getClusters: Identify clusters containing high-confidence substitutions...
In wavClusteR: Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data

Description Usage Arguments Value Note Author(s) References See Also Examples

Identifies clusters using the mini-rank norm (MRN) algorithm, which employs thresholding of background coverage differences and finds the optimal cluster boundaries by exhaustively evaluating all putative clusters using a rank-based approach. This method has higher sensitivity and an approximately 10-fold faster running time than the CWT-based cluster identification algorithm.

1 2	getClusters(highConfSub, coverage, sortedBam, cores = 1, threshold)

`highConfSub`	GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function
`coverage`	An Rle object containing the coverage at each genomic position as returned by a call to coverage
`sortedBam`	a GRanges object containing all aligned reads, including read sequence (qseq) and MD tag (MD), as returned by the readSortedBam function
`cores`	integer, the number of cores to be used for parallel evaluation. Default is 1.
`threshold`	numeric, the difference in coverage to be considered noise. If not specified, a Gaussian mixture model is used to learn a threshold from the data. Empirically, 10% of the minimum coverage required at substitutions (see argument `minCov` in the getHighConfSub function) might suffice to provide highly resolved clusters. However, if `minCov` is much lower than the median strand-specific coverage at substitutions m, which can be computed using `summary(elementMetadata(highConfSub)[, 'coverage'])['Median']`), 10% of m might represent an optimal choice.

GRanges object containing the identified cluster boundaries.

Clusters returned by this function need to be further merged by the function filterClusters, which also computes all relevant cluster statistics.

Federico Comoglio and Cem Sievers

Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697

Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.

getHighConfSub, filterClusters

filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub, 
                         coverage = coverage, 
                         sortedBam = example, 
	                 cores = 1, 
	                 threshold = 2 )