Description Usage Arguments Value Note Author(s) References See Also Examples
View source: R/filterClusters.R
If clusters have been identified using the mini-rank norm algorithm, cluster statistics are computed. In contrast, if the CWT-based cluster identification algorithm was used, clusters are first filtered to retain only those instances containing a wavelet peak and a high-confidence substitution site within their cluster boundaries.
1 2 | filterClusters(clusters, highConfSub, coverage, model, genome,
refBase = 'T', minWidth = 12, verbose = TRUE)
|
clusters |
GRanges object containing individual clusters as identified by the getClusters function |
highConfSub |
GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function |
coverage |
An Rle object containing the coverage at each genomic position as returned by a call to coverage |
model |
List of 5 items containing the estimated mixture model as returned by the fitMixtureModel function |
genome |
BSgenome object of the relevant reference genome (e.g.
|
refBase |
A character specifying the base in the reference genome for
which transitions are experimentally induced (e.g. 4-SU treatment - a
standard in PAR-CLIP experiments - induces T to C transitions and hence
|
minWidth |
An integer corresponding to the minimum width of reported
clusters. Shorter clusters are extended to |
verbose |
Logical, if TRUE processing steps are printed |
GRanges object containing the transcriptome-wide identified clusters, having metadata:
Ntransitions |
The number of high-confidence transitions within the cluster |
MeanCov |
The mean coverage within the cluster |
NbasesInRef |
The number of genomic positions within the
cluster corresponding to |
CrossLinkEff |
The
crosslinking efficiency within the cluster, estimated as the ratio between
the number of high-confidence transitions within the cluster and the total
number of genomic positions therein corresponding to |
Sequence |
The genomic sequence undelying the cluster (plus strand) |
SumLogOdds |
The sum of the log-odd values within the cluster |
RelLogOdds |
The sum of the log-odds divided by the number of high-confidence transitions within the cluster. This variable can be regarded as a proxy for statistical significance and can be therefore used to rank clusters. See Comoglio, Sievers and Paro for details. |
1) This function calls the appropriate processing function according
to the method used to compute clusters. This information is stored in the
metadata(ranges(clusters))
slot as an object of type list.
2) Notice that genome
corresponds to the according reference genome
matching the organism in which experiments have been carried out. For
example genome = Hsapiens
is used for the human reference genome
(assembly 19), where Hsapiens
is provided by
BSgenome.Hsapiens.UCSC.hg19
.
Federico Comoglio and Cem Sievers
Herve Pages, BSgenome: Infrastructure for Biostrings-based genome data packages
Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697
Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.
getClusters
, getHighConfSub
,
fitMixtureModel
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | require(BSgenome.Hsapiens.UCSC.hg19)
data( model, package = "wavClusteR" )
filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam( filename = filename )
countTable <- getAllSub( example, minCov = 10, cores = 1 )
highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" )
coverage <- coverage( example )
clusters <- getClusters( highConfSub = highConfSub,
coverage = coverage,
sortedBam = example,
cores = 1,
threshold = 2 )
fclusters <- filterClusters( clusters = clusters,
highConfSub = highConfSub,
coverage = coverage,
model = model,
genome = Hsapiens,
refBase = 'T',
minWidth = 12 )
fclusters
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.