Description Usage Arguments Details Value Author(s) Examples
This function tries to obtain the minimum number of components needed in a
FFT filter to achieve or get as close as possible to a given correlation
value. Usually you don't need to call directly this function, is used in
filterFFT
by default.
1 2 3 4 5 6 7 8 9 10 11 12 | pcKeepCompDetect(
data,
pc.min = 0.01,
pc.max = 0.1,
max.iter = 20,
verbose = FALSE,
cor.target = 0.98,
cor.tol = 0.001,
smpl.num = 25,
smpl.min.size = 2^10,
smpl.max.size = 2^14
)
|
data |
Numeric vector to be filtered |
pc.min, pc.max |
Range of allowed values for |
max.iter |
Maximum number of iterations |
verbose |
Extra information (debug) |
cor.target |
Target correlation between the filtered and the original profiles. A value around 0.99 is recommeded for Next Generation Sequencing data and around 0.7 for Tiling Arrays. |
cor.tol |
Tolerance allowed between the obtained correlation an the target one. |
smpl.num |
If |
smpl.min.size, smpl.max.size |
Minimum and maximum size of the samples. This is used for selection and sub-selection of ranges with meaningful values (i,e, different from 0 and NA). Power of 2 values are recommended, despite non-mandatory. |
... |
Parameters to be pass to |
This function predicts a suitable pcKeepComp
value for filterFFT
function. This is the recommended amount of components (in percentage) to
keep in the filterFFT
function to obtain a correlation of (or near of)
cor.target
.
The search starts from two given values pc.min
, pc.max
and uses linial
interpolation to quickly reach a value that gives a corelation between the
filtered and the original near cor.target
within the specified tolerance
cor.tol
.
To allow a quick detection without an exhaustive search, this function uses
a subset of the data by randomly sampling those regions with meaningful
coverage values (i,e, different from 0 or NA) larger than smpl.min.size
.
If it's not possible to obtain smpl.max.size
from this region (this could
be due to flanking 0's, for example) at least smpl.min.size
will be used
to check correlation. Mean correlation between all sampled regions is used
to test the performance of the pcKeepComp
parameter.
If the number of meaningful bases in data
is less than smpl.min.size * (smpl.num/2)
all the data
vector will be used instead of using sampling.
Fitted pcKeepComp
value
Oscar Flores oflores@mmb.pcb.ub.es, David Rosell david.rosell@irbbarcelona.org
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # Load dataset
data(nucleosome_htseq)
data <- as.vector(coverage.rpm(nucleosome_htseq)[[1]])
# Get recommended pcKeepComp value
pckeepcomp <- pcKeepCompDetect(data, cor.target=0.99)
print(pckeepcomp)
# Call filterFFT
f1 <- filterFFT(data, pcKeepComp=pckeepcomp)
# Also this can be called directly
f2 <- filterFFT(data, pcKeepComp="auto", cor.target=0.99)
# Plot
library(ggplot2)
i <- 1:2000
plot_data <- rbind(
data.frame(x=i, y=data[i], coverage="original"),
data.frame(x=i, y=f1[i], coverage="two calls"),
data.frame(x=i, y=f2[i], coverage="one call")
)
qplot(x=x, y=y, color=coverage, data=plot_data, geom="line",
xlab="position", ylab="coverage")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.