View source: R/otimalBinsize.R
optimalBinsize | R Documentation |
Calculate Akaike's information criterion (AIC) and cross-validation (CV) log-likelihood to infer the optimal bin size to partition read counts across genome.
optimalBinsize(bamfiles = NULL, bamnames = NULL, pathToBams = NULL,
binSizes = c(10, 30, 50, 100, 250, 500, 750, 1000), measure = "CV",
lineColor = "red4", chromosomesFilter = c("X", "Y", "M", "MT"),
savePlot = FALSE, plotPrefix = "optimalBinsize", minMapq = 20,
isPaired = NA, isProperPair = NA, isUnmappedQuery = FALSE,
hasUnmappedMate = NA, isMinusStrand = NA, isMateMinusStrand = NA,
isFirstMateRead = NA, isSecondMateRead = NA, isSecondaryAlignment = NA,
isDuplicate = FALSE)
bamfiles |
A |
bamnames |
An optional |
pathToBams |
If |
binSizes |
A |
measure |
The goodness of fit criteria (AIC or CV). Defaults to "CV". |
lineColor |
Line color to use in plot. |
chromosomesFilter |
A |
savePlot |
if TRUE (default) saves plots of each sample to working directory. |
plotPrefix |
Prefix for plot title and pdf file name. Defaults to "optimalBinsize". |
minMapq |
If quality scores exists, the minimum quality score required in order to keep a read (20, default). |
isPaired |
A |
isProperPair |
A |
isUnmappedQuery |
A |
hasUnmappedMate |
A |
isMinusStrand |
A |
isMateMinusStrand |
A |
isFirstMateRead |
A |
isSecondMateRead |
A |
isSecondaryAlignment |
A |
isDuplicate |
A |
As a guidance, choose bin sizes which have low AIC and/or high CV values but also contain 30-180 read counts on average. This strikes a reasonable balance between error variability and bias of CNA. Using a much smaller bin size may result in many genomic regions with zero read count and make the overall analysis non-informative. At the other extreme, using a much bigger bin size will 'smooth out' some pattern of alteration (i.e. increasing bias). The process of estimating the optimal bin size is in the context of low-coverage sequence data, so use sensible values for the binSizes argument when the input data is not of shallow whole-genome depth (<10 million reads).
Returns a list. The first element is a data.frame holding information of the
average read counts per bin size, the other elements are sample-specific
ggplot
objects.
Dineika Chandrananda
Internally, the function opt.win.onesample
of the NGSoptwin package is used.
## Not run:
vignette("CNAclinic")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.