filterIntervals: Remove low quality intervals

View source: R/filterIntervals.R

filterIntervalsR Documentation

Remove low quality intervals

Description

This function determines which intervals in the coverage files should be included or excluded in the segmentation. It is called via the fun.filterIntervals argument of runAbsoluteCN. The arguments are passed via args.filterIntervals.

Usage

filterIntervals(
  normal,
  tumor,
  log.ratio,
  seg.file,
  filter.lowhigh.gc = 0.001,
  min.coverage = 15,
  min.total.counts = 100,
  min.targeted.base = 5,
  min.mappability = c(0.6, 0.1),
  min.fraction.offtarget = 0.05,
  normalDB = NULL
)

Arguments

normal

Coverage data for normal sample.

tumor

Coverage data for tumor sample.

log.ratio

Copy number log-ratios, one for each interval in the coverage file.

seg.file

If not NULL, then do not filter intervals, because data is already segmented via the provided segmentation file.

filter.lowhigh.gc

Quantile q (defines lower q and upper 1-q) for removing intervals with outlier GC profile. Assuming that GC correction might not have been worked on those. Requires interval.file.

min.coverage

Minimum coverage in both normal and tumor. Intervals with lower coverage are ignored. If a normalDB is provided, then this database already provides information about low quality intervals and the min.coverage is set to min.coverage/10000.

min.total.counts

Exclude intervals with fewer than that many reads in combined tumor and normal.

min.targeted.base

Exclude intervals with targeted base (size in bp) smaller than this cutoff. This is useful when the same interval file was used to calculate GC content. For such small targets, the GC content is likely very different from the true GC content of the probes.

min.mappability

double(2) specifying the minimum mappability score for on-target, off-target in that order.

min.fraction.offtarget

Skip off-target regions when less than the specified fraction of all intervals passes all filters

normalDB

Normal database, created with createNormalDatabase.

Value

logical(length(log.ratio)) specifying which intervals should be used in segmentation.

Author(s)

Markus Riester

Examples


normal.coverage.file <- system.file("extdata", "example_normal.txt.gz",
    package = "PureCN")
normal2.coverage.file <- system.file("extdata", "example_normal2.txt.gz",
    package = "PureCN")
normal.coverage.files <- c(normal.coverage.file, normal2.coverage.file)
normalDB <- createNormalDatabase(normal.coverage.files)

tumor.coverage.file <- system.file("extdata", "example_tumor.txt.gz",
    package = "PureCN")
vcf.file <- system.file("extdata", "example.vcf.gz",
    package = "PureCN")
interval.file <- system.file("extdata", "example_intervals.txt",
    package = "PureCN")

# The max.candidate.solutions, max.ploidy and test.purity parameters are set to
# non-default values to speed-up this example.  This is not a good idea for real
# samples.
ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file,
    tumor.coverage.file = tumor.coverage.file,
    genome = "hg19", vcf.file = vcf.file, normalDB = normalDB,
    sampleid = "Sample1", interval.file = interval.file,
    args.filterIntervals = list(min.targeted.base = 10), max.ploidy = 4,
    test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1)


lima1/PureCN documentation built on Sept. 17, 2024, 5:48 a.m.