View source: R/anomIdentifyLowQuality.R
anomIdentifyLowQuality | R Documentation |
Identify low quality samples for which false positive rate for anomaly detection is likely to be high. Measures of noise (high variance) and high segmentation are used.
anomIdentifyLowQuality(snp.annot, med.sd, seg.info,
sd.thresh, sng.seg.thresh, auto.seg.thresh)
snp.annot |
|
med.sd |
data.frame of median standard deviation of BAlleleFrequency (BAF) or LogRRatio (LRR) values across autosomes for each scan, with
columns "scanID" and "med.sd". Usually the result of
|
seg.info |
data.frame with segmentation information from |
sd.thresh |
Threshold for |
sng.seg.thresh |
Threshold for segmentation factor for a given chromosome, above which the chromosome is said to be highly segmented. See Details. Suggested values are 0.0008 for BAF and 0.0048 for LOH. |
auto.seg.thresh |
Threshold for segmentation factor across autosome, above which the scan is said to be highly segmented. See Details. Suggested values are 0.0001 for BAF and 0.0006 for LOH. |
Low quality samples are determined separately with regard to each
of the two methods of segmentation, anomDetectBAF
and
anomDetectLOH
. BAF anomalies (respectively
LOH anomalies) found for samples identified as low quality for BAF (respectively
LOH) tend to have a high false positive rate.
A scan is identified as low quality due to high variance (noise), i.e.
if med.sd
is above a certain threshold sd.thresh
.
High segmentation is often an indication of artifactual patterns in the
B Allele Frequency (BAF) or Log R Ratio values (LRR) that are not always captured
by high variance. Here segmentation information is determined by
anomDetectBAF
or anomDetectLOH
which use
circular binary segmentation implemented by the R-package DNAcopy.
The measure for high segmentation is a "segmentation factor" =
(number of segments)/(number of eligible SNPS). A single chromosome segmentation
factor uses information for one chromosome. A segmentation factor across
autosomes uses the total number of segments and eligible SNPs across all autosomes.
See med.sd
, sd.thresh
, sng.seg.thresh
, and auto.seg.thresh
.
A data.frame with the following columns:
scanID |
integer id for the scan |
chrX.num.segs |
number of segments for chromosome X |
chrX.fac |
segmentation factor for chromosome X |
max.autosome |
autosome with highest single segmentation factor |
max.auto.fac |
segmentation factor for chromosome = |
max.auto.num.segs |
number of segments for chromosome = |
num.ch.segd |
number of chromosomes segmented, i.e. for which change points were found |
fac.all.auto |
segmentation factor across all autosomes |
med.sd |
median standard deviation of BAF (or LRR values) across autosomes. See |
type |
one of the following, indicating reason for identification as low quality:
|
Cecelia Laurie
findBAFvariance
,
anomDetectBAF
, anomDetectLOH
library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)
blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <- IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <- GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
# initial scan for low quality with median SD
baf.sd <- sdByScanChromWindow(blData, genoData)
med.baf.sd <- medianSdOverAutosomes(baf.sd)
low.qual.ids <- med.baf.sd$scanID[med.baf.sd$med.sd > 0.05]
# segment and filter BAF
scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
data(centromeres.hg18)
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
snp.ids=snp.ids, centromere=centromeres.hg18, low.qual.ids=low.qual.ids)
# further screen for low quality scans
snp.annot <- illuminaSnpADF
snp.annot$eligible <- snp.annot$missing.n1 < 1
low.qual <- anomIdentifyLowQuality(snp.annot, med.baf.sd, anom$seg.info,
sd.thresh=0.1, sng.seg.thresh=0.0008, auto.seg.thresh=0.0001)
close(blData)
close(genoData)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.