Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/anomIdentifyLowQuality.R
Identify low quality samples for which false positive rate for anomaly detection is likely to be high. Measures of noise (high variance) and high segmentation are used.
1 2 | anomIdentifyLowQuality(snp.annot, med.sd, seg.info,
sd.thresh, sng.seg.thresh, auto.seg.thresh)
|
snp.annot |
|
med.sd |
data.frame of median standard deviation of BAlleleFrequency (BAF) or LogRRatio (LRR) values across autosomes for each scan, with
columns "scanID" and "med.sd". Usually the result of
|
seg.info |
data.frame with segmentation information from |
sd.thresh |
Threshold for |
sng.seg.thresh |
Threshold for segmentation factor for a given chromosome, above which the chromosome is said to be highly segmented. See Details. Suggested values are 0.0008 for BAF and 0.0048 for LOH. |
auto.seg.thresh |
Threshold for segmentation factor across autosome, above which the scan is said to be highly segmented. See Details. Suggested values are 0.0001 for BAF and 0.0006 for LOH. |
Low quality samples are determined separately with regard to each
of the two methods of segmentation, anomDetectBAF
and
anomDetectLOH
. BAF anomalies (respectively
LOH anomalies) found for samples identified as low quality for BAF (respectively
LOH) tend to have a high false positive rate.
A scan is identified as low quality due to high variance (noise), i.e.
if med.sd
is above a certain threshold sd.thresh
.
High segmentation is often an indication of artifactual patterns in the
B Allele Frequency (BAF) or Log R Ratio values (LRR) that are not always captured
by high variance. Here segmentation information is determined by
anomDetectBAF
or anomDetectLOH
which use
circular binary segmentation implemented by the R-package DNAcopy.
The measure for high segmentation is a "segmentation factor" =
(number of segments)/(number of eligible SNPS). A single chromosome segmentation
factor uses information for one chromosome. A segmentation factor across
autosomes uses the total number of segments and eligible SNPs across all autosomes.
See med.sd
, sd.thresh
, sng.seg.thresh
, and auto.seg.thresh
.
A data.frame with the following columns:
scanID |
integer id for the scan |
chrX.num.segs |
number of segments for chromosome X |
chrX.fac |
segmentation factor for chromosome X |
max.autosome |
autosome with highest single segmentation factor |
max.auto.fac |
segmentation factor for chromosome = |
max.auto.num.segs |
number of segments for chromosome = |
num.ch.segd |
number of chromosomes segmented, i.e. for which change points were found |
fac.all.auto |
segmentation factor across all autosomes |
med.sd |
median standard deviation of BAF (or LRR values) across autosomes. See |
type |
one of the following, indicating reason for identification as low quality:
|
Cecelia Laurie
findBAFvariance
,
anomDetectBAF
, anomDetectLOH
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)
blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <- IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <- GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
# initial scan for low quality with median SD
baf.sd <- sdByScanChromWindow(blData, genoData)
med.baf.sd <- medianSdOverAutosomes(baf.sd)
low.qual.ids <- med.baf.sd$scanID[med.baf.sd$med.sd > 0.05]
# segment and filter BAF
scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
data(centromeres.hg18)
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
snp.ids=snp.ids, centromere=centromeres.hg18, low.qual.ids=low.qual.ids)
# further screen for low quality scans
snp.annot <- illuminaSnpADF
snp.annot$eligible <- snp.annot$missing.n1 < 1
low.qual <- anomIdentifyLowQuality(snp.annot, med.baf.sd, anom$seg.info,
sd.thresh=0.1, sng.seg.thresh=0.0008, auto.seg.thresh=0.0001)
close(blData)
close(genoData)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.