GenotypeData-class | R Documentation |
The GenotypeData class is a container for storing genotype data from a genome-wide association study together with the metadata associated with the subjects and SNPs involved in the study.
The GenotypeData class consists of three slots: data, snp annotation, and scan annotation. There may be multiple scans associated with a subject (e.g. duplicate scans for quality control), hence the use of "scan" as one dimension of the data. Snp and scan annotation are optional, but if included in the GenotypeData object, their unique integer ids (snpID and scanID) are checked against the ids stored in the data slot to ensure consistency.
GenotypeData(data, snpAnnot=NULL, scanAnnot=NULL)
:
data
must be an NcdfGenotypeReader
, GdsGenotypeReader
, or MatrixGenotypeReader
object.
snpAnnot
, if not NULL
, must be a
SnpAnnotationDataFrame
or SnpAnnotationSQLite
object.
scanAnnot
, if not NULL
, must be a
ScanAnnotationDataFrame
or ScanAnnotationSQLite
object.
The GenotypeData
constructor creates and returns a
GenotypeData instance, ensuring that data, snpAnnot, and scanAnnot
are internally consistent.
In the code snippets below, object
is a GenotypeData object.
nsnp(object)
: The number of SNPs in the data.
nscan(object)
: The number of scans in the data.
getSnpID(object, index)
: A unique integer vector of snp
IDs. The optional index
is a logical or
integer vector specifying elements to extract.
getChromosome(object, index, char=FALSE)
: A vector of
chromosomes. The optional index
is a logical or
integer vector specifying elements to extract.
If char=FALSE
(default), returns an integer vector.
If char=TRUE
, returns a character vector with elements in
(1:22,X,XY,Y,M,U).
getPosition(object, index)
: An integer vector of base pair
positions. The optional index
is a logical or
integer vector specifying elements to extract.
getAlleleA(object, index)
: A character vector of A alleles.
The optional index
is a logical or
integer vector specifying elements to extract.
getAlleleB(object, index)
: A character vector of B alleles.
The optional index
is a logical or
integer vector specifying elements to extract.
getScanID(object, index)
: A unique integer vector of scan
IDs. The optional index
is a logical or
integer vector specifying elements to extract.
getSex(object, index)
: A character vector of sex, with values 'M'
or 'F'. The optional index
is a logical or
integer vector specifying elements to extract.
hasSex(object)
: Returns TRUE
if the column 'sex' is present in
object
.
getGenotype(object, snp=c(1,-1), scan=c(1,-1), char=FALSE, sort=TRUE, drop=TRUE, use.names=FALSE, ...)
:
Extracts genotype values (number of A alleles).
snp
and scan
indicate which elements to return along the snp and
scan dimensions. They must be integer vectors of the form (start,
count), where start is the index of the first data element to read
and count is the number of elements to read. A value of '-1' for
count indicates that the entire dimension should be read.
If drop=TRUE
, the result is coerced to the lowest possible dimension.
If use.names=TRUE
, names of the resulting vector or matrix are set to the SNP and scan IDs.
Missing values are represented as NA
. If char=TRUE
, genotypes are
returned as characters of the form "A/B". If sort=TRUE
,
alleles are lexographically sorted ("G/T" instead of "T/G").
getGenotypeSelection(object, snp=NULL, scan=NULL, snpID=NULL, scanID=NULL,
char=FALSE, sort=TRUE, drop=TRUE, use.names=TRUE, ...)
:
May be used only if the data slot contains a
GdsGenotypeReader
or MatrixGenotypeReader
object.
Extracts genotype values (number of A alleles).
snp
and scan
may be integer or logical vectors indicating which elements
to return along the snp and scan dimensions.
snpID
and scanID
allow section by values of snpID and scanID.
Unlike getGenotype
, the values requested need not be in contiguous blocks.
Other arguments are identical to getGenotype
.
getSnpVariable(object, varname, index)
: Returns the snp
annotation variable varname
.
The optional index
is a logical or
integer vector specifying elements to extract.
getSnpVariableNames(object)
: Returns a character vector
with the names of the columns in the snp annotation.
hasSnpVariable(object, varname)
: Returns TRUE
if the
variable varname
is present in the snp annotation.
getScanVariable(object, varname, index)
: Returns the scan
annotation variable varname
.
The optional index
is a logical or
integer vector specifying elements to extract.
getScanVariableNames(object)
: Returns a character vector
with the names of the columns in the scan annotation.
hasScanVariable(object, varname)
: Returns TRUE
if the
variable varname
is present in the scan annotation.
getVariable(object, varname, drop=TRUE, ...)
: Extracts the
contents of the variable varname
from the data.
If drop=TRUE
, the result is coerced to the lowest possible dimension.
Missing values are represented as NA
.
If the variable is not found, returns NULL
.
hasVariable(object, varname)
: Returns TRUE
if
the data contains contains varname
, FALSE
if not.
getSnpAnnotation(object)
: Returns the snp annotation.
hasSnpAnnotation(object)
: Returns TRUE
if the snp
annotation slot is not NULL
.
getScanAnnotation(object)
: Returns the scan annotation.
hasScanAnnotation(object)
: Returns TRUE
if the scan
annotation slot is not NULL
.
open(object)
: Opens a connection to the data.
close(object)
: Closes the data connection.
autosomeCode(object)
: Returns the integer codes for the
autosomes.
XchromCode(object)
: Returns the integer code for the X
chromosome.
XYchromCode(object)
: Returns the integer code for the
pseudoautosomal region.
YchromCode(object)
: Returns the integer code for the Y
chromosome.
MchromCode(object)
: Returns the integer code for
mitochondrial SNPs.
Stephanie Gogarten
SnpAnnotationDataFrame
,
SnpAnnotationSQLite
,
ScanAnnotationDataFrame
,
ScanAnnotationSQLite
,
GdsGenotypeReader
,
NcdfGenotypeReader
,
MatrixGenotypeReader
,
IntensityData
library(GWASdata)
file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(file)
# object without annotation
genoData <- GenotypeData(gds)
# object with annotation
data(illuminaSnpADF)
data(illuminaScanADF)
# need to rebuild old SNP annotation object to get allele methods
snpAnnot <- SnpAnnotationDataFrame(pData(illuminaSnpADF))
genoData <- GenotypeData(gds, snpAnnot=snpAnnot, scanAnnot=illuminaScanADF)
# dimensions
nsnp(genoData)
nscan(genoData)
# get snpID and chromosome
snpID <- getSnpID(genoData)
chrom <- getChromosome(genoData)
# get positions only for chromosome 22
pos22 <- getPosition(genoData, index=(chrom == 22))
# get other annotations
if (hasSex(genoData)) sex <- getSex(genoData)
plate <- getScanVariable(genoData, "plate")
rsID <- getSnpVariable(genoData, "rsID")
# get all snps for first scan
geno <- getGenotype(genoData, snp=c(1,-1), scan=c(1,1))
# starting at snp 100, get 10 snps for the first 5 scans
geno <- getGenotype(genoData, snp=c(100,10), scan=c(1,5))
geno
# return genotypes as "A/B" rather than number of A alleles
geno <- getGenotype(genoData, snp=c(100,10), scan=c(1,5), char=TRUE)
geno
close(genoData)
#--------------------------------------
# An example using a non-human organism
#--------------------------------------
# Chicken has 38 autosomes, Z, and W. Male is ZZ, female is ZW.
# Define sex chromosomes as X=Z and Y=W.
gdsfile <- tempfile()
simulateGenotypeMatrix(n.snps=10, n.chromosomes=40, n.samples=5,
filename=gdsfile, file.type="gds")
gds <- GdsGenotypeReader(gdsfile, autosomeCode=1:38L,
XchromCode=39L, YchromCode=40L,
XYchromCode=41L, MchromCode=42L)
table(getChromosome(gds))
table(getChromosome(gds, char=TRUE))
# SNP annotation
snpdf <- data.frame(snpID=getSnpID(gds),
chromosome=getChromosome(gds),
position=getPosition(gds))
snpAnnot <- SnpAnnotationDataFrame(snpdf, autosomeCode=1:38L,
XchromCode=39L, YchromCode=40L,
XYchromCode=41L, MchromCode=42L)
varMetadata(snpAnnot)[,"labelDescription"] <-
c("unique integer ID",
"chromosome coded as 1:38=autosomes, 39=Z, 40=W",
"base position")
# reverse sex coding to get proper counting of sex chromosome SNPs
scandf <- data.frame(scanID=1:5, sex=c("M","M","F","F","F"),
stringsAsFactors=FALSE)
scanAnnot <- ScanAnnotationDataFrame(scandf)
varMetadata(scanAnnot)[,"labelDescription"] <-
c("unique integer ID",
"sex coded as M=female and F=male")
genoData <- GenotypeData(gds, snpAnnot=snpAnnot, scanAnnot=scanAnnot)
afreq <- alleleFrequency(genoData)
# frequency of Z chromosome in females ("M") and males ("F")
afreq[snpAnnot$chromosome == 39, c("M","F")]
# frequency of W chromosome in females ("M") and males ("F")
afreq[snpAnnot$chromosome == 40, c("M","F")]
close(genoData)
unlink(gdsfile)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.