Description Usage Arguments Details Author(s) See Also Examples
gdsSubset
takes a subset of data (snps and samples)
from a GDS file and write it to a new GDS file.
gdsSubsetCheck
checks that a GDS file is the desired subset
of another GDS file.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
parent.gds |
Name of the parent GDS file |
sub.gds |
Name of the subset GDS file |
sample.include |
Vector of sampleIDs to include in |
snp.include |
Vector of snpIDs to include in |
sub.storage |
storage type for the subset file; defaults to original storage type |
compress |
The compression level for variables in a GDS file (see |
block.size |
for GDS files stored with scan,snp dimensions, the number of SNPs to read from the parent file at a time. Ignored for snp,scan dimensions. |
verbose |
Logical value specifying whether to show progress information. |
allow.fork |
Logical value specifying whether to enable multiple forks to access the gds file simultaneously. |
gdsSubset
can select a subset of snps for all samples by setting snp.include
, a subset of samples for all snps by setting sample.include
, or a subset of snps and samples with both arguments.
The GDS nodes "snp.id", "snp.position", "snp.chromosome", and "sample.id" are copied, as well as any 2-dimensional nodes. Other nodes are not copied.
The attributes of the 2-dimensional nodes are also copied to the subset file.
If sub.storage is specified, the subset gds file will have a different storage mode for any 2-dimensional array.
In the special case where the 2-dimensional node has an attribute named "missing.value"
and the sub.storage type is "bit2"
, the missing.value attribute for the subset node is automatically set to 3.
At this point, no checking is done to ensure that the values will be properly stored with a different storage type, but gdsSubsetCheck
will return an error if the values do not match.
If the nodes in the GDS file are stored with scan,snp dimensions, then block.size
allows you to loop over a block of SNPs at a time.
If the nodes are stored with snp,scan dimensions, then the function simply loops over samples, one at a time.
gdsSubsetCheck
checks that a subset GDS file has the expected SNPs and samples of the parent file. It also checks that attributes were similarly copied, except for the above-mentioned special case of missing.value
for sub.storage="bit2"
.
Adrienne Stilp
gdsfmt, createDataFile
1 2 3 4 5 6 7 8 9 10 11 | gdsfile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(gdsfile)
sample.sel <- getScanID(gds, index=1:10)
snp.sel <- getSnpID(gds, index=1:100)
close(gds)
subfile <- tempfile()
gdsSubset(gdsfile, subfile, sample.include=sample.sel, snp.include=snp.sel)
gdsSubsetCheck(gdsfile, subfile, sample.include=sample.sel, snp.include=snp.sel)
file.remove(subfile)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.