ScanVcfParam-class: Parameters for scanning VCF files

ScanVcfParam-classR Documentation

Parameters for scanning VCF files

Description

Use ScanVcfParam() to create a parameter object influencing which records and fields are imported from a VCF file. Record parsing is based on genomic coordinates and requires a Tabix index file. Individual VCF elements can be specified in the ‘fixed’, ‘info’, ‘geno’ and ‘samples’ arguments.

Usage

  ScanVcfParam(fixed=character(), info=character(), geno=character(), 
               samples=character(), trimEmpty=TRUE, which, ...)

  ## Getters and Setters 
 
  vcfFixed(object)
  vcfFixed(object) <- value
  vcfInfo(object)
  vcfInfo(object) <- value
  vcfGeno(object)
  vcfGeno(object) <- value
  vcfSamples(object)
  vcfSamples(object) <- value
  vcfTrimEmpty(object)
  vcfTrimEmpty(object) <- value
  vcfWhich(object)
  vcfWhich(object) <- value

Arguments

fixed

A character() vector of fixed fields to be returned. Possible values are ALT, QUAL and FILTER. The CHROM, POS, ID and REF fields are needed to create the GRanges of variant locations. Because these are essential fields there is no option to request or omit them. If not specified, all fields are returned; if fixed=NA only REF is returned.

info

A character() vector naming the ‘INFO’ fields to return. scanVcfHeader() returns a vector of available fields. If not specified, all fields are returned; if info=NA no fields are returned.

geno

A character() vector naming the ‘GENO’ fields to return. scanVcfHeader() returns a vector of available fields. If not specified, all fields are returned; if geno=NA no fields are returned and requests for specific samples are ignored.

samples

A character() vector of sample names to return. samples(scanVcfHeader()) returns all possible names. If not specified, data for all samples are returned; if either samples=NA or geno=NA no fields are returned. Requests for specific samples when geno=NA are ignored.

trimEmpty

A logical(1) indicating whether ‘GENO’ fields with no values should be returned.

which

A GRanges describing the sequences and ranges to be queried. Variants whose POS lies in the interval(s) [start, end] are returned. If which is not specified all ranges are returned.

object

An instance of class ScanVcfParam.

value

An instance of the corresponding slot, to be assigned to object.

...

Arguments passed to methods.

Objects from the Class

Objects can be created by calls of the form ScanVcfParam().

Slots

which:

Object of class "IntegerRangesList" indicating which reference sequence and coordinate variants must overlap.

fixed:

Object of class "character" indicating fixed fields to be returned.

info:

Object of class "character" indicating portions of ‘INFO’ to be returned.

geno:

Object of class "character" indicating portions of ‘GENO’ to be returned.

samples:

Object of class "character" indicating the samples to be returned.

trimEmpty:

Object of class "logical" indicating whether empty ‘GENO’ fields are to be returned.

Functions and methods

See 'Usage' for details on invocation.

Constructor:

ScanVcfParam:

Returns a ScanVcfParam object. The which argument to the constructor can be one of several types, as documented above.

Accessors:

vcfFixed, vcfInfo, vcfGeno, vcfSamples, vcfTrimEmpty, vcfWhich:

Return the corresponding field from object.

Methods:

show

Compactly display the object.

Author(s)

Martin Morgan and Valerie Obenchain

See Also

readVcf

Examples

  ScanVcfParam()

  fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
  compressVcf <- bgzip(fl, tempfile())
  idx <- indexTabix(compressVcf, "vcf")
  tab <- TabixFile(compressVcf, idx)
  ## ---------------------------------------------------------------------
  ## 'which' argument
  ## ---------------------------------------------------------------------
  ## To subset on genomic coordinates, supply an object 
  ## containing the ranges of interest. These ranges can
  ## be given directly to the 'param' argument or wrapped
  ## inside ScanVcfParam() as the 'which' argument. 

  ## When using a list, the outer list names must correspond to valid
  ## chromosome names in the vcf file. In this example they are "1" 
  ## and "2".
  gr1 <- GRanges("1", IRanges(13219, 2827695, name="regionA"))
  gr2 <- GRanges(rep("2", 2), IRanges(c(321680, 14477080), 
                 c(321689, 14477090), name=c("regionB", "regionC")))
  grl <- GRangesList("1"=gr1, "2"=gr2)
  vcf <- readVcf(tab, "hg19", grl)

  ## Names of the ranges are in the 'paramRangeID' metadata column of the
  ## GRanges object returned by the rowRanges() accessor.
  rowRanges(vcf)

  ## which can be used for subsetting the VCF object
  vcf[rowRanges(vcf)$paramRangeID == "regionA"]
 
  ## When using ranges, the seqnames must correspond to valid
  ## chromosome names in the vcf file.
  gr <- unlist(grl, use.names=FALSE)
  vcf <- readVcf(tab, "hg19", gr)
 
  ## ---------------------------------------------------------------------
  ## 'fixed', 'info', 'geno' and 'samples' arguments
  ## ---------------------------------------------------------------------
  ## This param specifies the "GT" 'geno' field for a single sample
  ## and the subset of ranges in 'which'. All 'fixed' and 'info' fields 
  ## will be returned. 
  ScanVcfParam(geno="GT", samples="NA00002", which=gr)
 
  ## Here two 'fixed' and one 'geno' field are specified
  ScanVcfParam(fixed=c("ALT", "QUAL"), geno="GT", info=NA) 

  ## Return only the 'fixed' fields
  ScanVcfParam(geno=NA, info=NA) 

Bioconductor/VariantAnnotation documentation built on Jan. 9, 2025, 12:03 a.m.