scanBcf: Operations on 'BCF' files.

BcfInputR Documentation

Operations on ‘BCF’ files.

Description

Import, coerce, or index variant call files in text or binary format.

Usage


scanBcfHeader(file, ...)
## S4 method for signature 'character'
scanBcfHeader(file, ...)

scanBcf(file, ...)
## S4 method for signature 'character'
scanBcf(file, index = file, ..., param=ScanBcfParam())

asBcf(file, dictionary, destination, ...,
      overwrite=FALSE, indexDestination=TRUE)
## S4 method for signature 'character'
asBcf(file, dictionary, destination, ...,
      overwrite=FALSE, indexDestination=TRUE)

indexBcf(file, ...)
## S4 method for signature 'character'
indexBcf(file, ...)

Arguments

file

For scanBcf and scanBcfHeader, the character() file name of the ‘BCF’ file to be processed, or an instance of class BcfFile.

index

The character() file name(s) of the ‘BCF’ index to be processed.

dictionary

a character vector of the unique “CHROM” names in the VCF file.

destination

The character(1) file name of the location where the BCF output file will be created. For asBcf this is without the “.bcf” file suffix.

param

A instance of ScanBcfParam influencing which records are parsed and the ‘INFO’ and ‘GENO’ information returned.

...

Additional arguments, e.g., for scanBcfHeader,character-method, mode of BcfFile.

overwrite

A logical(1) indicating whether the destination can be over-written if it already exists.

indexDestination

A logical(1) indicating whether the created destination file should also be indexed.

Details

bcf* functions are restricted to the GENO fields supported by ‘bcftools’ (see documentation at the url below). The argument param allows portions of the file to be input, but requires that the file be BCF or bgzip'd and indexed as a TabixFile. For similar functions operating on VCF files see ?scanVcf in the VariantAnnotation package.

Value

scanBcfHeader returns a list, with one element for each file named in file. Each element of the list is itself a list containing three elements. The Reference element is a character() vector with names of reference sequences. The Sample element is a character() vector of names of samples. The Header element is a DataFrameList with one DataFrame per unique key value in the header (preceded by “##”).

NOTE: In Rsamtools >=1.33.6, the Header DataFrameList no longer contains a DataFrame named "META". The META DataFrame contained all "simple" key-value headers lines from a bcf / vcf file. These "simple" header lines are now parsed into individual DataFrames named for the unique key.

scanBcf returns a list, with one element per file. Each list has 9 elements, corresponding to the columns of the VCF specification: CHROM, POS, ID, REF, ALTQUAL, FILTER, INFO, FORMAT, GENO.

The GENO element is itself a list, with elements corresponding to fields supported by ‘bcftools’ (see documentation at the url below).

asBcf creates a binary BCF file from a text VCF file.

indexBcf creates an index into the BCF file.

Author(s)

Martin Morgan <mtmorgan@fhcrc.org>.

References

http://vcftools.sourceforge.net/specs.html outlines the VCF specification.

http://samtools.sourceforge.net/mpileup.shtml contains information on the portion of the specification implemented by bcftools.

http://samtools.sourceforge.net/ provides information on samtools.

See Also

BcfFile, TabixFile

Examples

fl <- system.file("extdata", "ex1.bcf.gz", package="Rsamtools",
                  mustWork=TRUE)
scanBcfHeader(fl)
bcf <- scanBcf(fl)
## value: list-of-lists
str(bcf[1:8])
names(bcf[["GENO"]])
str(head(bcf[["GENO"]][["PL"]]))
example(BcfFile)

Bioconductor/Rsamtools documentation built on Oct. 31, 2024, 1:23 p.m.