injectSNPs: SNP injection

Description Usage Arguments Value Note Author(s) See Also Examples

Description

Inject SNPs from a SNPlocs data package into a genome.

Usage

1
2
3
4
5
6
7
8
9
injectSNPs(x, SNPlocs_pkgname)

SNPlocs_pkgname(x)
SNPcount(x)
SNPlocs(x, seqname)

## Related utilities
available.SNPs(type=getOption("pkgType"))
installed.SNPs()

Arguments

x

A BSgenome object.

SNPlocs_pkgname

The name of a SNPlocs data package containing SNP information for the single sequences contained in x. This package must be already installed (injectSNPs won't try to install it).

seqname

The name of a single sequence in x.

type

Character string indicating the type of package ("source", "mac.binary" or "win.binary") to look for.

Value

injectSNPs returns a copy of the original genome x where some or all of the single sequences were altered by injecting the SNPs defined in the SNPlocs data package specified thru the SNPlocs_pkgname argument. The SNPs in the altered genome are represented by an IUPAC ambiguity code at each SNP location.

SNPlocs_pkgname, SNPcount and SNPlocs return NULL if no SNPs were injected in x (i.e. if x is not a BSgenome object returned by a previous call to injectSNPs). Otherwise SNPlocs_pkgname returns the name of the package from which the SNPs were injected, SNPcount the number of SNPs for each altered sequence in x, and SNPlocs their locations in the sequence whose name is specified by seqname.

available.SNPs returns a character vector containing the names of the SNPlocs data packages that are currently available on the Bioconductor repositories for your version of R/Bioconductor. A SNPlocs data package contains basic SNP information (location and alleles) for a given organism.

installed.SNPs returns a character vector containing the names of the SNPlocs data packages that are already installed.

Note

injectSNPs, SNPlocs_pkgname, SNPcount and SNPlocs have the side effect to try to load the SNPlocs data package if it's not already loaded.

Author(s)

H. Pages

See Also

BSgenome-class, IUPAC_CODE_MAP, injectHardMask, letterFrequencyInSlidingView, .inplaceReplaceLetterAt

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
## What SNPlocs data packages are already installed:
installed.SNPs()

## What SNPlocs data packages are available:
available.SNPs()

if (interactive()) {
  ## Make your choice and install with:
  source("http://bioconductor.org/biocLite.R")
  biocLite("SNPlocs.Hsapiens.dbSNP.20100427")
}

## Inject SNPs from dbSNP into the Human genome:
library(BSgenome.Hsapiens.UCSC.hg19.masked)
genome <- BSgenome.Hsapiens.UCSC.hg19.masked
SNPlocs_pkgname(genome)

genome2 <- injectSNPs(genome, "SNPlocs.Hsapiens.dbSNP.20100427")
genome2  # note the extra "with SNPs injected from ..." line
SNPlocs_pkgname(genome2)
SNPcount(genome2)
head(SNPlocs(genome2, "chr1"))

alphabetFrequency(genome$chr1)
alphabetFrequency(genome2$chr1)

## Find runs of SNPs of length at least 25 in chr1. Might require
## more memory than some platforms can handle (e.g. 32-bit Windows
## and maybe some Mac OS X machines with little memory):
is_32bit_windows <- .Platform$OS.type == "windows" &&
                    .Platform$r_arch == "i386"
is_macosx <- substr(R.version$os, start=1, stop=6) == "darwin"
if (!is_32bit_windows && !is_macosx) {
    chr1 <- injectHardMask(genome2$chr1)
    ambiguous_letters <- paste(DNA_ALPHABET[5:15], collapse="")
    lf <- letterFrequencyInSlidingView(chr1, 25, ambiguous_letters)
    sl <- slice(as.integer(lf), lower=25)
    v1 <- Views(chr1, start(sl), end(sl)+24)
    v1
    max(width(v1))  # length of longest SNP run
}

Przemol/mirrors-bioc-BSgenome documentation built on May 8, 2019, 3:46 a.m.