getPromoterSeq-methods: Get gene promoter or terminator sequences

getPromoterSeqR Documentation

Get gene promoter or terminator sequences

Description

Extract promoter or terminator sequences for the genes or transcripts specified in the query (aGRanges or GRangesList object) from a BSgenome or FaFile object.

Usage

## S4 method for signature 'GRanges'
getPromoterSeq(query, subject, upstream=2000, downstream=200)
## S4 method for signature 'GRanges'
getTerminatorSeq(query, subject, upstream=2000, downstream=200)

## S4 method for signature 'GRangesList'
getPromoterSeq(query, subject, upstream=2000, downstream=200)
## S4 method for signature 'GRangesList'
getTerminatorSeq(query, subject, upstream=2000, downstream=200)

Arguments

query

A GRanges or GRangesList object containing genes grouped by transcript.

subject

A BSgenome or FaFile object from which the sequences will be taken.

upstream

The number of DNA bases to include upstream of the TSS (transcription start site)

downstream

The number of DNA bases to include downstream of the TSS (transcription start site)

Details

getPromoterSeq and getTerminatorSeq are generic functions dispatching on query, which is either a GRanges or a GRangesList. They are convenience wrappers for the promoters, terminators, and getSeq functions. The purpose is to allow sequence extraction from either a BSgenome or FaFile object.

Default values for upstream and downstream were chosen based on our current understanding of gene regulation. On average, promoter regions in the mammalian genome are 5000 bp upstream and downstream of the transcription start site.

Value

A DNAStringSet or DNAStringSetList instance corresponding to the GRanges or GRangesList supplied in the query.

Author(s)

Paul Shannon

See Also

  • The promoters man page in the GenomicRanges package for the promoters() and terminators() methods for GenomicRanges objects.

  • getSeq in the Biostrings package for extracting a set of sequences from a sequence container like a BSgenome or FaFile object.

Examples

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(BSgenome.Hsapiens.UCSC.hg19)


## A GRangesList object describing all the known Human transcripts grouped
## by gene:
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
tx_by_gene <- transcriptsBy(txdb, by="gene")

e2f3 <- "1871"  # entrez geneID for a cell cycle control transcription
                # factor, chr6 on the plus strand

## A GRanges object describing the three transcripts for gene 1871:
e2f3_tx <- tx_by_gene[[e2f3]]

## Promoter sequences for gene 1871:
e2f3_promoter_seqs <- getPromoterSeq(e2f3_tx, Hsapiens,
                                     upstream=40, downstream=15)
e2f3_promoter_seqs

mcols(e2f3_promoter_seqs)

## Terminator sequences for gene 1871:
e2f3_terminator_seqs <- getTerminatorSeq(e2f3_tx, Hsapiens,
                                         upstream=25, downstream=10)

e2f3_terminator_seqs

mcols(e2f3_terminator_seqs)  # same as 'mcols(e2f3_promoter_seqs)'

## All Human promoter sequences grouped by gene:
getPromoterSeq(tx_by_gene, Hsapiens, upstream=6, downstream=4)

Bioconductor/GenomicFeatures documentation built on Nov. 7, 2024, 4:25 a.m.