BiocStyle::markdown()
Package: Pbase
Author: Laurent Gatto
Last compiled: r date()
Last modified: r file.info("ensucsc.Rmd")$mtime
This vignette described how to convert coordinates between different
genome references. We will use transcript ENST00000373316
and GRCh38
and GRCh37 as working example.
tr <- "ENST00000373316"
Here is use r Biocpkg("Gviz")
to query the latest Ensembl biomart
and extract the transcript of interest.
suppressMessages(library("Gviz")) suppressMessages(library("biomaRt")) h38 <- useMart("ensembl", "hsapiens_gene_ensembl") tr38 <- BiomartGeneRegionTrack(biomart = h38, transcript = tr) tr38 <- split(tr38, transcript(tr38)) tr38 <- ranges(tr38[[tr]]) tr38
Note the starting position of the transcript is r min(start(tr38))
.
Below, I repeat the same operation without using my own ens Mart instance. As far as I understand, Gviz queries the UCSC genome reference by default.
h37 <- useMart(host = "feb2014.archive.ensembl.org", biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl") tr37 <- BiomartGeneRegionTrack(biomart = h37, transcript = tr) tr37 <- split(tr37, transcript(tr37)) tr37 <- ranges(tr37[[tr]]) tr37
Note the starting position of the transcript is r min(start(tr37))
.
These differences seem to stem from different genome builds. Ensembl
release 78 uses GRCh38, while UCSC uses GRCh37. Indeed,
r Biocpkg("Gviz")
sets the Ensembl biomart server to Feb.2014
GRCh37.p13
.
We will use the coordinate mapping infrastructure described in the January 2015 Bioconductor Newletter and the Changing genomic coordinate systems with rtracklayer::liftOver workflow.
First, we query r Biocpkg("AnnotationHub")
for a chain file to
perform the operation we want.
library("AnnotationHub") hub <- AnnotationHub() query(hub, 'hg19ToHg38') chain <- query(hub, 'hg19ToHg38')[[1]]
The liftOver
function from the r Biocpkg("rtracklayer")
package
will use the chain and translate the coordinates of a GRanges
object
into a new GRangesList
object.
library("rtracklayer") res <- liftOver(tr37, chain) res <- unlist(res) ## set annotation names(res) <- NULL genome(res) <- "hsapiens_gene_ensembl" all.equal(res, tr38)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.