BiocStyle::markdown()
# Ensure that any errors cause the Vignette build to fail. library(knitr) opts_chunk$set(error=FALSE)
apiKey <- Sys.getenv("GOOGLE_API_KEY") if (nchar(apiKey) == 0) { warning(paste("To build this vignette, please setup the environment variable", "GOOGLE_API_KEY with the public API key from your Google", "Developer Console before loading the GoogleGenomics package,", "or run GoogleGenomics::authenticate.")) knitr::knit_exit() }
Google Genomics implements the GA4GH reads API and this R package can retrieve data from that implementation. For more detail, see https://cloud.google.com/genomics/v1beta2/reference/reads
library(GoogleGenomics) # This vignette is authenticated on package load from the env variable GOOGLE_API_KEY. # When running interactively, call the authenticate method. # ?authenticate
By default, this function retrieves reads for a small genomic region for one sample in 1,000 Genomes.
reads <- getReads() length(reads)
We can see that r length(reads)
individual reads were returned and that the JSON response was deserialized into an R list object:
class(reads) mode(reads)
The top level field names are:
names(reads[[1]])
And examining the alignment we see:
reads[[1]]$alignedSequence reads[[1]]$alignment$position$referenceName reads[[1]]$alignment$position$position
This is good, but this data becomes much more useful when it is converted to Bioconductor data types. For example, we can convert reads in this list representation to r Biocpkg("GAlignments")
:
readsToGAlignments(reads)
Let's take a look at the reads that overlap rs9536314 for sample NA12893 within the Illumina Platinum Genomes dataset. This SNP resides on chromosome 13 at position 33628137 in 0-based coordinates.
# Change the values of 'chromosome', 'start', or 'end' here if you wish to plot # alignments from a different portion of the genome. alignments <- getReads(readGroupSetId="CMvnhpKTFhD3he72j4KZuyc", chromosome="chr13", start=33628130, end=33628145, converter=readsToGAlignments) alignments
Notice that we passed the converter to the getReads method so that we're immediately working with GAlignments which means that we can start taking advantage of other Bioconductor functionality. Also keep in mind that the parameters start
and end
are expressed in 0-based coordinates per the GA4GH specification but the Bioconductor data type converters in r Biocpkg("GoogleGenomics")
, by default, transform the returned data to 1-based coordinates.
library(ggbio)
We can display the basic alignments and coverage data:
alignmentPlot <- autoplot(alignments, aes(color=strand, fill=strand)) coveragePlot <- ggplot(as(alignments, "GRanges")) + stat_coverage(color="gray40", fill="skyblue") tracks(alignmentPlot, coveragePlot, xlab="Reads overlapping rs9536314 for NA12893")
And also display the ideogram for the corresponding location on the chromosome:
ideogramPlot <- plotIdeogram(genome="hg19", subchr="chr13") ideogramPlot + xlim(as(alignments, "GRanges"))
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.