suppressPackageStartupMessages({ library(jpNetPack) library(DT) library(mongolite) library(GenomicFiles) })
This package demonstrates high-level tasks related to research in genomic regulatory networks that we can perform with Bioconductor packages.
We use GRanges
instances to represent variants and genomic regions of interest.
suppressMessages({ library(jpNetPack) library(TxRegInfra) library(GenomeInfoDb) library(TFutils) library(knitr) library(DT) library(mongolite) }) head(demo_eQTL_granges, 3)
The FIMO runs from Kimbie Glass and Abhijeet Sonawane can be imported as GRanges, via a GenomicFiles object. In this example, binding affinities of VDR and POU2F1 are tabulated in a small region of chr17.
lapply(demo_fimo_granges, lapply, head, 3)
head(sbov_output_HS,3) head(sbov_output_FP,3)
In this example, we search TFs for binding sites that overlap with eQTL.
seqlevelsStyle(demo_eQTL_granges) = "UCSC" lapply(demo_fimo_granges, lapply, function(x) subsetByOverlaps(x, demo_eQTL_granges))
In the other direction, we enumerate eQTL assertions that overlap with TFBS:
lapply(demo_fimo_granges, lapply, function(x) subsetByOverlaps(demo_eQTL_granges, x))
This shows that two SNPs that have association with expression of multiple genes overlap with binding sites asserted for POU2F1, but not for any with VDR.
data(basicColData) as.data.frame(basicColData) %>% dplyr::filter(type=="eQTL") %>% datatable
We use mongolite to provide access to a specific tissue type for which GTEx eQTL are available.
con1 = mongo(url=URL_txregInAWS(), db="txregnet", collection="Adipose_Subcutaneous_allpairs_v7_eQTL") con1$find(limit=1)
This interface can be wrapped to simplify access to various tissue/assay types.
cd = TxRegInfra::basicColData rme0 = RaggedMongoExpt(con1, colData=cd) alleq = rme0[, which(colData(rme0)$type=="eQTL")] eq2 = alleq[, which(colData(alleq)$base == "Adipose")] eq2
We can use sbov
to look for overlaps between the eQTLs
and ranges of interest. At present we do one tissue at
a time. We need to make our queries with sturdy and self-describing
GRanges. Here we are interested in eQTL lying on chr17 between
positions 38000000 and 38100000.
query = GRanges("17", IRanges(38e6, 38.1e6)) si = GenomeInfoDb::Seqinfo(genome="hg19")["chr17"] seqlevelsStyle(si) = "NCBI" seqinfo(query) = si BiocParallel::register(BiocParallel::SerialParam()) chksub = sbov(eq2[,"Adipose_Subcutaneous_allpairs_v7_eQTL"], query) chkoment = sbov(eq2[,"Adipose_Visceral_Omentum_allpairs_v7_eQTL"], query)
Are there eQTL shared between subcutaneous adipose and visceral omentum samples in the query region?
fo = findOverlaps(chksub, chkoment) fo
Many are shared.
We don't use mongoDB for TF data -- the FIMO data are too voluminous. In the cloud we have a small number of FIMO runs. They have 'bed' in their names, but they are not really BED format.
library(TFutils) colnames(fimo16) = fimo16$HGNC fimo16 psfimo = fimo16[,c("POU2F1", "STAT1")] colnames(psfimo)
To find predicted binding regions and scores in an interval of interest, use
seqlevelsStyle(query) = "UCSC" # switch back pslook = fimo_granges( psfimo, query ) pslook
We need to tack gene symbols on to our eQTL reports.
addsyms = function(x, EnsDb=EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75) { ensids = gsub("\\..*", "", x$gene_id) # remove post period gns = ensembldb::genes(EnsDb) x$symbol = gns[ensids]$symbol x } sbov_output_eQTL = addsyms(sbov_output_eQTL) seqlevelsStyle(sbov_output_eQTL) = "UCSC"
Now we can obtain the intersections of our eQTL with binding sites for the two TFs inspected above:
ints = lapply(pslook, lapply, function(x) subsetByOverlaps(sbov_output_eQTL, x))
and convert these to graphNEL instances:
ll = lapply(ints, lapply, function(x) sbov_to_graphNEL(x))
then to igraph:
library(igraph) stat1 = igraph.from.graphNEL(ll$STAT1[[1]]) plot(stat1, main="SNP:Gene assoc in GTEx lung within STAT1 binding sites by FIMO")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.