suppressMessages({ suppressPackageStartupMessages({ library(bedbaseRClient) library(httr) library(hca) library(TFutils) library(dplyr) }) })
bedbase.org assembles genomic data on many thousands of region-based scoring files. See the about page for bedbase.org for details.
This package explores the resource using Bioconductor/R. We'll use interesting facilities of the hca package to explore JSON outputs of the bedbase API.
We'll find out about resources related to the GM12878 cell line.
The get_bb_metadata
function is very simple. We'll learn how
to vary the types of data retrieved later on. One could question
whether 'GM12878' is a 'cell type' and we will examine this concept another time.
library(bedbaseRClient) q1 = get_bb_metadata(query_type="cell_type", query_val="GM12878") q1 cc = httr::content(q1) names(cc[[1]][[2]])
The content returned by the API is complex. The hca package can help navigate.
library(hca) lcc = lol(cc) # list of lists lol_path(lcc)
With lol_pull
, we can extract the content present along a given structural path.
exps = lol_pull(lcc, "[*][*].exp_protocol") table(exps)
That's informative. Let's check how many targets are transcription factors. We'll use the Lambert table in the TFutils package.
library(TFutils) lam = retrieve_lambert_main() targs = lol_pull(lcc, "[*][*].target") length(intersect(targs, lam$Name))
We can pivot to a different cell type as follows:
q2 = get_bb_metadata(query_type="cell_type", query_val="K562") kk = httr::content(q2) ktargs = lol_pull(lol(kk), "[*][*].target") length(intersect(ktargs, lam$Name))
We'll use the API to acquire 50 metadata records. Cell type
information is in an API component called other
.
fixc = function(x) ifelse(is.null(x), NA_character_, x) ctcheck = httr::GET("http://bedbase.org/api/bed/all/data?ids=md5sum&ids=other&limit=50") ddl = lol(httr::content(ctcheck)) types = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) x[["cell_type"]], character(1)) abs = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) fixc(x[["antibody"]]), character(1)) targs = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) fixc(x[["target"]]), character(1)) exps = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) fixc(x[["exp_protocol"]]), character(1))
The md5sum component is used for direct file access.
mds = unlist(lol_pull(ddl, "data[*]")[seq(1,100,2)]) tab50 = data.frame(celltype=types, expt=exps, Ab=abs, targ=targs, md5=mds) DT::datatable(tab50)
sel = tab50 |> filter(expt=="ChiPseq", celltype=="GM12878", Ab=="IKZF1") |> select(md5) query_bb(sel[1,1], which=GenomicRanges::GRanges("chr17:38000000-39000000"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.