knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
suppressPackageStartupMessages({ library(JASPAR2022) library(TFBSTools) })
JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.
The easiest way to use the JASPAR2022 data package [@10.1093/nar/gkab1113] is by TFBSTools
package interface [@Tan:2016], which provides functions to retrieve and manipulate data from the JASPAR database. This vignette demonstrates how to use those functions.
library(JASPAR2022) library(TFBSTools)
Matrices from JASPAR can be retrieved using either getMatrixByID
or getMatrixByName
function by providing a matrix ID or a matrix name from JASPAR, respectively. These functions accept either a single element as the ID/name parameter or a vector of values. The former case returns a PFMatrix
object, while the later one returns a PFMatrixList
with multiple PFMatrix
objects.
## the user assigns a single matrix ID to the argument ID pfm <- getMatrixByID(JASPAR2022, ID="MA0139.1") ## the function returns a PFMatrix object pfm
The user can utilise the PFMatrix object for further analysis and visualisation. Here is an example of how to plot a sequence logo of a given matrix using functions available in TFBSTools
package.
seqLogo(toICM(pfm))
## the user assigns multiple matrix IDs to the argument ID pfmList <- getMatrixByID(JASPAR2022, ID=c("MA0139.1", "MA1102.1")) ## the function returns a PFMatrix object pfmList ## PFMatrixList can be subsetted to extract enclosed PFMatrix objects pfmList[[2]]
getMatrixByName
retrieves matrices by name. If multiple matrix names are supplied, the function returns a PFMatrixList object.
pfm <- getMatrixByName(JASPAR2022, name="Arnt") pfm pfmList <- getMatrixByName(JASPAR2022, name=c("Arnt", "Ahr::Arnt")) pfmList
The getMatrixSet
function fetches all matrices that match criteria defined by the named arguments, and it returns a PFMatrixList
object.
## select all matrices found in a specific species and constructed from the SELEX experiment opts <- list() opts[["species"]] <- 9606 opts[["type"]] <- "SELEX" opts[["all_versions"]] <- TRUE PFMatrixList <- getMatrixSet(JASPAR2022, opts) PFMatrixList ## retrieve all matrices constructed from SELEX experiment opts2 <- list() opts2[["type"]] <- "SELEX" PFMatrixList2 <- getMatrixSet(JASPAR2022, opts2) PFMatrixList2
Additional details about TFBS matrix analysis can be found in the TFBSTools documantation.
Here is the output of sessionInfo()
on the system on which this document was compiled:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.