knitr::opts_chunk$set( tidy=FALSE, cache=TRUE, dev="png", message=FALSE, error=FALSE, warning=TRUE )
TFregulomeR data compendium is hosted in Singapore and Canada. As an API, TFregulomeR (from v2.0.0)
will dynamically access and retrieve the data either from Singapore (default) or
Canada server. The functions in TFregulomeR that dynamically link to the servers include
dataBrowser()
, loadPeaks()
, searchMotif()
, commonPeaks()
, exclusivePeaks()
,
intersectPeakMatrix()
, motifDistrib()
, genomeAnnotate()
and toTFBSTools()
.
By default, these functions are linking to Singapore server. If users opt to choose the
Canada server, they can use the input parameter server='ca'
when using these functions.
TFregulomeR allows users to easily browse TFregulomeR compendium using only one simple
function dataBrowser
(it’s called TFBSBrowser
before v1.2.0). Users can search the TFBS according to species, organ, sample type,
cell/tissue name, TF name, disease state, and source. A data.frame will be returned upon
searching, including the information in TFregulomeR ID (important for downstream analysis), species,
organ, sample type, cell/tissue name, description of cell/tissue, disease state, TF name,
source, TF source ID, number of peaks and number of peaks with motif.
In TFregulomeR project, we
used MEME-ChIP to perform motif de novo discovery in each ChIP-seq. Highly and centrally enriched
motifs were selected and compared with the existing TFBS databases, such as HOCOMOCO and JASPAR.
91 highly enriched motifs were not consistent with the TFBS databases. it might
be due to the fact that in those cell types, the given TFs are indirectly recruited to genome,
and/or that highly abundant presence of cohesion and polycomb group proteins masks the motif
enrichment of the ChIP’ed TF.
Besides, 136 motifs were not recorded for their corresponding TFs in the databases.
In order to confirm the reliability of these 227 motifs, we used HOMER to perform a de novo motif
discovery again. Motif results by HOMER were compared with those by MEME-ChIP and their similarity
were measured by normalised Pearson correlation coefficient using compare-matrices function in
RSAT with the formula: Ncor = cor * w / w_smaller,
where cor is raw Pearson correlation coefficient, w is the alignment width of two matrices from
MEME-ChIP and HOMER (the minimum value of w was set as 5), and w_smaller is the width of smaller
motifs from MEME-ChIP and HOMER. We found that majority of those PWM matrices generated by MEME-ChIP,
a combined algorithm suite of expectation maximization and regular expressions, were able to be
recapitulated by HOMER, which takes advantage of hypergeometric enrichment (Figure 1). We have added the information into the last two columns of dataBrowser
output (from v1.2.0).
In particular, if no input is given for the function, all records in TFregulomeR compendium will be returned.
knitr::include_graphics("../inst/extdata/motif_consistency.png")
library(TFregulomeR) # browse all records in TFregulomeR TFBS compendium all_record <- dataBrowser() # or TFBSBrowser() before v1.2.0 #> 1468 record(s) found: ... #> ... covering 415 TF(s) #> ... from 1 species: #> ... ...human #> ... from 29 organ(s): #> ... ... stem_cell, blood_and_lymph, connective_tissue, colorectum, brain, bone, stomach, prostate, breast, pancreas, skin, kidney, lung, eye, esophagus, heart, muscle, uterus, spleen, cervix, testis, liver, adrenal_gland, neck_and_mouth, pleura, ovary, thymus, fallopian, vagina #> ... in 3 sample type(s): #> ... ... primary_cells, cell_line, tissue #> ... in 414 different cell(s) or tissue(s) #> ... in 8 type(s) of disease state(s): #> ... ... normal, tumor, Simpson_Golabi_Behmel_syndrome, progeria, metaplasia, unknown, immortalized, premetastatic #> ... from the source(s): GTRD, MethMotif # returned table head(all_record) #> ID species organ sample_type #> 1 GTRD-EXP000061_HSA_embryonic-stem-cells_PRDM14 human stem_cell primary_cells #> 2 GTRD-EXP000080_HSA_CD4pos-T-cells_YY1 human blood_and_lymph primary_cells #> 3 GTRD-EXP000128_HSA_EWS502_FLI1 human connective_tissue cell_line #> 4 GTRD-EXP000132_HSA_HUVEC_FLI1 human connective_tissue cell_line #> 5 GTRD-EXP000140_HSA_LS180_TCF7L2 human colorectum cell_line #> 6 GTRD-EXP000142_HSA_LS180_CEBPB human colorectum cell_line #> cell_tissue_name description disease_state TF source source_ID #> 1 embryonic-stem-cells embryonic stem cells normal PRDM14 GTRD EXP000061 #> 2 CD4pos-T-cells CD4+ T-cells normal YY1 GTRD EXP000080 #> 3 EWS502 Ewing sarcoma tumor FLI1 GTRD EXP000128 #> 4 HUVEC umbilical vein endothelial cells normal FLI1 GTRD EXP000132 #> 5 LS180 colon cancer tumor TCF7L2 GTRD EXP000140 #> 6 LS180 colon cancer tumor CEBPB GTRD EXP000142 #> peak_num peak_with_motif_num Consistent_with_HOCOMOCO_JASPAR Ncor_between_MEME_ChIP_and_HOMER #> 1 17482 5656 YES NA #> 2 18298 2038 YES NA #> 3 89523 28914 YES NA #> 4 71437 36126 YES NA #> 5 16517 2312 YES NA #> 6 128542 67747 YES NA # browse TFBSs in blood and lymph blood_and_lymph_record <- dataBrowser(organ = "blood_and_lymph") # or TFBSBrowser() before v1.2.0 #> 494 record(s) found: ... #> ... covering 197 TF(s) #> ... from 1 species: #> ... ...human #> ... from 1 organ(s): #> ... ... blood_and_lymph #> ... in 3 sample type(s): #> ... ... primary_cells, cell_line, tissue #> ... in 129 different cell(s) or tissue(s) #> ... in 2 type(s) of disease state(s): #> ... ... normal, tumor #> ... from the source(s): GTRD, MethMotif # browse all CEBPB TFBSs CEBPB_record <- dataBrowser(tf = "CEBPB") # or TFBSBrowser() before v1.2.0 #> 16 record(s) founded: ... #> ... covering 1 TF(s) #> ... from 1 species: #> ... ...human #> ... from 9 organ(s): #> ... ... colorectum, uterus, blood_and_lymph, stem_cell, bone, lung, cervix, liver, #> breast #> ... in 2 sample type(s): #> ... ... cell_line, primary_cells #> ... in 16 different cell(s) or tissue(s) #> ... in 2 type(s) of disease state(s): #> ... ... tumor, normal #> ... from the source(s): GTRD, MethMotif
We have designed some useful functions for data retrieval from TFregulomeR compendium.
Retrieval of motif matrix and DNA methylation matrix if the source is MethMotif
(If the source is GTRD, no DNA methylation information is available) can be achieved
using searchMotif
. Further, these obtained matrices can be easily saved locally using exportMMPFM
,
and corresponding (Meth)Motif logo (MethMotif logo, if the source is MethMotif; motif logo, if the source is GTRD) can be simply plotted using plotLogo
. Here, we introduced an object of class
"MethMotif" using S4 class in order for an easy and intuitive storage, manipulation and conversion
(with other packages such as "TFBSTools") of a MethMotif matrix, which contains a TF
motif weight position matrix and its DNA methylation matrix (beta score matrix). What's more, we allow users to directly load peak regions of a TF of interest
(all peaks or peaks with motif only) from TFregulomeR compendium using loadPeaks
.
# according to TFBSBrowser results for all CEBPB TFBS query, we select two CEBPB TFBSs # from MethMotif and GTRD: MM1_HSA_K562_CEBPB, GTRD-EXP040801_HSA_HL-60_CEBPB. # loading MethMotif object in "MEME" format. Currently we support "MEME" and "TRANSFAC". K562_CEBPB <- searchMotif(id = "MM1_HSA_K562_CEBPB", motif_format = "MEME") #> There are a matched record exported in a MethMotif object. HL60_CEBPB <- searchMotif(id = "GTRD-EXP040801_HSA_HL-60_CEBPB", motif_format = "MEME") #> There are a matched record exported in a MethMotif object. class(K562_CEBPB) #> [1] "MethMotif" #> attr(,"package") #> [1] "TFregulomeR"
After obtaining a MethMotif matrix, user can use plotLogo
to plot logo as below (Figure 2).
If the TFBS source is MethMotif, then a MethMotif logo will be saved. Two options are available for motif logo,
"entropy" and "frequency", and also different methylation levels
("all", "methylated" and "unmethylated") can be opted for methylation bar charts. However,
if the TFBS source is GTRD, only a motif logo will be saved due to the unknown DNA methylation within motif.
In the plot, the number of peaks with motif will be printed. It should be NOTED that the motif logo is generated by the all possible TFBSs (initally scanned by FIMO after de novo motif discovery using MEME-ChIP suite with a p-value less than 1e-4) in the peak regions (+/-100bp around peak summits). It's possbile that one peak region contains more than one significant TFBSs. Hence, the number of TFBSs in the peak regions could be larger than the number of peaks with motif.
For each MethMotif logo, bar plot above motif logo denotes the cytosines in CpG context covered by WGBS at each base position in motif and these CpGs are segregated into three groups shown in different colors, namely homogeneously methylated (orange bar, beta score > 90%), heterogeneously methylated (green bar, beta score 10-90%) and homogenously unmethylated (blue bar, beta score < 10%).
plotLogo(K562_CEBPB, logo_type = "entropy", meth_level = "all") #> Success: a PDF named 'MM1_HSA_K562_CEBPB-logo-entropy.pdf' has been saved! plotLogo(K562_CEBPB, logo_type = "entropy", meth_level = "methylated") #> Success: a PDF named 'MM1_HSA_K562_CEBPB-logo-entropy-methylated-only.pdf' has been saved! plotLogo(K562_CEBPB, logo_type = "entropy", meth_level = "unmethylated") #> Success: a PDF named 'MM1_HSA_K562_CEBPB-logo-entropy-unmethylated-only.pdf' has been saved! plotLogo(K562_CEBPB, logo_type = "frequency", meth_level = "all") #> Success: a PDF named 'MM1_HSA_K562_CEBPB-logo-frequency.pdf' has been saved! plotLogo(K562_CEBPB, logo_type = "frequency", meth_level = "methylated") #> Success: a PDF named 'MM1_HSA_K562_CEBPB-logo-frequency-methylated-only.pdf' has been saved! plotLogo(K562_CEBPB, logo_type = "frequency", meth_level = "unmethylated") #> Success: a PDF named 'MM1_HSA_K562_CEBPB-logo-frequency-unmethylated-only.pdf' has been saved! # plot motif logo for GTRD-EXP040801_HSA_HL-60_CEBPB. No DNA methylation states available here plotLogo(HL60_CEBPB, logo_type = "entropy") #> Success: a PDF named 'GTRD-EXP040801_HSA_HL-60_CEBPB-logo-entropy.pdf' has been saved!
knitr::include_graphics("../inst/extdata/plotLogo.png")
Motif matrix as well as methylation matrix (beta score matrix), if available, can be saved locally using exportMMPFM
.
To be noted, the function exportMMPFM
is also able to export (Meth)Motif matrix for
the outputs of commonPeaks
, exclusivePeaks
and intersectPeakMatrix
by specifying "fun = ".
We will introduce it in the following section.
# export MethMotif matrix for MM1_HSA_K562_CEBPB exportMMPFM(fun_output = K562_CEBPB, fun = "searchMotif", save_motif_PFM = TRUE, save_betaScore_matrix = TRUE) #> Start exporting ... ... #> ... ... You chose to save motif PFM and beta score matrix. #> ... ... export searchMotif #> ... ... ... ... Beta score matrix has been saved as 'MM1_HSA_K562_CEBPB-methScore.txt'. #> ... ... ... ... Motif PFM has been saved as 'MM1_HSA_K562_CEBPB-motif-MEME.txt'. # export motif matrix for GTRD-EXP040801_HSA_HL-60_CEBPB exportMMPFM(fun_output = HL60_CEBPB, fun = "searchMotif", save_motif_PFM = TRUE, save_betaScore_matrix = TRUE) #> Start exporting ... ... #> ... ... You chose to save motif PFM and beta score matrix. #> ... ... export searchMotif #> ... ... ... ... No beta score matrix is available. Skip! #> ... ... ... ... Motif PFM has been saved as 'GTRD-EXP040801_HSA_HL-60_CEBPB-motif-MEME.txt'. # exprot motif matrix in TRANSFAC format K562_CEBPB_TRANSFAC <- searchMotif(id = "MM1_HSA_K562_CEBPB", motif_format = "TRANSFAC") #> There are a matched record exported in a MethMotif object. exportMMPFM(fun_output <- K562_CEBPB_TRANSFAC, fun = "searchMotif", save_motif_PFM = TRUE, save_betaScore_matrix = TRUE) #> Start exporting ... ... #> ... ... You chose to save motif PFM and beta score matrix. #> ... ... export searchMotif #> ... ... ... ... Beta score matrix has been saved as 'MM1_HSA_K562_CEBPB-methScore.txt'. #> ... ... ... ... Motif PFM has been saved as 'MM1_HSA_K562_CEBPB-motif-TRANSFAC.txt'.
More importantly, we allow the users to load peak regions (all peaks or peaks only with motif)
of all TFs in TFregulomeR compendium using loadPeaks
. To be noted, the peak regions of a given
TF in TFregulomeR compendium are the peak summits (hg38 for human), and the TFBS is enriched
in a +/- 100bp window surrounding the peak summits. For each peak region, we also provide its tag (read) fold change (fifth column of returned peaks). This read enrichment value is obtained from MACS2 and denotes the fold change of reads in TF ChIP-seq compared to input ChIP-seq.
K562_CEBPB_peaks <- loadPeaks(id = "MM1_HSA_K562_CEBPB", includeMotifOnly = TRUE) #> Success: peak file has been returned in a data frame! head(K562_CEBPB_peaks) #> chr start end id tag_fold_change #> 1 chr3 101823721 101823722 MM1_HSA_K562_CEBPB_peaks_with_motif_1 22.91742 #> 2 chr3 101850619 101850620 MM1_HSA_K562_CEBPB_peaks_with_motif_2 13.09647 #> 3 chr3 102182290 102182291 MM1_HSA_K562_CEBPB_peaks_with_motif_3 17.28870 #> 4 chr3 105626970 105626971 MM1_HSA_K562_CEBPB_peaks_with_motif_4 23.36092 #> 5 chr3 105647238 105647239 MM1_HSA_K562_CEBPB_peaks_with_motif_5 34.80412 #> 6 chr3 105899733 105899734 MM1_HSA_K562_CEBPB_peaks_with_motif_6 13.61153
knitr::include_graphics("../inst/extdata/commonPeaks_tutorial.png")
TFregulomeR provides the functionality to find the common peak regions along with DNA methylation
profiles and read (tag) enrichments using commonPeaks
.
For the target peak sets, users can directly use TFregulomeR TF peaks (hg38 for human) by inputting its TFregulomeR IDs in target_peak_id
(all peaks or peaks with motif only can be opted using motif_only_for_target_peak
), or their
own peak regions in user_target_peak_list
. If customised peak sets are provided, all peak sets
should be stored in an R list(), and each peak set should be a bed-format data.frame with the
first three columns as chromosome (starting with 'chr'), start and end. It's recommended that users
provide the UNIQUE IDs for their customised peak list in user_target_peak_id
(also should be unique to the provided TFregulomeR
ID list in target_peak_id
). If unavailable, the function will automatically assign IDs for the
user's peak sets. It should be noted that if the customised peak set is derived TFregulomeR
compendium, it's highly recommended that its TFregulomeR ID should be provided correspondingly in
user_target_peak_id
if one opts to profile the DNA methylation levels. Even though TFregulomeR peak sets
are peak summits, the function is able to recognise it with the provided TFregulomeR ID in user_target_peak_id
and
automatically expand +/- 100bp during the analysis.
For the compared peak sets, same rules are applicable to options compared_peak_id
,
motif_only_for_compared_peak
, user_compared_peak_list
and user_compared_peak_id
when loading compared peak sets.
During the analysis, EACH of target peak set from TFregulomeR compendium using target_peak_id
and/or user provided using
user_target_peak_list
will be compared with ALL input compared peak sets (compared_peak_id
and
user_compared_peak_list
), to get a final target sub-ensemble peaks shared by
all compared peak sets.
If methylation_profile_in_narrow_region=TRUE
, DNA methylation profiling in +/- 100bp surrounding peak summit will be performed
for each target common peak sub-ensemble, if its ID labeled in target_peak_id
and user_target_peak_list
matches a MethMotif ID ("MM1_HSA_") in TFregulomeR.
# read my local file. Here we use the HCT116 CEBPB binding sites from the publication PMID: 30380113 my_peak_path <- system.file("extdata", "HCT116_CEBPb_binding_sites.txt", package = "TFregulomeR") my_peak <- read.delim(my_peak_path, sep = "\t", header = FALSE) head(my_peak) #> V1 V2 V3 #> 1 chr1 58585814 58585827 #> 2 chr12 122925699 122925712 #> 3 chr9 5818111 5818124 #> 4 chr19 5850889 5850902 #> 5 chr10 5951274 5951287 #> 6 chr1 8175025 8175038 # To get the sub-ensemble peaks of K562 CEBPB peaks and my peaks, which are # share with the CEBPB peaks in all cell types from TFregulomeR compendium, # and at the same time to profile the DNA methylation states in the final subsets. # 1) Get all CEBPB records in TFregulomeR compendium CEBPB_record <- dataBrowser(tf = "CEBPB") # or TFBSBrowser() before v1.2.0 #> 16 records(s) founded: ... #> ... covering 1 TF(s) #> ... from 1 species: #> ... ...human #> ... from 9 organ(s): #> ... ... colorectum, uterus, blood_and_lymph, stem_cell, bone, lung, cervix, liver, breast #> ... in 2 sample type(s): #> ... ... cell_line, primary_cells #> ... in 16 different cell(s) or tissue(s) #> ... in 2 type(s) of disease state(s): #> ... ... tumor, normal #> ... from the source(s): GTRD, MethMotif # 2) Start commonPeaks analysis commonPeak_output <- commonPeaks(target_peak_id = "MM1_HSA_K562_CEBPB", motif_only_for_target_peak = TRUE, user_target_peak_list = list(my_peak), user_target_peak_id = c("HCT116_CEBPB"), compared_peak_id = CEBPB_record$ID, motif_only_for_compared_peak = TRUE, methylation_profile_in_narrow_region = TRUE) #> TFregulomeR::commonPeaks() starting ... ... #> You chose to profile the methylation levels in 200bp window around peak summits, #> if there is any peak loaded from TFregulome #> Loading target peak list ... ... #> ... You have 1 TFBS(s) requested to be loaded from TFregulomeR server #> ... You chose to load TF peaks with motif only. Using 'motif_only_for_target_peak' tunes your options #> ... loading TFBS(s) from TFregulomeR now #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_CEBPB' #> ... Done loading TFBS(s) from TFregulome #> ... You have 1 customised peak set(s) #> Loading compared peak list ... ... #> ... You have 16 TFBS(s) requested to be loaded from TFregulomeR server #> ... You chose to load TF peaks with motif only. Using 'motif_only_for_compared_peak' tunes your options #> ... loading TFBS(s) from TFregulomeR now #> ... ... peak file loaded successfully for id 'GTRD-EXP000142_HSA_LS180_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP010975_HSA_Ishikawa_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP030173_HSA_LoVo_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP030702_HSA_blood-monocytes_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP034967_HSA_mesenchymal-stem-cells_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP036478_HSA_fetal-osteoblasts_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP040652_HSA_monocyte-derived-macrophages_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP040801_HSA_HL-60_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_A549_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_H1-hESC_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_HCT116_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_HeLa-S3_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_HepG2_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_IMR-90_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_MCF-7_CEBPB' #> ... Done loading TFBS(s) from TFregulome #> Start analysing: MM1_HSA_K562_CEBPB... ... #> Start analysing: HCT116_CEBPB... ... #> Done analysing.
After getting the output of commonPeaks
, you can use commonPeakResult
to get
1) the summary, 2) the common peak regions, 3) DNA methylation levels in 200bp around common peak summits if the input TF source is
MethMotif in TFregulomeR compendium and 4) the (Meth)Motif logo if TF input is from TFregulomeR warehouse or with TFregulomeR ID.
commonPeak_result <- commonPeakResult(commonPeaks = commonPeak_output, return_common_peak_sites = TRUE, save_MethMotif_logo = TRUE, return_methylation_profile = TRUE, return_summary = TRUE) #> Start getting the results of commonPeakResult ... #> ... ... You chose to return common peak sites; #> ... ... You chose to return methylation profile; #> ... ... You chose to return common peak summary; #> ... ... ... ALL of common peak sets, methylation profiles and peak summary will be #> stored in a list, and named with 'common_peak_list', 'methylation_profile' and #> 'peak_summary' in the list. Use 'names()' in the output for its detials. #> ... ... You chose to save MethMotif logo in PDF if any; #> ... ... ... You chose entropy logo; #> ... ... ... You chose to show all methylation levels; #> Success: a PDF named 'MM1_HSA_K562_CEBPB_common_peaks-logo-entropy.pdf' has been saved! #> ... ... ... The input peak set for the results 'HCT116_CEBPB_common_peaks' was #> not originated from TFregulomeR or the number of direct binding sites in the #> common peaks is 0, so no motif logo available. # the contents in commonPeak_result names(commonPeak_result) #> [1] "common_peak_list" "methylation_profile" "peak_summary" common_peak_list <- commonPeak_result$common_peak_list methylation_profile <- commonPeak_result$methylation_profile peak_summary <- commonPeak_result$peak_summary # peak summary: 1.137225% of K562 CEBPB peaks and 2.517540% of my peaks are common with # the CEBPB peaks in all cell types available in TFregulomeR warehouse. peak_summary #> percentage_in_original_inputs(%) #> MM1_HSA_K562_CEBPB_common_peaks 1.137225 #> HCT116_CEBPB_common_peaks 2.517540 # common peak regions K562_CEBPB_common_peak <- common_peak_list$MM1_HSA_K562_CEBPB_common_peaks head(K562_CEBPB_common_peak) #> chr start end id tag_fold_change #> 39 chr3 126261153 126261154 MM1_HSA_K562_CEBPB_peaks_with_motif_39 91.77499 #> 145 chr3 152100699 152100700 MM1_HSA_K562_CEBPB_peaks_with_motif_145 74.01529 #> 276 chr3 188239782 188239783 MM1_HSA_K562_CEBPB_peaks_with_motif_276 13.06116 #> 283 chr3 194003766 194003767 MM1_HSA_K562_CEBPB_peaks_with_motif_283 26.67740 #> 435 chr4 73704747 73704748 MM1_HSA_K562_CEBPB_peaks_with_motif_435 54.26637 #> 456 chr4 77158137 77158138 MM1_HSA_K562_CEBPB_peaks_with_motif_456 31.97526 # methylation profile in common peak regions names(methylation_profile) #> [1] "MM1_HSA_K562_CEBPB_common_peaks" "HCT116_CEBPB_common_peaks" methylation_profile$MM1_HSA_K562_CEBPB_common_peaks #> CpG_num #> 0-10% 161 #161 CpG methylation scores are less than 0.1 (homogeneously unmethylated) in +/-100bp window around common peak summits #> 10-20% 9 #> 20-30% 17 #> 30-40% 2 #> 40-50% 1 #> 50-60% 2 #> 60-70% 9 #> 70-80% 5 #> 80-90% 13 #> 90-100% 25 #25 CpG methylation scores are more than 0.9 (homogeneously methylated) in +/-100bp window around common peak summits # customised input peaks are not originated from TFregulomeR compendium, so no DNA methylation states methylation_profile$HCT116_CEBPB_common_peaks #> [,1] #> [1,] NA
knitr::include_graphics("../inst/extdata/MM1_HSA_K562_CEBPB_common_peaks-logo-entropy.png")
knitr::include_graphics("../inst/extdata/exclusivePeaks_tutorial.png")
Exclusive peak regions are important to study a TF's context-specific function.
Hence, we implemented such functionality to achieve the extraction of the context dependent peak loci
along with DNA methylation profiles, exclusivePeaks
.
For the target peak sets, users can directly use TFregulomeR TF peaks (hg38 for human) by inputting the TFregulomeR IDs in target_peak_id
(all peaks or peaks with motif only can be selected using motif_only_for_target_peak
), or their
own peak regions in user_target_peak_list
. If customised peak sets are provided, all peak sets
should be stored in an R list(), and each peak set should be a bed-format data.frame with the
first three columns as chromosome (starting with 'chr'), start and end. It's recommended that users
provide the UNIQUE IDs for their customised peak list in user_target_peak_id
(also should be unique to the provided TFregulomeR
ID list in target_peak_id
). If unavailable, the function will automatically assign IDs for the
provided peak sets. It should be noted that if the customised target peak set is derived from TFregulomeR
compendium, it's highly recommended that its TFregulomeR ID should be provided correspondingly in
user_target_peak_id
if one opts to profile the DNA methylation levels . Even though TFregulomeR peak sets
are peak summits, the function is able to recognise it with the provided TFregulomeR ID in user_target_peak_id
and
automatically expand +/- 100bp during the analysis.
For the excluded peak sets, same rules are applicable to options excluded_peak_id
,
motif_only_for_excluded_peak
, user_excluded_peak_list
and user_excluded_peak_id
when loading excluded peak sets.
During the analysis, EACH of target peak set from TFregulomeR compendium using target_peak_id
and/or user provided using
user_target_peak_list
will be compared with ALL input excluded peak sets from excluded_peak_id
and
user_excluded_peak_list
, to get a final target peak sub-ensemble which is exclusive
from all input excluded peak sets.
If methylation_profile_in_narrow_region=TRUE
, DNA methylation profiling in +/- 100bp surrounding peak summit will be performed
for each target exclusive peak sub-ensemble, if its ID labeled in target_peak_id
and user_target_peak_list
matches a MethMotif ID record ("MM1_HSA_") in TFregulomeR
compendium.
# To get the exclusive subset of K562 CEBPB peaks and at the same time to profile the DNA # methylation states in the exclusive subset # 1) Get all CEBPB records in TFregulomeR warehouse CEBPB_record <- dataBrowser(tf = "CEBPB") # or TFBSBrowser() before v1.2.0 #> 16 record(s) founded: ... #> ... covering 1 TF(s) #> ... from 1 species: #> ... ...human #> ... from 9 organ(s): #> ... ... colorectum, uterus, blood_and_lymph, stem_cell, bone, lung, cervix, liver, breast #> ... in 2 sample type(s): #> ... ... cell_line, primary_cells #> ... in 16 different cell(s) or tissue(s) #> ... in 2 type(s) of disease state(s): #> ... ... tumor, normal #> ... from the source(s): GTRD, MethMotif # 2) All CEBPB TFregulomeR IDs except MM1_HSA_K562_CEBPB CEBPB_record_ID_noK562 <- CEBPB_record$ID[!(CEBPB_record$ID %in% "MM1_HSA_K562_CEBPB")] exclusivePeak_output <- exclusivePeaks(target_peak_id = "MM1_HSA_K562_CEBPB", motif_only_for_target_peak = TRUE, excluded_peak_id = CEBPB_record_ID_noK562, motif_only_for_excluded_peak = TRUE, methylation_profile_in_narrow_region = TRUE) #> TFregulomeR::exclusivePeaks() starting ... ... #> You chose to profile the methylation levels in 200bp window around peak summits, #> if there is any peak loaded from TFregulome #> Loading target peak list ... ... #> ... You have 1 TFBS(s) requested to be loaded from TFregulomeR server #> ... You chose to load TF peaks with motif only. Using 'motif_only_for_target_peak' tunes your options #> ... loading TFBS(s) from TFregulomeR now #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_CEBPB' #> ... Done loading TFBS(s) from TFregulome #> Loading excluded peak list ... ... #> ... You have 15 TFBS(s) requested to be loaded from TFregulomeR server #> ... You chose to load TF peaks with motif only. Using 'motif_only_for_excluded_peak' tunes your options #> ... loading TFBS(s) from TFregulomeR now #> ... ... peak file loaded successfully for id 'GTRD-EXP000142_HSA_LS180_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP010975_HSA_Ishikawa_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP030173_HSA_LoVo_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP030702_HSA_blood-monocytes_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP034967_HSA_mesenchymal-stem-cells_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP036478_HSA_fetal-osteoblasts_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP040652_HSA_monocyte-derived-macrophages_CEBPB' #> ... ... peak file loaded successfully for id 'GTRD-EXP040801_HSA_HL-60_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_A549_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_H1-hESC_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_HCT116_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_HeLa-S3_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_HepG2_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_IMR-90_CEBPB' #> ... ... peak file loaded successfully for id 'MM1_HSA_MCF-7_CEBPB' #> ... Done loading TFBS(s) from TFregulome #> Start analysing: MM1_HSA_K562_CEBPB... ... #> Done analysing.
After getting the output of exclusivePeaks
, you can use exclusivePeakResult
to get
1) the summary, 2) the exclusive peak regions, 3) DNA methylation levels in exclusive peak subsets if the input source is
MethMotif in TFregulomeR compendium and 4) the (Meth)Motif logo if the input target is from TFregulomeR compendium or comes with a TFregulomeR ID.
exclusivePeak_result <- exclusivePeakResult(exclusivePeaks = exclusivePeak_output, return_exclusive_peak_sites = TRUE, save_MethMotif_logo = TRUE, return_methylation_profile = TRUE, return_summary = TRUE) #> Start getting the results of exclusivePeaks ... #> ... ... You chose to return exclusive peak sites; #> ... ... You chose to return methylation profile; #> ... ... You chose to return exclusive peak summary; #> ... ... ... ALL of exclusive peak sets, methylation profiles and peak summary will #> be stored in a list, and named with 'exclusive_peak_list', 'methylation_profile' and #> 'peak_summary' in the list. Use 'names()' in the output for its detials. #> ... ... You chose to save MethMotif logo in PDF if any; #> ... ... ... You chose entropy logo; #> ... ... ... You chose to show all methylation levels; #> Success: a PDF named 'MM1_HSA_K562_CEBPB_exclusive_peaks-logo-entropy.pdf' has been saved! # the contents in exclusivePeak_result names(exclusivePeak_result) #> [1] "exclusive_peak_list" "methylation_profile" "peak_summary" exclusive_peak_list <- exclusivePeak_result$exclusive_peak_list peak_summary <- exclusivePeak_result$peak_summary methylation_profile <- exclusivePeak_result$methylation_profile # peak summary, 9.110437% of K562 CEBPB peaks are unique compared with # all CEBPB TFBSs in TFregulomeR warehouse peak_summary #> percentage_in_original_inputs(%) #> MM1_HSA_K562_CEBPB_exclusive_peaks 9.110437 K562_CEBPB_exclusive_peak <- exclusive_peak_list$MM1_HSA_K562_CEBPB_exclusive_peaks head(K562_CEBPB_exclusive_peak) #> chr start end id tag_fold_change #> 4 chr3 105626970 105626971 MM1_HSA_K562_CEBPB_peaks_with_motif_4 23.36092 #> 22 chr3 119695686 119695687 MM1_HSA_K562_CEBPB_peaks_with_motif_22 10.89517 #> 45 chr3 128415444 128415445 MM1_HSA_K562_CEBPB_peaks_with_motif_45 40.57037 #> 57 chr3 130184388 130184389 MM1_HSA_K562_CEBPB_peaks_with_motif_57 23.26712 #> 58 chr3 130343212 130343213 MM1_HSA_K562_CEBPB_peaks_with_motif_58 13.14518 #> 68 chr3 133629332 133629333 MM1_HSA_K562_CEBPB_peaks_with_motif_68 20.32872 # methylation profile in exclusive peak regions names(methylation_profile) #> [1] "MM1_HSA_K562_CEBPB_exclusive_peaks" methylation_profile$MM1_HSA_K562_CEBPB_exclusive_peaks #> CpG_num #> 0-10% 852 #852 CpG methylation scores are less than 0.1 (homogeneously unmethylated) in +/-100bp window around exclusive peak summits #> 10-20% 149 #> 20-30% 86 #> 30-40% 63 #> 40-50% 45 #> 50-60% 39 #> 60-70% 25 #> 70-80% 46 #> 80-90% 29 #> 90-100% 40 #40 CpG methylation scores are more than 0.9 ((homogeneously methylated)) in +/-100bp window around exclusive peak summits
knitr::include_graphics("../inst/extdata/MM1_HSA_K562_CEBPB_exclusive_peaks-logo-entropy.png")
knitr::include_graphics("../inst/extdata/intersectPeakMatrix_tutorial.png")
TFregulomeR allows users to portray the co-binding landscapes between two collections
of TFs, along with DNA methylation states in the pair-wise intersected peaks, using intersectPeakMatrix
.
This functionality is particularly useful to study TF cofactor and interactome in a cell type. Different from commonPeaks
,
intersectPeakMatrix
perform an exhaustive pair-wise intersection analysis between peak list x and peak list y,
to form an x*y intersection matrix. Therefore, it is required for users to provide the two lists of peak sets.
For peak list x, users can directly use TFregulomeR peaks by providing TFregulomeR ID in peak_id_x
and indicating
whether loading peaks with motif only using motif_only_for_id_x
. In addition, customised peak sets can also be
input in user_peak_list_x
. It's recommended that UNIQUE IDs (also unique to IDs in peak_id_x
) be provided for
each customised peak set in user_peak_x_id
. If the customised peak set is derived from TFregulomeR compendium, it's highly
recommended that the corresponding TFregulomeR ID be provided in user_peak_x_id
, which allows the function
to recognise the source of peak set and to properly profile the DNA methylation states
(if methylation_profile_in_narrow_region=TRUE
)
as well as read enrichments in the intersected regions. Even though TFregulomeR peak sets
are peak summits, the function is able to recognise it with the provided TFregulomeR ID in peak_id_x
and
automatically expand +/- 100bp during the analysis.
Same principles are applicable for peak list y.
Advice: It's to be noted that function intersectPeakMatrix
could take up several minutes to hours depending on the number of input TFs. It is a wise choice to save a intersectPeakMatrix
output into R data
format using saveRDS(object, file="my_data.rds")
and restore it using readRDS(file="my_data.rds")
if you need to re-use the output to extract other results by function intersectPeakMatrixResult
in the future.
# profile the co-binding landscapes of all K562 TFs in TFregulomeR warehouse surrounding # K562 CEBPB common and exclusive peaks # browse all TFBS record in K562 K562_TFBS = dataBrowser(cell_tissue_name = "K562") # or TFBSBrowser() before v1.2.0 #> 131 record(s) found: ... #> ... covering 131 TF(s) #> ... from 1 species: #> ... ...human #> ... from 1 organ(s): #> ... ... blood_and_lymph #> ... in 1 sample type(s): #> ... ... cell_line #> ... in 1 different cell(s) or tissue(s) #> ... in 1 type(s) of disease state(s): #> ... ... tumor #> ... from the source(s): MethMotif # co-binding landscapes in K562 CEBPB common peaks intersectMatrix_common <- intersectPeakMatrix(user_peak_list_x = list(K562_CEBPB_common_peak), user_peak_x_id = c("MM1_HSA_K562_CEBPB"), peak_id_y = K562_TFBS$ID, motif_only_for_id_y = TRUE, methylation_profile_in_narrow_region = TRUE) #> TFregulomeR::intersectPeakMatrix() starting ... ... #> You chose to profile the methylation levels in 200bp window around peak summits, #> if there is any peak loaded from TFregulome. It will make the program slow. #> Disable it if you want a speedy analysis and do not care about methylation #> Loading peak list x ... ... #> ... You have 1 customised peak set(s) #> Loading peak list y ... ... #> ... You have 131 TFBS(s) requested to be loaded from TFregulomeR server #> ... You chose to load TF peaks with motif only. Using 'motif_only_for_id_y' tunes your options #> ... loading TFBS(s) from TFregulomeR now #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_AFF1' #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_ARID2' #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_ARID3A' #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_ATF1' #> ... ... #> ... ... peak file loaded successfully for id 'MM1_HSA_K562_ZSCAN29' #> ... Done loading TFBS(s) from TFregulome #> Start analysing list x:MM1_HSA_K562_CEBPB... ... #> ... ... Start analysing list y:MM1_HSA_K562_AFF1 #> ... ... Start analysing list y:MM1_HSA_K562_ARID2 #> ... ... Start analysing list y:MM1_HSA_K562_ARID3A #> ... ... Start analysing list y:MM1_HSA_K562_ATF1 #> ... ... #> ... ... Start analysing list y:MM1_HSA_K562_ZSCAN29 # K562 CEBPB exclusive peaks intersectMatrix_exclusive <- intersectPeakMatrix(user_peak_list_x = list(K562_CEBPB_exclusive_peak), user_peak_x_id = c("MM1_HSA_K562_CEBPB"), peak_id_y = K562_TFBS$ID, motif_only_for_id_y = TRUE, methylation_profile_in_narrow_region = TRUE) #> TFregulomeR::intersectPeakMatrix() starting ... ... #> You chose to profile the methylation levels in 200bp window around peak summits, #> if there is any peak loaded from TFregulome. It will make the program slow. #> Disable it if you want a speedy analysis and do not care about methylation #> Loading peak list x ... ... #> ... You have 1 customised peak set(s) #> Loading peak list y ... ... #> ... You have 108 TFBS(s) requested to be loaded from TFregulomeR server #> ... You chose to load TF peaks with motif only. Using 'motif_only_for_id_y' tunes your options #> ... loading TFBS(s) from TFregulomeR now #> .. ... peak file loaded successfully for id 'MM1_HSA_K562_AFF1' #> .. ... peak file loaded successfully for id 'MM1_HSA_K562_ARID2' #> .. ... peak file loaded successfully for id 'MM1_HSA_K562_ARID3A' #> .. ... peak file loaded successfully for id 'MM1_HSA_K562_ATF1' #> ... ... #> .. ... peak file loaded successfully for id 'MM1_HSA_K562_ZSCAN29' #> ... Done loading TFBS(s) from TFregulome #> Start analysing list x:MM1_HSA_K562_CEBPB... ... #> ... ... Start analysing list y:MM1_HSA_K562_AFF1 #> ... ... Start analysing list y:MM1_HSA_K562_ARID2 #> ... ... Start analysing list y:MM1_HSA_K562_ARID3A #> ... ... Start analysing list y:MM1_HSA_K562_ATF1 #> ... ... #> ... ... Start analysing list y:MM1_HSA_K562_ZSCAN29
We have implemented intersectPeakMatrixResult
in TFregulomeR package, allowing an easy extraction and interpretation of intersectPeakMatrix
output. It is worth noting that there are two ways of interpretations in intersection between set A and B, that is, 1) the percentage of A overlapped with B, and 2) the percentage of B intersected with A. Same principle is applicable for the output of intersectPeakMatrix
.
The output of intersectPeakMatrix
is a X*Y matrix table (X peak sets in peak list x
and Y peak sets in peak list y) and each table cell contains an
intersectPeakMatrix class object. IntersectPeakMatrix class is a exclusively-designed
S4 class and each object contains:
1) Information of peak x subset overlapped with peak y:
i) overlap percentage in peak x;
ii) motif in peak x overlapped with peak y;
iii) tag enrichment in peak x overlapped with peak y;
iv) DNA methylation in peak x overlapped with peak y, if peak x is derived from MethMotif in TFregulomeR compendium.
and
2) Information of peak y subset overlapped with peak x:
i) overlap percentage in peak y;
ii) motif in peak y overlapped with peak x;
iii) tag enrichment in peak y overlapped with peak x;
iv) DNA methylation in peak y overlapped with peak x, if peak y is derived from MethMotif in TFregulomeR compendium.
By using return_intersection_matrix = TRUE
and angle_of_matrix = "x"
in function
intersectPeakMatrixResult
, user can obtain an intersection matrix table. Row i
and column j table cell denotes the percent of peak x(i) overlapped with peak y(j). If
angle_of_matrix = "y"
, then row i and column j table cell denotes the percent of peak y(j) overlapped with peak x(i).
By using return_methylation_profile = TRUE
and angle_of_methylation_profile = "x"
in function
intersectPeakMatrixResult
, user can obtain a DNA methylation matrix table. Row i
and column j table cell contains a vector about the statistics of CpG methylation states
within the peak x(i) overlapped with peak y(j). If
angle_of_matrix = "y"
, then row i and column j table cell contains a vector about the statistics of CpG methylation states within the peak y(j) overlapped with peak x(i).
By using save_MethMotif_logo = TRUE
and angle_of_logo = "x"
in function
intersectPeakMatrixResult
, the function will plot and save (Meth)Motif logo for each peak x
intersected with peak y (if any of peak x is derived from TFregulomeR compendium).
If angle_of_logo = "y"
, the function will plot and save (Meth)Motif logo for each peak y
intersected with peak x (if any of peak y is derived from TFregulomeR compendium). Indeed,
it's not necessary to plot all (Meth)Motif logos for every pair of intersection at the same time. One can focus only on some subsets of peak_list_x
and peak_list_y
, using
saving_MethMotif_logo_x_id
and saving_MethMotif_logo_y_id
respectively.
Lastly, By using return_tag_density = TRUE
and angle_of_tag_density = "x"
in function
intersectPeakMatrixResult
, user can obtain a read enrichment matrix table. Row i
and column j table cell denotes a read enrichment value within the peak x(i) overlapped
with peak y(j). If angle_of_tag_density = "y"
, then row i and column j table cell denotes a
read enrichment value within the peak y(j) overlapped with peak y(x).There are five
read enrichment values to be selected using tag_density_value
: median, mean,
SD (standard deviation), quartile_25 (first quartile) and quartile_75 (third quartile).
In order to make cofactor analysis more straightforward, a function cofactorReport
has been exclusively designed. Just by simply inputting the output of intersectPeakMatrix
into the function, cofactor report will be saved as a PDF file for each peak x. In the
report, top cofactors of peak x will be reported along with motif sequences, DNA
methylation within motif and 200bp peak regions as well as read enrichment scores (median,
first quartile and third quartile).
# get the intersection matrix for K562 common peaks IM_K562_common_result <- intersectPeakMatrixResult(intersectPeakMatrix = intersectMatrix_common, return_intersection_matrix = TRUE, angle_of_matrix = "x", return_methylation_profile = TRUE, angle_of_methylation_profile = "x", return_tag_density = TRUE, angle_of_tag_density = "x", tag_density_value = "median") #> Start getting the results of intersectPeakMatrix ... #> ... ... You chose to return intersection matrix; #> ... ... ... You chose x-wise intersection matrix; #> ... ... You chose to return tag density; #> ... ... ... the tag density value you chose to return is median #> ... ... ... You chose to return tag density for peak list x; #> ... ... You chose to return methylation profile; #> ... ... ... You chose to return methylation profile for peak list x; #> ... ... You chose NOT to save MethMotif logo in PDF if any; names(IM_K562_common_result) #> [1] "intersection_matrix" "tag_density_matrix" "methylation_profile_matrix" # return intersection matrix table for K562 CEBPB shared peaks IM_K562_common_intersect_matrix <- IM_K562_common_result$intersection_matrix dim(IM_K562_common_intersect_matrix) #> [1] 1 131 # 1 row represents the 1 input peak x, 131 columns represent the 131 peak y # e.g. 1.111111%, 3.333333% and 1.111111% of K562 common peaks overlapped with MM1_HSA_K562_AFF1, MM1_HSA_K562_ARID2 and MM1_HSA_K562_ARID3A respectively. IM_K562_common_intersect_matrix[1,1:3] #> MM1_HSA_K562_AFF1 MM1_HSA_K562_ARID2 MM1_HSA_K562_ARID3A #> MM1_HSA_K562_CEBPB 1.111111 3.333333 1.111111 # find the top 10 TFs co-binding in K562 CEBPB shared peaks IM_K562_common_result_t <- as.data.frame(t(IM_K562_common_intersect_matrix)) attach(IM_K562_common_result_t) IM_K562_common_result_order <- as.data.frame(IM_K562_common_result_t[order(-MM1_HSA_K562_CEBPB),,drop = FALSE]) detach(IM_K562_common_result_t) head(IM_K562_common_result_order, n = 10) #> MM1_HSA_K562_CEBPB #> MM1_HSA_K562_CEBPB 100.000000 #> MM1_HSA_K562_CEBPD 36.666667 #> MM1_HSA_K562_CTCF 12.222222 #> MM1_HSA_K562_E4F1 10.000000 #> MM1_HSA_K562_JUND 10.000000 #> MM1_HSA_K562_FOS 8.888889 #> MM1_HSA_K562_JUN 8.888889 #> MM1_HSA_K562_CTCFL 7.777778 #> MM1_HSA_K562_ELF4 7.777778 #> MM1_HSA_K562_ATF4 6.666667 # The highest co-binding factor in K562 CEBPB common peaks is CEBPD. # Then plot MethMotif logo for the K562 CEBPB common sites intersected with CEBPD peaks intersectPeakMatrixResult(intersectPeakMatrix = intersectMatrix_common, save_MethMotif_logo = TRUE, angle_of_logo = "x", saving_MethMotif_logo_y_id = c("MM1_HSA_K562_CEBPD")) #> Start getting the results of intersectPeakMatrix ... #> ... ... You chose NOT to return intersection matrix; #> ... ... You chose NOT to return methylation profile; #> ... ... You chose to save MethMotif logo in PDF if any; #> ... ... ... You chose x-wise MethMotif logo; #> ... ... ... You chose entropy logo; #> ... ... ... You chose to show all methylation levels; #> Success: a PDF named 'MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_CEBPD-logo-entropy.pdf' #> has been saved! # return DNA methylation matrix table for K562 CEBPB shared peaks IM_K562_common_methylation <- IM_K562_common_result$methylation_profile_matrix dim(IM_K562_common_methylation) #> [1] 1 131 # 1 row represents the 1 input peak x, 131 columns represent the 131 peak y rownames(IM_K562_common_methylation) #> [1] "MM1_HSA_K562_CEBPB" colnames(IM_K562_common_methylation)[1:5] #> [1] "MM1_HSA_K562_AFF1" "MM1_HSA_K562_ARID2" "MM1_HSA_K562_ARID3A" #> [4] "MM1_HSA_K562_ATF1" "MM1_HSA_K562_ATF2" # the methylation level in the intersected regions between CEBPB and CEBPBD in K562 # each cell in the matrix is a list() IM_K562_common_methylation["MM1_HSA_K562_CEBPB","MM1_HSA_K562_CEBPD"][[1]] #> CpG_num #> 0-10% 64 #64 CpG methylation scores are less than 0.1 (homogeneously unmethylated) in +/-100bp window around intersected peak summits #> 10-20% 8 #> 20-30% 7 #> 30-40% 2 #> 40-50% 1 #> 50-60% 1 #> 60-70% 8 #> 70-80% 3 #> 80-90% 10 #> 90-100% 16 #16 CpG methylation scores are more than 0.9 (homogeneously methylated) in +/-100bp window around intersected peak summits # return read enrichment matrix table for K562 CEBPB shared peaks IM_K562_common_read_enrichment <- IM_K562_common_result$tag_density_matrix IM_K562_common_read_enrichment[,seq(1,3,1)] #> MM1_HSA_K562_AFF1 MM1_HSA_K562_ARID2 MM1_HSA_K562_ARID3A #> MM1_HSA_K562_CEBPB 54.27878 34.56091 17.68113 # the read ernichment median value in the K562 CEBPB peaks overlapped with AFF1 is 54.27878 # Simply using cofactorReport(), cofactors for each peak x will be automatically reported cofactorReport(intersectPeakMatrix = intersectMatrix_common) #> Start cofactorReport ... #> ... The maximum number of cofactors to be reported is 10 #> ... The minimum percent of co-binding peaks for a cofactor is 5% #> ... Each peak set derived from TFregulomeR compendium in 'peak list x' will be reported in an individual PDF file #> ... ... Start reporting peak id 'MM1_HSA_K562_CEBPB' ... #> ... ... ... The number of cofactors passing 'cobinding_threshold' for peak id 'MM1_HSA_K562_CEBPB' is 15. Only top 10 will be selected. #> ... ... ... Cofactor report for id 'MM1_HSA_K562_CEBPB' has been saved as MM1_HSA_K562_CEBPB_cofactor_report.pdf # get the intersection matrix for K562 exclusive peaks IM_K562_exclusive_result <- intersectPeakMatrixResult(intersectPeakMatrix = intersectMatrix_exclusive, return_intersection_matrix = TRUE, angle_of_matrix = "x", return_methylation_profile = TRUE, angle_of_methylation_profile = "x", return_tag_density = TRUE, angle_of_tag_density = "x", tag_density_value = "median") #> Start getting the results of intersectPeakMatrix ... #> ... ... You chose to return intersection matrix; #> ... ... ... You chose x-wise intersection matrix; #> ... ... You chose to return tag density; #> ... ... ... the tag density value you chose to return is median #> ... ... ... You chose to return tag density for peak list x; #> ... ... You chose to return methylation profile; #> ... ... ... You chose to return methylation profile for peak list x; #> ... ... You chose NOT to save MethMotif logo in PDF if any; names(IM_K562_exclusive_result) #> [1] "intersection_matrix" "tag_density_matrix" "methylation_profile_matrix" # return intersection matrix table for K562 CEBPB exclusive peaks IM_K562_exclusive_intersect_matrix <- IM_K562_exclusive_result$intersection_matrix # e.g. 0.4160888%, 0% and 0.554785% of K562 exclusive peaks overlapped with MM1_HSA_K562_AFF1, MM1_HSA_K562_ARID2 and MM1_HSA_K562_ARID3A respectively. IM_K562_exclusive_intersect_matrix[1,1:3] #> MM1_HSA_K562_AFF1 MM1_HSA_K562_ARID2 MM1_HSA_K562_ARID3A #> MM1_HSA_K562_CEBPB 0.4160888 0 0.554785 # find the top 10 TFs co-binding in K562 exclusive results IM_K562_exclusive_result_t <- as.data.frame(t(IM_K562_exclusive_intersect_matrix)) attach(IM_K562_exclusive_result_t) IM_K562_exclusive_result_order <- as.data.frame(IM_K562_exclusive_result_t[order(-MM1_HSA_K562_CEBPB),,drop = FALSE]) detach(IM_K562_exclusive_result_t) head(IM_K562_exclusive_result_order, n = 10) #> MM1_HSA_K562_CEBPB #> MM1_HSA_K562_CEBPB 100.000000 #> MM1_HSA_K562_ATF4 35.367545 #> MM1_HSA_K562_TCF12 17.337032 #> MM1_HSA_K562_CBFA2T3 16.643551 #> MM1_HSA_K562_TAL1 16.366158 #> MM1_HSA_K562_GATA1 11.650485 #> MM1_HSA_K562_FOXM1 9.431345 #> MM1_HSA_K562_ATF7 8.876560 #> MM1_HSA_K562_NFE2 8.599168 #> MM1_HSA_K562_MAFG 7.073509 # The highest co-binding factor in K562 CEBPB exclusive peaks is ATF4. # Then plot MethMotif logo for the K562 CEBPB exclusive sites intersected with ATF4 peaks intersectPeakMatrixResult(intersectPeakMatrix = intersectMatrix_exclusive, save_MethMotif_logo = TRUE, angle_of_logo = "x", saving_MethMotif_logo_y_id = c("MM1_HSA_K562_ATF4")) #> Start getting the results of intersectPeakMatrix ... #> ... ... You chose NOT to return intersection matrix; #> ... ... You chose NOT to return methylation profile; #> ... ... You chose to save MethMotif logo in PDF if any; #> ... ... ... You chose x-wise MethMotif logo; #> ... ... ... You chose entropy logo; #> ... ... ... You chose to show all methylation levels; #> Success: a PDF named 'MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_ATF4-logo-entropy.pdf' #> has been saved! # the methylation levels in the intersected regions between CEBPB and ATF4 IM_K562_exclusive_methylation <- IM_K562_exclusive_result$methylation_profile_matrix IM_K562_exclusive_methylation["MM1_HSA_K562_CEBPB","MM1_HSA_K562_ATF4"][[1]] #> CpG_num #> 0-10% 303 #> 10-20% 44 #> 20-30% 22 #> 30-40% 19 #> 40-50% 12 #> 50-60% 11 #> 60-70% 8 #> 70-80% 14 #> 80-90% 12 #> 90-100% 8 # Simply using cofactorReport(), cofactors for each peak x will be automatically reported cofactorReport(intersectPeakMatrix = intersectMatrix_exclusive) #> Start cofactorReport ... #> ... The maximum number of cofactors to be reported is 10 #> ... The minimum percent of co-binding peaks for a cofactor is 5% #> ... Each peak set derived from TFregulomeR compendium in 'peak list x' will be reported in an individual PDF file #> ... ... Start reporting peak id 'MM1_HSA_K562_CEBPB' ... #> ... ... ... The number of cofactors passing 'cobinding_threshold' for peak id 'MM1_HSA_K562_CEBPB' is 13. Only top 10 will be selected. #> ... ... ... Cofactor report for id 'MM1_HSA_K562_CEBPB' has been saved as MM1_HSA_K562_CEBPB_cofactor_report.pdf
knitr::include_graphics("../inst/extdata/MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_CEBPD-logo-entropy.png")
knitr::include_graphics("../inst/extdata/MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_ATF4-logo-entropy.png")
knitr::include_graphics("../inst/extdata/cofactor_report.png")
As aforementioned, exportMMPFM
is not only designed for searchMotif
outputs, but also compatible with the outputs from commonPeaks
, exclusivePeaks
, intersectPeakMatrix
. Here, we want to export the motif PFMs and beta score matrices for K562 CEBPB common and exclusive peaks, as well as the common peaks intersected with CEBPD peaks and the exclusive peaks intersected with ATF4 peaks.
# export motif PFM and beta score matrix for K562 CEBPB common peaks exportMMPFM(fun_output = commonPeak_output, fun = "commonPeaks", save_motif_PFM = TRUE, save_betaScore_matrix = TRUE) #> Start exporting ... ... #> ... ... You chose to save motif PFM and beta score matrix. #> ... ... export commonPeaks... #> ... ... ... export id = MM1_HSA_K562_CEBPB_common_peaks #> ... ... ... ... Beta score matrix has been saved as 'MM1_HSA_K562_CEBPB_common_peaks-methScore.txt'. #> ... ... ... ... Motif PFM has been saved as 'MM1_HSA_K562_CEBPB_common_peaks-motif-MEME.txt'. #> ... ... ... export id = HCT116_CEBPB_common_peaks #> ... ... ... the original peaks of HCT116_CEBPB_common_peaks is not loaded from TFregulomeR, #> or in the common peak the number of TFBS is zero. Hence no further action for this id! # export motif PFM and beta score matrix for K562 CEBPB exclusive peaks exportMMPFM(fun_output = exclusivePeak_output, fun = "exclusivePeaks", save_motif_PFM = TRUE, save_betaScore_matrix = TRUE) #> Start exporting ... ... #> ... ... You chose to save motif PFM and beta score matrix. #> ... ... export exclusivePeaks... #> ... ... ... export id = MM1_HSA_K562_CEBPB_exclusive_peaks #> ... ... ... ... Beta score matrix has been saved as 'MM1_HSA_K562_CEBPB_exclusive_peaks-methScore.txt'. #> ... ... ... ... Motif PFM has been saved as 'MM1_HSA_K562_CEBPB_exclusive_peaks-motif-MEME.txt'. # export motif PFM and beta score matrix for K562 CEBPB common peaks intersected with K562 CEBPD peaks exportMMPFM(fun_output = intersectMatrix_common, fun = "intersectPeakMatrix", save_motif_PFM = TRUE, save_betaScore_matrix = TRUE, angle_of_matrix_for_intersectPeakMatrix = "x", saving_id_y_for_intersectPeakMatrix = c("MM1_HSA_K562_CEBPD")) #> Start exporting ... ... #> ... ... You chose to save motif PFM and beta score matrix. #> ... ... export intersectPeakMatrix... #> ... ... we will export in the x wide of intersectPeakMatrix output since the input angle_of_matrix_for_intersectPeakMatrix = 'x' #> ... ... ... export id = MM1_HSA_K562_CEBPB #> ... ... ... ... Beta score matrix has been saved as 'MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_CEBPD-methScore.txt'. #> ... ... ... ... Motif PFM has been saved as 'MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_CEBPD-motif-MEME.txt'. # export motif PFM and beta score matrix for K562 CEBPB exclusive peaks intersected with K562 ATF4 peaks exportMMPFM(fun_output = intersectMatrix_exclusive, fun = "intersectPeakMatrix", save_motif_PFM = TRUE, save_betaScore_matrix = TRUE, angle_of_matrix_for_intersectPeakMatrix = "x", saving_id_y_for_intersectPeakMatrix = c("MM1_HSA_K562_ATF4")) #> Start exporting ... ... #> ... ... You chose to save motif PFM and beta score matrix. #> ... ... export intersectPeakMatrix... #> ... ... we will export in the x wide of intersectPeakMatrix output since the input angle_of_matrix_for_intersectPeakMatrix = 'x' #> ... ... ... export id = MM1_HSA_K562_CEBPB #> ... ... ... ... Beta score matrix has been saved as 'MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_ATF4-methScore.txt'. #> ... ... ... ... Motif PFM has been saved as 'MM1_HSA_K562_CEBPB_overlapped_with_MM1_HSA_K562_ATF4-motif-MEME.txt'.
TFregulomeR also allows users to plot the distributions of a given TFBS from the TFregulomeR warehouse in a list of peak sets, using motifDistrib
and plotDistrib
sequentially. By providing the TFregulomeR ID in id
as the input of the motifDistrib
, motifDistrib
will calculate the occurrences the TFBSs in the given list of peak sets input in peak_list
. The unique IDs corresponding to peak_list
is required to be provided at the same time using peak_id
. If a peak set is derived from TFregulomeR compendium, the TFregulomeR ID should be provided correspondingly; if it is self provided, you can name it with a unique ID yourself.
It should be noted that even though the loaded peak regions from the TFregulomeR compendium are the peak summits, you don't need to expand the regions. Once you tell motifDistrib
the peak set is a TFregulomeR TF subset by providing TFregulomeR ID in the peak_id
, it will automatically operate on the peaks itself.
The output of motifDistrib
is the input of plotDistrib
. In each motif distribution plot, x-axis is the relative distance (bp) to the peak center, while y-axis is the percentage of the TFBS at the position.
Here, we show the distributions of K562 CEBPB motif in K562 CEBPB exclusive peaks and our own peaks locally loaded previously.
# loading my peaks my_peak_path <- system.file("extdata", "HCT116_CEBPb_binding_sites.txt", package = "TFregulomeR") my_peak <- read.delim(my_peak_path, sep = "\t", header = FALSE) motifDistrib_output <- motifDistrib(id = "MM1_HSA_K562_CEBPB", peak_list = list(K562_CEBPB_exclusive_peak, my_peak), peak_id = c("MM1_HSA_K562_CEBPB","my_peak")) #> motifDistrib starts analysing for MethMotif ID = MM1_HSA_K562_CEBPB #> ... ... analysing peak set MM1_HSA_K562_CEBPB #> ... ... analysing peak set my_peak plotDistrib(motifDistrib = motifDistrib_output) #> Distribution of motif MM1_HSA_K562_CEBPB in peak set MM1_HSA_K562_CEBPB has been saved! #> Distribution of motif MM1_HSA_K562_CEBPB in peak set my_peak has been saved!
knitr::include_graphics("../inst/extdata/motif_distribution.png")
TFregulomeR is able to annotate TFBS genomic locations using genomeAnnotate
. The annotation process is following the order: promoter, TTS, exon, 5' UTR exon, 3' UTR exon, intron and intergenic region. Specifically, promoter is defined as the range from 1000bp upstream of TSS to 100bp downstream of TSS, and TTS is defined as the range from 100bp upstream of TTS to 1000bp downstream of TTS. Users can change the parameters using promoter_range
and TTS_range
respectively. The annotation output of genomeAnnotate
is intuitive, not only will a data.frame containing annotation results be returned, but also an HTML report will be saved.
# annotate the locations of K562 CEBPB exclusive peaks # loading UCSC knownGene library(TxDb.Hsapiens.UCSC.hg38.knownGene) K562_CEBPB_exclusivePeak_loc <- genomeAnnotate(peaks = K562_CEBPB_exclusive_peak, return_annotation = TRUE, return_html_report = TRUE) #> Start genomeAnnotate ... #> ... ... You chose to return annotated results in a data.frame. #> ... ... You chose to return an HTML report. #> ... ... annotating promoters defined as upstream 1000bp and downstream 100bp #> ... ... annotating TTS defined as upstream 100bp and downstream 1000bp #> ... ... annotating exons #> ... ... annotating 5' UTR #> ... ... annotating 3' UTR #> ... ... annotating introns #> ... ... annotating intergenic regions #> ... ... An html report has been generated as 'genomeAnnotate_result.html'! #> ... ... The annotation results have been returned in a data.frame! head(K562_CEBPB_exclusivePeak_loc) #> chr start end id annotation geneName #> 1 chr3 133629333 133629333 genomeAnnotate_peak_68 promoter-TSS TOPBP1 #> 2 chr1 155858041 155858041 genomeAnnotate_peak_255 promoter-TSS GON4L #> 3 chr3 193662882 193662882 genomeAnnotate_peak_280 promoter-TSS OPA1 #> 4 chr4 72038972 72038972 genomeAnnotate_peak_426 promoter-TSS NPFFR2 #> 5 chr4 74448009 74448009 genomeAnnotate_peak_447 promoter-TSS AREG #> 6 chr4 143905637 143905637 genomeAnnotate_peak_635 promoter-TSS GYPE #> transcript #> 1 ENST00000513818.1;ENST00000506779.1 #> 2 ENST00000368331.5;ENST00000271883.9;ENST00000620426.4;ENST00000361040.9;ENST00000471341.5;ENST00000622608.1 #> 3 ENST00000445863.1 #> 4 ENST00000395999.5;ENST00000358749.3 #> 5 ENST00000511560.1 #> 6 ENST00000358615.8;ENST00000506264.5;ENST00000437468.2 #> distanceToTSS #> 1 941;602 #> 2 745;745;745;836;823;745 #> 3 8 #> 4 205;401 #> 5 629 #> 6 77;117;73
knitr::include_graphics("../inst/extdata/genomeAnnotate.png")
The key function of transcription factors is to regulate gene expression. By working with Genomic Regions Enrichment of Annotations Tool (GREAT), TFregulomeR allows users to annotate the functions of TFBSs using greatAnnotate
. Given that GREAT server doesn't support hg38, liftOver R package has been incorporated in TFregulomeR to convert hg38 to hg19. The annotation output of greatAnnotate
is intuitive, not only will a data.frame containing annotation results be returned, but also an HTML report will be saved. The HTML report takes advantage of rbokeh
package, which presents a vivid and dynamic interface (Figure 13).
# annotate the functions of K562 CEBPB exclusive peaks # loading GREAT R package 'rGREAT' library(rGREAT) # the peak assembly is "hg38", and 'liftOver' is needed for conversion library(liftOver) # 'rbokeh' is required for an HTML report generation library(rbokeh) K562_CEBPB_exclusivePeak_func <- greatAnnotate(peaks = K562_CEBPB_exclusive_peak, return_annotation = TRUE, return_html_report = TRUE) #> Start greatAnnotate ... #> ... ... You chose to return annotated results in a data.frame. #> ... ... You chose to return an HTML report. #> ... ... assembly is hg38. Now converting to hg19 using liftOver... #> ... ... number of the original input regions is 721 #> ... ... number of the regions successfully converted to hg19 is 721 #> ... ... start GREAT analysis #> ... ... An html report has been generated as 'greatAnnotate_result.html'! #> ... ... The annotation results have been returned in a data.frame! head(K562_CEBPB_exclusivePeak_func) #> category ID name #> 1 MF GO:0005488 binding #> 2 MF GO:0005515 protein binding #> 3 BP GO:0065007 biological regulation #> 4 BP GO:0050794 regulation of cellular process #> 5 BP GO:0019222 regulation of metabolic process #> 6 BP GO:0050789 regulation of biological process #> number_of_targeting_genes adjusted_pvalue #> 1 817 0.0001415278 #> 2 497 0.0007221064 #> 3 653 0.0059313854 #> 4 602 0.0059313854 #> 5 386 0.0072758686 #> 6 628 0.0072758686
knitr::include_graphics("../inst/extdata/greatAnnotate.png")
TFregulomeR is not working alone. We have built a function allowing conversion of the motif matrix in TFregulomeR warehouse to the subclass PFMatrix
in TFBSTools, using toTFBSTools
.
library(TFBSTools) K562_CEBPB_TFBS_PFM <- toTFBSTools(id = "MM1_HSA_K562_CEBPB") K562_CEBPB_TFBS_PFM #> An object of class PFMatrix #> ID: MM1_HSA_K562_CEBPB #> Name: CEBPB #> Matrix Class: Unknown #> strand: * #> Tags: #> list() #> Background: #> A C G T #> 0.2717 0.2283 0.2283 0.2717 #> Matrix: #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] #> A 93 305 0 0 49 0 280 91 290 533 15 #> C 61 74 0 0 0 533 59 205 242 0 215 #> G 192 141 0 0 390 0 139 3 0 0 73 #> T 187 13 533 533 94 0 55 234 1 0 230
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.