find_tss: Predict TSSs of miRNA

Description Usage Arguments Value TSS types Log Reference Examples

View source: R/find_tss.R

Description

Search for putative TSSs of miRNA, together with integrating available data such as H3K4me3 data, Pol II data, miRNA expression data, and protein-coding gene data, as well as provide the transcriptional regulation relationship between TF and miRNA.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
find_tss(
  bed_merged,
  expressed_mir = "all",
  flanking_num = 1000,
  threshold = 0.7,
  ignore_DHS_check = TRUE,
  DHS,
  allmirdhs_byforce = TRUE,
  expressed_gene = "all",
  allmirgene_byforce = TRUE,
  seek_tf = FALSE,
  tf_n = 1000,
  min.score = 0.8
)

Arguments

bed_merged

Peaks from ChIP-seq data to be provided for analysis can be H3K4me3 peaks, Pol II peaks or both. Notice that peaks are supposed to be merged(see also peak_merge) before find_TSS if using only one kind of peak data, while peaks should be firstly merged and then join together(see also peak_join) if both H3K4me3 data and Pol II are input.

expressed_mir

This parameter allows users to specify certain miRNAs, the TSSs of which they want to search for by providing a list of miRNAs(e.g., expressed miRNAs in a certain cell-line). If expressed_mir is not specified, the default value of the parameter is "all" and the function will acquiescently employ all the miRNAs currently listed on "miRbase" database.

flanking_num

A parameter in Eponine model to detect TSSs. It is concluded that a peak signal with flanking regions of C-G enrichment are important to mark TSSs. The default value is 1000.

threshold

The threshold for candidate TSSs scored with Eponine method. The default value is 0.7.

ignore_DHS_check

The process of DHS_check further assists to filter putative TSSs. When there is a DHS peak that locates within 1 kb upstream of a putative TSS, this predicted TSS will be retained for its character is consistent with that of an authentic TSS. Or the TSSs with no DHSs locating within 1 kb upstream of them would be discarded.

DHS

ChIP-seq data of DNase I hypersensitive sites(DHSs).

allmirdhs_byforce

When we use DHS data to check the validity of TSSs, there is a possibility where no DHSs locates within 1 kb upstream of all putative TSSs and all these putative TSSs might be filtered out by our method resulting no outputs. While "allmirdhs_byforce = TRUE", it ensures to output at least 1 most possible TSS even if the nearest DHS signal locates more than 1 kb upstream of this TSS.

expressed_gene

Users can specify genes expressed in certain cell-lines that are analyzed. Or the default value is "all", which means all the expressed genes annotated on Ensemble will be employed.

allmirgene_byforce

While integrating expressed_gene data to improve prediction, there might be a circumstance where all the putative TSS are discarded. To prevent this condition, users are allowed to use "allmirgene_byforce = TRUE" to ensure at least 1 putative TSS for each miRNA will be output.

seek_tf

With the result of predicted TSSs, seek_tf provides users with an option to predict related TFs for miRNA. The data of transcription factors refer to JASPAR2018 database.

tf_n

TFBS locates on the upstream of the TSS of a certain TF, which is considered as the promoter region. tf_n set the length of promoter region for predicting transcription regulation between miRNAs and TFs.

min.score

The threshold for scoring transcription factor binding sites. A single absolute value between 0 and 1.

Value

The first part of the result returns details of predicted TSSs, composed of seven columns: mir_name, chrom, stem_loop_p1, stem_loop_p2, strand mir_context, tss_type gene and predicted_tss:

mir_name: Name of miRNA.

chrom: Chromosome.

stem_loop_p1: The start site of a stem-loop.

stem_loop_p2: The end site of a stem-loop.

strand: Polynucleotide strands. (+/-)

mir_context: The relative positon relationship between stem-loop and protein-coding gene. (intra/inter)

tss_type: Four types of predicted TSSs. See the section below TSS types for details. (host_TSS/intra_TSS/overlap_inter_TSS/inter_TSS)

gene: Ensembl gene ID

predicted_tss: Predicted transcription start sites(TSSs).

pri_tss_distance: The distance between a predicted TSS and the start site of the stem-loop.

TSS types

TSSs are catalogued into 4 types as below.

host_TSS The TSSs of miRNA that are close to the TSS of protein-coding gene implying they may share the same TSS, on the condition where mir_context is "intra". (See above: Value-mir_context)

intra_TSS The TSSs of miRNA that are NOT close to the TSS of the protein-coding gene, on the condition where mir_context is "intra".

overlap_inter_TSS The TSSs of miRNA are catalogued as "overlap_inter_TSS" when the pri-miRNA gene overlaps with Ensembl gene, on the condition where "mir_context" is "inter".

inter_inter_TSS The TSSs of miRNA are catalogued as "inter_inter_TSS" when the miRNA gene does NOT overlap with Ensembl gene, on the condition where "mir_context" is "inter".

(See Xu HUA et al 2016 for more details)

Log

The second part of the result returns logs during the process of prediction: find_nearest_peak_log If no peaks locate in the upstream of a stem-loop to help determine putative TSSs of miRNA, we will fail to find the nearest peak and this miRNA will be logged in find_nearest_peak_log.

eponine_score_log For a certain miRNA, if none of the candidate TSSs scored with Eponine method meet the threshold we set, we will fail to get an eponine score and this miRNA will be logged in eponine_score_log.

DHS_check_log For a certain miRNA, if no DHS signals locate within 1 kb upstream of each putative TSSs, these putative TSSs will be filtered out and this miRNA will be logged in DHS_check_log.

gene_filter_log For a certain miRNA, when integrating expressed_gene data to improve prediction, if no putative TSSs are confirmed after considering the relative position relationship among TSSs, stem-loops and expressed genes, this miRNA will be filtered out and logged in gene_filter_log.

Reference

Xu Hua, Luxiao Chen, Jin Wang*, Jie Li* and Edgar Wingender*, Identifying cell-specific microRNA transcriptional start sites. Bioinformatics 2016, 32(16), 2403-10.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
bed_merged <- data.frame(
                chrom = c("chr1", "chr1", "chr1", "chr1", "chr2"),
                start = c(9910686, 9942202, 9996940, 10032962, 9830615),
                end = c(9911113, 9944469, 9998065, 10035458, 9917994),
                stringsAsFactors = FALSE)
bed_merged <- as(bed_merged, "GRanges")

## Not run: 
ownmiRNA <- find_tss(bed_merged, expressed_mir = "hsa-mir-5697",
                     ignore_DHS_check = TRUE,
                     expressed_gene = "all",
                     allmirgene_byforce = TRUE)

## End(Not run)

ipumin/primirTSS documentation built on June 10, 2020, 9:52 a.m.