findBumps: Bump-finding from transcriptome bins.
In haowulab/TRESS: Toolbox for mRNA epigenetics sequencing analysis

findBumps

R Documentation

Bump-finding from transcriptome bins.

Description

This function constructs transcriptome m6A bumps for each input \& IP replicate, by merging together bins having significant enrichment of IP over input control reads.

Usage

findBumps(chr, pos, strand, x, count,
          use = "pval",
          pval.cutoff,
          fdr.cutoff,
          lfc.cutoff,
          sep = 2000,
          minlen = 100,
          minCount = 3,
          dis.merge = 100,
          scorefun = mean,
          sort = TRUE)

Arguments

`chr`	Chromosome number of all bins.
`pos`	Transcriptome start position of all bins.
`strand`	Strand of all bins.
`x`	A dataframe containing the p-values, fdrs and log fold changes of all bins.
`count`	Read counts in each bin from paired input and IP sample.
`use`	A character to specify which criterion to select significant bins. It takes among "pval", "fdr", "lfc", "pval_lfc" and "fdr_lfc". "pval": The selection is only based on P-values; "fdr": The selection is only based on FDR; "lfc": The selection is only based on log fold changes between normalized IP and normalized input read counts; "pval_lfc": The selection is based on both p-values and log fold changes; "fdr_lfc": The selection is based on both FDR and log fold changes. Default is "pval".
`pval.cutoff`	A numerical value to specify a cutoff for p-value. Default is 1e-5.
`fdr.cutoff`	A numerical value to specify a cutoff for fdr. Default is 0.05.
`lfc.cutoff`	A numerical value to specify a cutoff for log fold change between normalized IP and input read counts. Default is 0.7 for fold change of 2.
`sep`	A constant used divide genome into consecutive sequenced regions. Any two bins with distance greater than sep will be grouped into different regions. Default is 2000.
`minlen`	A constant to select bumps who have minimum length of minlen. Default is 100.
`minCount`	A constant to select bumps who have at least minlen number of bins. Default is 3.
`dis.merge`	A constant. Any twp bumps with distance smaller than dis.merge would be merged. Default is 100.
`scorefun`	A character indicating a function used to assign a score for each bump base on p-values of all spanned bins. Default is "mean", meaning that the score is an average of bin-level p-values.
`sort`	A logical value indicating whether rank (TRUE) bumps with the score output from scorefun or not (FALSE). Default is TRUE.

Value

This function returns a dataframe containing the chromosome, start position, end position, length, strand, summit, total read counts (both IP and input) and score of each bump.

Examples

### Use example dataset "Basal" in TRESS
### to illustrate usage of this function
data("Basal")
bins = Basal$Bins$Bins
Counts = Basal$Bins$Counts
sf = Basal$Bins$sf
colnames(Counts)
dat = Counts[, 1:2]
thissf = sf[1:2]
### pvals based on binomial test
idx = rowSums(dat) > 0
Pvals = rep(1, nrow(dat))
Pvals[idx] = 1 - pbinom(dat[idx, 2],
                        rowSums(dat[idx, ]),
                        prob = 0.5)
### lfc
c0 = mean(as.matrix(dat), na.rm = TRUE)  ### pseudocount
lfc = log((dat[, 2]/thissf[2] + c0)/(dat[, 1]/thissf[1] + c0))
x.vals = data.frame(pvals = Pvals,
                    fdr = p.adjust(Pvals, method = "fdr"),
                    lfc = lfc)
### find bumps based on pvals, fdr or lfc
Bumps = findBumps(chr = bins$chr,
                  pos = bins$start,
                  strand = bins$strand,
                  x = x.vals,
                  use = "fdr_lfc",
                  fdr.cutoff = 0.01,
                  lfc.cutoff = 0.5,
                  count = dat)
head(Bumps, 3)

haowulab/TRESS documentation built on Aug. 27, 2022, 7:11 p.m.