diffPatternTest: Differential pattern analysis of Ribo-seq data

View source: R/package.R

diffPatternTestR Documentation

Differential pattern analysis of Ribo-seq data


The normalized gene data are pooled into a large matrix, where parameter estimations and tests are performed. Within each gene, multiplicity correction are then performed for codon/bin-level p-values. The minimum of adjusted codon/bin-level p-value is defined to be the gene-level p-value.


diffPatternTest(data, classlabel, method = c('gtxr', 'qvalue'))



A list of named matrices input from the dataBinning function. In each element of the list, rows correspond to replicates, columns correspond to bins.


For matrix input: a DataFrame or data.frame with at least a column comparison. In comparison, 1s stand for the reference condition, 2s stand for the target condtion, and 0s represent replicates is not invloved in the test, if present. Rows of classlabel correspond to rows of data, which are biological replicates.


For a 2-component character vector input: the first argument is the multiplicity correction method for codon/bin-level p-value adjustment. The second argument is the multiplicity correction method for gene-level p-value adjustment. Methods include: "qvalue" for q-value from qvalue pacakge, "gtxr", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none" from the elitism package.


Using binned data, this function first estimates normalizing constant by exclusing outlier bins which may represent the true differential pattern. An outlier bin is defined as that whose log2-fold change value is more than 1.5 interquartile ranges below the first quartile or above the third quartile. For a given gene, the normalizing constant is defined based on the total read counts from each replicate.

It then performs differential pattern testing on P-site counts bin by bin for each gene. Briefly, counts are modeled by a negative binomial distribution to call bins with statistically significant differences across conditions, bin level p-values are adjusted for multiple hypothesis testing for a given gene, and then the smallest p-value for a gene is adjusted to control for multiple hypothesis testing across all genes.

Additionally, the T-value is a supplementary statistic that quantifies the magnitude of difference between conditions, with larger numbers indicating a greater difference. The $T$-value is defined to be 1-cosine of the angle between the first right singular vectors of the footprint matrices of the two conditions under comparison. It ranges from 0-1, with larger values representing larger differences between conditions, and practically speaking, can be used to identify genes with larger magnitude of pattern difference beyond statistical significance. This might be helpful to investigators to prioritize certain genes for investigation among many that may pass the significance test for differential pattern.



A List object of codon/bin-level results. Each element of list is of a gene, containing codon/bin results columns: pvalue, log2FoldChange, and the adjusted p-value named by the first string in method. Names of Bins are set to "start-end" genomic coordinates.


A DataFrame object of gene-level results. It contains columns: tvalue, pvalue, and the adjusted p-value named by the second string in method.


The same as input method.


Names of genes without sufficient reads, not reported in bin and gene.


Subset of input data, including all genes reported in bin and gene.


The same as input classlabel.

classlabel <- data.frame(condition = c("mutant", "mutant", 
    "wildtype", "wildtype"), comparison = c(2, 2, 1, 1))
rownames(classlabel) <- c("mutant1", "mutant2", "wildtype1", "wildtype2")
result.pst <- diffPatternTest(data = data.binned, 
    classlabel = classlabel, method = c('gtxr', 'qvalue'))

