Description Usage Arguments Details Value Author(s) References See Also Examples
Calculates the reproducibility-optimized test statistic (ROTS) for ranking genes in order of evidence for differential expression in two-group or multi-group comparisons.
1 |
data |
A numeric data matrix or an ExpressionSet instance, in which rows correspond to genes and columns correspond to samples. |
groups |
A vector indicating the sample groups. |
B |
An integer specifying the number of bootstrap and permutation resamplings (default 1000). |
K |
An integer indicating the largest top list size considered. If no value is given, 1/4 of the features are used. |
paired |
A logical indicating whether a paired test is performed. The samples are expected to be in the same order in both groups. |
seed |
An integer seed for the random number generator. |
a1, a2 |
Non-negative parameters. See details section for further information. |
log |
A logical (deafult TRUE) indicating whether input data is log2 scaled. This information is only used to calculate log fold change. |
progress |
A logical indicating if additional progress bars are shown. |
verbose |
A logical indicating if messages are shown. |
The reproducibility-optimization procedure ROTS enables the selection of a suitable gene ranking statistic directly from the given dataset. The statistic is optimized among a family of t-type statistics d = m/(a1+a2*s), where m is the absolute difference between the group averages, s is the pooled standard error, and a1 and a2 are the non-negative parameters to be optimized. Two special cases of this family are the ordinary t-statistic (a1=0, a2=1) and the signal log-ratio (a1=1, a2=0). The optimality is defined in terms of maximal overlap of top-ranked genes in group-preserving bootstrap datasets. Importantly, besides the group labels, no a priori information about the properties of the data is required and no fixed cutoff for the gene rankings needs to be specified. For more details about the reproducibility-optimization procedure, see Elo et al. (2008).
The user is given the option to adjust the largest top list size considered in the reproducibility calculations, since lowering this size can markedly reduce the computation time. In large data matrices with thousands of rows, we generally recommend using a size of several thousands. In smaller data matrices, and especially if there are many rows with only a few non-missing entries, the size of K should be decreased accordingly.
ROTS tolerates a moderate number of missing values in the data matrix
by effectively ignoring their contribution during the operation of the
procedure. However, each row of the data matrix must contain
at least two values in both groups. The rows containing only a few
non-missing values should be removed; or alternatively, the missing data
entries can be imputed using, e.g., the K-nearest neighbors
imputation, which is implemented in the Bioconductor package
impute
. ROTS assumes the input data matrix is log2 transformed
(the default for log parameter is set to TRUE). Although, this only
affects fold change values, we recommend setting log parameter to FALSE
if the input matrix is not log transformed to avoid downstream confusions.
If the parameter values a1 and a2 are set by the user, then no optimization is performed but the statistic and FDR-values are calculated for the given parameters. The false discovery rate (FDR) for the optimized test statistic is calculated by permuting the sample labels. The results for all the genes can be obtained by setting the FDR cutoff to 1.
ROTS
returns an object of class ROTS
, which is a list
containing the following components
data |
the expression data matrix. |
B |
the number of bootstrap and permutation resamplings. |
d |
the value of the optimized ROTS-statistic for each gene. |
pvalue |
the corresponding pvalues. |
FDR |
the corresponding FDR-values. |
a1 |
the optimized parameter a1. |
a2 |
the optimized parameter a2. |
k |
the optimized top list size. |
R |
the optimized reproducibility value. |
Z |
the optimized reproducibility Z-score. |
print
prints the optimized parameters a1 and a2, the optimized
top list size and the corresponding reproducibility values.
summary
summarizes the results of a ROTS analysis. If
fdr
and num.genes
are not specified, then the optimized
parameters a1 and a2, the optimized top list size and the
corresponding reproducibility values are shown. If fdr
or
num.genes
is specified, then also the gene-specific information
is shown for the genes at the specified FDR-level or top list size,
respectively.
Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo
Maintainer: Tomi Suomi <tomi.suomi@utu.fi>
Suomi T, Seyednasrollah F, Jaakkola MK, Faux T, Elo LL.
ROTS: An R package for reproducibility-optimized statistical testing.
PLoS Comput Biol 2017; 13: e1005562.
1 2 3 4 5 |
Bootstrapping samples
Optimizing parameters
Calculating p-values
Calculating FDR
ROTS results:
Number of resamplings: 100
a1: 1.6
a2: 1
Top list size: 10
Reproducibility value: 0.908
Z-score: 23.32576
5 rows satisfy the condition.
Row ROTS-statistic pvalue FDR
684_at 315 -4.3078654 0.00002 0
36202_at 555 0.4830924 0.00023 0
36085_at 710 0.4443196 0.00025 0
1024_at 833 0.3993879 0.00027 0
36311_at 303 0.3805451 0.00030 0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.