The iCOBRA
interactive shiny application provides an easy-to-use,
general-purpose benchmarking interface for comparing multiple methods
in terms of their ability to correctly classify features in a (high-throughput)
dataset as "positive" or "negative", as well as in terms of their ability to
correctly estimate a continuous target. As an example, iCOBRA
can be
used to evaluate the performance of methods aimed at finding differentially
expressed or differentially spliced genes between conditions, or to
evaluate how well methods manage to estimate the (known) expression of a
set of genes. The only formal requirements are that each evaluated method
has assigned either (adjusted) p-values or a more general "score" (for
example, the estimated expression level that will be compared to the true
one) to each feature, and that the true status of the features to which
the evaluated methods have been applied is known. The results can be
visualized from different perspectives, using different metrics, and can
also be stratified by feature attributes.
The iCOBRA
app can be launched in two modes:
With an input object of the type COBRAData
, generated by the iCOBRA
R
package. This object contains data frames with calculated p-values, adjusted
p-values and/or scores, as well as a table of true values. When the application
is launched with an input object, all evaluations will be performed using the
data in that object. For more information about the COBRAData
object type,
consult the corresponding help page in the iCOBRA
R package.
Without an input object, either from the iCOBRA
R package or from the server
(http://imlspenticton.uzh.ch:3838/iCOBRA).
In this case, all input data is uploaded in the form of text files containing
the truth and the results, respectively. These text files are described in more
detail in the next section. Note that the iCOBRA
R package can be used to
convert between COBRAData
objects and correctly formatted text files (see
help pages for functions COBRAData_to_text()
and COBRAData_from_text()
)
If the iCOBRA
app is launched without a data object, two types of input text
files are necessary for the calculations to be performed:
The truth file is a tab-delimited text file (with a header line), listing all the features that were investigated (in rows), together with one or more attributes (in columns). The columns are of different types, and are used by the app for different purposes:
The table below shows the first lines of an example truth file. It contains the columns feature (indicating the feature identifier), status (the true binary assignment), logFC (the true continuous variable corresponding to the scores calculated by the evaluated methods), as well as additional columns representing stratification annotations.
The result files contain the p-values, adjusted p-values and scores for the evaluated features. Each file can contain results obtained by one or multiple methods. It is also possible to load multiple result files into the app. Each result file must have a column corresponding to the feature identifier. This column must have the same name as in the truth file. In order to correctly interpret the other columns, each column header must be of the form method:type, where type is either adjP (if the column contains adjusted p-values or FDR estimates), P (if the column contains nominal p-values), or score (if the column contains a general score).
Nominal p-values will be adjusted by the app, using the Benjamini-Hochberg correction method, as long as the adjusted p-values for the same method have not been previously loaded or are part of the same result file as the p-values. The part of the column name preceding the ":adjP", ":P" or ":score" will be considered the "name" of the method. Please make sure that this name is unique for each evaluated method. However, note that one or several types of columns can be provided for the same method (e.g., we can provide both p-values, adjusted p-values and a score with the same method name, and they will be considered as different representations corresponding to the same method).
The table below shows the first lines of an example result file, containing nominal p-values, adjusted p-values and scores for several methods. Missing values (NA) are allowed.
The combination of measures that are provided for a given method (P, adjP, score) affects how the performance evaluation will be performed:
Sometimes, not all features are assigned a score by each of the evaluated methods. For example, some methods filter out variables for which they can not perform reliable inference. Similarly, some features may not be present in the truth table. This can be due to, for example, some inference methods generating "new" features by combining original features. In this case, the true status for the new features is not known. The table below tabulates the possible sets of features in a data set. Features that are neither present in the result tables nor in the truth table will not be considered.
The default settings of iCOBRA
is to consider all features that are present
(with non-missing status) in the truth file. Thus, A = B = 0. In this case, all
features that are not called (i.e., where there are missing values in the result
table) will be considered negative ("not significant") and will thus be added to
the FN and TN, respectively, in the calculations of TPR and FPR. This is
motivated by the assumption that features are left out of the result table
because there was not enough evidence to call them significant. However, in some
circumstances it may mean that the given TPR and FPR values are slight
underestimates of the true values. Choosing to consider only features shared
between the truth and result table (by checking the box in the input controls)
will disregard features with missing values in the truth table such that also C
= D = 0. This will be done separately for each evaluated method. One exception
is made for the overlap plots (Venn diagrams and UpSet plots), which is the only
aspect that is interpretable even without a given truth (to evaluate agreement
between methods). For these, the following feature collections are used:
Note that this means that, for example, the value represented in the Number of
detections
column in the FDR/NBR plots may differ from the total number of
calls in the Venn diagrams.
To summarise, the different columns shown in the information boxes below the plots when hovering over a displayed point are defined as follows:
The iCOBRA
application calculates several different types of comparison and
evaluation metrics, each represented in a separate tab. The available methods
are described briefly below.
For all plots except Venn diagrams and UpSet plots, more information about a given point can be obtained by hovering over the point in the plot. The information will be displayed in a table below the plot. In this table it is also possible to see which type of input measure (p-value, adjusted p-value or score) that was used for each calculation.
Input controls are located in the sidebar as well as in the individual tabs. By changing one or more of the input parameters, the user can get more precise control of what is displayed. The following general parameters are available:
COBRAData
object that contains the binary truth
(classification) for the features.COBRAData
object that contains the continuous
truth for the features.hue_pal
palette will be used. Note that the number of required
colors may be larger than the number of methods, e.g., if results are stratified
by an annotation but all strata are shown in the same panel (and thus each
method/stratum combination needs a unique color), or if the truth is included as
a method in the Venn diagram.COBRAPlot
object containing all results
needed for plotting (see the iCOBRA
R package for details).If the plots are not displayed, it can be for one of the following reasons:
autorun = FALSE
(default), and the button
"Start calculation!" has not been pressed.:P
, :adjP
or :score
). Check the "Data preview"
tab to see the data that iCOBRA
has extracted from the input files.iCOBRA
has extracted from the input files.iCOBRA
has extracted from the input files.If not all input controls are visible in the left-hand sidebar, either "fold" one of the three sections (Truth, Results, Plot settings) by clicking on the corresponding title, or change the size of the window slightly.
The sidebar can be hidden by clicking on the three lines next to the main title.
If the colors of the plots do not change when a new color palette is chosen in
the left-hand sidebar, most likely the number of colors in the chosen palette is
not enough. Note that the number of required colors depends not only on the
number of different methods in the evaluation, but also on whether the plots are
facetted or not, and whether the truth is included or not. Also note that not
all methods may be included in each plot, so the number of methods that need to
be assigned a unique color may exceed the number of methods displayed in any
given plot. iCOBRA
will attempt to keep the colors for a given method
consistent throughout the different visualizations.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.