Description Usage Arguments Details Value Note References See Also Examples
Nearest Template Prediction (NTP) based on predefined class templates.
1 2 3 4 5 6 7 8 9 10 |
emat |
a numeric matrix with row features and sample columns.
|
templates |
a data frame with two columns; class (coerced to factor) and probe (coerced to character). |
nPerm |
an integer, number of permutations for p-value estimation. |
distance |
a character, either c("cosine", "pearson", "spearman" or "kendall"). |
nCores |
an integer specifying number of threads for parallelization. |
seed |
an integer, for p-value reproducibility. Setting seed enforces serial processing. |
verbose |
logical, whether console messages are to be displayed. |
doPlot |
logical, whether to produce prediction |
ntp
implements the Nearest Template Prediction (NTP)
algorithm, largely as proposed by Yujin Hoshida (2010) (see below). For each
sample, distances to templates are calculated and class assigned based on
smallest distance. Distances are transformed from the sample-templates
correlations as follows:
d.class = √(1/2 * (1-(cor(sample,templates))
Template values are 1 for class features and 0 for non-class features (-1 if there are only two classes). Prediction confidence is estimated based on the distance of the null-distribution, estimated from permutation tests. Thus the lowest possible estimate of the p-value is 1/nPerm.
emat
should be a row-wise centered and scaled matrix.
For large, balanced datasets, this may be achieved by applying
ematAdjust
function.
templates
is a data.frame defining class templates. A class
template is a set of marker genes with higher expected expression in
samples belonging to class compared to non-class samples. templates
must contain at least two columns named probe and class.
compared to Hoshida (2010), resulting p-value estimates are more conservative (by a factor equaling the number of classes) and the distances are a monotonic transformation of 1-cor (see Details section above).
Hoshida (2010) does not explicitly state whether input should be log2-transformed or not and examples includes both. Based on experience this choice affects results only at the margins, but for high-quality datasets, normalized, untransformed inputs may yield a small increase in accuracy.
For further details on the NTP algorithm, please refer to package vignette and Hoshida (2010).
Parallel processing is implemented through parallel
mclapply
or snow parLapply
for nix and Windows systems, respectively.
a data frame with class predictions, template distances,
p-values and false discovery rate adjusted p-values
(p.adjust(method = "fdr")
). Rownames equal emat
colnames.
features with missing values are discarded.
setting seed disables parallel processing to ensure p-value reproducibility.
for two random uncorrelated vectors x,y N\sim(0,1) E[d.xy]\approx0.71 when distance is cosine.
internally, correlations instead of distances are calculated.
accepts reuse of features (marker not specific for one class only)
Hoshida, Y. (2010). Nearest Template Prediction: A Single-Sample-Based Flexible Class Prediction with Confidence Assessment. PLoS ONE 5, e15543.
1 2 3 | emat <- ematAdjust(crcTCGAsubset, normMethod = "quantile")
res <- ntp(emat, templates.CMS, doPlot=TRUE, nPerm=100)
head(res)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.