Description Usage Arguments Details Value Examples
Predict classes for new samples based on signature centroid matrix
1 2 3 | ## S4 method for signature 'matrix'
predict_classes(object, mat, dist_method = c("euclidean", "correlation", "cosine"),
nperm = 1000, p_cutoff = 0.05, plot = TRUE, verbose = TRUE, prefix = "")
|
object |
The signature centroid matrix. See the Details section. |
mat |
The new matrix where the classes are going to be predicted. The number of rows should be the same as the signature centroid matrix (also make sure the row orders are the same). Be careful that |
dist_method |
Distance method. Value should be "euclidean", "correlation" or "cosine". |
nperm |
Number of permutatinos. It is used when |
p_cutoff |
Cutoff for the p-values for determining class assignment. |
plot |
Whether to draw the plot that visualizes the process of prediction. |
verbose |
Whether to print messages. |
prefix |
Used internally. |
The signature centroid matrix is a k-column matrix where each column is the centroid of samples in the corresponding class (k-group classification).
For each sample in the new matrix, the task is basically to test which signature centroid the current sample is the closest to. There are two methods: the Euclidean distance and the correlation (Spearman) distance.
For the Euclidean/cosine distance method, for the vector denoted as x which corresponds to sample i
in the new matrix, to test which class should be assigned to sample i, the distance between
sample i and all k signature centroids are calculated and denoted as d_1, d_2, ..., d_k. The class with the smallest distance is assigned to sample i.
The distances for k centroids are sorted increasingly, and we design a statistic named "difference ratio", denoted as r
and calculated as: (|d_(1) - d_(2)|)/mean(d), which is the difference between the smallest distance and the second
smallest distance, normalized by the mean distance.
To test the statistical significance of r, we randomly permute rows of the signature centroid matrix and calculate r_rand.
The random permutation is performed n_perm
times and the p-value is calculated as the proportion of r_rand being
larger than r.
For the correlation method, the distance is calculated as the Spearman correlation between sample i and signature
centroid k. The label for the class with the maximal correlation value is assigned to sample i. The
p-value is simply calculated by cor.test
between sample i and centroid k.
If a sample is tested with a p-value higher than p_cutoff
, the corresponding class label is set to NA
.
A data frame with two columns: the class labels (the column names of the signature centroid matrix are treated as class labels) and the corresponding p-values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | data(golub_cola)
res = golub_cola["ATC:skmeans"]
mat = get_matrix(res)
# note scaling should be applied here because the matrix was scaled in the cola analysis
mat2 = t(scale(t(mat)))
tb = get_signatures(res, k = 3, plot = FALSE)
sig_mat = tb[, grepl("scaled_mean", colnames(tb))]
sig_mat = as.matrix(sig_mat)
colnames(sig_mat) = paste0("class", seq_len(ncol(sig_mat)))
# this is how the signature centroid matrix looks like:
head(sig_mat)
mat2 = mat2[tb$which_row, , drop = FALSE]
# now we predict the class for `mat2` based on `sig_mat`
predict_classes(sig_mat, mat2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.