nscentroids | R Documentation |
Nearest shrunken centroids performs regularized classification of high-dimensional data. Originally developed for classification of microarrays, it calculates test statistics for each feature/dimension based on the deviation between the class centroids and the global centroid. It applies regularization (via soft thresholding) to these test statistics to produce shrunken centroids for each class.
# Nearest shrunken centroids
nscentroids(x, y, s = 0, distfun = NULL,
priors = table(y), center = NULL, transpose = FALSE,
verbose = NA, chunkopts=list(),
BPPARAM = bpparam(), ...)
## S3 method for class 'nscentroids'
fitted(object, type = c("response", "class"), ...)
## S3 method for class 'nscentroids'
predict(object, newdata,
type = c("response", "class"), priors = NULL, ...)
## S3 method for class 'nscentroids'
logLik(object, ...)
x |
The data matrix. |
y |
The response. (Coerced to a factor.) |
s |
The sparsity (soft thresholding) parameter used to shrink the test statistics. May be a vector. |
distfun |
A distance function with the same usage (i.e., supports the same arguments and return values) as |
priors |
The prior probabilities or sample sizes for each class. (Will be normalized.) |
center |
An optional vector giving the pre-calculated global centroid. |
transpose |
A logical value indicating whether |
verbose |
Should progress be printed for the initial centroid calculations and for each fitted model (i.e., each value of |
chunkopts |
An (optional) list of chunk options including |
BPPARAM |
An optional instance of |
... |
Additional options passed to |
object |
An object inheriting from |
newdata |
An optional data matrix to use for the prediction. |
type |
The type of prediction, where |
This functions implements nearest shrunken centroids based on the original algorithm by Tibshirani et al. (2002). It provides a sparse strategy for classification based on regularized class centroids. The class centroids are shrunken toward the global centroid. The shrunken test statistics used to perform the regularization can then be interpreted to determine which features are relevant to the classification. (Important features will have nonzero test statitistics after soft thresholding.)
A custom distance function can be passed via distfun
. If not provided, then this defaults to rowDists
if transpose=FALSE
or colDists
if transpose=TRUE
.
If a custom function is passed, it must support the same arguments and return values as rowDists
and colDists
.
An object of class nscentroids
, with the following components:
class
: The predicted classes.
probability
: A matrix of posterior class probabilities.
centers
: The shrunken class centroids used for classification.
statistic
: The shrunken test statistics.
sd
: The pooled within-class standard deviations for each feature.
priors
: The prior class probabilities.
s
: The regularization (soft thresholding) parameter.
distfun
: The function used to generate the dissimilarity function.
Kylie A. Bemis
R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Diagnosis of multiple cancer types by shrunken centroids of gene expression.” Proceedings of the National Academy of Sciences of the USA, vol. 99, no. 10, pp. 6567-6572, 2002.
R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. “Class prediction by nearest shrunken with applications to DNA microarrays.” Statistical Science, vol. 18, no. 1, pp. 104-117, 2003.
rowDists
,
colDists
register(SerialParam())
set.seed(1)
n <- 100
p <- 5
x <- matrix(rnorm(n * p), nrow=n, ncol=p)
colnames(x) <- paste0("x", seq_len(p))
y <- ifelse(x[,1L] > 0 | x[,2L] < 0, "a", "b")
nscentroids(x, y, s=1.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.