runPCA: Perform PCA on expression data
In davismcc/scater: Single-Cell Analysis Toolkit for Gene Expression Data in R

calculatePCA

R Documentation

Perform PCA on expression data

Description

Perform a principal components analysis (PCA) on cells, based on the expression data in a SingleCellExperiment object.

Usage

calculatePCA(x, ...)

## S4 method for signature 'ANY'
calculatePCA(
  x,
  ncomponents = 50,
  ntop = 500,
  subset_row = NULL,
  scale = FALSE,
  transposed = FALSE,
  BSPARAM = bsparam(),
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
calculatePCA(x, ..., exprs_values = "logcounts", assay.type = exprs_values)

## S4 method for signature 'SingleCellExperiment'
calculatePCA(
  x,
  ...,
  exprs_values = "logcounts",
  dimred = NULL,
  n_dimred = NULL,
  assay.type = exprs_values
)

## S4 method for signature 'SingleCellExperiment'
runPCA(x, ..., altexp = NULL, name = "PCA")

Arguments

`x`	For `calculatePCA`, a numeric matrix of log-expression values where rows are features and columns are cells. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix. For `runPCA`, a SingleCellExperiment object containing such a matrix.
`...`	For the `calculatePCA` generic, additional arguments to pass to specific methods. For the SummarizedExperiment and SingleCellExperiment methods, additional arguments to pass to the ANY method. For `runPCA`, additional arguments to pass to `calculatePCA`.
`ncomponents`	Numeric scalar indicating the number of principal components to obtain.
`ntop`	Numeric scalar specifying the number of features with the highest variances to use for dimensionality reduction.
`subset_row`	Vector specifying the subset of features to use for dimensionality reduction. This can be a character vector of row names, an integer vector of row indices or a logical vector.
`scale`	Logical scalar, should the expression values be standardized?
`transposed`	Logical scalar, is `x` transposed with cells in rows?
`BSPARAM`	A BiocSingularParam object specifying which algorithm should be used to perform the PCA.
`BPPARAM`	A BiocParallelParam object specifying whether the PCA should be parallelized.
`exprs_values`	Alias to `assay.type`.
`assay.type`	Integer scalar or string indicating which assay of `x` contains the expression values.
`dimred`	String or integer scalar specifying the existing dimensionality reduction results to use.
`n_dimred`	Integer scalar or vector specifying the dimensions to use if `dimred` is specified.
`altexp`	String or integer scalar specifying an alternative experiment containing the input data.
`name`	String specifying the name to be used to store the result in the `reducedDims` of the output.

Details

Fast approximate SVD algorithms like BSPARAM=IrlbaParam() or RandomParam() use a random initialization, after which they converge towards the exact PCs. This means that the result will change slightly across different runs. For full reproducibility, users should call set.seed prior to running runPCA with such algorithms. (Note that this includes BSPARAM=bsparam(), which uses approximate algorithms by default.)

Value

For calculatePCA, a numeric matrix of coordinates for each cell (row) in each of ncomponents PCs (column).

For runPCA, a SingleCellExperiment object is returned containing this matrix in reducedDims(..., name).

In both cases, the attributes of the PC coordinate matrix contain the following elements:

"percentVar", the percentage of variance explained by each PC. This may not sum to 100 if not all PCs are reported.
"varExplained", the actual variance explained by each PC.
"rotation", the rotation matrix containing loadings for all genes used in the analysis and for each PC.

Feature selection

This section is relevant if x is a numeric matrix of (log-)expression values with features in rows and cells in columns; or if x is a SingleCellExperiment and dimred=NULL. In the latter, the expression values are obtained from the assay specified by assay.type.

The subset_row argument specifies the features to use for dimensionality reduction. The aim is to allow users to specify highly variable features to improve the signal/noise ratio, or to specify genes in a pathway of interest to focus on particular aspects of heterogeneity.

If subset_row=NULL, the ntop features with the largest variances are used instead. We literally compute the variances from the expression values without considering any mean-variance trend, so often a more considered choice of genes is possible, e.g., with scran functions. Note that the value of ntop is ignored if subset_row is specified.

If scale=TRUE, the expression values for each feature are standardized so that their variance is unity. This will also remove features with standard deviations below 1e-8.

Using reduced dimensions

If x is a SingleCellExperiment, the method can be applied on existing dimensionality reduction results in x by setting the dimred argument. This is typically used to run slower non-linear algorithms (t-SNE, UMAP) on the results of fast linear decompositions (PCA). We might also use this with existing reduced dimensions computed from a priori knowledge (e.g., gene set scores), where further dimensionality reduction could be applied to compress the data.

The matrix of existing reduced dimensions is taken from reducedDim(x, dimred). By default, all dimensions are used to compute the second set of reduced dimensions. If n_dimred is also specified, only the first n_dimred columns are used. Alternatively, n_dimred can be an integer vector specifying the column indices of the dimensions to use.

When dimred is specified, no additional feature selection or standardization is performed. This means that any settings of ntop, subset_row and scale are ignored.

If x is a numeric matrix, setting transposed=TRUE will treat the rows as cells and the columns as the variables/diemnsions. This allows users to manually pass in dimensionality reduction results without needing to wrap them in a SingleCellExperiment. As such, no feature selection or standardization is performed, i.e., ntop, subset_row and scale are ignored.

Using alternative Experiments

This section is relevant if x is a SingleCellExperiment and altexp is not NULL. In such cases, the method is run on data from an alternative SummarizedExperiment nested within x. This is useful for performing dimensionality reduction on other features stored in altExp(x, altexp), e.g., antibody tags.

Setting altexp with assay.type will use the specified assay from the alternative SummarizedExperiment. If the alternative is a SingleCellExperiment, setting dimred will use the specified dimensionality reduction results from the alternative. This option will also interact as expected with n_dimred.

Note that the output is still stored in the reducedDims of the output SingleCellExperiment. It is advisable to use a different name to distinguish this output from the results generated from the main experiment's assay values.

Author(s)

Aaron Lun, based on code by Davis McCarthy

Examples

example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)

example_sce <- runPCA(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))

davismcc/scater documentation built on Feb. 15, 2025, 8:06 a.m.

davismcc/scater index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

davismcc/scater
Single-Cell Analysis Toolkit for Gene Expression Data in R

runPCA: Perform PCA on expression data
In davismcc/scater: Single-Cell Analysis Toolkit for Gene Expression Data in R

Perform PCA on expression data

Description

Usage

Arguments

Details

Value

Feature selection

Using reduced dimensions

Using alternative Experiments

Author(s)

See Also

Examples

Related to runPCA in davismcc/scater...

R Package Documentation

Browse R Packages

We want your feedback!

davismcc/scater Single-Cell Analysis Toolkit for Gene Expression Data in R

runPCA: Perform PCA on expression data In davismcc/scater: Single-Cell Analysis Toolkit for Gene Expression Data in R

Perform PCA on expression data

Description

Usage

Arguments

Details

Value

Feature selection

Using reduced dimensions

Using alternative Experiments

Author(s)

See Also

Examples

Related to runPCA in davismcc/scater...

R Package Documentation

Browse R Packages

We want your feedback!

davismcc/scater
Single-Cell Analysis Toolkit for Gene Expression Data in R

runPCA: Perform PCA on expression data
In davismcc/scater: Single-Cell Analysis Toolkit for Gene Expression Data in R