View source: R/preprocess_cds.R
preprocess_cds | R Documentation |
Most analyses (including trajectory inference, and clustering)
in Monocle3, require various normalization and preprocessing steps.
preprocess_cds
executes and stores these preprocessing steps.
Specifically, depending on the options selected, preprocess_cds
first
normalizes the data by log and size factor to address depth differences, or
by size factor only. Next, preprocess_cds
calculates a lower
dimensional space that will be used as the input for further dimensionality
reduction like tSNE and UMAP.
preprocess_cds(
cds,
method = c("PCA", "LSI"),
num_dim = 50,
norm_method = c("log", "size_only", "none"),
use_genes = NULL,
pseudo_count = NULL,
scaling = TRUE,
verbose = FALSE,
build_nn_index = FALSE,
nn_control = list()
)
cds |
the cell_data_set upon which to perform this operation |
method |
a string specifying the initial dimension method to use, currently either "PCA" or "LSI". For "LSI" (latent semantic indexing), it converts the (sparse) expression matrix into a tf-idf matrix and then performs SVD to decompose the gene expression / cells into certain modules / topics. Default is "PCA". |
num_dim |
the dimensionality of the reduced space. |
norm_method |
Determines how to transform expression values prior to reducing dimensionality. Options are "log", "size_only", and "none". Default is "log". Users should only use "none" if they are confident that their data is already normalized. |
use_genes |
NULL or a list of gene IDs. If a list of gene IDs, only this subset of genes is used for dimensionality reduction. Default is NULL. |
pseudo_count |
NULL or the amount to increase expression values before normalization and dimensionality reduction. If NULL (default), a pseudo_count of 1 is added for log normalization and 0 is added for size factor only normalization. |
scaling |
When this argument is set to TRUE (default), it will scale each gene before running trajectory reconstruction. Relevant for method = PCA only. |
verbose |
Whether to emit verbose output during dimensionality reduction |
build_nn_index |
logical When this argument is set to TRUE, preprocess_cds builds and stores the nearest neighbor index from the reduced dimension matrix for later use. Default is FALSE. |
nn_control |
An optional list of parameters used to make the nearest neighbor index. See the set_nn_control help for detailed information. |
an updated cell_data_set object
cell_metadata <- readRDS(system.file('extdata',
'worm_embryo/worm_embryo_coldata.rds',
package='monocle3'))
gene_metadata <- readRDS(system.file('extdata',
'worm_embryo/worm_embryo_rowdata.rds',
package='monocle3'))
expression_matrix <- readRDS(system.file('extdata',
'worm_embryo/worm_embryo_expression_matrix.rds',
package='monocle3'))
cds <- new_cell_data_set(expression_data=expression_matrix,
cell_metadata=cell_metadata,
gene_metadata=gene_metadata)
cds <- preprocess_cds(cds)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.