learn_graph: Learn principal graph from the reduced dimension space using...

View source: R/learn_graph.R

learn_graphR Documentation

Learn principal graph from the reduced dimension space using reversed graph embedding

Description

Monocle3 aims to learn how cells transition through a biological program of gene expression changes in an experiment. Each cell can be viewed as a point in a high-dimensional space, where each dimension describes the expression of a different gene. Identifying the program of gene expression changes is equivalent to learning a trajectory that the cells follow through this space. However, the more dimensions there are in the analysis, the harder the trajectory is to learn. Fortunately, many genes typically co-vary with one another, and so the dimensionality of the data can be reduced with a wide variety of different algorithms. Monocle3 provides two different algorithms for dimensionality reduction via reduce_dimension (UMAP and tSNE). Both take a cell_data_set object and a number of dimensions allowed for the reduced space. You can also provide a model formula indicating some variables (e.g. batch ID or other technical factors) to "subtract" from the data so it doesn't contribute to the trajectory. The function learn_graph is the fourth step in the trajectory building process after preprocess_cds, reduce_dimension, and cluster_cells. After learn_graph, order_cells is typically called.

Usage

learn_graph(
  cds,
  use_partition = TRUE,
  close_loop = TRUE,
  learn_graph_control = NULL,
  verbose = FALSE
)

Arguments

cds

the cell_data_set upon which to perform this operation

use_partition

logical parameter that determines whether to use partitions calculated during cluster_cells and therefore to learn disjoint graph in each partition. When use_partition = FALSE, a single graph is learned across all partitions. Default is TRUE.

close_loop

logical parameter that determines whether or not to perform an additional run of loop closing after estimating the principal graphs to identify potential loop structure in the data space. Default is TRUE.

learn_graph_control

NULL or a list of control parameters to be passed to the reversed graph embedding function. Default is NULL. A list of potential control parameters is provided in details.

verbose

Whether to emit verbose output during graph learning.

Value

an updated cell_data_set object

Optional learn_graph_control parameters

euclidean_distance_ratio:

The maximal ratio between the euclidean distance of two tip nodes in the spanning tree and the maximum distance between any connecting points on the spanning tree allowed to be connected during the loop closure procedure. Default is 1.

geodesic_distance_ratio:

The minimal ratio between the geodesic distance of two tip nodes in the spanning tree and the length of the diameter path on the spanning tree allowed to be connected during the loop closure procedure. (Both euclidean_distance_ratio and geodesic_distance_ratio need to be satisfied to introduce the edge for loop closure). Default is 1/3.

minimal_branch_len:

The minimal length of the diameter path for a branch to be preserved during graph pruning procedure. Default is 10.

orthogonal_proj_tip:

Whether to perform orthogonal projection for cells corresponding to the tip principal points. Default is FALSE.

prune_graph:

Whether or not to perform an additional round of graph pruning to remove small insignificant branches. Default is TRUE.

scale:
ncenter:
nn.k:

Maximum number of nearest neighbors to compute in the reversed graph embedding. Set k=NULL to let learn_graph estimate k. Default is 25.

rann.k:

nn.k replaces rann.k but rann.k is available for compatibility with existing code.

maxiter:
eps:
L1.gamma:
L1.sigma:
nn.method:

The method to use for finding nearest neighbors. nn.method can be one of 'nn2', 'annoy', or 'hnsw'.

nn.metric:

The distance metric for the annoy or hnsw nearest neighbor index build. See help(set_nn_control) for more information.

nn.n_trees:

The number of trees used to build the annoy nearest neighbor index. See help(set_nn_control) for more information.

nn.search_k:

The number of nodes to search in an annoy index search. See help(set_nn_control) for more information.

nn.M:

Related to internal dimensionality of HNSW index. See help(set_nn_control) for more information.

nn.ef_construction:

Controls the HNSW index build speed/accuracy tradeoff.

nn.ef:

Controls the HNSW index search speed/accuracy tradeoff. See help(set_nn_control) for more information.

nn.grain_size:

Used by annoy and HNSW to set the minimum amount of work to do per thread. See help(set_nn_control) for more information.

nn.cores:

Used by annoy and HNSW to control the number of threads used. See help(set_nn_control) for more information.

Examples

  
    cell_metadata <- readRDS(system.file('extdata',
                                         'worm_embryo/worm_embryo_coldata.rds',
                                         package='monocle3'))
    gene_metadata <- readRDS(system.file('extdata',
                                         'worm_embryo/worm_embryo_rowdata.rds',
                                         package='monocle3'))
    expression_matrix <- readRDS(system.file('extdata',
                                             'worm_embryo/worm_embryo_expression_matrix.rds',
                                             package='monocle3'))

    cds <- new_cell_data_set(expression_data=expression_matrix,
                             cell_metadata=cell_metadata,
                             gene_metadata=gene_metadata)

    cds <- preprocess_cds(cds)
    cds <- align_cds(cds, alignment_group =
                     "batch", residual_model_formula_str = "~ bg.300.loading +
                      bg.400.loading + bg.500.1.loading + bg.500.2.loading +
                      bg.r17.loading + bg.b01.loading + bg.b02.loading")
    cds <- reduce_dimension(cds)
    cds <- cluster_cells(cds)
    cds <- learn_graph(cds)
  


cole-trapnell-lab/monocle3 documentation built on April 7, 2024, 9:24 p.m.