peco
is a supervised approach for predicting cell cycle phase in a
continuum using single-cell RNA sequencing data. The R package
provides functions to build a training data set and predict cell cycle
on a continuum.
Our work shows that peco is able to predict continuous cell cylce phase using a small set of cyclic genes, CDK1, UBE2C, TOP2A, HISTH1E, and HISTH1C (These were dentified as cell cycle marker genes in studies of yeast (Spellman et al, 1998) and HeLa cells (Whitfield et al, 2002).
Below, we present two use cases.
In the first use case, we show how to use the built-in training dataset to predict continuous cell cycle.
In the second use case, we show how to create a training data set, then build a predictor using the training data.
knitr::opts_chunk$set(results = "hold",collapse = TRUE,comment = "#>", fig.align = "center")
peco
uses SingleCellExperiment
class objects:
library(SingleCellExperiment) library(doParallel) library(foreach) library(peco)
training_human
provides training data for 101 significant
cyclic genes. The data include:
predict.yy
, a gene-by-sample matrix (101 x 888) that storing
predicted cyclic expression values.
cellcycle_peco_ordered
, cell cycle phase in a unit circle
(angle), ordered from 0 to 2$\pi$.
cellcycle_function
, functions used for prediction corresponding to
the top 101 cyclic genes identified.
sigma
, standard error associated with cyclic trends of gene
expression.
pve
, proportion of variance explained by the cyclic trend.
data(training_human)
peco
is integrated with SingleCellExperiment
objects in
Bioconductor. Here we illustrate using a SingleCellExperiment
object
to perform cell cycle phase prediction.
sce_top101genes
is a SingleCellExpression
object for 101 genes and
888 single-cell samples.
data(sce_top101genes) assays(sce_top101genes)
Transform expression to quantile-normalizesd counts-per-million (CPM)
values; peco
uses the cpm_quantNormed
slot as the input data for
predictions.
sce_top101genes <- data_transform_quantile(sce_top101genes) assays(sce_top101genes)
Generate predictions using cycle_npreg_outsample
.
pred_top101genes <- cycle_npreg_outsample(Y_test = sce_top101genes, sigma_est = training_human$sigma[rownames(sce_top101genes),], funs_est = training_human$cellcycle_function[rownames(sce_top101genes)], method.trend = "trendfilter",get_trend_estimates = FALSE)
pred_top101genes$Y
contains a SingleCellExperiment
object with
the predicted cell cycle phase in the colData
slot:
head(colData(pred_top101genes$Y)$cellcycle_peco)
View the predictions for the CDK1 gene (Ensembl id ENSG00000170312). Because CDK1 is a known cell cycle gene, this visualization serves as a "sanity check" for the predictions.
x <- seq(0,2*pi,length.out = 100) plot(x = colData(pred_top101genes$Y)$theta_shifted, y = assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",], pch = 21,col = "white",bg = "darkblue", main = "CDK1",xlab = "FUCCI phase", ylab = "quantile-normalized expression") lines(x = x,y = training_human$cellcycle_function[["ENSG00000170312"]](x), col = "tomato",lwd = 2)
Next, we view the predictions for the top six genes. Here we use
fit_cyclical_many
to estimate cell cycle.
theta_predict <- colData(pred_top101genes$Y)$cellcycle_peco names(theta_predict) <- rownames(colData(pred_top101genes$Y)) yy_input <- assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,] fit_cyclic <- fit_cyclical_many(Y = yy_input,theta = theta_predict) gene_symbols <- rowData(pred_top101genes$Y)[rownames(yy_input),"hgnc"] x <- seq(0,2*pi,length.out = 100) par(mfrow = c(3,2)) for (i in 1:6) { plot(x = fit_cyclic$cellcycle_peco_ordered,y = yy_input[i,], pch = 1,col = "darkblue",main = gene_symbols[i], xlab = "FUCCI phase",ylab = "normalized expression") lines(x = x,y = fit_cyclic$cellcycle_function[[i]](x), col = "tomato",lwd = 2) }
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.