knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
To install and load the package, run:
install.packages("devtools") library(devtools) install_github("jhsiao999/peco")
peco
uses SingleCellExperiment
class objects.
library(peco) library(SingleCellExperiment) library(doParallel) library(foreach)
peco
is a supervised approach for PrEdicting cell cycle phase in a
COntinuum using single-cell RNA sequencing data. The R package provides functions to build training dataset and also functions to use existing training data to predict cell cycle on a continuum.
Our work demonstrated that peco is able to predict continuous cell cylce phase using a small set of cylcic genes: CDK1, UBE2C, TOP2A, HISTH1E, and HISTH1C (identified as cell cycle marker genes in studies of yeast (Spellman et al., 1998) and HeLa cells (Whitfield et al., 2002)).
Below we provide two use cases. Vignette 1 shows how to use the built-training dataset to predict continuous cell cycle. Vignette 2 shows how to make a training datast and build a predictor using training data.
Users can also view the vigenettes via browseVignettes("peco")
.
training_human
stores built-in training data of 101 significant cyclic genes. Below are the slots contained in training_human
:
predict.yy
: a gene by sample matrix (101 by 888) that stores predict cyclic expression values. cellcycle_peco_reordered
: cell cycle phase in a unit circle (angle), ordered from 0 to 2$pi$cellcycle_function
: lists of 101 function corresponding to the top 101 cyclic genes identified in our datasetsigma
: standard error associated with cyclic trends of gene expressionpve
: proportion of variance explained by the cyclic trenddata("training_human")
peco
is integrated with SingleCellExperiment
object in Bioconductor. Below shows an example of inputting SingleCellExperiment
object to perform cell cycle phase prediction.
sce_top101genes
includes 101 genes and 888 single-cell samples and one assay slot of counts
.
data("sce_top101genes") assays(sce_top101genes)
Transform the expression values to quantile-normalizesd counts-per-million values. peco
uses the cpm_quantNormed
slot as input data for predictions.
sce_top101genes <- data_transform_quantile(sce_top101genes) assays(sce_top101genes)
Apply the prediction model using function cycle_npreg_outsample
and generate prediction results contained in a list object pred_top101genes
.
pred_top101genes <- cycle_npreg_outsample( Y_test=sce_top101genes, sigma_est=training_human$sigma[rownames(sce_top101genes),], funs_est=training_human$cellcycle_function[rownames(sce_top101genes)], method.trend="trendfilter", ncores=1, get_trend_estimates=FALSE)
The pred_top101genes$Y
contains a SingleCellExperiment object with the predict cell cycle phase in the colData
slot.
head(colData(pred_top101genes$Y)$cellcycle_peco)
Visualize results of prediction for one gene. Below we choose CDK1 ("ENSG00000170312"). Because
CDK1 is a known cell cycle gene, this visualization serves as a sanity
check for the results of fitting. The fitted function
training_human$cellcycle_function[[1]]
was obtained from our training data.
plot(y=assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",], x=colData(pred_top101genes$Y)$theta_shifted, main = "CDK1", ylab = "quantile normalized expression") points(y=training_human$cellcycle_function[["ENSG00000170312"]](seq(0,2*pi, length.out=100)), x=seq(0,2*pi, length.out=100), col = "blue", pch =16)
Visualize results of prediction for the top 10 genesone genes. Use fit_cyclical_many
to estimate cyclic function based on the input data.
# predicted cell time in the input data theta_predict = colData(pred_top101genes$Y)$cellcycle_peco names(theta_predict) = rownames(colData(pred_top101genes$Y)) # expression values of 10 genes in the input data yy_input = assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,] # apply trendfilter to estimate cyclic gene expression trend fit_cyclic <- fit_cyclical_many(Y=yy_input, theta=theta_predict) gene_symbols = rowData(pred_top101genes$Y)$hgnc[rownames(yy_input)] par(mfrow=c(2,3)) for (i in 1:6) { plot(y=yy_input[i,], x=fit_cyclic$cellcycle_peco_ordered, main = gene_symbols[i], ylab = "quantile normalized expression") points(y=fit_cyclic$cellcycle_function[[i]](seq(0,2*pi, length.out=100)), x=seq(0,2*pi, length.out=100), col = "blue", pch =16) }
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.