In jhsiao999/peco: A Supervised Approach for Predicting Cell Cycle Progression Using Single-Cell RNA-seq Data

peco is a supervised approach for predicting cell cycle phase in a continuum using single-cell RNA sequencing data. The R package provides functions to build a training data set and predict cell cycle on a continuum.

Our work shows that peco is able to predict continuous cell cylce phase using a small set of cyclic genes, CDK1, UBE2C, TOP2A, HISTH1E, and HISTH1C (These were dentified as cell cycle marker genes in studies of yeast (Spellman et al, 1998) and HeLa cells (Whitfield et al, 2002).

Below, we present two use cases.

In the first use case, we show how to use the built-in training dataset to predict continuous cell cycle.

In the second use case, we show how to create a training data set, then build a predictor using the training data.

knitr::opts_chunk$set(results = "hold",collapse = TRUE,comment = "#>",
                      fig.align = "center")

peco uses SingleCellExperiment class objects:

library(SingleCellExperiment)
library(doParallel)
library(foreach)
library(peco)

About the training data

training_human provides training data for 101 significant cyclic genes. The data include:

predict.yy, a gene-by-sample matrix (101 x 888) that storing predicted cyclic expression values.
cellcycle_peco_ordered, cell cycle phase in a unit circle (angle), ordered from 0 to 2$\pi$.
cellcycle_function, functions used for prediction corresponding to the top 101 cyclic genes identified.
sigma, standard error associated with cyclic trends of gene expression.
pve, proportion of variance explained by the cyclic trend.

data(training_human)

Predict cell cycle phase using gene expression data

peco is integrated with SingleCellExperiment objects in Bioconductor. Here we illustrate using a SingleCellExperiment object to perform cell cycle phase prediction.

sce_top101genes is a SingleCellExpression object for 101 genes and 888 single-cell samples.

data(sce_top101genes)
assays(sce_top101genes)

Transform expression to quantile-normalizesd counts-per-million (CPM) values; peco uses the cpm_quantNormed slot as the input data for predictions.

sce_top101genes <- data_transform_quantile(sce_top101genes)
assays(sce_top101genes)

Generate predictions using cycle_npreg_outsample.

pred_top101genes <-
  cycle_npreg_outsample(Y_test = sce_top101genes,
    sigma_est = training_human$sigma[rownames(sce_top101genes),],
    funs_est = training_human$cellcycle_function[rownames(sce_top101genes)],
    method.trend = "trendfilter",get_trend_estimates = FALSE)

pred_top101genes$Y contains a SingleCellExperiment object with the predicted cell cycle phase in the colData slot:

head(colData(pred_top101genes$Y)$cellcycle_peco)

View the predictions for the CDK1 gene (Ensembl id ENSG00000170312). Because CDK1 is a known cell cycle gene, this visualization serves as a "sanity check" for the predictions.

x <- seq(0,2*pi,length.out = 100)
plot(x = colData(pred_top101genes$Y)$theta_shifted,
     y = assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",],
     pch = 21,col = "white",bg = "darkblue",
     main = "CDK1",xlab = "FUCCI phase",
     ylab = "quantile-normalized expression")
lines(x = x,y = training_human$cellcycle_function[["ENSG00000170312"]](x),
      col = "tomato",lwd = 2)

Visualize cyclic expression trend based on predicted phase

Next, we view the predictions for the top six genes. Here we use fit_cyclical_many to estimate cell cycle.

theta_predict <- colData(pred_top101genes$Y)$cellcycle_peco
names(theta_predict) <- rownames(colData(pred_top101genes$Y))
yy_input <- assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,]
fit_cyclic <- fit_cyclical_many(Y = yy_input,theta = theta_predict)
gene_symbols <- rowData(pred_top101genes$Y)[rownames(yy_input),"hgnc"]
x <- seq(0,2*pi,length.out = 100)
par(mfrow = c(3,2))
for (i in 1:6) {
  plot(x = fit_cyclic$cellcycle_peco_ordered,y = yy_input[i,],
       pch = 1,col = "darkblue",main = gene_symbols[i],
       xlab = "FUCCI phase",ylab = "normalized expression")
  lines(x = x,y = fit_cyclic$cellcycle_function[[i]](x),
        col = "tomato",lwd = 2)
}

Session information

sessionInfo()

jhsiao999/peco documentation built on Nov. 21, 2020, 5:34 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jhsiao999/peco
A Supervised Approach for Predicting Cell Cycle Progression Using Single-Cell RNA-seq Data

In jhsiao999/peco: A Supervised Approach for Predicting Cell Cycle Progression Using Single-Cell RNA-seq Data

About the training data

Predict cell cycle phase using gene expression data

Visualize cyclic expression trend based on predicted phase

Session information

R Package Documentation

Browse R Packages

We want your feedback!

jhsiao999/peco A Supervised Approach for Predicting Cell Cycle Progression Using Single-Cell RNA-seq Data

In jhsiao999/peco: A Supervised Approach for Predicting Cell Cycle Progression Using Single-Cell RNA-seq Data

About the training data

Predict cell cycle phase using gene expression data

Visualize cyclic expression trend based on predicted phase

Session information

R Package Documentation

Browse R Packages

We want your feedback!

jhsiao999/peco
A Supervised Approach for Predicting Cell Cycle Progression Using Single-Cell RNA-seq Data