README.md

ILoReg

Introduction

ILoReg is a novel tool for cell population identification from single-cell RNA-seq (scRNA-seq) data. In our study [1], we showed that ILoReg was able to identify, by both unsupervised clustering and visually, rare cell populations that other scRNA-seq data analysis pipelines were unable to identify.

The figure below illustrates the workflows of ILoReg and a typical pipeline that applies feature selection prior to dimensionality reduction by principal component analysis (PCA).

*Figure: Analysis workflows of ILoReg and a feature-selection based approach*

In contrast to most scRNA-seq data analysis pipelines, ILoReg does not reduce the dimensionality of the gene expression matrix by feature selection. Instead, it performs probabilistic feature extraction using iterative clustering projection (ICP), yielding a probability matrix, which contains probabilities of each of the N cells belonging to the k clusters. ICP is a novel machine learning algorithm that iteratively seeks a clustering with k clusters that maximizes the adjusted Rand index (ARI) between the clustering and its projection by L1-regularized logistic regression. In the ILoReg consensus approach, ICP is run L times and the L probability matrices are merged into a joint probability matrix and subsequently transformed by principal component analysis (PCA) into a lower dimensional matrix (consensus matrix). The final clustering step is performed using hierarhical clustering by the Ward's method, after which the user can extract a clustering with K consensus clusters. Two-dimensional visualization is supported using two popular nonlinear dimensionality reduction methods: t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP). Additionally, ILoReg provides user-friendly functions that enable identification of differentially expressed (DE) genes and visualization of gene expression.

Installation

The latest version of ILoReg can be downloaded from GitHub using the devtools R package.


devtools::install_github("elolab/ILoReg")

Example

Please follow this link to an example, in which a peripheral blood mononuclear cell (PBMC) dataset is analyzed using ILoReg. In Bioconductor the vignette can be accessed in a readable format.

Contact information

If you have questions related to ILoReg, please contact us here.

References

  1. Johannes Smolander, Sini Junttila, Mikko S Venäläinen, Laura L Elo. " ILoReg: a tool for high-resolution cell population identification from single-cell RNA-seq data". Bioinformatics, Volume 37, Issue 8, 15 April 2021, Pages 1107–1114, https://doi.org/10.1093/bioinformatics/btaa919.


elolab/iloreg documentation built on March 27, 2022, 4:19 a.m.