knitr::opts_knit$set(root.dir = "d:/my_analysis/BRIC_TEST/2.Yan/") library(IRISFGM) load("d:/my_analysis/BRIC_TEST/2.Yan/YanObjectBRIC_qubic1.Rdata")
IRIS-FGM integrates in-house and state-of-the-art computational tools and provides two analysis strategies, including bicluster-based co-expression gene analysis (Xie, et al., 2020) and LTMG (left-truncated mixture Gaussian model)-embedded scRNA-Seq analysis (Wan, et al., 2019).
The main idea of IRIS-FGM consists of two major strategies:
We recommend user to install IRIS-FGM on large memory (32GB) based linux operation system if user aims at analyzing bicluster-based co-expression analysis; if user aims at analyzing data by quick mode, we recommend to install IRIS-FGM on small memeory (8GB) based Windows or linux operation system; IRIS-FGM does not support MAC. We will assum you have the following installed:
Pre-install packge
install.packages(c('BiocManager','devtools', 'AdaptGauss', "pheatmap", 'mixtools','MCL', 'anocva', 'qgraph','Rtools','ggpubr',"ggraph")) BiocManager::install(c('org.Mm.eg.db','multtest', 'org.Hs.eg.db','clusterProfiler','DEsingle', 'DrImpute', 'scater', 'scran')) devtools::install_github(repo = 'satijalab/seurat')
The input to IRIS-FGM is the single-cell RNA-seq expression matrix:
Rows correspond to genes and columns correspond to cells.
The data file should be tab delimited.
IRIS-FGM also accepts output files from 10X CellRanger, includinhg a folder which contains three individual files and h5 file.
When you perform co-expression analysis, it will output several intermediate files, thus please make sure that you have write permission to the folder where IRIS-FGM is located.
For installation, simply type the following command in your R console, please select option 3 when R asks user to update packages:
devtools::install_github("BMEngineeR/IRISCEM", force = T)
This tutorial run on a real dataset to illustrate the results obtained at each step.
As example, we will use Yan's data, a dataset containing 90 cells and 20,214 genes from human embryo, to conduct cell type prediction.
Yan, L. et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131-1139 (2013)
The original expression matrix was downloaded from https://s3.amazonaws.com/scrnaseq-public-datasets/manual-data/yan/nsmb.2660-S2.csv. The expression is provided as RPKM value. For convenience, we removed the space in the column names and deleted the second column(Transcript_ID). The processed data is available at https://bmbl.bmi.osumc.edu/downloadFiles/Yan_expression.txt.
IRIS-FGM can accepted 10X chromium input files, including a folder (contain gene name, cell name, and sparse matrix) and .h5 file.
setwd("~/2.Yan/") library(IRISFGM)
InputMatrix <- ReadFrom10X_h5("~/5k_pbmc_protein_v3_filtered_feature_bc_matrix.h5")
InputMatrix <- ReadFrom10X_folder("~/10X_3K/folder_10X/")
we will use this data set as example to run the pipeline.
InputMatrix <- read.table("~/2.Yan/Yan_expression.txt",header = T, row.names = 1)
object <- CreateIRISFGMObject(InputMatrix)
meta.info
is data frame of which row name should be cell ID, and column name should be cell type. object <- AddMeta(object, meta.info = NULL)
PlotMeta(object)
object <- SubsetData(object , nFeature.upper=15000,nFeature.lower=8000, Counts.upper=700000,Counts.lower=400000)
User can choose perform normalization or imputation based on their need. The normalization method has two options, one is the simplist CPM normalization (default normalization = 'LibrarySizeNormalization'
). The other is from package scran and can be opened by using parameter normalization = 'scran'
, . The imputation method is from package DrImpute and can be opened by using parameter IsImputation = TRUE
(default as closed).
object <- ProcessData(object, normalization = "cpm", IsImputation = FALSE, seed = 123)
The argument Gene_use = 500
is top 500 highlt variant genes which are selected to run LTMG. For quick mode, we recommend to use top 2000 gene (here we use top 500 gene for saving time). On the contrary, for co-expression gene analysis, we recommend to use all gene by changing Gene_use = "all"
.
# demo only run top 500 gene for saving time. object <- RunLTMG(object, Gene_use = 500, seed = 123)
User can use reduction = "umap"
or reductopm = "tsne"
to perform dimension reduction.
# demo only run top 500 gene for saving time. object <- RunDimensionReduction(object, reduction = "umap")
# demo only run top 500 gene for saving time. object <- RunClassification(object, k.param = 20, resolution = 0.5, algorithm = 1)
# demo only run top 500 gene for saving time. PlotDimension(object,reduction = "umap")
This function need user to input group that is used to plot on the figure. Input 4
means choose the "Seurat0.5" group as cell label to plot.
IRIS-FGM can provide biclustering function, which is based on our in-house novel algorithm, QUBIC2 (https://github.com/maqin2001/qubic2). Here we will show the basic biclustering usage of IRIS-FGM using a $200\times 88$ expression matrix generated from previous top 500 variant genes. However, we recommend user should use Gene_use = all
to generate LTMG matrix.
User can type the following command to run discretization (LTMG) + biclustering directly:
object <- RunLTMG(object, Gene_use = "all", seed = 123) object <- CalBinaryMultiSignal(object) object <- RunBicluster(object, DiscretizationModel = "LTMG",OpenDual = TRUE, NumBlockOutput = 100, BlockOverlap = 0.7, BlockCellMin = 15)
This will output several files, and among them you will find one named Yan_sub.txt.chars.blocks
,which contains the predicted biclusters.
Or, user may use first version discretization strategy provided by QUBIC 1.0.
object <- RunDiscretization(object) object <- RunBicluster(object, DiscretizationModel = "Quantile",OpenDual = TRUE, Extension = 0.90, NumBlockOutput = 1000, BlockOverlap = 0.7, BlockCellMin = 15)
(The default parameters in IRIS-FGM are BlockCellMin=15, BlockOverlap=0.7, Extension=0.90, NumBlockOutput=100 you may use other parameters as you like, just specify them in the argument)
The cell type prediction of IRIS-FGM is based on the biclustering results. In short, it will construct a weighted graph based on the biclusters and then do clustering on the weighted graph. Currently, we provide two commonly used clustering method: MCL .
object <- FindClassBasedOnMC(object)
PlotHeatmap(object ,N.bicluster =c(1,5),show.annotation = T)
PlotModuleNetwork(object, N.bicluster = 1, Node.color = "#E8E504")
object <- FindMarkers(object)
User need to select cell type to compare, while here we select 4:Suerat0.5
as cell type category to analyze.
Then IRIS-FGM will ask user choose a first group as reference, while here we select the third group (3 : 2
) marked as cluster 2 in umap.
Then user requires to select the second group as compared object, while here user can choose either one group (2 : 1
, 3 : 2
), or rest of all groups (4 : rest of all
).
After running the Findmarker
, user can find table in object@LTMG@MarkerGene
if using quick mode or find table in object@BiCluster@MarkerGene
.
The first pathway analysis is based on quick mode by specifying genes.source = "CTS"
, which means cell-type-specific marker genes; the second pathway analysis is based on genes from bicluster block.
object <- RunPathway(object, selected.gene.cutoff = 0.05, species = "Human", database = "GO", genes.source = "CTS") object <- RunPathway(object ,module.number = 5, selected.gene.cutoff = 0.05, species = "Human", database = "GO", genes.source = "Bicluster")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.