knitr::opts_chunk$set(echo = TRUE)
scMAGeCK is a computational model to identify genes associated with multiple expression phenotypes from CRISPR screening coupled with single-cell RNA sequencing data (CROP-seq).
scMAGeCK is based on our previous MAGeCK and MAGeCK-VISPR models for pooled CRISPR screens, but further extends to scRNA-seq as the readout of the screening experiment. scMAGeCK consists of two modules: scMAGeCK-Robust Rank Aggregation (RRA), a sensitive and precise algorithm to detect genes whose perturbation links to one single marker expression; and scMAGeCK-LR, a linear-regression based approach that unravels the perturbation effects on thousands of gene expressions, especially from cells undergo multiple perturbations.
library(scMAGeCK) ### BARCODE file contains cell identity information, generated from the cell identity collection step BARCODE <- system.file("extdata","barcode_rec.txt",package = "scMAGeCK") ### RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset RDS <- system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK") ### Set RRA executable file path. ### You can generate RRA executable file by following commands: ### wget https://bitbucket.org/weililab/scmageck/downloads/RRA_0.5.9.zip ### unzip RRA_0.5.9.zip ### cd RRA_0.5.9 ### make RRAPATH <- "/Library/RRA_0.5.9/bin/RRA" target_gene <- "MKI67" rra_result <- scmageck_rra(BARCODE=BARCODE, RDS=RDS, GENE=target_gene, RRAPATH=RRAPATH, LABEL='dox_mki67', NEGCTRL=NULL, KEEPTMP=FALSE, PATHWAY=FALSE, SAVEPATH=NULL)
library(scMAGeCK) ### BARCODE file contains cell identity information, generated from the cell identity collection step BARCODE <- system.file("extdata","barcode_rec.txt",package = "scMAGeCK") ### RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset RDS <- system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK") lr_result <- scmageck_lr(BARCODE=BARCODE, RDS=RDS, LABEL='dox_scmageck_lr', NEGCTRL = 'NonTargetingControlGuideForHuman', PERMUTATION = 1000, SAVEPATH=NULL, LAMBDA=0.01) lr_score <- lr_result[1] lr_score_pval <- lr_result[2]
The scmageck_rra function will output the ranking and p values of each perturbed genes, using the RRA program in MAGeCK. Users familiar with the MAGeCK program may find it similar with the gene_summary output in MAGeCK.
Here is the example output of scMAGeCK-RRA:
Row.names items_in_group.low lo_value.low p.low FDR.low goodsgrna.low items_in_group.high lo_value.high p.high FDR.high goodsgrna.high TP53 271 0.11832 0.95619 1 48 271 1.014e-83 4.9975e-06 0.00015 184
Explanations of each column are below:
|Column|Content|
|------|-------|
|Row.names| Perturbed gene name|
|items_in_group.low| The number of single-cells with each gene perturbed |
|lo_value.low | The RRA score in negative selection (reducing the marker expression if this gene is perturbed). The RRA score uses a p value from rank order statistics to measure the degree of selection; the smaller score, the stronger the selection is. More information on the calculation of RRA score can be found in our original MAGeCK paper. |
|p.low | The raw p-value (using permutation) of this gene in negative selection |
|FDR.low | The false discovery rate of this gene in negative selection |
|goodsgrna.low | The number of single-cells that passes the threshold and is considered in the RRA score calculation in negative selection|
|items_in_group.high| The same as items_in_group.low: the number of single-cells with each gene perturbed) |
|lo_value.high| The RRA score in positive selection (increasing the marker expression if this gene is perturbed|
|p.high| The raw p-value (using permutation) of this gene in positive selection |
|FDR.high| The false discovery rate of this gene in positive selection |
|goodsgrna.high| The number of single-cells that passes the threshold and is considered in the RRA score calculation in positive selection|
The scmageck_lr function will generate several files below:
|File|Description| |----|----------| |lr_score|The score (similar with log fold change) of each perturbed gene (rows) on each marker gene (columns)| |lr_score.pval|The associated p values of each score| |LR.RData|An R object to store scores and p values|
The format of score.txt and score.pval.txt is a simple table file with rows corresponding to perturbed genes and columns corresponding to marker genes. For example in the score.txt,
Perturbedgene APC ARID1A TP53 MKI67 APC 0.138075836476524 -0.0343441660045313 0.214449590551132 -0.150287676553705
This row records the effects of perturbing APC gene on the expressions of APC, ARID1A, TP53 and MKI67.
Questions? Comments? Join the MAGeCK Google group or email us (wli2@childrensnational.org) directly.
Any advice and suggestions will be greatly appreciated.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.