View source: R/scmageck_eff_estimate.R
scmageck_eff_estimate | R Documentation |
"This function uses constrained linear least squares to calculate perturbation-rsponse score (PS), which measures heterogenous perturbation effects from single-cell CRISPR screens (e.g., Perturb-seq, CROP-seq)."
scmageck_eff_estimate(
rds_object,
bc_frame,
perturb_gene,
non_target_ctrl,
perturb_target_gene = NULL,
scale_factor = 3,
target_gene_min = 10,
target_gene_max = 500,
assay_for_cor = 'MAGIC_RNA',
subset_rds = TRUE,
scale_score = TRUE,
perturb_gene_exp_id_list = NULL,
lambda = 0,
background_correction=FALSE,
use_perturb_exp=TRUE,
logfc.threshold=0.1
)
rds_object |
A Seurat object or local RDS file path that contains the scRNA-seq dataset; or a path to RDS file. Note that the dataset has to be normalized and scaled. |
bc_frame |
A txt file to include cell identity information, generated from the cell identity collection step; or a corresponding data.frame. |
perturb_gene |
The list of perturbed genes. By default, all genes in the table are subject to regression. |
non_target_control |
The list of genes (separated by ",") served as negative controls. |
perturb_target_gene |
The list of target genes for modeling. If null, will automatic search and identify the target genes. |
scale_factor |
The upper bound of the constraints. Must be a positive value. Default 3. Assign a higher value for a more continuous distribution of the signa scores. |
target_gene_min |
The minimum number of genes selected for target genes. If DEG analysis does not provide enough number of genes that reach this number, the algorithm will iteratively (at most 3 times) decrease LFC threshold to select more genes. |
target_gene_max |
The maximum number of genes selected for target genes. |
assay_for_cor |
The assays used for estimating correlation. Default: MAGIC_RNA, which is generated by MAGIC. |
subset_rds |
Whether to return an R object that only contains cells that express guides targeting perturbed genes (or negative control genes). If TRUE, a gene column in metadata will be added (or updated) that assigns perturbed genes to single cells. Default: TRUE |
scale_score |
Whether to scale the scores for each gene to 1. Default: TRUE |
perturb_gene_exp_id_list |
If the perturbed_gene id is different from expression feature id, use this parameter to provide the corresponding expression features. Must be the same length as perturb_gene. Default: NULL |
lambda |
Sparse penalty (similar with the lambda value in LASSO reguession). Must be non-negative. Default: 0 |
background_correction |
Whether to extract background gene expression, which is estimated from negative control cells. Turn this option on will reduce false positives in datasets containing multiple cell types, where gene expressions may be largely different from different cell types. Default: False |
use_perturb_exp |
Whether to use the expression of perturbed gene in the estimation of efficiency score. If False, the perturbed gene (or the corresponding id in perturb_gene_exp_id_list) will be removed from estimating the score. Default: True |
logfc.threshold |
logfc threshold when determining the downstream targets of a perturbed gene. Default: 0.1 |
Returns a list of several items: eff_matrix: the PS score matrix containing the PS scores of each cells for each perturbed gene rds: the R object if subset_rds is set as TRUE optimization_matrix: the matrix used for actually performing the constrained optimizaiton target_gene_search_result: the results of target gene search for each perturbed gene
library(Seurat)
# set the BARCODE and RDS file path
# if you have a guide matrix, use guidematrix_to_triplet() function in scMAGeCK to convert guide matrix to BARCODE file
BARCODE = system.file("extdata","barcode_rec.txt",package = "scMAGeCK")
bc_frame=read.table(BARCODE,header = T,as.is = T)
# needs clean later, but cell identity will need to be fixed
bc_frame$cell=sub('-1','',bc_frame$cell)
## RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset
RDS = system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK")
rds_object=readRDS(RDS)
# Run scmageck_eff_estimate function
# By default, the result will be saved to the current working directory.
## rds_object<-assign_cell_identity(bc_frame,rds_object)
eff_object <- scmageck_eff_estimate(rds_object, bc_frame, perturb_gene='TP53',
non_target_ctrl = 'NonTargetingControlGuideForHuman',assay_for_cor='RNA')
eff_estimat=eff_object$eff_matrix
rds_subset=eff_object$rds
# TP53 scores clearly show the pattern of clustering
FeaturePlot(rds_subset,features='TP53_eff',reduction = 'tsne')
# whereas TP53 gene expression did not have this pattern
FeaturePlot(rds_subset,features='TP53',reduction = 'tsne')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.