suppressPackageStartupMessages(library(magrittr)) if (rlang::is_true(getOption("knitr.in.progress"))) { params_ <- scdrake::scdrake_list(params) } drake_cache_dir <- params_$drake_cache_dir drake::loadd( config_main, config_input_qc, empty_droplets, sce_valid_cells_info, barcode_ranks, qc_filter, custom_filter, sce_qc_filter_rowSums, sce_custom_filter_rowSums, path = drake_cache_dir ) cfg <- config_input_qc empty_droplets_enabled <- cfg$EMPTY_DROPLETS_ENABLED cell_filtering_enabled <- cfg$ENABLE_CELL_FILTERING gene_filtering_enabled <- cfg$ENABLE_GENE_FILTERING input_type <- cfg$INPUT_DATA$type filtering_type <- ifelse(cfg$SAVE_DATASET_SENSITIVE_FILTERING, "dataset-sensitive", "custom")
if (input_type == "cellranger") { scdrake::md_header("Input data: 10x Genomics Cell Ranger data", 1) cat(scdrake::str_space( "The feature-barcode matrix was imported from", "[Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger)", "output (the official quantification tool from 10x Genomics)." )) } else if (input_type == "table") { scdrake::md_header("Input data: delimited text (table)", 1) cat("The feature-barcode matrix was imported from a delimited file.") } else if (input_type == "sce") { scdrake::md_header("Input data: `SingleCellExperiment` object", 1) cat("The object holding experimental data (feature-barcode matrix, gene annotation etc.) was imported from a Rds file.") }
Each row of feature-barcode matrix corresponds to a gene, while each column corresponds to a cell barcode. Summary of imported data:
cat(drake::readd(sce_raw_info, path = drake_cache_dir)$str)
r scdrake::format_used_functions("DropletUtils::read10xCounts()")
Empty droplets often contain RNA from the ambient solution, resulting in non-zero counts after debarcoding. It is desired to discard such droplets.
A useful diagnostic for droplet-based data is the barcode rank plot, which shows the total UMI (log-)count for each barcode on the y-axis and the (log-)rank on the x-axis. This is effectively a transposed empirical cumulative density plot with log-transformed axes. It is useful as it allows examine the distribution of total UMI counts across barcodes, focusing on those with the largest counts.
uniq <- !duplicated(barcode_ranks$rank) plot(barcode_ranks$rank[uniq], barcode_ranks$total[uniq], log = "xy", xlab = "Rank", ylab = "Total") o <- order(barcode_ranks$rank) lines(barcode_ranks$rank[o], barcode_ranks$fitted[o], col = "red") abline(h = metadata(barcode_ranks)$knee, col = "dodgerblue", lty = 2) abline(h = metadata(barcode_ranks)$inflection, col = "forestgreen", lty = 2) if (empty_droplets_enabled) { abline(h = cfg$EMPTY_DROPLETS_LOWER, col = "firebrick", lty = 2) legend( "bottomleft", lty = 2, col = c("dodgerblue", "forestgreen", "firebrick"), legend = c("knee", "inflection", "emptyDroplets lower bound") ) } else { legend( "bottomleft", lty = 2, col = c("dodgerblue", "forestgreen"), legend = c("knee", "inflection") ) }
The knee and inflection points on the curve mark the transition between two components of the total UMI count distribution. This is assumed to represent the difference between empty droplets with little RNA and cell-containing droplets with much more RNA.
if (empty_droplets_enabled) { cat( "The emptyDroplets lower bound specifies at or below which number of the total UMI count all barcodes", "are assumed to correspond to empty droplets." ) } else { cat("Removal of empty droplets was disabled. You can enable it by setting `EMPTY_DROPLETS_ENABLED` parameter to `TRUE`.") }
```r)}
*** # Gene + Cell quality filtering ## Pre-filtering QC Given sets of mitochondrial and ribosomal genes in the data, the `scater` package automatically calculates several per-cell QC metrics: - Number of UMI. - Number of detected genes (non-zero UMI count). - Percentage of expressed mitochondrial genes ($\frac {UMI_{mitochondrial}} {UMI_{sum}} * 100$). Then we can use two different methods to filter cells based on the metrics above: - **Custom filtering**: a standard approach is to filter cells with low amount of reads as well as genes that are present in at least a certain amount of cells, using fixed thresholds. While simple, using fixed thresholds requires knowledge of the experiment and of the experimental protocol. - **Dataset-sensitive filtering**: an alternative approach is to use adaptive, data-driven thresholds to identify outlying cells, based on the set of QC metrics just calculated. We identify cells that are outliers for the various QC metrics, based on the median absolute deviation (MAD) from the median value of each QC metric across all cells. Specifically, a value is considered an outlier if it is more than `r cfg$MAD_THRESHOLD` MADs from the median in the "problematic" direction. Additionaly, extremely high number of detected genes could indicate doublets (more sensitive doublet detection is done after library normalization). However, depending on the cell type composition in your sample, you may have cells with higher number of genes (and also higher counts) from one cell type. Now we can plot some of the QC features. Cells are colored by `discard_qc`, meaning if a cell would be discarded by MAD thresholding on a QC metric. ```r cowplot::plot_grid(plotlist = drake::readd(sce_unfiltered_plotlist, path = drake_cache_dir), ncol = 2)
r scdrake::format_used_functions("scuttle::perCellQCMetrics()")
```r)}
```r)} cat("Cell filtering was disabled.")
```r)}
```r)} cat("Gene filtering was disabled.")
```r)}
```r)} cat("Cell filtering was disabled.")
```r)}
```r)} cat("Gene filtering was disabled.")
Final filtering selection: using r filtering_type
filtering.
cat(drake::readd(sce_final_input_qc_info, path = drake_cache_dir)$str)
scdrake::render_bootstrap_table(drake::readd(sce_history, path = drake_cache_dir), full_width = FALSE, position = "left")
print(drake::readd(sce_history_plot, path = drake_cache_dir))
Plots of QC metrics after dataset-sensitive filtering.
discard_custom
means if given cell was discarded in custom filtering.
cowplot::plot_grid(plotlist = drake::readd(sce_qc_filter_genes_plotlist, path = drake_cache_dir), ncol = 2)
Plots of QC metrics after custom filtering.
discard_qc
means if given cell was discarded in dataset-sensitive filtering.
cowplot::plot_grid(plotlist = drake::readd(sce_custom_filter_genes_plotlist, path = drake_cache_dir), ncol = 2)
r config_main$ANNOTATION_PKG
(vr sessioninfo::package_info(config_main$ANNOTATION_PKG, dependencies = FALSE)$loadedversion
),
).drake::readd(gene_annotation, path = drake_cache_dir) %>% head() %>% scdrake::render_bootstrap_table()
Show input parameters
Main config
print(config_main)
print(cfg)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.