suppressPackageStartupMessages(library(magrittr)) if (rlang::is_true(getOption("knitr.in.progress"))) { params_ <- scdrake::scdrake_list(params) } drake_cache_dir <- params_$drake_cache_dir drake::loadd( config_main, config_input_qc, empty_droplets, sce_valid_cells_info, barcode_ranks, qc_filter, custom_filter, sce_qc_filter_rowSums, sce_custom_filter_rowSums, path = drake_cache_dir ) cfg <- config_input_qc empty_droplets_enabled <- cfg$EMPTY_DROPLETS_ENABLED cell_filtering_enabled <- cfg$ENABLE_CELL_FILTERING gene_filtering_enabled <- cfg$ENABLE_GENE_FILTERING input_type <- cfg$INPUT_DATA$type filtering_type <- ifelse(cfg$SAVE_DATASET_SENSITIVE_FILTERING, "dataset-sensitive", "custom")
if (input_type == "cellranger") { scdrake::md_header("Input data: 10x Genomics Space Ranger data", 1) cat(scdrake::str_space( "The feature-barcode matrix was imported from", "[Space Ranger](https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/what-is-space-ranger)", "output (the official quantification tool from 10x Genomics)." )) } else if (input_type == "table") { scdrake::md_header("Input data: delimited text (table)", 1) cat("The feature-barcode matrix was imported from a delimited file.") } else if (input_type == "sce") { scdrake::md_header("Input data: `SingleCellExperiment` object", 1) cat("The object holding experimental data (feature-barcode matrix, gene annotation etc.) was imported from a Rds file.") }
Each row of feature-barcode matrix corresponds to a gene, while each column corresponds to a spot barcode. Summary of imported data:
cat(drake::readd(sce_raw_info, path = drake_cache_dir)$str)
r scdrake::format_used_functions("DropletUtils::read10xCounts()")
In droplet-based single cell RNA-seq, empty droplets often contain RNA from the ambient solution, resulting in non-zero counts after debarcoding. In spot-based spatial transcriptomics, a residual tissue can be accidentally placed on the spots, resulting in non-zero counts in such spot.
It is desired to discard such droplets/spots.
A useful diagnostic for both droplet- and spot- based data is the barcode rank plot, which shows the total UMI (log-)count for each barcode on the y-axis and the (log-)rank on the x-axis. This is effectively a transposed empirical cumulative density plot with log-transformed axes. It is useful as it allows examine the distribution of total UMI counts across barcodes, focusing on those with the largest counts.
uniq <- !duplicated(barcode_ranks$rank) plot(barcode_ranks$rank[uniq], barcode_ranks$total[uniq], log = "xy", xlab = "Rank", ylab = "Total") o <- order(barcode_ranks$rank) lines(barcode_ranks$rank[o], barcode_ranks$fitted[o], col = "red") abline(h = metadata(barcode_ranks)$knee, col = "dodgerblue", lty = 2) abline(h = metadata(barcode_ranks)$inflection, col = "forestgreen", lty = 2) if (empty_droplets_enabled) { abline(h = cfg$EMPTY_DROPLETS_LOWER, col = "firebrick", lty = 2) legend( "bottomleft", lty = 2, col = c("dodgerblue", "forestgreen", "firebrick"), legend = c("knee", "inflection", "emptyDroplets lower bound") ) } else { legend( "bottomleft", lty = 2, col = c("dodgerblue", "forestgreen"), legend = c("knee", "inflection") ) }
The knee and inflection points on the curve mark the transition between two components of the total UMI count distribution. This is assumed to represent the difference between empty droplets with little RNA and spots containing much more RNA.
if (empty_droplets_enabled) { cat( "The emptyDroplets lower bound specifies at or below which number of the total UMI count all barcodes", "are assumed to correspond to empty droplets." ) } else { cat("Removal of empty droplets was disabled. You can enable it by setting `EMPTY_DROPLETS_ENABLED` parameter to `TRUE`.") }
```r)}
*** # Gene + Spot quality filtering ## Pre-filtering QC Given sets of mitochondrial and ribosomal genes in the data, the `scater` package automatically calculates several per-spot QC metrics: - Number of UMI. - Number of detected genes (non-zero UMI count). - Percentage of expressed mitochondrial (ribosomal) genes ($\frac {UMI_{mitochondrial}} {UMI_{sum}} * 100$). Then we can use two different methods to filter spots based on the metrics above: - **Custom filtering**: a standard approach is to filter spots with low amount of reads as well as genes that are present in at least a certain amount of spots, using fixed thresholds. While simple, using fixed thresholds requires knowledge of the experiment and of the experimental protocol. - **Dataset-sensitive filtering**: an alternative approach is to use adaptive, data-driven thresholds to identify outlying spots, based on the set of QC metrics just calculated. We identify spots that are outliers for the various QC metrics, based on the median absolute deviation (MAD) from the median value of each QC metric across all spots. Specifically, a value is considered an outlier if it is more than `r cfg$MAD_THRESHOLD` MADs from the median in the "problematic" direction. Doublets detection and/or removal is not recomended for spot-based spatial transcriptomics data. Now we can plot some of the QC features. spots are colored by `discard_qc`, meaning if a spot would be discarded by MAD thresholding on a QC metric. ```r cowplot::plot_grid(plotlist = drake::readd(sce_unfiltered_plotlist, path = drake_cache_dir), ncol = 2)
Visualisation of prefiltering (raw) QC metrics in spatial coordinates.
pl <- plot_spat_visuals(drake::readd(sce_unfiltered, path = drake_cache_dir)) cowplot::plot_grid(plotlist = pl,ncol=2)
r scdrake::format_used_functions("scuttle::perCellQCMetrics()")
```r)}
```r)} cat("Spot filtering was disabled.")
```r)}
```r)} cat("Gene filtering was disabled.")
```r)}
```r)} cat("Spot filtering was disabled.")
```r)}
```r)} cat("Gene filtering was disabled.")
Final filtering selection: using r filtering_type
filtering.
cat(drake::readd(sce_final_input_qc_info, path = drake_cache_dir)$str)
scdrake::render_bootstrap_table(drake::readd(sce_history, path = drake_cache_dir), full_width = FALSE, position = "left")
print(drake::readd(sce_history_plot, path = drake_cache_dir))
Plots of QC metrics after dataset-sensitive filtering.
discard_custom
means if given spot was discarded in custom filtering.
cowplot::plot_grid(plotlist = drake::readd(sce_qc_filter_genes_plotlist, path = drake_cache_dir), ncol = 2)
Plots in spatial coordinates of QC metrics after dataset-sensitive filtering.
pl <- plot_spat_visuals(drake::readd(sce_qc_filter, path = drake_cache_dir)) cowplot::plot_grid(plotlist = pl,ncol=2)
Plots of QC metrics after custom filtering.
discard_qc
means if given spot was discarded in dataset-sensitive filtering.
cowplot::plot_grid(plotlist = drake::readd(sce_custom_filter_genes_plotlist, path = drake_cache_dir), ncol = 2)
Plots in spatial coordinates of QC metrics after custom filtering.
pl <- plot_spat_visuals(drake::readd(sce_custom_filter, path = drake_cache_dir)) cowplot::plot_grid(plotlist = pl,ncol=2)
r config_main$ANNOTATION_PKG
(vr sessioninfo::package_info(config_main$ANNOTATION_PKG, dependencies = FALSE)$loadedversion
),
).drake::readd(gene_annotation, path = drake_cache_dir) %>% head() %>% scdrake::render_bootstrap_table()
Show input parameters
Main config
print(config_main)
print(cfg)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.