runScrublet: Find doublets using 'scrublet'.
In compbiomed/singleCellTK: Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

View source: R/scrublet_doubletDetection.R

runScrublet

R Documentation

Find doublets using `scrublet`.

Description

A wrapper function that calls scrub_doublets from python module scrublet. Simulates doublets from the observed data and uses a k-nearest-neighbor classifier to calculate a continuous scrublet_score (between 0 and 1) for each transcriptome. The score is automatically thresholded to generate scrublet_call, a boolean array that is TRUE for predicted doublets and FALSE otherwise.

Usage

runScrublet(
  inSCE,
  sample = NULL,
  useAssay = "counts",
  simDoubletRatio = 2,
  nNeighbors = NULL,
  minDist = NULL,
  expectedDoubletRate = 0.1,
  stdevDoubletRate = 0.02,
  syntheticDoubletUmiSubsampling = 1,
  useApproxNeighbors = TRUE,
  distanceMetric = "euclidean",
  getDoubletNeighborParents = FALSE,
  minCounts = 3,
  minCells = 3L,
  minGeneVariabilityPctl = 85,
  logTransform = FALSE,
  meanCenter = TRUE,
  normalizeVariance = TRUE,
  nPrinComps = 30L,
  tsneAngle = NULL,
  tsnePerplexity = NULL,
  verbose = TRUE,
  seed = 12345
)

Arguments

`inSCE`	A SingleCellExperiment object.
`sample`	Character vector or colData variable name. Indicates which sample each cell belongs to. Default `NULL`.
`useAssay`	A string specifying which assay in the SCE to use. Default `"counts"`.
`simDoubletRatio`	Numeric. Number of doublets to simulate relative to the number of observed transcriptomes. Default `2.0`.
`nNeighbors`	Integer. Number of neighbors used to construct the KNN graph of observed transcriptomes and simulated doublets. If `NULL`, this is set to `round(0.5 * sqrt(n_cells))`. Default `NULL`.
`minDist`	Float Determines how tightly UMAP packs points together. If `NULL`, this is set to `0.1`. Default `NULL`.
`expectedDoubletRate`	The estimated doublet rate for the experiment. Default `0.1`.
`stdevDoubletRate`	Uncertainty in the expected doublet rate. Default `0.02`.
`syntheticDoubletUmiSubsampling`	Numeric. Rate for sampling UMIs when creating synthetic doublets. If `1.0`, each doublet is created by simply adding the UMIs from two randomly sampled observed transcriptomes. For values less than 1, the UMI counts are added and then randomly sampled at the specified rate. Defuault `1.0`.
`useApproxNeighbors`	Boolean. Use approximate nearest neighbor method (annoy) for the KNN classifier. Default `TRUE`.
`distanceMetric`	Character. Distance metric used when finding nearest neighbors. See detail. Default `"euclidean"`.
`getDoubletNeighborParents`	Boolean. If `TRUE`, return the parent transcriptomes that generated the doublet neighbors of each observed transcriptome. This information can be used to infer the cell states that generated a given doublet state. Default `FALSE`.
`minCounts`	Numeric. Used for gene filtering prior to PCA. Genes expressed at fewer than `minCounts` in fewer than `minCells` are excluded. Default `3`.
`minCells`	Integer. Used for gene filtering prior to PCA. Genes expressed at fewer than `minCounts` in fewer than `minCells` are excluded. Default `3`.
`minGeneVariabilityPctl`	Numeric. Used for gene filtering prior to PCA. Keep the most highly variable genes (in the top `minGeneVariabilityPctl` percentile), as measured by the v-statistic (Klein et al., Cell 2015). Default `85`.
`logTransform`	Boolean. If `TRUE`, log-transform the counts matrix (log1p(TPM)). `sklearn.decomposition.TruncatedSVD` will be used for dimensionality reduction, unless `meanCenter` is `TRUE`. Default `FALSE`.
`meanCenter`	If `TRUE`, center the data such that each gene has a mean of `0`. `sklearn.decomposition.PCA` will be used for dimensionality reduction. Default `TRUE`.
`normalizeVariance`	Boolean. If `TRUE`, normalize the data such that each gene has a variance of 1. `sklearn.decomposition.TruncatedSVD` will be used for dimensionality reduction, unless `meanCenter` is `TRUE`. Default `TRUE`.
`nPrinComps`	Integer. Number of principal components used to embed the transcriptomes prior to k-nearest-neighbor graph construction. Default `30`.
`tsneAngle`	Float. Determines angular size of a distant node as measured from a point in the t-SNE plot. If `NULL`, it is set to `0.5`. Default `NULL`.
`tsnePerplexity`	Integer. The number of nearest neighbors that is used in other manifold learning algorithms. If `NULL`, it is set to 30. Default `NULL`.
`verbose`	Boolean. If `TRUE`, print progress updates. Default `TRUE`.
`seed`	Seed for the random number generator, can be set to `NULL`. Default `12345`.

Details

For the list of valid values for distanceMetric, see the documentation for annoy (if useApproxNeighbors is TRUE) or sklearn.neighbors.NearestNeighbors (if useApproxNeighbors is FALSE).

Value

A SingleCellExperiment object with scrub_doublets output appended to the colData slot. The columns include scrublet_score and scrublet_call.

Examples

data(scExample, package = "singleCellTK")
## Not run: 
sce <- subsetSCECols(sce, colData = "type != 'EmptyDroplet'")
sce <- runScrublet(sce)

## End(Not run)

compbiomed/singleCellTK documentation built on Feb. 24, 2025, 2:43 p.m.

compbiomed/singleCellTK index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

compbiomed/singleCellTK
Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

runScrublet: Find doublets using 'scrublet'.
In compbiomed/singleCellTK: Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

Find doublets using `scrublet`.

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to runScrublet in compbiomed/singleCellTK...

R Package Documentation

Browse R Packages

We want your feedback!

compbiomed/singleCellTK Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

runScrublet: Find doublets using 'scrublet'. In compbiomed/singleCellTK: Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

Find doublets using scrublet.

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to runScrublet in compbiomed/singleCellTK...

R Package Documentation

Browse R Packages

We want your feedback!

compbiomed/singleCellTK
Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

runScrublet: Find doublets using 'scrublet'.
In compbiomed/singleCellTK: Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

Find doublets using `scrublet`.