cleanTagCounts | R Documentation |
Remove low-quality libraries from a count matrix where each row is a tag and each column corresponds to a cell-containing barcode.
cleanTagCounts(x, ...)
## S4 method for signature 'ANY'
cleanTagCounts(
x,
controls,
...,
ambient = NULL,
exclusive = NULL,
sparse.prop = 0.5
)
## S4 method for signature 'SummarizedExperiment'
cleanTagCounts(x, ..., assay.type = "counts")
x |
A numeric matrix-like object containing counts for each tag (row) in each cell (column). Alternatively, a SummarizedExperiment containing such a matrix. |
... |
For the generic, further arguments to pass to individual methods. For the SummarizedExperiment, further arguments to pass to the ANY method. For the ANY method, further arguments to pass to |
controls |
A vector specifying the rows of |
ambient |
A numeric vector of length equal to |
exclusive |
A character vector of names of mutually exclusive tags that should never be expressed on the same cell.
Alternatively, a list of vectors of mutually exclusive sets of tags - see |
sparse.prop |
Numeric scalar specifying the minimum proportion of tags that should be present per cell. |
assay.type |
Integer or string specifying the assay containing the count matrix. |
We remove cells for which there is no detectable ambient contamination.
Specifically, we expect non-zero counts for most tags due to the deeply sequenced nature of tag-based data.
If sparse.prop
or more tags have zero counts, this is indicative of a failure in library preparation for that cell.
We also remove cells for which the total control count is unusually high.
The control coverage is used as a proxy for non-specific binding, most notably from contamination of droplets by protein aggregates.
High levels of non-specific activity are undesirable as this masks the actual marker profile of affected cells.
The upper threshold is defined with isOutlier
on the log-total control count.
If controls
is missing, we instead compute the ambient scaling factor for each cell.
This represents the amount of ambient contamination - see ?ambientContribSparse
for more details -
and cells with unusually high values are assumed to be affected by protein aggregates.
High outliers are again identified and removed based on the log-ambient scale.
If controls
is missing and exclusive
is specified, the ambient scaling factor is computed by ambientContribNegative
instead.
This can be helpful for explicitly removing cells with impossible marker combinations,
though it is only as comprehensive as the knowledge of mutually exclusive marker sets.
A DataFrame with one row per column of x
, containing the following fields:
zero.ambient
, a logical field indicating whether each cell has zero ambient contamination.
sum.controls
, a numeric field containing the sum of counts for all control features.
Only present if controls
is supplied.
high.controls
, a logical field indicating whether each cell has unusually high control total.
Only present if controls
is supplied.
ambient.scale
, a numeric field specifying the relative amount of ambient contamination.
Only present if controls
is not supplied.
high.ambient
, a numeric field indicating whether each cell has unusually high ambient contamination.
Only present if controls
is not supplied.
discard
, a logical field indicating whether a column in x
should be discarded.
Aaron Lun
ambientContribSparse
, to estimate the ambient contamination for each droplet.
isOutlier
, to identify the outliers in a distribution of values.
x <- rbind(
rpois(1000, rep(c(100, 10), c(100, 900))),
rpois(1000, rep(c(20, 100, 20), c(100, 100, 800))),
rpois(1000, rep(c(30, 100, 30), c(200, 700, 100)))
)
# Adding a zero-ambient column plus a high-ambient column.
x <- cbind(0, x, 1000)
df <- cleanTagCounts(x)
df
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.