testDA_voom | R Documentation |
Calculate tests for differential abundance of cell populations using method 'diffcyt-DA-voom'
testDA_voom(
d_counts,
design,
contrast,
block_id = NULL,
min_cells = 3,
min_samples = NULL,
normalize = FALSE,
norm_factors = "TMM",
plot = FALSE,
path = "."
)
d_counts |
|
design |
Design matrix, created with |
contrast |
Contrast matrix, created with |
block_id |
(Optional) Vector or factor of block IDs (e.g. patient IDs) for paired
experimental designs, to be included as random effects. If provided, the block IDs
will be included as random effects using the |
min_cells |
Filtering parameter. Default = 3. Clusters are kept for differential
testing if they have at least |
min_samples |
Filtering parameter. Default = |
normalize |
Whether to include optional normalization factors to adjust for composition effects (see details). Default = FALSE. |
norm_factors |
Normalization factors to use, if |
plot |
Whether to save diagnostic plots for the |
path |
Path for diagnostic plots, if |
Calculates tests for differential abundance of clusters, using functions from the
limma
package and voom
method.
This method uses the limma
package (Ritchie et al. 2015, Nucleic
Acids Research) to fit models and calculate moderated tests at the cluster level.
Moderated tests improve statistical power by sharing information on variability (i.e.
variance across samples for a single cluster) between clusters. Since count data are
often heteroscedastic, we use the voom
method (Law et al. 2014,
Genome Biology) to transform the raw cluster cell counts and estimate
observation-level weights to stabilize the mean-variance relationship. Diagnostic plots
are shown if plot = TRUE
.
The experimental design must be specified using a design matrix, which can be created
with createDesignMatrix
. Flexible experimental designs are possible,
including blocking (e.g. paired designs), batch effects, and continuous covariates. See
createDesignMatrix
for more details.
For paired designs, either fixed effects or random effects can be used. Fixed effects
are simpler, but random effects may improve power in data sets with unbalanced designs
or very large numbers of samples. To use fixed effects, provide the block IDs (e.g.
patient IDs) to createDesignMatrix
. To use random effects, provide the
block_id
argument here instead. This will make use of the limma
duplicateCorrelation
methodology. Note that >2 measures per sample are
not possible in this case (fixed effects should be used instead). Block IDs should not
be included in the design matrix if the limma
duplicateCorrelation
methodology is used.
The contrast matrix specifying the contrast of interest can be created with
createContrast
. See createContrast
for more details.
Filtering: Clusters are kept for differential testing if they have at least
min_cells
cells in at least min_samples
samples. This removes clusters
with very low cell counts across conditions, to improve power.
Normalization for the total number of cells per sample (library sizes) and total number
of cells per cluster is automatically performed by the limma
and voom
functions. Optional normalization factors can also be included to adjust for
composition effects in the cluster cell counts per sample. For example, in an extreme
case, if several additional clusters are present in only one condition, while all other
clusters are approximately equally abundant between conditions, then simply normalizing
by the total number of cells per sample will create a false positive differential
abundance signal for the non-differential clusters. (For a detailed explanation in the
context of RNA sequencing gene expression, see Robinson and Oshlack, 2010.)
Normalization factors can be calculated automatically using the 'trimmed mean of
M-values' (TMM) method (Robinson and Oshlack, 2010), implemented in the edgeR
package (see also the edgeR
User's Guide for details). Alternatively, a vector
of values can be provided (the values should multiply to 1).
Returns a new SummarizedExperiment
object, with differential test
results stored in the rowData
slot. Results include raw p-values
(p_val
) and adjusted p-values (p_adj
) from the limma
moderated
tests, which can be used to rank clusters by evidence for differential abundance.
Additional output columns from the limma
tests are also included. The results
can be accessed with the rowData
accessor function.
# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline,
# see the package vignette.
# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
d
}
# Create random data (without differential signal)
set.seed(123)
d_input <- list(
sample1 = d_random(),
sample2 = d_random(),
sample3 = d_random(),
sample4 = d_random()
)
# Add differential abundance (DA) signal
ix_DA <- 801:900
ix_cols_type <- 1:10
d_input[[3]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
d_input[[4]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
experiment_info <- data.frame(
sample_id = factor(paste0("sample", 1:4)),
group_id = factor(c("group1", "group1", "group2", "group2")),
stringsAsFactors = FALSE
)
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", 1:20)),
marker_name = paste0("marker", sprintf("%02d", 1:20)),
marker_class = factor(c(rep("type", 10), rep("state", 10)),
levels = c("type", "state", "none")),
stringsAsFactors = FALSE
)
# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)
# Transform data
d_se <- transformData(d_se)
# Generate clusters
d_se <- generateClusters(d_se)
# Calculate counts
d_counts <- calcCounts(d_se)
# Create design matrix
design <- createDesignMatrix(experiment_info, cols_design = "group_id")
# Create contrast matrix
contrast <- createContrast(c(0, 1))
# Test for differential abundance (DA) of clusters
res_DA <- testDA_voom(d_counts, design, contrast)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.