findTopCorrelations | R Documentation |
For each feature, find the subset of other features in the same or another modality that have strongest positive/negative Spearman's rank correlations in a pair of normalized expression matrices.
findTopCorrelations(x, number, ...)
## S4 method for signature 'ANY'
findTopCorrelations(
x,
number = 10,
y = NULL,
d = 50,
direction = c("both", "positive", "negative"),
subset.cols = NULL,
block = NULL,
equiweight = TRUE,
use.names = TRUE,
deferred = TRUE,
BSPARAM = IrlbaParam(),
BNPARAM = KmknnParam(),
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
findTopCorrelations(
x,
number,
y = NULL,
use.names = TRUE,
...,
assay.type = "logcounts"
)
x , y |
Normalized expression matrices containing features in the rows and cells in the columns. Each matrix should have the same set of columns but a different set of features, usually corresponding to different modes for the same cells. Alternatively, SummarizedExperiment objects containing such a matrix. Finally, |
number |
Integer scalar specifying the number of top correlated features to report for each feature in |
... |
For the generic, further arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. |
d |
Integer scalar specifying the number of dimensions to use for the approximate search via PCA.
If |
direction |
String specifying the sign of the correlations to search for. |
subset.cols |
Vector indicating the columns of |
block |
A vector or factor of length equal to the number of cells, specifying the block of origin for each cell. |
equiweight |
Logical scalar indicating whether each block should be given equal weight, if |
use.names |
Logical scalar specifying whether row names of For the SummarizedExperiment method, this may also be a string specifying the |
deferred |
Logical scalar indicating whether a fast deferred calculation should be used for the rank-based PCA. |
BSPARAM |
A BiocSingularParam object specifying the algorithm to use for the PCA. |
BNPARAM |
A BiocNeighborParam object specifying the algorithm to use for the neighbor search. |
BPPARAM |
A BiocParallelParam object specifying the parallelization scheme to use. |
assay.type |
String or integer scalar specifying the assay containing the matrix of interest in |
In most cases, we only care about the top-correlated features, allowing us to skip a lot of unnecessary computation. This is achieved by transforming the problem of finding the largest Spearman correlation into a nearest-neighbor search in rank space. For the sake of speed, we approximate the search by performing PCA to compress the rank values for all features.
For each direction, we compute the one-sided p-value for each feature using the approximate method implemented in cor.test
.
The FDR correction is performed by considering all possible pairs of features, as these are implicitly tested in the neighbor search.
Note that this is somewhat conservative as it does not consider strong correlations outside the reported features.
If block
is specified, correlations are computed separately for each block of cells.
For each feature pair, the reported rho
is set to the average of the correlations across all blocks.
Similarly, the p-value corresponding to each correlation is computed separately for each block and then combined across blocks with Stouffer's method.
If equiweight=FALSE
, the average correlation and each per-block p-value is weighted by the number of cells.
We only consider pairs of features that have computable correlations in at least one block.
Blocks are ignored if one or the other feature has tied values (typically zeros) for all cells in that block.
This means that a feature may not have any entries in feature1
if it forms no valid pairs, e.g., because it is not expressed.
Similarly, the total number of rows may be less than the maximum if insufficient valid pairs are available.
A List containing one or two DataFrames for results in each direction.
These are named "positive"
and "negative"
, and are generated according to direction
;
if direction="both"
, both DataFrames will be present.
Each DataFrame has up to nrow(x) * number
rows, containing the top number
correlated features for each feature in x
.
This contains the following fields:
feature1
, the name (character) or row index (integer) of each feature in x
.
Not all features may be reported here, see Details.
feature2
, the name (character) or row index (integer) of one of the top correlated features to feature1
.
This is another feature in x
if y=NULL
, otherwise it is a feature in y
.
rho
, the Spearman rank correlation for the current pair of feature1
and feature2
.
p.value
, the approximate p-value associated with rho
under the null hypothesis that the correlation is zero.
FDR
, the adjusted p-value.
The rows are sorted by feature1
and then p.value
.
Aaron Lun
computeCorrelations
, to compute correlations for all pairs of features.
library(scuttle)
sce1 <- mockSCE()
sce1 <- logNormCounts(sce1)
sce2 <- mockSCE(ngenes=20) # pretend this is CITE-seq data, or something.
sce2 <- logNormCounts(sce2)
# Top 20 correlated features in 'sce2' for each feature in 'sce1':
df <- findTopCorrelations(sce1, sce2, number=20)
df
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.