remove_redundancy | R Documentation |
remove_redundancy() takes as input A 'tbl' (with at least three columns for sample, feature and transcript abundance) or 'SummarizedExperiment' (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) for correlation method or | <DIMENSION 1> | <DIMENSION 2> | <...> | for reduced_dimensions method, and returns a consistent object (to the input) with dropped elements (e.g., samples).
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column,
Dim_b_column,
log_transform = NULL
)
## S4 method for signature 'spec_tbl_df'
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
## S4 method for signature 'tbl_df'
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
## S4 method for signature 'tidybulk'
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
## S4 method for signature 'SummarizedExperiment'
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
## S4 method for signature 'RangedSummarizedExperiment'
remove_redundancy(
.data,
.element = NULL,
.feature = NULL,
.abundance = NULL,
method,
of_samples = TRUE,
correlation_threshold = 0.9,
top = Inf,
transform = identity,
Dim_a_column = NULL,
Dim_b_column = NULL,
log_transform = NULL
)
.data |
A 'tbl' (with at least three columns for sample, feature and transcript abundance) or 'SummarizedExperiment' (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) |
.element |
The name of the element column (normally samples). |
.feature |
The name of the feature column (normally transcripts/genes) |
.abundance |
The name of the column including the numerical value the clustering is based on (normally transcript abundance) |
method |
A character string. The method to use, correlation and reduced_dimensions are available. The latter eliminates one of the most proximar pairs of samples in PCA reduced dimensions. |
of_samples |
A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column |
correlation_threshold |
A real number between 0 and 1. For correlation based calculation. |
top |
An integer. How many top genes to select for correlation based method |
transform |
A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity |
Dim_a_column |
A character string. For reduced_dimension based calculation. The column of one principal component |
Dim_b_column |
A character string. For reduced_dimension based calculation. The column of another principal component |
log_transform |
DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data) |
'r lifecycle::badge("maturing")'
This function removes redundant elements from the original data set (e.g., samples or transcripts). For example, if we want to define cell-type specific signatures with low sample redundancy. This function returns a tibble with dropped redundant elements (e.g., samples). Two redundancy estimation approaches are supported: (i) removal of highly correlated clusters of elements (keeping a representative) with method="correlation"; (ii) removal of most proximal element pairs in a reduced dimensional space.
Underlying method for correlation: widyr::pairwise_cor(sample, transcript,count, sort = TRUE, diag = FALSE, upper = FALSE)
Underlying custom method for reduced dimensions: select_closest_pairs = function(df) couples <- df |> head(n = 0)
while (df |> nrow() > 0) pair <- df |> arrange(dist) |> head(n = 1) couples <- couples |> bind_rows(pair) df <- df |> filter( !'sample 1' !'sample 2' )
couples
A tbl object with with dropped redundant elements (e.g., samples).
A tbl object with with dropped redundant elements (e.g., samples).
A tbl object with with dropped redundant elements (e.g., samples).
A tbl object with with dropped redundant elements (e.g., samples).
A 'SummarizedExperiment' object
A 'SummarizedExperiment' object
tidybulk::se_mini |>
identify_abundant() |>
remove_redundancy(
.element = sample,
.feature = transcript,
.abundance = count,
method = "correlation"
)
counts.MDS =
tidybulk::se_mini |>
identify_abundant() |>
reduce_dimensions( method="MDS", .dims = 3)
remove_redundancy(
counts.MDS,
Dim_a_column = `Dim1`,
Dim_b_column = `Dim2`,
.element = sample,
method = "reduced_dimensions"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.