correct_batch_effects | R Documentation |
Batch correction of normalized data. Batch correction brings each feature in each batch to the comparable shape. Currently the following batch correction functions are implemented:
Per-feature median centering:
center_feature_batch_medians_df()
.
Median centering of the features (per batch median).
correction with ComBat: correct_with_ComBat_df()
.
Adjusts for discrete batch effects using ComBat. ComBat, described in
Johnson et al. 2007. It uses either parametric or
non-parametric empirical Bayes frameworks for adjusting data for batch
effects. Users are returned an expression matrix that has been corrected for
batch effects. The input data are assumed to be free of missing values
and normalized before batch effect removal. Please note that missing values
are common in proteomics, which is why in some cases corrections like
center_peptide_batch_medians_df
are more appropriate.
Continuous drift correction: adjust_batch_trend_df()
.
Adjust batch signal trend with the custom (continuous) fit.
Should be followed by discrete corrections,
e.g. center_feature_batch_medians_df()
or
correct_with_ComBat_df()
.
Alternatively, one can call the correction function with
correct_batch_effects_df()
wrapper.
Batch correction method allows correction of
continuous signal drift within batch (if required) and adjustment for
discrete difference across batches.
center_feature_batch_medians_df(df_long, sample_annotation = NULL,
sample_id_col = "FullRunName", batch_col = "MS_batch",
feature_id_col = "peptide_group_label", measure_col = "Intensity",
keep_all = "default", no_fit_imputed = TRUE, qual_col = NULL,
qual_value = NULL)
center_feature_batch_medians_dm(data_matrix, sample_annotation,
sample_id_col = "FullRunName", batch_col = "MS_batch",
feature_id_col = "peptide_group_label", measure_col = "Intensity")
center_feature_batch_means_df(df_long, sample_annotation = NULL,
sample_id_col = "FullRunName", batch_col = "MS_batch",
feature_id_col = "peptide_group_label", measure_col = "Intensity",
keep_all = "default", no_fit_imputed = TRUE, qual_col = NULL,
qual_value = NULL)
center_feature_batch_means_dm(data_matrix, sample_annotation,
sample_id_col = "FullRunName", batch_col = "MS_batch",
feature_id_col = "peptide_group_label", measure_col = "Intensity")
adjust_batch_trend_df(df_long, sample_annotation = NULL,
batch_col = "MS_batch", feature_id_col = "peptide_group_label",
sample_id_col = "FullRunName", measure_col = "Intensity",
order_col = "order", keep_all = "default",
fit_func = "loess_regression", no_fit_imputed = TRUE,
qual_col = NULL, qual_value = NULL, min_measurements = 8, ...)
adjust_batch_trend_dm(data_matrix, sample_annotation,
batch_col = "MS_batch", feature_id_col = "peptide_group_label",
sample_id_col = "FullRunName", measure_col = "Intensity",
order_col = "order", fit_func = "loess_regression",
return_fit_df = TRUE, min_measurements = 8, ...)
correct_with_ComBat_df(df_long, sample_annotation = NULL,
feature_id_col = "peptide_group_label", measure_col = "Intensity",
sample_id_col = "FullRunName", batch_col = "MS_batch",
par.prior = TRUE, no_fit_imputed = TRUE, qual_col = NULL,
qual_value = NULL, keep_all = "default")
correct_with_ComBat_dm(data_matrix, sample_annotation = NULL,
feature_id_col = "peptide_group_label", measure_col = "Intensity",
sample_id_col = "FullRunName", batch_col = "MS_batch",
par.prior = TRUE)
correct_batch_effects_df(df_long, sample_annotation,
continuous_func = NULL, discrete_func = c("MedianCentering",
"MeanCentering", "ComBat"), batch_col = "MS_batch",
feature_id_col = "peptide_group_label",
sample_id_col = "FullRunName", measure_col = "Intensity",
order_col = "order", keep_all = "default", no_fit_imputed = TRUE,
qual_col = NULL, qual_value = NULL, min_measurements = 8, ...)
correct_batch_effects_dm(data_matrix, sample_annotation,
continuous_func = NULL, discrete_func = c("MedianCentering",
"ComBat"), batch_col = "MS_batch",
feature_id_col = "peptide_group_label",
sample_id_col = "FullRunName", measure_col = "Intensity",
order_col = "order", min_measurements = 8, ...)
df_long |
data frame where each row is a single feature in a single
sample. It minimally has a |
sample_annotation |
data frame with:
.
See |
sample_id_col |
name of the column in |
batch_col |
column in |
feature_id_col |
name of the column with feature/gene/peptide/protein
ID used in the long format representation |
measure_col |
if |
keep_all |
when transforming the data (normalize, correct) - acceptable values: all/default/minimal (which set of columns be kept). |
no_fit_imputed |
(logical) whether to use imputed (requant) values, as flagged in
|
qual_col |
column to color point by certain value denoted
by |
qual_value |
value in |
data_matrix |
features (in rows) vs samples (in columns) matrix, with
feature IDs in rownames and file/sample names as colnames.
See "example_proteome_matrix" for more details (to call the description,
use |
order_col |
column in |
fit_func |
function to fit the (non)-linear trend |
min_measurements |
the number of samples in a batch required for curve fitting. |
... |
other parameters, usually of |
return_fit_df |
(logical) whether to return the |
par.prior |
use parametrical or non-parametrical prior |
continuous_func |
function to use for the fit (currently
only |
discrete_func |
function to use for adjustment of discrete batch effects
( |
the data in the same format as input (data_matrix
or
df_long
).
For df_long
the data frame stores the original values of
measure_col
in another column called "preBatchCorr_[measure_col]", and the normalized
values in measure_col
column.
The function adjust_batch_trend_dm()
, if return_fit_df
is
TRUE
returns list of two items:
data_matrix
fit_df
, used to examine the fitting curves
fit_nonlinear
fit_nonlinear
, plot_with_fitting_curve
fit_nonlinear
, plot_with_fitting_curve
#Median centering per feature per batch:
median_centered_df <- center_feature_batch_medians_df(
example_proteome, example_sample_annotation)
#Correct with ComBat:
combat_corrected_df <- correct_with_ComBat_df(example_proteome,
example_sample_annotation)
#Adjust the MS signal drift:
test_peptides = unique(example_proteome$peptide_group_label)[1:3]
test_peptide_filter = example_proteome$peptide_group_label %in% test_peptides
test_proteome = example_proteome[test_peptide_filter,]
adjusted_df <- adjust_batch_trend_df(test_proteome,
example_sample_annotation, span = 0.7,
min_measurements = 8)
plot_fit <- plot_with_fitting_curve(unique(adjusted_df$peptide_group_label),
df_long = adjusted_df, measure_col = 'preTrendFit_Intensity',
fit_df = adjusted_df, sample_annotation = example_sample_annotation)
#Correct the data in one go:
batch_corrected_matrix <- correct_batch_effects_df(example_proteome,
example_sample_annotation,
continuous_func = 'loess_regression',
discrete_func = 'MedianCentering',
batch_col = 'MS_batch',
span = 0.7, min_measurements = 8)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.