knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = TRUE, out.width = "100%" )
We can use metflow2
for data normalization and data integration.
First, we need to prepare samples for metflow2
.
The peak table (csv format) can be from any software. We recomment that you use the Peak_table_for_cleaning.csv
from processData()
function from metflow2
.
If you use other software, please make sure that the top 3 columns are name
(peak name), mz
and rt
(rentention time, second). And the left column are sample intensity.
![](../man/figures/Screen Shot 2020-04-01 at 1.07.37 PM.png)
We need the sample information (csv format) to define the detailed information of samples. Column 1 is sample.name
, column 2 is injection.order
, column 3 is class
(such as Subject, QC, Blank), column 4 is batch
and column 5 is group
(such as control and case).
![](../man/figures/Screen Shot 2020-04-02 at 8.40.02 AM.png)
Then place the peak table and sample information in a folder. We use the demo data from demoData
package.
library(metflow2) library(demoData) library(tidyverse)
##create a folder named as example path <- file.path(".", "example") dir.create(path = path, showWarnings = FALSE)
##get demo data demo_data <- system.file("metflow2", package = "demoData") file.copy(from = file.path(demo_data, dir(demo_data)), to = path, overwrite = TRUE, recursive = TRUE)
Here, we have two peak tables, batch1.data.csv
and batch2.data.csv
, and sample_info.csv
are in your ./example
folder.
metflowClass
objectobject <- create_metflow_object( ms1.data = c("batch1.data.csv", "batch2.data.csv"), sample.information = "sample_info.csv", path = path )
object
is a metflowClass
object, so you can print it in the console.
Because there are two batch peak tables, so first we must align them.
object <- align_batch( object = object, combine.mz.tol = 15, combine.rt.tol = 30, use.int.tol = FALSE )
object2 <- filter_peaks( object = object, min.fraction = 0.5, type = "any", min.subject.blank.ratio = 2, according.to = "class", which.group = "QC" )
Nest, we should remove some samples which have a lot of missing values.
object2 <- filter_samples(object = object2, min.fraction.peak = 0.9)
object2 <- impute_mv(object = object2, method = "knn") object2
Now we can normalize data using different methods.
object3 <- normalize_data(object = object2, method = "mean")
object3 <- normalize_data(object = object2, method = "svr", threads = 1)
# object3 <- normalize_data(object = object2, method = "pqn")
After data normaliztion, you can use the get_peak_int_distribution()
function to see each peak intensity distributation plot.
get_peak_int_distribution(object = object3, peak_name = "M114T670", interactive = TRUE)
get_peak_int_distribution(object = object2, peak_name = "M114T670", interactive = TRUE)
Then we can use the integrate_data()
function to do data integration.
object4 <- integrate_data(object = object3, method = "qc.mean")
We can also get the RSDs of all the peaks before and after data normalization and data integration.
rsd2 <- calculate_rsd(object = object2, slot = "QC") rsd4 <- calculate_rsd(object = object4, slot = "QC")
Then we can draw the comprison plot:
library(ggplot2) dplyr::left_join(rsd2, rsd4, by = c("index", "name")) %>% dplyr::mutate(class = dplyr::case_when(rsd.y < rsd.x ~ "Decrease", rsd.y > rsd.x ~ "Increase", rsd.y == rsd.y ~ "Equal")) %>% ggplot(aes(rsd.x, rsd.y, colour = class)) + ggsci::scale_color_jama() + geom_abline(slope = 1, intercept = 0) + geom_point() + labs(x = "RSD after normalization", y = "RSD before normalization") + theme_bw()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.