  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = TRUE,
  out.width = "100%"

We can use metflow2 for missing value (MV) imputation.

First, we need to prepare samples for metflow2.

Data preparation

Peak table

The peak table (csv format) can be from any software. We recommend that you use the Peak_table_for_cleaning.csv from processData() function from metflow2.

If you use other software, please make sure that the top 3 columns are name (peak name), mz and rt (retention time, second). And the left column are sample intensity.

![](../man/figures/Screen Shot 2020-04-01 at 1.07.37 PM.png)

Sample information

We need the sample information (csv format) to define the detailed information of samples. Column 1 is, column 2 is injection.order, column 3 is class (such as Subject, QC, Blank), column 4 is batch and column 5 is group (such as control and case).

![](../man/figures/Screen Shot 2020-04-02 at 8.40.02 AM.png)

Read data

Then place the peak table and sample information in a folder. We use the demo data from demoData package.


Load demo data

##creat a folder nameed as example
path <- file.path(".", "example")
dir.create(path = path, showWarnings = FALSE)
##get demo data
demo_data <- system.file("metflow2", package = "demoData")

file.copy(from = file.path(demo_data, dir(demo_data)), 
          to = path, overwrite = TRUE, recursive = TRUE)

Here, we have two peak tables, and, and sample_info.csv are in your ./example folder.

Creat metflowClass object

object <-
  create_metflow_object( = c("", ""),
    sample.information = "sample_info.csv",
    path = path

object is a metflowClass object, so you can print it in the console.


Align different batches

Because there are two batch peak tables, so first we must align them.

object <- align_batch(
  object = object, = 15,
  combine.rt.tol = 30, = FALSE

Missing value processing

First, we should remove some peaks and samples which have a lot of missing values.

Remove noisy peaks and outlier samples

We use filter_peaks() function to filter noisy peaks.

object2 <- filter_peaks(
  object = object,
  min.fraction = 0.5,
  type = "any",
  min.subject.blank.ratio = 2, = "class", = "QC"

There are three creteria in filter_peaks() to remove peaks:

If you want to remove peaks according to peaks NA in samples. You should what groups you want to use. For example, if you want to remove peaks which have more than 50% NA in QC samples, you can set as class, because QC group is defined in class column in, then the should be set as QC, and min.fraction as 0.5.

object2 <- filter_peaks(
  object = object,
  min.fraction = 0.5, = "class", = "QC"

If you want to remove peaks which have more than 50% NAs in QC and/or Subject samples.You can set as c("QC", "Subject"), a vector. And the type is all means the peaks should meet the min.fraction in QC and Subject, and any means that the peaks meet the min.fraction in QC or Subject.

object2 <- filter_peaks(
  object = object,
  min.fraction = 0.5, = "class", = c("QC", "Subject"),
  type = "all"
object2 <- filter_peaks(
  object = object,
  min.fraction = 0.5, = "class", = c("QC", "Subject"),
  type = "any"

If you have Blank samples in your data, you can also remove some peaks according to Blank samples. If min.subject.blank.ratio is set as 2, it means thay only the peak whoes intensitys in samples is higher than 2 times of intensitys in Blank samples will be left. If min.subject.blank.ratio is set < 1, no peaks will be removed.

object2 <- filter_peaks(
  object = object,
  min.fraction = 0.5, = "class", = c("QC", "Subject"),
  type = "any",
  min.subject.blank.ratio = 0
object2 <- filter_peaks(
  object = object,
  min.fraction = 0.5, = "class", = c("QC", "Subject"),
  type = "all",
  min.subject.blank.ratio = 2

After remove the noisy peaks, there are only 5864 peaks.

Remove outlier samples

Nest, we should remove some samples which have a lot of missing values.

object2 <- filter_samples(object = object2,
                          min.fraction.peak = 0.5)

min.fraction.peak is set as 0.8 means that only the Subject or QC samples with more than 80% peaks are no missing values will be left.

We can get the missing values distribution in samples:

get_mv_plot_samples(object = object2, interactive = TRUE)

Missing value imputation

The function impute_mv() is used to do missing value imputation.

object2 <- impute_mv(object = object2,
                     method = "knn")

Note: Only the Subject and QC samples are imputed.

So now, the object2 have been imputed using KNN method.

If you want to output the peak_table and sample_info, you can use get_data() function.

##get the peak table
peak_table2 <- get_data(object = object2, slot = "peak.table")
##get the sample inforamtion
sample_info2 <- get_data(object = object2, slot = "")

jaspershen/metflow2 documentation built on Aug. 15, 2021, 4:38 p.m.