pre_process: Pre-Process the data

View source: R/fct_02_pre_process.R

pre_processR Documentation

Pre-Process the data

Description

This function takes in user defined values to process the data for the EDA. Processing steps depend on data format, but generally includes missing value imputation, data filtering, and data transformations.

Usage

pre_process(
  data,
  missing_value = c("geneMedian", "treatAsZero", "geneMedianInGroup"),
  data_file_format = c(1, 2, 3),
  low_filter_fpkm,
  n_min_samples_fpkm,
  log_transform_fpkm,
  log_start_fpkm,
  min_counts,
  n_min_samples_count,
  counts_transform,
  counts_log_start,
  no_fdr
)

Arguments

data

Matrix of data that has already gone through convert_data()

missing_value

String indicating method to deal with missing data. This should be one of "geneMedian", "treatAsZero", or "geneMedianInGroup"

data_file_format

Integer indicating the data format. This should be one of 1 for read counts data, 2 for normalized expression, or 3 for fold changes and adjusted P-values

low_filter_fpkm

Integer for low count filter if data_file_format is normalized expression, NULL otherwise

n_min_samples_fpkm

Integer for minimum samples if data_file_format is normalized expression, NULL otherwise

log_transform_fpkm

TRUE/FALSE if a log transformation should be applied to normalized expression data

log_start_fpkm

Integer added to log transformation if data_file_format is normalized expression, NULL otherwise

min_counts

Numeric value for minimum count if data_file_format is read counts

n_min_samples_count

Integer for minimum libraries with min_counts if data_file_format is read counts

counts_transform

Integer to indicate which transformation to make if data_file_format is read counts. This should be one of 1 for log2(CPM+c) (EdgeR), 2 for variance stabilizing transformation (VST), or 3 for regulatized log (rlog)

counts_log_start

Integer added to log if counts_transform is log2(CPM + 2)

no_fdr

TRUE/FALSE to indicate fold-changes-only data with no p values if data_file_format is fold changes

Value

A list containing the transformed data, the mean kurtosis, the raw counts, a data type warning, the size of the original data, and p-values.

See Also

  • cpm() for information on calculating counts per million

  • vst() for information on variance stabilizing transformation

  • rlog() for information on the regularized log transformation

Other preprocess functions: chr_counts_ggplot(), chr_normalized_ggplot(), eda_boxplot(), eda_density(), eda_scatter(), gene_counts_ggplot(), individual_plots(), mean_sd_plot(), rRNA_counts_ggplot(), total_counts_ggplot()


espors/idepGolem documentation built on Oct. 27, 2024, 4:56 a.m.