preprocess: (Internal function) Perform the pre-processing step of ipcaps

Description Usage Arguments Value

View source: R/preprocess.R

Description

(Internal function) Perform the pre-processing step of ipcaps

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
preprocess(
  files,
  label.file,
  lab.col,
  rdata.infile,
  bed.infile,
  cate.list,
  result.dir,
  threshold,
  min.fst,
  reanalysis = FALSE,
  method = "mix",
  min.in.group = 20,
  datatype = "snp",
  nonlinear = FALSE,
  missing.char = NA,
  regression.file = NA,
  regression.col.first = NA,
  regression.col.last = NA,
  reg.method = "linear",
  plot.as.pdf = NA,
  no.plot = NA,
  silence.mode = FALSE,
  max.thread = 0,
  seed = NULL
)

Arguments

files

ipcaps supports SNPs encoded as 0, 1 and 2 (dosage encoding). Rows represent SNPs and columns represent subjects. Each column needs to be separated by a space or a tab. A big text file should be divided into smaller files to load faster. For instance, to input 3 files, use as: files=c( 'input1.txt', 'input2.txt', 'input3.txt').

label.file

An additional useful information (called 'labels' in ipcaps) related subject, for example, geographic location or disease phenotype. These labels (one at a time) are used in displaying the clustering outcome of ipcaps. A label file must contain at least one column. However, it may contain more than one column in which case each column need to be separated by a space or a tab.

lab.col

The label in the label file to be used in the tree-like display of ipcaps clustering results.

rdata.infile

In case of re-analysis, it is convenient to run ipcaps using the file rawdata.RData generated by ipcaps. This file contains a matrix of SNPs (raw.data) and a vector of labels (label).

bed.infile

A PLINK binary format consists of 3 files; bed, bim, and fam. To generate these files from PLINK, use option –make-bed. See more details at: http://zzz.bwh.harvard.edu/plink/data.shtml.

cate.list

(Unimplemented) A list of categorical input file (text). For instance, to input 3 files, use as: files=c('input1.txt', 'input2.txt', 'input3.txt').

result.dir

To set an absolute path for ipcaps output. If the specified output directory already exists, result files are saved in sub-directories cluster_out, cluster_out1, cluster_out2, etc.

threshold

Cutoff value for EigenFit.

min.fst

Minimum Fst between a pair of subgroups.

reanalysis

(Unimplemented) To specify whether it is re-analysis or not. If TRUE, it is re-analysis, otherwise it is not. Default = FALSE.

method

The internal clustering method. It can be set to 'mix' (rubikclust & mixmod), 'mixmod' (Lebret et al., 2015), 'clara' (R: Clustering Large Applications), 'pam' (R: Partitioning Around Medoids (PAM) Object), 'meanshift' (Wang, 2016), 'apcluster' (Bodenhofer et al., 2016), and 'hclust' (R: Hierarchical Clustering). Default = 'mix'.

min.in.group

Minimum number of individuals to constitute a cluster or subgroup.

datatype

To specify whether the input data are 'snp' or 'linear'. Defalut = 'snp'.

nonlinear

(Unimplemented) To specify whether linear or non-linear method is used for ipcaps analysis. If TRUE, non-linear method is used, otherwise linear method is used. Default = FALSE.

missing.char

Symbol used for missing genotypes. Default = NA.

regression.file

A file of covariates; one covariate per column. SNPs can be adjusted for these covariates via regression modeling and residual computation.

regression.col.first

Refer to a covariate file, the first covariate to be considered as confounding variable.

regression.col.last

Refer to a covariate file, the last covariate to be considered as confounding variable. All the variables in between the cov.col.first and cov.col.last will be considered in the adjustment process.

reg.method

(Fixed) Specify a method for regression analysis. Default = 'linear'.

plot.as.pdf

To export plots as PDF. When omitted, plots are saved as PNG.

no.plot

No plot is generated if this option is TRUE. This option is useful when the system does not support X Windows in the unix based system. Default = FALSE.

silence.mode

To enable or disable silence mode. If silence mode is enabled, the fuction is processed without printing any message on the screen, and it is slightly faster. Default = TRUE.

max.thread

To specify a number of threads in order to run an analysis in parallel. If max.thread is specified more than the maximum number of CPU cores, then the maximum number of CPU cores are used instead. If max.thread is specified as floating point number, it will be rounded up using the function round(). Default = 0, which the maximum number of CPU cores are used.

seed

To specify a seed number for random generator. Default = NA, which means that a seed number is automatically chose.

Value

A data frame of input data.


kridsadakorn/ipcaps.bioc documentation built on Jan. 22, 2020, 11:18 p.m.