CAMPrep: Data preprocessing for CAM
In Lululuella/CAMTHC: Convex Analysis of Mixtures for Tissue Heterogeneity Characterization

Description Usage Arguments Details Value Examples

This function perform preprocessing for CAM, including norm-based filtering, dimension deduction, perspective projection, local outlier removal and aggregation of gene expression vectors by clustering.

CAMPrep(data, dim.rdc = 10, thres.low = 0.05, thres.high = 0.95,
  cluster.method = c("K-Means", "apcluster"), cluster.num = 50,
  MG.num.thres = 20, lof.thres = 0.02, quick.select = NULL,
  sample.weight = NULL, generalNMF = FALSE)

`data`	Matrix of mixture expression profiles. Data frame, SummarizedExperiment or ExpressionSet object will be internally coerced into a matrix. Each row is a gene and each column is a sample. Data should be in non-log linear space with non-negative numerical values (i.e. >= 0). Missing values are not supported. All-zero rows will be removed internally.
`dim.rdc`	Reduced data dimension; should be not less than maximum candidate K.
`thres.low`	The lower bound of percentage of genes to keep for CAM with ranked norm. The value should be between 0 and 1. The default is 0.05.
`thres.high`	The higher bound of percentage of genes to keep for CAM with ranked norm. The value should be between 0 and 1. The default is 0.95.
`cluster.method`	The method to do clustering. The default "K-Means" will use `kmeans` function. The alternative "apcluster" will use `apclusterK-methods`.
`cluster.num`	The number of clusters; should be much larger than K. The default is 50.
`MG.num.thres`	The clusters with the gene number smaller than MG.num.thres will be treated as outliers. The default is 20.
`lof.thres`	Remove local outlier using `lofactor` function. MG.num.thres is used as the number of neighbors in the calculation of the local outlier factors. The default value 0.02 will remove top 2% local outliers. Zero value will disable lof.
`quick.select`	The number of candidate corners kept after quickhull and SFFS greedy search. If Null, only quickhull is applied. The default is 20. If this value is larger than the number of candidate corners after quickhull, greedy search will also not be applied.
`sample.weight`	Vector of sample weights. If NULL, all samples have the same weights. The length should be the same as sample numbers. All values should be positive.
`generalNMF`	If TRUE, the decomposed proportion matrix has no sum-to-one constraint for each row. Without assuming samples are normalized, the first principal component will not forced to be along c(1,1,..,1) but a standard PCA will be applied during preprocessing.

This function is used internally by CAM function to preprocess data, or used when you want to perform CAM step by step.

Low/high-expressed genes are filtered by their L2-norm ranks. Dimension reduction is slightly different from PCA. The first loading vector is forced to be c(1,1,...,1) with unit norm normalization. The remaining are eigenvectors from PCA in the space orthogonal to the first vector. Perspective projection is to project dimension-reduced gene expression vectors to the hyperplane orthogonal to c(1,0,...,0), i.e., the first axis in the new coordinate system. local outlier removal is optional to exclude outliers in simplex formed after perspective projection. Finally, gene expression vectors are aggregated by clustering to further reduce the impact of noise/outlier and help improve the efficiency of simplex corner detection.

An object of class "CAMPrepObj" containing the following components:

`Valid`	logical vector to indicate the genes left after filtering.
`Xprep`	Preprocessed data matrix.
`Xproj`	Preprocessed data matrix after perspective projection.
`W`	The matrix whose rows are loading vectors.
`SW`	Sample weights.
`cluster`	cluster results including two vectors. The first indicates the cluster to which each gene is allocated. The second is the number of genes in each cluster.
`c.outlier`	The clusters with the gene number smaller than MG.num.thres.
`centers`	The centers of candidate corner clusters (candidate clusters containing marker genes).

#obtain data
data(ratMix3)
data <- ratMix3$X

#set seed to generate reproducible results
set.seed(111)

#preprocess data
rPrep <- CAMPrep(data, dim.rdc = 3, thres.low = 0.30, thres.high = 0.95)

Lululuella/CAMTHC documentation built on May 5, 2019, 2:39 a.m.

Lululuella/CAMTHC index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Lululuella/CAMTHC
Convex Analysis of Mixtures for Tissue Heterogeneity Characterization

CAMPrep: Data preprocessing for CAM
In Lululuella/CAMTHC: Convex Analysis of Mixtures for Tissue Heterogeneity Characterization

Description

Usage

Arguments

Details

Value

Examples

Related to CAMPrep in Lululuella/CAMTHC...

R Package Documentation

Browse R Packages

We want your feedback!

Lululuella/CAMTHC Convex Analysis of Mixtures for Tissue Heterogeneity Characterization

CAMPrep: Data preprocessing for CAM In Lululuella/CAMTHC: Convex Analysis of Mixtures for Tissue Heterogeneity Characterization

Description

Usage

Arguments

Details

Value

Examples

Related to CAMPrep in Lululuella/CAMTHC...

R Package Documentation

Browse R Packages

We want your feedback!

Lululuella/CAMTHC
Convex Analysis of Mixtures for Tissue Heterogeneity Characterization

CAMPrep: Data preprocessing for CAM
In Lululuella/CAMTHC: Convex Analysis of Mixtures for Tissue Heterogeneity Characterization