estimateProportionsCP: estimateProportionsCP
In RnBeads: RnBeads

Description Usage Arguments Details Value Note Author(s) References

Estimates cell type proportions using the constrained projection method from Houseman et al. [1]

estimateProportionsCP(
  rnb.set,
  cell.type.column,
  n.most.variable = NA,
  n.markers = 500L,
  constrained = TRUE,
  full.output = FALSE
)

`rnb.set`	RnBSet object
`cell.type.column`	integer index or character identifier of a column in the RnBSet object sample annotation table which gives the mapping to reference cell type samples
`n.most.variable`	Singleton integer specifying how many top variable CpGs should be used for marker selection. If this option is set to `NA` or `NULL`, all sites are considered. Please take into account the extended computation time in such a case.
`n.markers`	singleton integer specifying how many CpGs should be used as markers for fitting the projection model
`constrained`	if `TRUE` the returned cell type proportion estimates are non-negative
`full.output`	if `TRUE` not only the estimated proportions but also the intermediate analysis results are returned

This is a minimally customized implementation of the method by Houseman et al. [1] based on the orginial code kindly provided by Andres Houseman. Note that RnBeads does not provide any reference data sets, and the methylomes of purified cell types should be provided by the user as a part of the object supplied via rnb.set. The column specified by cell.type.column should give assignment of each reference methylome replicate to a cell type and missing values for all the target samples. First the marker selection model is fit to estimate association of each CpG with the given reference cell types (first expression in eq. (1) of [1]). The strength of association is expressed as an F-statistic. Since fitting the marker selection model to all CpGs can take a lot of time, one can limit the marker search only to variable CpG positions by setting n.most.variable to non-NA positive integer. The CpGs will be ranked using across-sample variance in the reference data set and n.most.variable will be taken to fit the marker selection model. Coefficients of the fit, together with the F-statistic value for each CpG, are returned in case full.output is TRUE. Thereafter, n.markers are selected as true quantitative markers and the projection model (eq. [2]) is fit to estimate contributions of each cell type. Depending on the value of constrained the returned coefficients can be either raw or enforced to attain values between 0 and 1 with within-sample sum less or equal to 1.

a matrix of estimated cell type contributions (samples times cell types) or a list with results of the intermetidate steps (see details).

Requires the package nlme.

Pavlo Lutsik

1. Houseman, Eugene and Accomando, William and Koestler, Devin and Christensen, Brock and Marsit, Carmen and Nelson, Heather and Wiencke, John and Kelsey, Karl. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 2012, 13:86

RnBeads documentation built on March 3, 2021, 2 a.m.