fusedLasso: Generalized fused lasso to partition cell types by allelic...
In Wancen/airpart: Differential cell-type-specific allelic imbalance

fusedLasso

R Documentation

Generalized fused lasso to partition cell types by allelic imbalance

Description

Fits generalized fused lasso with either binomial(link="logit") or Gaussian likelihood, leveraging functions from the smurf package.

Usage

fusedLasso(
  sce,
  formula,
  model = c("binomial", "gaussian"),
  genecluster,
  niter = 1,
  pen.weights,
  lambda = "cv1se.dev",
  k = 5,
  adj.matrix,
  lambda.length = 25L,
  se.rule.nct = 8,
  se.rule.mult = 0.5,
  ...
)

Arguments

`sce`	A SingleCellExperiment containing assays (`"ratio"`, `"counts"`) and colData `"x"`
`formula`	A `formula` object which will typically involve a fused lasso penalty: default is just using cell-type 'x': `ratio ~ p(x, pen="gflasso")`. Other possibilities would be to use the Graph-Guided Fused Lasso penalty, or add covariates want to be adjusted for, which can include a gene-level baseline 'gene' `ratio ~ p(x, pen = "ggflasso") + gene + batch` See `glmsmurf` for more details
`model`	Either `"binomial"` or `"gaussian"` used to fit the generalized fused lasso
`genecluster`	which gene cluster to run the fused lasso on. Usually one first identifies an interesting gene cluster pattern by `summaryAllelicRatio`
`niter`	number of iterations to run; recommended to run 5 times if allelic ratio differences across cell types are within [0.05, 0.1]
`pen.weights`	argument as described in `glmsmurf`
`lambda`	argument as described in `glmsmurf`. Default lambda is determined by `"cv1se.dev"` (cross-validation within 1 standard error rule(SE); deviance)
`k`	number of cross-validation folds
`adj.matrix`	argument as described in `glmsmurf`
`lambda.length`	argument as described in `glmsmurf`
`se.rule.nct`	the number of cell types to trigger a different SE-based rule than 1 SE (to prioritize larger models, less fusing, good for detecting smaller, e.g. 0.05, allelic ratio differences). When the number of cell types is less than or equal to this value, `se.rule.mult` SE rule is used. When the number of cell types is larger than this value, the standard 1 SE rule is used.
`se.rule.mult`	the multiplier of the SE in determining the lambda: the chosen lambda is within `se.rule.mult` x SE of the minimum deviance. Small values will prioritize larger models, less fusing. Only used when number of cell types is equal to or less than `se.rule.nct`
`...`	additional arguments passed to `glmsmurf`

Details

Usually, we used a Generalized Fused Lasso penalty for the cell states in order to regularize all possible coefficient differences. Another possibility would be to use the Graph-Guided Fused Lasso penalty to only regularize the differences of coefficients of neighboring cell states.

When using a Graph-Guided Fused Lasso penalty, the adjacency matrix corresponding to the graph needs to be provided. The elements of this matrix are zero when two levels are not connected, and one when they are adjacent.

See the package vignette for more details and a complete description of a use case.

Value

A SummarizedExperiment with attached metadata and colData: a matrix grouping factor partition and the penalized parameter lambda are returned in metadata "partition" and "lambda". Partition and logistic group allelic estimates are stored in colData: "part" and "coef".

References

This function leverages the glmsmurf function from the smurf package. For more details see the following manuscript:

Devriendt S, Antonio K, Reynkens T, et al. Sparse regression with multi-type regularized feature modeling[J]. Insurance: Mathematics and Economics, 2021, 96: 248-261.

Examples

library(S4Vectors)
library(smurf)
sce <- makeSimulatedData()
sce <- preprocess(sce)
sce <- geneCluster(sce, G = seq_len(4))
f <- ratio ~ p(x, pen = "gflasso") # formula for the GFL
sce_sub <- fusedLasso(sce,
  formula = f, model = "binomial", genecluster = 1, ncores = 1)
metadata(sce_sub)$partition
metadata(sce_sub)$lambda

# can add covariates or `gene` to the formula
f2 <- ratio ~ p(x, pen = "gflasso") + gene
sce_sub <- fusedLasso(sce[1:5,],
  formula = f2, model = "binomial",
  genecluster = 1, ncores = 1)

# Suppose we have 4 cell states, if we only want neibouring cell states
# to be grouped together with other cell states. Note here the names of
# the cell states should be given as row and column names.
nct <- nlevels(sce$x)
adjmatrix <- makeOffByOneAdjMat(nct)
colnames(adjmatrix) <- rownames(adjmatrix) <- levels(sce$x)
f <- ratio ~ p(x, pen = "ggflasso") # use graph-guided fused lasso
sce_sub <- fusedLasso(sce,
  formula = f, model = "binomial", genecluster = 1,
  lambda = 0.5, ncores = 1,
  adj.matrix = list(x = adjmatrix)
)
metadata(sce_sub)$partition

Wancen/airpart documentation built on March 12, 2023, 11:53 a.m.