rarcat: Robustness Assessment of Regressions using Cluster Analysis...

View source: R/rarcat.R

rarcatR Documentation

Robustness Assessment of Regressions using Cluster Analysis Typologies (RARCAT)

Description

rarcat is a wrapper for the functions regressboot and rarcat that performs the entire RARCAT procedure on all possible associations between a typology and covariates of interest. See Roth et al. (2024) or the R tutorial as WeightedCluster vignette for all details on the corresponding methods and their utility.

Usage

rarcat(diss, covar, df, 
        clustering=NULL, robust=TRUE, B=500, count=FALSE, 
        algo="pam", method="ward.D", 
        fixed=FALSE, ncluster=10, eval="CH",
        parallel="no", ncpus=1, cl=NULL,
        transformation=FALSE, conflevel=0.05, digits=3)

Arguments

diss

The numerical dissimilarity matrix used for clustering. Only a pre-computed matrix (i.e., where pairwise dissimilarities do not depend on the resample) is currently supported.

covar

A character vector containing the names of the covariates whose association with the clustering is studied. The formula object is then created inside the function based on this.

df

The original dataset (data frame) containing the covariates of interest. Row number should be equal to the length of the clustering argument and column names should match the information in covar.

clustering

Optional. An integer vector containing the clustering solution (one entry for each individual) from the original analysis. If not given (default), it is computed based on the other information inside the function.

robust

Logical. TRUE (the default) indicates that RARCAT should be performed. FALSE implies a much faster function run but only output the original analysis, which is a standard regression analysis for all combinations of reference clusters and covariates.

B

The integer number of bootstrap. Set to 500 by default to attain a satisfactory precision around the estimates as the procedure involves multiple steps.

count

Logical. Whether the bootstrap runs are counted on the screen or not.

algo

The clustering algorithm as a character string. Currently only "pam" (calling the function wcKMedRange) and "hierarchical" (calling the function fastcluster::hclust) are supported. By default "pam".

method

A character string with the method argument of hclust, "ward.D" by default.

fixed

Logical. TRUE implies that the number of clusters is the same in every bootstrap. FALSE (default) implies that an optimal number of clusters is evaluated each time.

ncluster

Integer. Either the number of clusters in every bootstrap if fixed is TRUE or the maximum number of clusters (starting from 2) to be evaluated in each bootstrap if fixed is FALSE.

eval

A character string with the cluster quality index to be evaluated for each new partition. Any column of as.clustrange is supported, "CH" (the Calinski-Harabasz index) by default. Also works with algo= "pam".

parallel

A character string with the type of parallel operation to be used (if any) by the function boot:boot. Options are "no" (default), "multicore" and "snow" (for Windows).

ncpus

Integer. Number of processes to be used in case of parallel operation. Typically, one would chose this to be the number of available CPUs.

cl

A parallel cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the boot call.

transformation

Logical. TRUE means that a Fisher transformation is applied in the rarcat function. This can be recommended in case of extreme associations (close to the -1 or 1 boundaries). FALSE by default.

conflevel

Confidence level for the confidence intervals from the original analysis and the prediction intervals from the robustness assessment. 0.05 by default.

digits

Controls the number of significant digits to print. 3 by default.

Details

The rarcat function runs a standard typology-based association study and evaluates the impact of sampling uncertainty on the results, thus assessing the reproducibility of the analysis.

Value

The output of rarcattables contains the following tables:

original.analysis

Average Marginal Effects (AMEs) estimated with multivariable logistic regressions and representing the expected change in the probability of belonging to a trajectory group (a reference cluster) for a change in the level of a variable (a covariate of interest), together with their confidence intervals.

robust.analysis

Pooled AMEs from the bootstrap procedure and their prediction intervals, representing the range of expected values if the clustering and associated regressions were performed on a new sample from the same underlying distribution. This table provide robust estimates for a typology-based association study.

Author(s)

Leonard Roth

References

Roth, L., Studer, M., Zuercher, E., & Peytremann-Bridevaux, I. (2024). Robustness assessment of regressions using cluster analysis typologies: a bootstrap procedure with application in state sequence analysis. BMC medical research methodology, 24(1), 303. https://doi.org/10.1186/s12874-024-02435-8.

See Also

regressboot, unirarcat

Examples


## Set the seed for reproducible results
set.seed(1)

## Loading the data (TraMineR package)
data(mvad)

## Creating the state sequence object
mvad.seq <- seqdef(mvad, 17:86)

## Distance computation
diss <- seqdist(mvad.seq, method="LCS")

## Hierarchical clustering
hc <- fastcluster::hclust(as.dist(diss), method="ward.D")

## Computing cluster quality measures
clustqual <- as.clustrange(hc, diss=diss, ncluster=6)

# A character vector with the names of the covariates of interest (to be related to the typology)
covar <- c("funemp", "gcse5eq")

## As in the original analysis, hierarchical clustering with Ward method is implemented
## The number of clusters is fixed to 6 here
## For illustration purposes, the number of bootstrap is smaller than what it ought to be
rarcatout <- rarcat(diss, covar, mvad, B = 50, 
                    algo = "hierarchical", method = "ward.D", 
                    fixed = TRUE, ncluster = 6)

## Assess the robustness of the original analysis
rarcatout$original.analysis
rarcatout$robust.analysis

WeightedCluster documentation built on April 24, 2025, 3:01 a.m.