rankSimilarPerturbations: Rank differential expression profile against CMap...
In nuno-agostinho/cTRAP: Identification of candidate causal perturbations from differential gene expression data

rankSimilarPerturbations

R Documentation

Rank differential expression profile against CMap perturbations by similarity

Description

Compare differential expression results against CMap perturbations.

Usage

rankSimilarPerturbations(
  input,
  perturbations,
  method = c("spearman", "pearson", "gsea"),
  geneSize = 150,
  cellLineMean = "auto",
  rankPerCellLine = FALSE,
  threads = 1,
  chunkGiB = 1,
  verbose = FALSE
)

Arguments

`input`	`Named numeric vector` of differentially expressed genes whose names are gene identifiers and respective values are a statistic that represents significance and magnitude of differentially expressed genes (e.g. t-statistics); or `character` of gene symbols composing a gene set that is tested for enrichment in reference data (only used if `method` includes `gsea`)
`perturbations`	`perturbationChanges` object: CMap perturbations (check `prepareCMapPerturbations()`)
`method`	Character: comparison method (`spearman`, `pearson` or `gsea`; multiple methods may be selected at once)
`geneSize`	Numeric: number of top up-/down-regulated genes to use as gene sets to test for enrichment in reference data; if a 2-length numeric vector, the first index is the number of top up-regulated genes and the second index is the number of down-regulated genes used to create gene sets; only used if `method` includes `gsea` and if `input` is not a gene set
`cellLineMean`	Boolean: add rows with the mean of `method` across cell lines? If `cellLineMean = "auto"` (default), rows will be added when data for more than one cell line is available.
`rankPerCellLine`	Boolean: rank results based on both individual cell lines and mean scores across cell lines (`TRUE`) or based on mean scores alone (`FALSE`)? If `cellLineMean = FALSE`, individual cell line conditions are always ranked.
`threads`	Integer: number of parallel threads
`chunkGiB`	Numeric: if second argument is a path to an HDF5 file (`.h5` extension), that file is loaded and processed in chunks of a given size in gibibytes (GiB); lower values decrease peak RAM usage (see details below)
`verbose`	Boolean: print additional details?

Value

Data table with correlation and/or GSEA score results

Process data by chunks

If a file path to a valid HDF5 (.h5) file is provided instead of a data matrix, that file can be loaded and processed in chunks of size chunkGiB, resulting in decreased peak memory usage.

The default value of 1 GiB (1 GiB = 1024^3 bytes) allows loading chunks of ~10000 columns and 14000 rows (10000 * 14000 * 8 bytes / 1024^3 = 1.04 GiB).

GSEA score

When method = "gsea", weighted connectivity scores (WTCS) are calculated (https://clue.io/connectopedia/cmap_algorithms).

Examples

# Example of a differential expression profile
data("diffExprStat")

## Not run: 
# Download and load CMap perturbations to compare with
cellLine <- c("HepG2", "HUH7")
cmapMetadataCompounds <- filterCMapMetadata(
    "cmapMetadata.txt", cellLine=cellLine, timepoint="24 h",
    dosage="5 \u00B5M", perturbationType="Compound")

cmapPerturbationsCompounds <- prepareCMapPerturbations(
    cmapMetadataCompounds, "cmapZscores.gctx", "cmapGeneInfo.txt",
    "cmapCompoundInfo_drugs.txt", loadZscores=TRUE)

## End(Not run)
perturbations <- cmapPerturbationsCompounds

# Rank similar CMap perturbations (by default, Spearman's and Pearson's
# correlation are used, as well as GSEA with the top and bottom 150 genes of
# the differential expression profile used as reference)
rankSimilarPerturbations(diffExprStat, perturbations)

# Rank similar CMap perturbations using only Spearman's correlation
rankSimilarPerturbations(diffExprStat, perturbations, method="spearman")

nuno-agostinho/cTRAP documentation built on Jan. 2, 2025, 12:11 a.m.