dev/benchmark/README.md

Time and memory profiling of cTRAP

Nuno Agostinho, 27 November 2020

cTRAP is a multi-threaded R package composed of three modules. These scripts test the critical module of ranking user-provided differential expression results against differential expression results from CMap perturbations.

They also benchmark the prediction of targeting drugs (using the NCI60 gene expression and drug sensitivity association, the most time-consuming option) and drug set enrichment analysis.

cTRAP performance milestones (dev versions)

General instructions

Ranking CMap perturbations

Input

CMap perturbation data loading

CMap perturbation data is first filtered according to available variables (cell lines, timepoints, drug dosage, perturbation types). Only the data matching the user criteria is loaded into memory.

CMap perturbation types tested: - knockdown: consensus signature from shRNAs targeting the same gene - overexpression: cDNA for overexpression of wild-type gene - compound

Given that the CMap perturbation data is too big for usually available RAM, there are two options of loading CMap perturbation data: - On-demand (default): load ~1GB chunks of filtered z-scores while comparing data - Pre-load: load all filtered z-scores into memory before comparing data

Similarity ranking

CMap data is ranked against user-provided differential expression results. The less similar the data, the higher the final rank value. Similarity is measured using: - Spearman's correlation coefficient - Pearson's correlation coefficient - GSEA-based score (weighted connectivity score as described in CMap original article)

The values of these scores are ranked. The ranks themselves are then summarised via the rank product's rank (i.e. the final rank).



nuno-agostinho/cTRAP documentation built on Jan. 2, 2025, 12:11 a.m.