Introduction of uniConSig and CSEA


UniConSig

UniConSig is an algorithm to quantify the functional associations between a gene set and a single gene. The "functional associations" refers to the Domains, Pathways, GOs, Interactoms etc., which are the sources of the pre-compiled molecular concept database. The input is a set of genes which were grouped by a certain criteria, i.e. cancer genes, or genes down-regulated after knock down of a certain gene, or genes included in a certain pathway. The algorithm will give a score to each gene in the genome according to the similarities between the input gene set and the associated concepts. The scores are between 0 and 1, and the higher of the scores, the more related the genes to the input gene set.

This algorithm is distinct from common learning algorithms in that it does not select features (each concept can be treated as a feature). Instead, it penalizes the concepts associated with a gene according to the similarities between these concepts. After the penalization, similar concepts will contribute to the final scores as if they were only one concepts(i.e. two or more identical concepts will be treated as only one concept). The final scores are taken by the average of the Jaccard Index between the input gene set and all the penalized concepts.

This R package uses a pre-calculated result dataset. The pre-calculation of the result avoids the majority of the unnecessary calculations and dramatically improves the efficiency. Because the dataset is relatively big, it was split into 4 parts. Users can reconstruct the total pre-calculated results following the examples in this vignette.

CSEA

CSEA stands for "Concept Signature Enrichment Analysis". It's basically an extension of the well-established pathway analysis algorithm "GSEA" (Gene Set Enrichment Analysis). In contrast to GSEA, CSEA uses the output of uniConSig as the weight for each gene in genome and then perform weighted random walk k-s test for each of the testing pathway gene set. Specifically, before the weighted random walk k-s test, each gene is given a score based upon the functional association of this gene with the top 50 differentially expressed genes. The advantage of CSEA is that uniConSig can detect the underlying associations of a single gene to the differentially expressed gene set, which are not always directly included in the core pathway that responds to the change of biological circumstance.

Installation


To install this package, start R and enter:

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("uniConSigPreCal")

Examples


Calculating uniConSig scores

Attaching the package, download the pre-calculation file, and loading the pre-compiled data. The pre-calculation file is stored on github.

library("uniConSigPreCal")
preCal.local<-get_data_uniConSigPreCal()
load(preCal.local)

This will create a new directory under the current working directory.

Load the cancer gene list generated by Cancer Gene Census:

data(trList.cgc) #trList.cgc is a vector containing entrez cancer gene ids generated by Cancer Gene Census

For customized training gene set, put the entrez gene ids in a vector

trList.my<-1:100

And then run cal_uniConSig:

result.cgc<-cal_uniConSig(trList.cgc,preCal=preCal.data.all)
result.my<-cal_uniConSig(trList.my,preCal=preCal.data.all) 

CSEA calculation

To calculate CSEA based on differentially expressed genes, first calculate uniConSig scores based upon the top DE genes(top 50 genes usually performs better). Suppose trList.DEG is a simple vector which contains 50 entrez gene ids of top DE genes:

trList.DEG<-1:50 #user's gene list should be generated based on differentially expressed genes

Load the pathway gene sets data, for example the MSigDB c2cp gene sets:

data(pathway.c2cp) #This package also includes a compiled list of hallmark gene sets from MSigDB

And calculate uniConSig then CSEA:

result.uniConSig.DEG<-cal_uniConSig(trList.DEG,preCal=preCal.data.all)
result.csea.DEG.c2cp<-CSEA(result.uniConSig.DEG,pathway.c2cp) # CSEA calculation
head(result.csea.DEG.c2cp)

The results will be in the table "result.csea.DEG.c2cp". For other use of CSEA, simply specify the input training gene list of interest.

Sessioninfo

sessionInfo()


wangxlab/uniConSigPreCal documentation built on May 23, 2019, 9:31 a.m.