GIGSEAdata is the gene set collection used for GIGSEA (Genotype Imputed Gene Set Enrichment Analysis), which is a novel SNP enrichment method that uses GWAS-and-eQTL-imputed differential gene expression to interrogate gene set enrichment for the trait-associated SNPs. The gene sets are saved as matrices. Such matrices are largely sparse, so, in order to save space, we used the functions provided by the R package "Matrix" to build the sparse matrices and saved into the GIGSEAdata package.
GIGSEA is built on the weighted linear regression model, so it permits both discrete-valued and continuous-valued gene sets. In the GIGSEA package, we already included four categories of gene sets: "MSigDB.KEGG.Pathway", "MSigDB.miRNA", "MSigDB.TF", and "TargetScan.miRNA". Here, we added two more categories in the GIGSEAdata package:
1) discrete-valued gene sets:
- org.Hs.eg.GO
: Gene sets that contain genes annotated by the same Gene
Ontology (GO) term. For each GO term, we not only incorporate its own gene
sets, but also incorporate the gene sets belonging to its offsprings. See the
database "org.Hs.eg.GO.db" and "GO.db" in R.
2) continuous-valued gene sets:
- Fantom5.TF
: The human transcript promoter locations were obtained from
Fantom5. Based on the promoter locations, the tool MotEvo was used to predict
the human transcriptional factor (TF) target sites. The dataset contains 500
Positional Weight Matrices (PWM) and 21964 genes. For each PWM, there is a list
of associated human TFs, ordered by percent identity of TFs known to bind sites
of the PWM. The list of associations was checked manually. The entire set of
PWMs and mapping to associated TFs is available from the SwissRegulon website
http://www.swissregulon.unibas.ch.
- TargetScan.miRNA
: Gene sets of predicted human miRNA target sites were
downloaded from TargetScan. TargetScan groups miRNAs that have identical
subsequences at positions 2 through 8 of the miRNA, i.e. the 2-7 seed region
plus the 8th nucleotide, and provides predictions for each such seed motif.
TargetScan covers 87 human miRNA seed motifs in total. It provides a score for
each seed motif and each RefSeq transcript, called preferential conservation
scoring (aggregate Pct), which shows consistently high performance in various
benchmark tests. To obtain a site count associated with each gene, we average
the TargetScan Pct scores of all RefSeq transcripts associated with each gene.
It comprises 87 miRNA seed motifs and 9861 genes.
See http://www.targetscan.org.
We first take as an example of the gene set "org.Hs.eg.GO"", where the row represents the gene, and the column represents the GO term. Each entry takes discrete values of 0 or 1, where 1 represents the gene (row) belongs to the GO term (column), and otherwise, not.
library(GIGSEAdata) data(org.Hs.eg.GO) class(org.Hs.eg.GO) names(org.Hs.eg.GO) dim(org.Hs.eg.GO$net) head(colnames(org.Hs.eg.GO$net)) head(rownames(org.Hs.eg.GO$net)) head(org.Hs.eg.GO$annot) head(org.Hs.eg.GO$net[,1:30])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.