TFactS is to predict which are the transcription factors (TFs), regulated in a biological condition based on lists of differentially expressed genes (DEGs) obtained from transcriptome experiments. This package is based on the TFactS concept and expands it. It allows users to performe TFactS-like enrichment approach. The package can import and use the original catalogue file from the TFactS website as well as users' defined catalogues of interest that are not supported by TFactS (e.g., Arabidopsis).
This vignette is largely based on the TFactS manual. For the details about TFactS, please also see the original paper by Essaghir et al. (2010).
Briefly, the current package assumes the Sign-Less catalogue, i.e. it does not contain any regulation type information (up- or down-regulation). TFactSR compares the list of query DEGs (up and/or down) with a catalogue of target gene signatures. The core algorithm is based on Fisher's exact test using a contingency table as follows:
TF | DEGs: Present | DEGs: Absent | Total ------------------ | --------------- | --------------- | ----- Catalogue: Present | k | m - k | m Catalogue: Absent | n - k | N + k - n - m | N - m Total | n | N - n | N
$$ Pval = \left( \begin{array}{c} m \ i \end{array} \right) \left( \begin{array}{c} N-m \ n-i \end{array} \right) / \left( \begin{array}{c} N \ n \end{array} \right) $$
E-value is the number of tests done ($T$) times the p-value.
$Eval = pval \times T$
Benjamini and Hochberg false discovery rate (FDR) controlling method: this is based on Benjamini and Hochberg (1995) and is calculated using p.adjust() function. Note that the current TFactSR package does not use Q-value (Storey 2003) under default settings.
RC is the percentage of which a TF is called significant under a certain E-value threshold after a random simulation of user lists in specified number of repetitions:
$$ RD_{(TF)} = \frac{#\left{ Eval(TF) \leq \lambda \right} \times 100} { #\left{rep\right} } $$
The TFactSR package requires (1) a list of DEGs and (2) a catalogue of interest. For Arabidopsis, we prepared the catalogue based on AtRegNet and ATRM. For human data, the package can do the calculation using default settings.
The Supported organisms by the original TFactS are human, rat and mouse genes. As you can see below, you can perform an enrichment analysis which TFs are regulated if you have a list of DEGs and your catalogue.
For human/rat/mouse data, we can do the TFactS analysis as follows.
```{R the original TFactS} library(TFactSR) data(DEGs) data(catalog)
tftg <- extractTFTG(DEGs, catalog) TFs <- tftg$TFs all.targets <- tftg$all.targets
res <- calculateTFactS(DEGs, catalog, TFs, all.targets) head(res)
Using the option "TF.col" and "TF.col", we can specify the target column of your catalogue dataset. Carefully you have to choose the TF-target relationships as follows. ```{R Arabidopsis} data(AtCatalog) data(GenesUp_SH1H) d <- extractTFTG(GenesUp_SH1H, AtCatalog, TF.col = "TF", TG.col = "target.genes") res <- calculateTFactS(GenesUp_SH1H, AtCatalog, d$TFs, d$all.targets, TF.col = "TF") head(res)
We thank the Bio"Pack"thon community for helpful discussions. This work was supported by JSPS KAKENHI Grant Numbers 26850024 and 17K07663.
Here is the output of sessionInfo()
on the system on which this
document was compiled:
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.