export-funs: Generate and manipulate tables and sub-networks from an...

Description Usage Arguments Details Value References Examples

Description

In general, generateResultsTable, generateEnzymesTable and generateResultsGraph provide the results of an enrichment in several formats.

Function generateResultsTable returns a table that contains the best hits from a FELLA.USER object with a successful enrichment analysis. Similarly, generateEnzymesTable returns a data frame with the best scoring enzyme families and their annotated genes.

Function generateResultsGraph gives a sub-network, plottable through plotGraph, witht the nodes with the lowest p.score from an enrichment analysis. Function addGOToGraph can be applied to such sub-networks to overlay GO labels and similarity to a user-defined GO term.

Function exportResults is a wrapper around generateResultsTable, generateEnzymesTable and generateResultsGraph to write the results to files.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
generateResultsTable(method = "diffusion", threshold = 0.05,
    plimit = 15, nlimit = 250, LabelLengthAtPlot = 45,
    capPscores = 1e-06, object = NULL, data = NULL, ...)

generateEnzymesTable(method = "diffusion", threshold = 0.05,
    nlimit = 250, LabelLengthAtPlot = 45, capPscores = 1e-06,
    mart.options = list(biomart = "ensembl", dataset =
    "hsapiens_gene_ensembl"), object = NULL, data = NULL, ...)

generateResultsGraph(method = "diffusion", threshold = 0.05,
    plimit = 15, nlimit = 250, thresholdConnectedComponent = 0.05,
    LabelLengthAtPlot = 22, object = NULL, data = NULL, ...)

exportResults(format = "csv", file = "myOutput",
    method = "diffusion", object = NULL, data = NULL, ...)

addGOToGraph(graph = NULL, GOterm = NULL, godata.options = list(OrgDb
    = "org.Hs.eg.db", ont = "CC"), mart.options = list(biomart = "ensembl",
    dataset = "hsapiens_gene_ensembl"))

plotGraph(graph = NULL, layout = FALSE, graph.layout = NULL,
    plotLegend = TRUE, plot.fun = "plot.igraph", NamesAsLabels = TRUE,
    ...)

Arguments

method

one in "diffusion", "pagerank"

threshold

Numeric value between 0 and 1. p.score threshold applied when filtering KEGG nodes. Lower thresholds are more stringent.

plimit

Pathway limit, must be a numeric value between 1 and 50. Limits the amount of pathways in method = "hypergeom"

nlimit

Node limit, must be a numeric value between 1 and 1000. Limits the order of the solution sub-graph when in method = "diffusion" and method = "pagerank"

LabelLengthAtPlot

Numeric value between 10 and 50. Maximum length that a label can reach when plotting the graph. The remaining characters will be truncated using "..."

capPscores

Numeric value, minimum p-score admitted for the readable formatting. Smaller p-scores will be displayed as < capPscores

object

FELLA.USER object

data

FELLA.DATA object

...

Optional arguments for the plotting function in plotGraph. Arguments passed to the exporting function in exportResults. Ignored otherwise.

mart.options

List, options for the biomaRt function getBM. Importantly, this defines the organism, see listDatasets for possibilities. If calling generateEnzymesTable, the user can set mart.options = NULL to avoid adding GO labels to enzymes.

thresholdConnectedComponent

Numeric value between 0 and 1. Connected components that are below the threshold are kept, while the ones exceeding it (because they are too small) are discarded.

format

Character, one of: "csv" for regular results table, "enzyme" for table with enzyme data, "igraph" for igraph format. Alternatively, any format supported by igraph, see write_graph

file

Character specifying the output file name

graph

An igraph object, typically a small one, coming from an enrichment through "diffusion" or "pagerank".

GOterm

Character, GO entry to draw semantic similarity in the solution graph. If NULL, the GO labels will be appended without similarities.

godata.options

List, options for the database creator godata

layout

Logical, should the plot be returned as a layout?

graph.layout

Two-column numeric matrix, if this argument is not null then it is used as graph layout

plotLegend

Logical, should the legend be plotted as well?

plot.fun

Character, can be either plot.igraph or tkplot

NamesAsLabels

Logical, should KEGG names be displayed as labels instead of KEGG identifiers?

Details

Functions generateResultsTable and generateEnzymesTable need a FELLA.DATA object and a FELLA.USER object with a successful enrichment. generateResultsTable provides the entries whose p-score is below the chosen threshold in a tabular format. generateEnzymesTable returns a table that contains (1) the enzymes that are below the user-defined p-score threshold, along with (2) the genes that belong to the enzymatic families in the organism defined in the database, and (3) GO labels of such enzymes, if mart.options is not NULL and points to the right database.

Function generateResultsGraph returns an igraph object with a relevant sub-network for manual examination. A FELLA.USER object with a successful enrichment analysis and the corresponding FELLA.DATA must be supplied. Graph nodes are prioritised by p.score and selected through the most stringent between (1) p.score threshold and (2) maximum number of nodes nlimit.

There is an additional filtering feature for tiny connected components, controllable through thresholdConnectedComponent (smaller is stricter). The user can choose to turn off this filter by setting thresholdConnectedComponent = 1. The idea is to discard connected components so small that are likely to arise from random selection of nodes. Let k be the order of the current sub-network. A connected component of order r will be kept only if the probability that a random subgraph from the whole KEGG knowledge model of order k contains a connected component of order at least r is smaller than thresholdConnectedComponent. Such probabilities are estimated during buildDataFromGraph; the amount of random trials can be controlled by its niter argument.

Function exportResults writes the enrichment results as the specified filetype. Options are: a csv table ("csv"), an enzyme csv table ("enzyme") an igraph object as an RData file, or any format supported by igraph's write_graph.

Function addGOToGraph takes and returns a graph object with class igraph adding the following attributes: GO labels in V(graph)$GO, and semantic similarities in V(graph)$GO.simil if GOterm != NULL.

The GO database describes genes in terms of three ontologies: molecular function (MF), biological process (BP) and cellular component (CC) [Gene Ontology Consortium, 2015]. The user can be interested in finding which enzymatic families reported with a low p.score are closest to a particular GO term. To assess similarity between GO labels, FELLA uses the semantic similarity defined in [Yu, 2010] and their implementation in the GOSemSim R package. The user will obtain, for each enzymatic family, the closest GO term to his or her GO query and the semantic similarity between them. Exact matches have a similarity of 1. Function plotGraph detects the presence of the GO similarity option and plots its magnitude.

Function plotGraph plots a solution graph from the diffusion and pagerank analysis. For plotting hypergeom results, please use plot instead. Specific colors and shapes for each KEGG category are used: pathways are maroon, modules are violet, enzymes are orange, reactions are blue and compounds are green. If the graph contains the similarity to a GO term, enzymes will be displayed as triangles whose color depicts the strength of such measure (yellow: weak, purple: strong). At the moment, plotGraph allows plotting throug the static plot.igraph and the interactive tkplot.

Value

generateResultsTable returns a data.frame that contains the nodes below the p.score threshold from an enrichment analysis

generateEnzymesTable returns a data.frame that contains the enzymes below the p.score threshold, along with their genes and GO labels

generateResultsGraph returns an igraph object: a sub-network from the whole KEGG knowledge model under the specified thresholds (threshold and thresholdConnectedComponent)

exportResults returns invisible(), but as a side effect the specified file is created.

addGOToGraph returns an igraph object, which is the input graph with extra attributes: GO labels in V(graph)$GO, and semantic similarities in V(graph)$GO.simil if GOterm != NULL

plotGraph returns invisible() if layout = F and the plotting layout as a data.frame otherwise.

References

Gene Ontology Consortium. (2015). Gene ontology consortium: going forward. Nucleic acids research, 43(D1), D1049-D1056.

Yu, G., Li, F., Qin, Y., Bo, X., Wu, Y., & Wang, S. (2010). GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics, 26(7), 976-978.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
## First generate a toy enrichment
library(igraph)
data(FELLA.sample)
data(input.sample)
## Enrich input
obj <- enrich(
compounds = input.sample, 
data = FELLA.sample)

######################
## Results table
tab.res <- generateResultsTable(
method = "hypergeom",
threshold = 0.1, 
object = obj, 
data = FELLA.sample)
head(tab.res)

tab.res <- generateResultsTable(
method = "diffusion",
threshold = 0.1, 
object = obj, 
data = FELLA.sample)
head(tab.res)

######################
## Use wrapper to write the table to a file
out.file <- tempfile()
exportResults(
format = "csv", 
threshold = 0.1, 
file = out.file, 
object = obj, 
data = FELLA.sample)
tab.wrap <- read.csv(out.file)
head(tab.wrap)

######################
## Enzymes table
tab.ec <- generateEnzymesTable(
threshold = 0.1, 
object = obj, 
data = FELLA.sample, 
mart.options = NULL)
head(tab.ec)

######################
## Generate graph
g.res <- generateResultsGraph(
method = "pagerank", 
threshold = 0.1, 
object = obj, 
data = FELLA.sample)
g.res

## Plot graph (without GO terms)
plotGraph(g.res)

## Add similarity to the GO CC term "mitochondrion"
## Not run: 
g.cc <- FELLA:::addGOToGraph(
graph = g.res, 
GOterm = "GO:0005739")

## Plot graph (with GO terms)
plotGraph(g.cc)

## Without the CC
any(V(g.res)$GO.simil >= 0)
## With the CC
v.cc <- unlist(V(g.cc)$GO.simil)
sum(v.cc >= 0, na.rm = TRUE)
## Similarity values
table(v.cc)

## End(Not run)

b2slab/FELLA documentation built on March 3, 2021, 2:22 p.m.