```{=html} <!--
add to description URL: https://usethis.r-lib.org, https://github.com/r-lib/usethis BugReports: https://github.com/r-lib/usethis/issues -->
```r library(BiocStyle) library(knitr) knitr::opts_chunk$set(error = FALSE, message = FALSE, warning = FALSE, cache = FALSE) # knitr::opts_chunk$set(fig.asp = 1)
R version: r R.version.string
Bioconductor version: r BiocManager::version()
Package: r packageVersion("txGeneNetwork")
```{=html} <!--
This workflow was motivated by ...
Functional analysis is widely used to ...
-->
# Getting started Install and run all needed packages ```r #txGeneNetwork:::load_dependencies() # suppressPackageStartupMessages({ # library(ggraph) # library(tidygraph) # library(ggforce) # library(readr) # library(tidyr) # library(ggnewscale) # library(concaveman) # }) requireNamespace("ggplot2") requireNamespace("ggforce") requireNamespace("dplyr") requireNamespace("readr") requireNamespace("tidyr") requireNamespace("ggnewscale") requireNamespace("tidygraph") requireNamespace("ggraph") requireNamespace("concaveman")
library(ggplot2) library(dplyr) library(ggforce) library(readr) library(tidyr) library(ggnewscale) library(tidygraph) library(ggraph) library(concaveman) library(txGeneNetwork)
```{=plain}
## Quick Start Your input data must be organized as a *.csv* with **From** and **To** Columns. For this specific graph your from-to order would go Process -\> Gene and in another line Gene -\> Transcript +------------+---------------+ | From | To | +============+===============+ | Process\_1 | Gene\_1 | +------------+---------------+ | Gene\_1 | Transcript\_1 | +------------+---------------+ | Gene\_1 | Transcript\_2 | +------------+---------------+ Other metadata columns that correspond to the **edges** info can be added and imported. For this type of network, we add a Process column, or to which biological process that edge belongs to and a Direction column, to show if that transcript is up or down-regulated in our example data. +------------+---------------+------------+------------+ | From | To | Process | Direction | +============+===============+============+===========:+ | Process\_1 | Gene\_1 | Process\_1 | NA | +------------+---------------+------------+------------+ | Gene\_1 | Transcript\_1 | Process\_1 | Up | +------------+---------------+------------+------------+ | Gene\_1 | Transcript\_2 | Process\_1 | Down | +------------+---------------+------------+------------+ # Plotting `tidygraph` uses a two `tibble` format, one for nodes and one for edges, and displays it as a `tbl_graph` object, using a tidy manner to display both tibbles together. ```r example_dataset_path <- system.file("extdata", "example_dataset.csv", package = "txGeneNetwork") example_dataset <- read_csv(example_dataset_path)
The tbl_graph()
command allows you to directly create a tbl_graph
object using our .csv table.
example_tbl_graph <- as_tbl_graph(example_dataset) example_tbl_graph
Now that we have the tbl_graph
object we can start plotting the data.
ggraph
uses a syntax very similar to ggplot2 and most of the addons used in ggplot2
can also be used in ggraph
, like theme_*()
from ggthemes
and geom_*_repel()
from ggrepel
.
To start, we will construct the basic network using the example data and add extra information for nodes and edges.
We will use geom_node_point()
and geom_edge_link()
for the basic network.
example_tbl_graph %>% ggraph() + geom_node_point() + geom_edge_link()
We got a message, saying that ggraph
used sugiyama as the default layout, which can be changed by passing an argument to the ggraph()
function call.
example_tbl_graph %>% ggraph(layout = "kk") + geom_node_point() + geom_edge_link()
Now we have a network more similar to the final product.
To modify our tbl_graph
object and add other variables you can use usual dplyr
syntax together with the activate()
function.
The activate()
verb will select which of the tibbles you are modifying the nodes or the edges tibble.
Here we add a centrality measure [^1] to the network and size it accordingly using an aes()
call inside geom_node_point()
.
example_tbl_graph %>% mutate(centrality = centrality_power()) %>% ggraph(layout = "kk") + geom_node_point(aes(size = centrality)) + geom_edge_link()
We can also color the edges according to the process they belong to or the direction of the transcript expression, using a similar syntax, but now adding an aes()
call inside geom_edge_link()
.
example_tbl_graph %>% mutate(centrality = centrality_power()) %>% ggraph(layout = "kk") + geom_edge_link(aes(col = Direction)) + geom_node_point(aes(size = centrality))
example_tbl_graph %>% mutate(centrality = centrality_power()) %>% ggraph(layout = "kk") + geom_edge_link(aes(color = Group)) + geom_node_point(aes(size = centrality))
As there are genes which belong to more than one biological process, this is not an adequate process visualization, the best way would be plotting it as individual hulls, but we will get to that down the workflow.
For now, in our example table, we only added aesthetics to the edges.
Now we will add the transcript_type
and the hull aesthetic.
First, you can extract the nodes table to then modify it using
example_tbl_graph %>% activate(nodes) %>% as_tibble()
Saving this on an object allows you to save and modify at will your nodes table. Here we load the modified table version.
modified_nodes_path <- system.file("extdata", "modified_nodes.csv", package = "txGeneNetwork") modified_nodes <- read_csv(modified_nodes_path)
We now add the transcript modified information in our tbl_graph
object and plot it using the aes(col)
example_tbl_graph %>% activate(nodes) %>% mutate(Type = modified_nodes$Type) %>% mutate(centrality = centrality_power()) %>% ggraph(layout = "kk") + geom_edge_link(aes(col = Direction)) + geom_node_point(aes(size = centrality, color = Type))
The geom_mark_hull()
, the function to add the hull colors indicating biological processes, does not work well with the tbl_graph
object due to not being able to add multiple information for the same node in the same color.
So the best way to color hulls is to add extra columns representing these overlays and do one geom_mark_hull()
call for each.
example_tbl_graph %>% activate(nodes) %>% mutate( Type = modified_nodes$Type, Process = modified_nodes$Process_1, Process_2 = modified_nodes$Process_2, Process_3 = modified_nodes$Process_3 ) %>% mutate(centrality = centrality_power()) %>% ggraph(layout = "kk") + geom_mark_hull(aes(x = x, y = y, fill = Process, color = Process)) + geom_mark_hull(aes(x = x, y = y, fill = Process_2, color = Process_2)) + geom_mark_hull(aes(x = x, y = y, fill = Process_3, color = Process_3)) + geom_edge_link(aes(col = Direction)) + new_scale("color") + geom_node_point(aes(size = centrality, color = Type)) + theme_graph()
Unfortunately, there is no way to plot this without the NA values due to the tbl_graph
class and how the geom_mark_hull
function works, so the NA hulls have to be removed a posteriori.
Now some final touches like legend size and title.
example_tbl_graph %>% activate(nodes) %>% mutate( Type = modified_nodes$Type, Process = modified_nodes$Process_1, Process_2 = modified_nodes$Process_2, Process_3 = modified_nodes$Process_3 ) %>% mutate(centrality = centrality_power()) %>% ggraph(layout = "kk") + geom_mark_hull(aes(x = x, y = y, fill = Process, color = Process)) + geom_mark_hull(aes(x = x, y = y, fill = Process_2, color = Process_2)) + geom_mark_hull(aes(x = x, y = y, fill = Process_3, color = Process_3)) + geom_edge_link(aes(col = Direction)) + new_scale("color") + geom_node_point(aes(size = centrality, color = Type))
Now you have the final network, and you only need to save it as .pdf or .csv and remove the NA layer of the hull.
sessionInfo() # sessioninfo::session_info() # xfun::session_info()
[^1]: add a reference about network statistics
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.