knitr::opts_chunk$set(dpi = 300) knitr::opts_chunk$set(cache=FALSE)
devtools::load_all(".")
Motivation:
New technologies have made possible to identify marker gene signatures. However, gene expression-based signatures present some limitations because they do not consider metabolic role of the genes and are affected by genetic heterogeneity across patient cohorts. Considering the activity of entire pathways rather than the expression levels of individual genes can be a way to exceed these limits [@ref12].
This tool StarBioTrek
presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network and TCGA data. New measures are under development.
To install use the code below.
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("StarBioTrek")
Get data
: Get pathway and network dataSELECT_path_species
: Select the pathway database and species of interestThe user can select the pathway database and species of interest using some functions implemented in graphite [@ref1]
library(graphite) sel<-pathwayDatabases()
knitr::kable(sel, digits = 2, caption = "List of patwhay databases and species",row.names = FALSE)
GetData
: Searching pathway data for downloadThe user can easily search pathways data and their genes using the GetData
function. It can download pathways from several databases and species using the following parameters:
species="hsapiens" pathwaydb="kegg" path<-GetData(species,pathwaydb)
GetPathData
: Get genes inside pathwaysThe user can identify the genes inside the pathways of interest
pathway_ALLGENE<-GetPathData(path_ALL=path[1:3])
GetPathNet
: Get interacting genes inside pathwaysGetPathNet
generates a list of interacting genes for each pathway
pathway_net<-GetPathNet(path_ALL=path[1:3])
ConvertedIDgenes
: Get genes inside pathwaysThe user can convert the gene ID into GeneSymbol
pathway<-ConvertedIDgenes(path_ALL=path[1:10])
getNETdata
: Searching network data for downloadYou can easily search human network data from GeneMania using the getNETdata
function [@ref2].
The network category can be filtered using the following parameters:
The species can be filtered using the following parameters: Arabidopsis_thaliana Caenorhabditis_elegans Danio_rerio Drosophila_melanogaster Escherichia_coli Homo_sapiens Mus_musculus Rattus_norvegicus * Saccharomyces_cerevisiae
For default the organism is homo sapiens.
The example show the shared protein domain network for Saccharomyces_cerevisiae. For more information see SpidermiR
package.
organismID="Saccharomyces_cerevisiae" netw<-getNETdata(network="SHpd",organismID)
Integration data
: Integration between pathway and network datapath_net
: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)The function path_net
creates a network of interacting genes (downloaded from GeneMania) for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata
.
The output will be a network of genes belonging to the same pathway.
lista_net<-pathnet(genes.by.pathway=pathway[1:5],data=netw)
list_path_net
: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)The function list_path_net
creates a list of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata
.
The output will be a list of genes belonging to the same pathway and those having an interaction in the network.
list_path<-listpathnet(lista_net=lista_net,pathway=pathway[1:5])
Pathway summary indexes
: Score for each pathwayGE_matrix
: grouping gene expression profiles in pathwaysGet human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression levels grouped by pathways.
Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function GE_matrix
creates a profile of gene expression levels for each pathway given by the user:
list_path_gene<-GE_matrix(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10])
GE_matrix_mean
:Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix PXG (in the columns there are the pathways and in the rows there are genes) with the mean gene expression for only genes given containing in the pathways given in input by the user.
list_path_plot<-GE_matrix_mean(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10])
average
: Average of genes for each pathway starting from a matrix of gene expressionStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function average
creates an average matrix (SXG: S are the samples and P the pathways) of gene expression for each pathway:
score_mean<-average(pathwayexpsubset=list_path_gene)
stdv
: Standard deviations of genes for each pathway starting from a matrix of gene expressionStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function stdv
creates a standard deviation matrix of gene expression for each pathway:
score_st_dev<-stdv(gslist=list_path_gene)
Pathway cross-talk indexes
: Score for pairwise pathwayseucdistcrtlk
: Euclidean distance for cross-talk measureStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function eucdistcrtlk
creates an euclidean distance matrix of gene expression for pairwise pathway.
score_euc_distance<-eucdistcrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10])
dsscorecrtlk
: Discriminating score for cross-talk measureStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function dsscorecrtlk
creates an discriminating score matrix for pairwise pathway as measure of cross-talk. Discriminating score is given by |M1-M2|/S1+S2 where M1 and M2 are mean and S1 and S2 standard deviation of expression levels of genes in a pathway 1 and and in a pathway 2 .
cross_talk_st_dv<-dsscorecrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10])
Selection of pathway cross-talk
: Selection of pathway cross-talksvm_classification
: SVM classificationGiven the substantial difference in the activities of many pathways between two classes (e.g. normal and cancer), we examined the effectiveness to classify the classes based on their pairwise pathway profiles. This function is used to find the interacting pathways that are altered in a particular pathology in terms of Area Under Curve (AUC).AUC was estimated by cross-validation method (k-fold cross-validation, k=10).It randomly selected some fraction of TCGA data (e.g. nf= 60; 60% of original dataset) to form the training set and then assigned the rest of the points to the testing set (40% of original dataset). For each pairwise pathway the user can obtain using the methods mentioned above a score matrix ( e.g.dev_std_crtlk ) and can focus on the pairs of pathways able to differentiate a particular subtype with respect to the normal type.
nf <- 60 res_class<-svm_classification(TCGA_matrix=score_euc_dista[1:30,],nfs=nf, normal=colnames(norm[,1:10]),tumour=colnames(tumo[,1:10]))
IPPI
: Driver genes for each pathwayThe function IPPI
, using pathways and networks data, calculates the driver genes for each pathway. Please see Cava et al. BMC Genomics 2017.
DRIVER_SP<-IPPI(pathax=pathway_matrix[,1:3],netwa=netw_IPPI[1:50000,])
Visualization
: Gene interactions and pathwaysStarBioTrek presents several functions for the preparation to the visualization of gene-gene interactions and pathway cross-talk using the qgraph package [@ref3]. The function plotcrosstalk prepares the data:
formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo) library(qgraph) qgraph(formatplot[[1]], minimum = 0.25, cut = 0.6, vsize = 5, groups = formatplot[[2]], legend = TRUE, borders = FALSE,layoutScale=c(0.8,0.8))
qgraph(formatplot[[1]],groups=formatplot[[2]], layout="spring", diag = FALSE, cut = 0.6,legend.cex = 0.5,vsize = 6,layoutScale=c(0.8,0.8))
A circle can be generated using the function circleplot
[@ref4]. A score for each gene can be assigned.
formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo) score<-runif(length(formatplot[[2]]), min=-10, max=+10) circleplot(preplot=formatplot,scoregene=score)
library(png) library(grid) img <- readPNG("circleplot.png") grid.raster(img)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.