library(knitr) knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
suppressMessages(library(synaptome.ldb)) suppressMessages(library(dplyr)) library(ggplot2) library(pander)
58 published synaptic proteomic datasets (2000-2022 years) that describe over 8,000 proteins were integrated and combined with direct protein-protein interactions and functional metadata to build a network resource.
The set includes 29 post synaptic proteome (PSP) studies (2000 to 2019) contributing a total of 5,560 mouse and human unique gene identifiers; 19 presynaptic studies (2004 to 2022) describe 2,772 unique human and mouse gene IDs, and 11 studies that span the whole synaptosome and report 7,198 unique genes.
NB: With the latest update of the database we have added 6 new studies: 1 presynaptic and 5 postsynaptic, new compsrtment (Synaptic Vesicle), and coding mutations for Autistic Spectral Disorder (ASD) and epilepsy (Epi).
To reconstruct protein-protein interaction (PPI) networks for the pre- and post-synaptic proteomes we used human PPI data filtered for the highest confidence direct and physical interactions from BioGRID, Intact and DIP. The resulting postsynaptic proteome (PSP) network contains 4,817 nodes and 27,788 edges in the Largest Connected Component (LCC). The presynaptic network is significantly smaller and comprises 2,221 nodes and 8,678 edges in the LCC.
The database includes: proteomic and interactomic data with supporting information on compartment, specie and brain region, GO function information for three species: mouse, rat and human, disease annotation for human (based on Human Disease Ontology (HDO) and GeneToModel table, which links certain synaptic proteins to existing computational models of synaptic plasticity and synaptic signal transduction
The original files are maintained at Ednburgh Datashare https://doi.org/10.7488/ds/3017. Updated database file could be found here: https://doi.org/10.7488/ds/3771
The dataset was described in the @Sorokina:2021hl.
suppressMessages(library(synaptome.db)) suppressMessages(library(dplyr)) library(ggplot2) library(pander)
The dataset can be used to answer frequent questions such as “What is known about my favourite gene? Is it pre- or postsynaptic? Which brain region was it identified in?”, "Which publication it was reported in?" Information could be obtained by submitting gene EntrezID or Gene name
t <- getGeneInfoByEntrez(1742) pander(head(t)) t <- getGeneInfoByName("CASK") pander(head(t)) t <- getGeneInfoByName(c("CASK", "DLG2")) pander(head(t))
Obtaining Internal database GeneIDs is a useful intermediate step for more complex queries including those for building protein-protein interaction (PPI) networks for compartments and brain regions. Internal GeneID is specie-neutral and unique, which allows exact identification of the object of interest in case of redundancy (e.g. one Human genes matches on a few mouse ones, etc.)
t <- findGenesByEntrez(c(1742, 1741, 1739, 1740)) pander(head(t)) t <- findGenesByName(c("SRC", "SRCIN1", "FYN")) pander(head(t))
Synaptic genes are annotated with disease information from Human Disease
Ontology, where available. To get disease information one can submit the
list of Human Entrez Is or Human genes names, it could be also the list of
Internal GeneIDs if using getGeneDiseaseByIDs
function
t <- getGeneDiseaseByName (c("CASK", "DLG2", "DLG1")) pander(head(t)) t <- getGeneDiseaseByEntres (c(8573, 1742, 1739)) pander(head(t))
One can obtain the overview of synaptic proteome papers combined into the database, which includes paper PMID, specie Tax ID, year of publication, subcellular localisation, brain region and number of proteins identified in the paper. This information may help to choose the specific study(ies) for further work.
p <- getPapers() pander(head(p))
It is also possible to obtain the list of proteins found in more than one study, for the whole synaptic proteome. For that, the user needs to provide a "count" value as a desired minimal number of identifications (e.g. 2 or more). The command returns the table with gene identifiers and "Npmid" column, which contains the number of studies where this gene was identified.
gp <- findGeneByPaperCnt(cnt = 2)#find all proteins in synaptic proteome identified 2 times or more pander(head(gp))
Following section 5, when the information for all considered proteomic studies was obtained, user can select specific study(ies) by PMID and get the proteins identified in those studies. By providing "count" value user can extract either all proteins from specified studies (count = 1), or just frequently found ones (count >=2). As above, the command returns the table with gene identifiers with "Npmid" column, which contains the number of studies where this protein was identified
spg <- findGeneByPapers(p$PaperPMID[1:5], cnt = 1) pander(head(spg))
Most of the times, user is interested in the specific compartment rather then in total synaptic proteome. To help identify the genes most probably residenting in the specific compartment and exclude possible contaminants, findGeneByCompartmentPaperCnt function provides the table of proteins found "cnt" or more times in different compartment-paper pairs.
gcp <- findGeneByCompartmentPaperCnt(cnt = 2) pander(head(gcp))
Now user can select the specific compartment and proceed working with obtained list of frequently found proteins
presgp <- gcp[gcp$Localisation == "Presynaptic",] dim(presgp) pander(head(presgp))
Custom Protein-protein interactions based on bespoke subsets of molecules could be extracted in two general ways: "induced" and "limited". In the first case, the command will return all possible interactors for the genes within the whole interactome. In the second case it will return only interactions between the genes of interest. PPIs could be obtained by submitting list of EntrezIDs or gene names, or Internal IDs - in all cases the interactions will be returned as a list of interacting pairs of Intenal GeneIDs.
t <- getPPIbyName( c("CASK", "DLG4", "GRIN2A", "GRIN2B","GRIN1"), type = "limited") pander(head(t)) t <- getPPIbyEntrez(c(1739, 1740, 1742, 1741), type='induced') pander(head(t)) #obtain PPIs for the list of frequently found genes in presynaptc compartment t <- getPPIbyEntrez(presgp$HumanEntrez, type='induced') pander(head(t))
Three main synaptic compartments considered in the database are "presynaptic",
"postsynaptic" and "synaptosome". Genes are classified to compartments based
on respective publications, so that each gene can belong to one or two, or
even three compartments. The full list of genes for specific compartment
could be obtained with command getAllGenes4Compartment
, which returns
the table with main gene identifiers, like internal GeneIDs, MGI ID,
Human Entrez ID, Human Gene Name, Mouse Entrez ID, Mouse Gene Name,
Rat Entrez ID, Rat Gene Name.
If you need to check which genes of your list belong to specific compartment,
you can use getGenes4Compartment
command, which will select from your list
only genes associated with specific compartment.
To obtain the PPI network for compartment one has to submit the list of
Internal GeneIDs obtained with previous commands.
#getting the list of compartment comp <- getCompartments() pander(comp) #getting all genes for postsynaptic compartment gns <- getAllGenes4Compartment(compartmentID = 1) pander(head(gns)) #getting full PPI network for postsynaptic compartment ppi <- getPPIbyIDs4Compartment(gns$GeneID,compartmentID =1, type = "induced") pander(head(ppi))
Three are 12 brain regions considered in the database based on respective
publications, so that each gene can belong to the single or to the several
brain regions. Brain regions differ between species, and specie brain region
information is not 100% covered in the database(e.g. we don't have yet studies
for Human Striatum, but do have for Mouse and Rat), that's why when querying
the database for brain region information you will need to specify the specie.
The full list of genes for specific region could be obtained wuth command
getAllGenes4BrainRegion
, which returns the table with main gene identifiers,
like internal Gene IDs, MGI ID, Human Entrez ID, Human Gene Name,
Mouse Entrez ID, Mouse Gene Name, Rat Entrez ID, Rat Gene Name.
If you need to check which genes of your list were identified in
specific region, you can use getGenes4BrainRegion
command, which will
select only genes associated with specific region from your list.
To obtain the PPI network for brain region you need to submit the list of Internal GeneIDs obtained with previous commands.
#getting the full list of brain regions reg <- getBrainRegions() pander(reg) #getting all genes for mouse Striatum gns <- getAllGenes4BrainRegion(brainRegion = "Striatum",taxID = 10090) pander(head(gns)) #getting full PPI network for postsynaptic compartment ppi <- getPPIbyIDs4BrainRegion( gns$GeneID, brainRegion = "Striatum", taxID = 10090, type = "limited") pander(head(ppi))
listG<-findGenesByEntrez(1:250) #check whic genes from 250 random EntrezIds are in the database dim(listG) head(listG) getCompartments() presG <- getGenes4Compartment(listG$GeneID, 2) #check which genes from subset identified as synaptic are presynaptic dim(presG) head(presG) getBrainRegions() listR <- getGenes4BrainRegion(listG$GeneID, brainRegion = "Cerebellum", taxID = 10090) #check which genes from subset identified as synaptic are found in human cerebellum dim(listR) head(listR)
Combine information from PPI data.frame obtained with functions like
getPPIbyName
, getPPIbyEntrez
, getPPIbyIDs4Compartment
or
getPPIbyIDs4BrainRegion
with information about genes obtained
from getGenesByID
to make interpretable undirected PPI graph in
igraph format. In this format network could be further analysed and
visualized by algorithms from igraph package.
library(igraph) g<-getIGraphFromPPI( getPPIbyIDs(c(48, 129, 975, 4422, 5715, 5835), type='lim')) plot(g,vertex.label=V(g)$RatName,vertex.size=25)
If Igraph is not an option, the PPI network could be exported as an interpretible table to be processed with other tools, e.g. Cytoscape,etc.
tbl<-getTableFromPPI(getPPIbyIDs(c(48, 585, 710), type='limited')) tbl
c<-devtools::session_info() pander::pander(t(data.frame(c(c$platform)))) pander::pander(as.data.frame(c$packages)[,-c(4,5,10,11)])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.