Background: The BioPlex project has created two proteome-scale, cell-line-specific PPI networks: one for human embryonic kidney 293T cells, and one for human colon cancer HCT116 cells. The BioPlex R package serves transcriptome and proteome data for the two cell lines in dedicated Bioconductor data structures. This includes (i) transcriptome and proteome data for HCT116 from the Cancer Cell Line Encyclopedia (CCLE) , and (ii) transcriptome and proteome data profiling differential expression between 293T and HCT116 cells from different sources.
Objective: Can we identify patterns driving (dis-)agreement between transcriptome and proteome of HEK293 and HCT116 cells?
Proposed methods: The Transcriptome-Proteome analysis vignette implements a scaffold for integrative analysis of transcriptome and proteome data from both cell lines. This includes demonstration of how to obtain the corresponding datasets and initial results from (i) correlating HCT116 transcript levels and protein levels from CCLE, and (ii) correlating differential gene expression and differential protein abundance when comparing 293T against HCT116 cells.
Given that HCT116 is a cancer cell line, an important driver of transcriptomic and proteomic expression could be somatic copy number alteration. One approach to this problem would thus be to obtain copy number data from cBioPortal for HCT116 (the cBioPortalData package provides an R/Bioconductor interface for accessing cBioPortal study data incl. CCLE). A simplified version of this would be to obtain pre-computed gene sets of high and low copy number in HCT116 from Enrichr and test for their enrichment in genes/proteins that have high or low expression in HCT116 cells (i) individually, and (ii) when compared to 293T cells.
More generally, standard gene set enrichment analysis of eg. GO terms and KEGG pathways could be employed to identify themes in the sets of genes showing agreement for (i) transcript levels and protein expression in HCT116 CCLE samples, and (ii) differential gene and protein expression between 293T and HCT116 cells from different sources.
What a successful result would look like: Pull request to the BioPlexAnalysis github repository that extends the Transcriptome-Proteome analysis vignette. Pull requests will be reviewed and discussed. Accepted contributions will be acknowledged.
Potential follow-up work: Analyze PPI networks for differential wiring in areas of differential expression between 293T and HCT116 cells.
Background: Network propagation of genetic disease associations is an approach for identifying potential drug targets (MacNamara et al., 2020). Such approaches typically leverage publicly available collections of genes and variants associated with human diseases such as OpenTargets. OpenTargets integrates data from expert curated repositories, GWAS catalogues, animal models, and the scientific literature. It has been shown that protein networks from specific functional linkages such as protein complexes are suitable for simple guilt-by-association network propagation approaches (MacNamara et al., 2020). Global protein–protein interaction networks typically require more sophisticated approaches such as HotNet2.
Objective: Application of network propagation methods to CORUM complexes and BioPlex networks using genetic disease association scores from OpenTargets.
Proposed methods: A list of all OpenTargets datasets is available from the Platform Data Downloads page. Example scripts for accessing the data from within R are available here. Following MacNamara et al., 2020, one would first use the scored gene-disease associations from OpenTargets to obtain high‑confidence genetic hits (HCGHs). Proxy gene sets of HCGHs can then be defined by identifying genes that (i) share a protein complex with a HCGH, (ii) are annotated to the same KEGG or REACTOME pathway, (iii) are first- or second-degree neighbors in a BioPlex network, and (iv) are found in a network module identified with approaches such as BioNet or HotNet2.
What a successful result would look like:
Pull request to the
BioPlexAnalysis
github repository outlining the approach in a separate network propagation vignette
(.Rmd
file in the vignettes
folder).
Pull requests will be reviewed and discussed.
Accepted contributions will be acknowledged.
Potential follow-up work: Discussion of how to best encapsulate the OpenTargets gene-disease associations in a package with a sparklyr backend.
Background: GraphFrames is a package for Apache Spark that provides a DataFrame-based API for working with graphs. Functionality includes motif finding and common graph algorithms, such as PageRank and Breadth-first search.
Objective: Can we refactor the graph backend of the BioPlex data package using
graphframes
?
Proposed methods: The graph backend of the BioPlex data package is
currently based on the Bioconductor
graph
package. The function bioplex2graph
turns an ordinary data.frame
storing the
BioPlex PPI data into a graphNEL
object, where bait and prey relationships are
represented by directed edges from bait to prey. Data for individual nodes/proteins
is stored in the nodeData
slot, and data for the edges/interactions is stored
in the edgeData
slot. We might also want to store graph-level metadata such as
cell line, network version, and PMID of the primary publication associated with
a PPI network, among others. The function bioplex2graph
will need to be adapted
to be based on GraphFrames
instead.
What a successful result would look like: Pull request to the BioPlex github repository on a new branch (named graphframes). Pull requests will be reviewed and discussed. Contributions will be acknowledged.
Potential follow-up work: Several functions of the BioPlex package add data
via the graph
API to the nodeData
and the edgeData
of a BioPlex graph (typically obtained
via bioplex2graph
). This includes (i) the annotatePFAM
function, which adds
PFAM protein domain annotations to the nodes, and (ii) the
mapSummarizedExperimentOntoGraph
function, which allows to transfer assay
data
and/or rowData
annotations from the rows of a SummarizedExperiment
to the
node data of a graph. The implementation of these functions will need to be adapted
to make use of the GraphFrames
API instead.
Background: A variety of data and annotations are available for the nodes (= proteins) and the edges (= PPIs) of the BioPlex networks. As illustrated in the BioPlex data retrieval vignette, this includes (i) for the nodes: transcript and protein abundance, results of differential expression analysis, disease association scores, etc., and (ii) for the edges: confidence scores, interaction probabilities, etc. On the other hand, graph algorithms such as maximum scoring subnetwork analysis, potentially combined with gene set enrichment analysis, typically produce subnetwork/pathway-sized graphs of interest prompting further interactive exploration.
Objective: Setting up an R/Shiny graph viewer that allows exploration of data and annotations for nodes and edges of the networks.
Proposed methods: The GraphViewer github repository contains a scaffold implementation based on ggnetwork. Alternatives implementing a ggplot2 approach to the visualization of graphs and networks such as ggraph exist. The app is supposed to take subnetwork/pathway-sized graphs of interest as input, and is expected to support overlay of node attributes and edge attributes using ggplot2 grammar. The UI should be minimal. A drop-down for the node data attribute to display on the graph, and a drop-down to display the edge data attribute to display on a graph. Upload of a serialized graph object by the user should be supported.
What a successful result would look like: Pull request to the GraphViewer github repository. Pull requests will be reviewed and discussed. Contributions will be acknowledged.
Potential follow-up work:
Integration of the graph viewer as a new panel type for
iSEE, which takes a panel-based approach
to visualizing assay data, column data, and row data of a SummarizedExperiment
.
Extensions for individual panels for visualizing node data and edge data of a graph
could be modeled after the RowDataPlot
and ColumnDataPlot
classes.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.