Gene-ontology-methods: Gene ontology (over-representation) analysis using enriched...

Gene-ontology-methodsR Documentation

Gene ontology (over-representation) analysis using enriched genes of top alternative splicing events

Description

Genes containing differential alternative splicing events (ASEs) may be enriched in key functional pathways. This can be identified using a simple over-representation analysis. Biologists can identify key pathways of interest in order to focus on studying ASEs belonging to genes of functional interest.

Usage

goASE(
  enrichedEventNames,
  universeEventNames = NULL,
  se,
  ontologyType = c("BP", "MF", "CC"),
  pAdjustMethod = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr",
    "none"),
  ontologyRef = NULL,
  ...
)

goGenes(
  enrichedGenes,
  universeGenes = NULL,
  ontologyRef,
  ontologyType = c("BP", "MF", "CC"),
  pAdjustMethod = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr",
    "none"),
  ...
)

extract_gene_ids_for_GO(enrichedEventNames, universeEventNames = NULL, se)

subset_EventNames_by_GO(EventNames, go_id, se)

plotGO(
  res_go = NULL,
  plot_x = c("log10FDR", "foldEnrichment", "nGenes"),
  plot_size = c("nGenes", "foldEnrichment", "log10FDR"),
  plot_color = c("foldEnrichment", "nGenes", "log10FDR"),
  filter_n_terms = 20,
  filter_padj = 1,
  filter_pvalue = 1,
  trim_go_term = 50
)

Arguments

enrichedEventNames

A vector of EventNames. This is typically one or more EventNames of differential ASEs

universeEventNames

A vector of EventNames, typically the EventNames of all ASEs that were tested. If left as NULL, all genes are considered background genes.

se

The NxtSE object containing the GO reference and the EventNames

ontologyType

One of either "BP" - biological pathways, "MF" - molecular function, or "CC" - cellular component.

pAdjustMethod

The method for p-value adjustment for FDR. See ?p.adjust

ontologyRef

A valid gene ontology reference. This can be generated either using viewGO(reference_path) or ref(se)$ontology. This field is required for goGenes() and optional for goASE(). See details.

...

Additional arguments to be passed to fgsea::fora()

enrichedGenes

A vector of gene_id representing the list of enriched genes. To generate a list of valid gene_id, see viewGenes

universeGenes

(default NULL) A vector of gene_id representing the list of background genes.

EventNames, go_id

In subset_EventNames_by_GO(), a vector of ASE EventNames to subset against the given go_id.

res_go

For plotGO, the gene ontology results data object returned by the goASE() function.

plot_x, plot_size, plot_color

What parameters should be plotted on the x-axis, bubble-size, or bubble-color? Valid options are ⁠c("log10FDR", "foldEnrichment", "nGenes"). Defaults are ⁠"log10FDR", "nGenes", "foldEnrichment"' for x-axis, bubble size/color, respectively

filter_n_terms

(default 20) How many top terms to plot.

filter_padj, filter_pvalue

Whether given GO results should be filtered by adjusted p value (FDR) or nominal p value, respectively, prior to plot

trim_go_term

(default 50) For long GO terms, description will be trimmed by first n characters, where trim_go_term = n

Details

Users can perform GO analysis using either the GO annotation compiled via building the SpliceWiz reference using buildRef() , or via a custom-supplied gene ontology annotation. This is done by supplying their own GO annotations as an argument to ontologyRef. This should be coerceable to a data.frame containing the following columns:

  • gene_id Gene ID's matching that used by the SpliceWiz reference

  • go_id Gene ontology ID terms, of the form GO:XXXXXXX

Value

For goASE() and goGenes(), a data table containing the following:

  • go_id: Gene ontology ID

  • go_term: Gene ontology term

  • pval: Raw p values

  • padj: Adjusted p values

  • overlap: Number of enriched genes (of enriched ASEs)

  • size: Number of background genes (of background ASEs)

  • overlapGenes: A list of gene_id's from genes of enriched ASEs

  • expected: The number of overlap genes expected by random

For extract_gene_ids_for_GO(), a list containing the following:

  • genes: A vector of enriched gene_ids

  • universe: A vector of background gene_ids

For subset_EventNames_by_GO(), a vector of all ASE EventNames belonging to the given gene ontology go_id

Functions

  • goASE(): Performs over-representation gene ontology analysis using a given list of enriched / background ASEs

  • goGenes(): Performs GO analysis given the set of enriched and (optionally) the background (universe) genes.

  • extract_gene_ids_for_GO(): Produces a list containing enriched and universe gene_ids of given enriched and background ASE EventNames

  • subset_EventNames_by_GO(): Returns a list of ASEs enriched in a given gene ontology category

  • plotGO(): Produces a lollipop plot based on the given gene ontology results object

See Also

Build-Reference-methods on how to generate gene ontology annotations

Examples

# Generate example reference with `Homo sapiens` gene ontology

ref_path <- file.path(tempdir(), "Reference_withGO")
buildRef(
    reference_path = ref_path,
    fasta = chrZ_genome(),
    gtf = chrZ_gtf(),
    ontologySpecies = "Homo sapiens"
)

# Perform GO analysis using first 1000 genes
ontology <- viewGO(ref_path)
allGenes <- sort(unique(ontology$gene_id))

exampleGeneID <- allGenes[1:1000]
exampleBkgdID <- allGenes

go_df <- goGenes(
    enrichedGenes = exampleGeneID, 
    universeGenes = exampleBkgdID, 
    ontologyRef = ontology
)

# Plots the top 12 GO terms

plotGO(go_df, filter_n_terms = 12)

# Below example code of how to use output of differential ASEs for GO analysis
# It will not work with the example dataset because the reference must be 
# either human / mouse, or a  valid `ontologySpecies` given to buildRef()
# We hope the example code is simple enough to understand for users to adapt
# to their own workflows.

## Not run: 

se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE)

colData(se)$treatment <- rep(c("A", "B"), each = 3)

require("limma")
res_limma <- ASE_limma(se, "treatment", "A", "B")

# Perform gene ontology analysis of the first 10 differential ASEs

go_df <- goASE(
  enrichedEventNames = res_limma$EventName[1:10],
  universeEventNames = res_limma$EventName,
  se = se
)

# Return a list of all ASEs belonging to the top enriched category

GOsubset_EventName <- subset_EventNames_by_GO(
  EventNames = res_limma$EventName,
  go_id = go_df$go_id[1],
  se = se
)

# Return a list of all ASEs belonging to the top enriched category.
# - typically used if one wishes to export `gene_id` for use in other gene
#   ontology tools

gene_id_list <- extract_gene_ids_for_GO(
  enrichedEventNames = res_limma$EventName[1:10],
  universeEventNames = res_limma$EventName,
  se = se
)


## End(Not run)

alexchwong/SpliceWiz documentation built on Oct. 15, 2024, 10:12 a.m.