In d3b-center/annoFuse: annoFuse: an R Package to annotate, prioritize, and interactively explore putative oncogenic RNA fusions

knitr::opts_chunk$set(warning = FALSE, message = FALSE)

Introduction

In this vignette, we demonstrate how to use annoFuse to filter putative oncogenic fusions from a filtered fusion list followed by plotting recurrent fusions, recurrently fused genes, as well as a summary plot.

Step by step analysis

We start by loading annoFuse and the other required packages in the chunk below

library("annoFuse")
suppressPackageStartupMessages(library("readr"))
suppressPackageStartupMessages(library("dplyr"))
suppressPackageStartupMessages(library("qdapRegex"))

Overview of the package

Here, we present annoFuse, an R package developed to annotate and filter expressed gene fusions, along with highlighting artifact filtered novel fusions.

Filtered fusion list FilteredFusionAnnoFuse.tsv was obtained from Star-Fusion, arriba and rsem expression (fpkm) files from release-v16-20200320 of OpenPBTA . We used FusionAnnotator to annotate arriba files to specifically identify "red flag" fusions found in healthy tissues or in gene homology databases saved as column "annots" to match annotation in STAR-Fusion calls. The merged dataset was processed using the following steps fusion_standardization -> fusion_filtering_QC -> expression_filter_fusion -> annotate_fusion_calls ). Please refer to the filtering criterias available per function below:

Standard fusion call format requirements

Mandatory fields : "Sample", "LeftBreakpoint", "RightBreakpoint", "FusionName", "Gene1A", "Gene1B"

For OpenPBTA samples we will also require some project specific fields to visualize and explore the data

Additional fields required for OpenPBTA project specific filter : "Kids_First_Biospecimen_ID", "BreakpointLocation", "broad_histology"

fusion_calls <- read_tsv(system.file("extdata", "FilteredFusionAnnoFuse.tsv", package = "annoFuseData"))

# distance are being removed here to capture all intergenic fusions in count
# distance within () were making them count as unique instead of the same fusion
fusion_calls$FusionName <- unlist(lapply(
  fusion_calls$FusionName, 
  function(x) rm_between(x, "(", ")", extract = FALSE)
))

cols_fusioncalls <- c("LeftBreakpoint", "RightBreakpoint", "FusionName",
                      "Gene1A", "Gene1B","Sample")

head(fusion_calls[,cols_fusioncalls])

If the fusion has been previously reported in TCGA or if fusions containing gene partners are known oncogenes, tumor suppressor genes, COSMIC genes, and/or transcription factors, then we consider the fusion call to be known oncogenic or putative oncogenic.

# Add reference gene list containing known oncogenes, tumor suppressors, kinases, and transcription factors
geneListReferenceDataTab <- read.delim(
  system.file("extdata", "genelistreference.txt", package = "annoFuseData"),
  stringsAsFactors = FALSE
)

# Add fusion list containing previously reported oncogenic fusions.
fusionReferenceDataTab <- read.delim(
  system.file("extdata", "fusionreference.txt", package = "annoFuseData"),
  stringsAsFactors = FALSE
)

# filter for driver fusions
putative_driver_fusions <- fusion_driver(
  standardFusioncalls = fusion_calls, 
  annotated = TRUE, 
  geneListReferenceDataTab = geneListReferenceDataTab, 
  fusionReferenceDataTab = fusionReferenceDataTab,checkDomainStatus = TRUE
)

# aggregate caller
putative_driver_fusions <- aggregate_fusion_calls(putative_driver_fusions, removeother = FALSE)

Putative Driver Fusions found in more than four distinct histologies were filtered out, as these fusions were considered likely artifactual.

# checking if fusions are called in multiple histologies
# this will suggest these fusion calls are artifactual or
# commonly occuring
found_in_morethan_4 <- groupcount_fusion_calls(
  putative_driver_fusions, 
  group = "broad_histology", 
  numGroup = 4
) %>% 
  arrange(desc(group.ct))

found_in_morethan_4

Other non-oncogenic (not annotated as oncogene/transcription factor/tumor suppressor gene/kinase) fusions that are recurrent fusions, called by both callers, and are unique to a specific histology in this filtered dataset are of interest as well.

However, non-oncogenic genes that are fused more than five times per sample were removed, as they were considered artifactual within our dataset from manual review.

# To scavenge back non-oncogenic recurrent/unique per broad histology fusions
fusion_calls <- aggregate_fusion_calls(
  fusion_calls,
  removeother = TRUE,
  filterAnnots = "LOCAL_REARRANGEMENT|LOCAL_INVERSION"
)

# Keep
# 1. Called by at least n callers
fusion_calls.summary <- called_by_n_callers(fusion_calls, 
                                            numCaller = 2)

# OR
# 2. Found in at least n samples in each group
sample.count <- samplecount_fusion_calls(fusion_calls, 
                                         numSample = 2, 
                                         group = "broad_histology")

# Remove
# 1. non-oncogenic fusions that are in > numGroup
group.count <- groupcount_fusion_calls(fusion_calls, 
                                       group = "broad_histology", 
                                       numGroup = 1)

# 2. non-oncogenic multi-fused genes
fusion_recurrent5_per_sample <- fusion_multifused(fusion_calls, 
                                                  limitMultiFused = 5)

# filter fusion_calls to keep recurrent fusions from above sample.count and fusion_calls.summary
QCGeneFiltered_recFusion <- fusion_calls %>%
  dplyr::filter(FusionName %in% unique(c(sample.count$FusionName, fusion_calls.summary$FusionName)))

# filter QCGeneFiltered_recFusion to remove fusions found in more than 1 group and multifused gene per samples
QCGeneFiltered_recFusionUniq <- QCGeneFiltered_recFusion %>%
  dplyr::filter(!FusionName %in% group.count$FusionName) %>%
  dplyr::filter(!Gene1A %in% fusion_recurrent5_per_sample$GeneSymbol |
    !Gene2A %in% fusion_recurrent5_per_sample$GeneSymbol |
    !Gene1B %in% fusion_recurrent5_per_sample$GeneSymbol |
    !Gene2B %in% fusion_recurrent5_per_sample$GeneSymbol)

# Check for domain retention
QCGeneFiltered_recFusionUniq <- fusion_driver(
  standardFusioncalls = QCGeneFiltered_recFusionUniq, 
  geneListReferenceDataTab = geneListReferenceDataTab, 
  fusionReferenceDataTab = fusionReferenceDataTab,
  # don't filter because these are non-oncogenic fusions
  filterPutativeDriver = FALSE,
  checkDomainStatus = TRUE,
  annotated = TRUE
)

Here, we visualize the distribution of fusions within two different genes that are annotated as "Genic" in our standard fusion format. Fusions that are in-frame are predicted to translate into protein, so we also filter for only in-frame fusions for visualization.

putative_driver_fusions <- putative_driver_fusions %>%
  dplyr::bind_rows(QCGeneFiltered_recFusionUniq[, colnames(putative_driver_fusions)]) %>%
  # filter fusions found in more than 4 histologies
  dplyr::filter(
    !FusionName %in% found_in_morethan_4$FusionName,
    # remove intergenic and intragenic fusion
    BreakpointLocation == "Genic",
    Fusion_Type == "in-frame"
  ) %>%
  as.data.frame()


head(putative_driver_fusions)

The summary of fusions called can provide an overview of the genomic rearrangements within cohort in-terms of intra or inter chromosomal changes and biotypes of genes fused. We also provide distribution of kinase domain retained in 5and 3 genes as well as overall distribution of annotation in each group. Here we've used broad_histology to group our cohorts.

# plot summary
plot_summary(standardFusioncalls = putative_driver_fusions, 
             groupby = "broad_histology")

Recurrent fusions and genes fused provide insights into subtypes of samples within a cohort. Here we choose broad_histology as the grouping variable and plot the n participants (Kids_First_Participant_ID) with these fusions.

# recurrently fused genes
plot_recurrent_genes(standardFusioncalls = putative_driver_fusions, 
                     groupby = "broad_histology", 
                     countID = "Kids_First_Participant_ID", 
                     plotn = 20)

# recurrent fusions
plot_recurrent_fusions(standardFusioncalls = putative_driver_fusions, 
                       groupby = "broad_histology", 
                       countID = "Kids_First_Participant_ID", 
                       plotn = 20)

Session info {-}

sessionInfo()

d3b-center/annoFuse documentation built on Oct. 2, 2024, 4:17 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

d3b-center/annoFuse
annoFuse: an R Package to annotate, prioritize, and interactively explore putative oncogenic RNA fusions

In d3b-center/annoFuse: annoFuse: an R Package to annotate, prioritize, and interactively explore putative oncogenic RNA fusions

Introduction

Step by step analysis

Overview of the package

Standard fusion call format requirements

Session info {-}

R Package Documentation

Browse R Packages

We want your feedback!

d3b-center/annoFuse annoFuse: an R Package to annotate, prioritize, and interactively explore putative oncogenic RNA fusions

In d3b-center/annoFuse: annoFuse: an R Package to annotate, prioritize, and interactively explore putative oncogenic RNA fusions

Introduction

Step by step analysis

Overview of the package

Standard fusion call format requirements

Session info {-}

R Package Documentation

Browse R Packages

We want your feedback!

d3b-center/annoFuse
annoFuse: an R Package to annotate, prioritize, and interactively explore putative oncogenic RNA fusions