knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", warning = FALSE, message = FALSE ) library(dplyr) library(gtsummary) library(gnomeR)
You can install the development version of gnomeR
from GitHub with:
# install.packages("devtools") devtools::install_github("MSKCC-Epi-Bio/gnomeR")
Along with its companion package for cbioPortal data download:
devtools::install_github("karissawhiting/cbioportalR")
the gnomeR
package provides a consistent framework for genetic data processing, visualization and analysis. This is primarily targeted to IMPACT datasets but can also be applied to any genomic data provided by cBioPortal. With {gnomeR} and {cbioportalR} you can:
{gnomeR} is part of gnomeverse, a collection of R packages designed to work together seamlessly to create reproducible clinico-genomic analysis pipelines.
{gnomeR} works with any genomic data that follows cBioPortal guidelines for mutation, CNA, or fusion data file formats.
If you wish to pull the data directly from cBioPortal, see how to get set up with credentials with the {cbioportalR} package.
The below examples uses the data sets mutatations
, sv
, cna
which were pulled from cBioPortal and are included in the package as example data sets. We will sample 100 samples for examples:
set.seed(123) mut <- gnomeR::mutations cna <- gnomeR::cna sv <- gnomeR::sv un <- unique(mut$sampleId) sample_patients <- sample(un, size = 50, replace = FALSE)
The main data processing function is create_gene_binary()
which takes mutation, CNA and fusion files as input, and outputs a binary matrix of N rows (number of samples) by M genes included in the data set. We can specify which patients are included which will force all patients in resulting dataframe, even if they have no alterations.
gen_dat <- create_gene_binary(samples = sample_patients, mutation = mut, fusion = sv, cna = cna) head(gen_dat[, 1:6])
By default, mutations, CNA and fusions will be returned in separate columns. You can combine these at the gene level using the following:
by_gene <- gen_dat %>% summarize_by_gene() head(by_gene[,1:6])
You can visualize your processed and raw alteration data sets using {gnomeR}'s many data visualization functions.
Quickly visualize mutation characteristics with ggvarclass()
,
ggvartype()
, ggsnvclass()
, ggsamplevar()
, ggtopgenes()
, gggenecor()
, and ggcomut()
.
ggvarclass(mutation = mut)
You can tabulate summarize your genomic data frame using the tbl_genomic()
function, a wrapper for gtsummary::tbl_summary()
.
gen_dat <- gen_dat %>% dplyr::mutate(trt_status = sample(x = c("pre-trt", "post-trt"), size = nrow(gen_dat), replace = TRUE))
gene_tbl_trt <- gen_dat %>% subset_by_frequency(t = .1, other_vars = trt_status) %>% tbl_genomic(by = trt_status) %>% gtsummary::add_p()
#gt::gtsave(as_gt(gene_tbl_trt), file = file.path(tempdir(), "temp.png")) gt::gtsave(as_gt(gene_tbl_trt), filename = here::here("man", "figures" , "README-tbl_genomic_print.png"))
knitr::include_graphics(here::here("man/figures/README-tbl_genomic_print.png"))
Additionally, you can analyze custom pathways, or a set of default gene pathways using add_pathways()
:
path_by_trt <- gen_dat %>% add_pathways() %>% select(sample_id, trt_status, contains("pathway_")) %>% tbl_genomic(by = trt_status) %>% gtsummary::add_p()
gt::gtsave(as_gt(path_by_trt), filename = here::here("man", "figures" , "README-path_by_trt.png"))
knitr::include_graphics(here::here("man/figures/README-path_by_trt.png"))
Please note that the gnomeR project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Thank you to all contributors!
r usethis::use_tidy_thanks("MSKCC-Epi-Bio/gnomeR", from = "2020-01-01") %>% {glue::glue("[@{.}](https://github.com/{.})")} %>% glue::glue_collapse(sep = ", ", last = ", and ")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.