knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) CRANpkg <- function(pkg) { cran <- "https://cran.r-project.org/package" fmt <- "[%s](%s=%s)" sprintf(fmt, pkg, cran, pkg) } Biocpkg <- function(pkg) { sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) } # Packages ------------------------------------------------------------------- library(ggmsa) library(ggplot2)
if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("ggmsa")
ggmsa is a package designed to plot multiple sequence alignments.
This package implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful. It uses module design to annotate sequence alignments and allows to accept other data sets for diagrams combination.
In this tutorial, we’ll work through the basics of using ggmsa.
library(ggmsa)
knitr::include_graphics("man/figures/workflow.png")
We’ll start by importing some example data to use throughout this
tutorial. Expect FASTA files, some of the objects in R can also
as input. available_msa()
can be used to list MSA objects
currently available.
available_msa() protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa") miRNA_sequences <- system.file("extdata", "seedSample.fa", package = "ggmsa") nt_sequences <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa")
The most simple code to use ggmsa:
ggmsa(protein_sequences, 300, 350, color = "Clustal", font = "DroidSansMono", char_width = 0.5, seq_name = TRUE )
ggmsa predefines several color schemes for rendering MSA
are shipped in the package. In the same ways, using
available_msa()
to list color schemes currently available.
Note that amino acids (protein) and nucleotides (DNA/RNA) have
different names.
available_colors()
knitr::include_graphics("man/figures/schemes.png")
Several predefined fonts are shipped ggmsa.
Users can use available_fonts()
to list the font currently available.
available_fonts()
ggmsa supports annotations for MSA. Similar to the ggplot2,
it implements annotations by geom
and users can perform
annotation with +
, like this: ggmsa() + geom_*()
.
Automatically generated annotations that containing colored
labels and symbols are overlaid on MSAs to indicate
potentially conserved or divergent regions.
For example, visualizing multiple sequence alignment with sequence logo and bar chart:
ggmsa(protein_sequences, 221, 280, seq_name = TRUE, char_width = 0.5) + geom_seqlogo(color = "Chemistry_AA") + geom_msaBar()
This table shows the annnotation layers supported by ggmsa as following:
library(kableExtra) x <- "geom_seqlogo()\tgeometric layer\tautomatically generated sequence logos for a MSA\n geom_GC()\tannotation module\tshows GC content with bubble chart\n geom_seed()\tannotation module\thighlights seed region on miRNA sequences\n geom_msaBar()\tannotation module\tshows sequences conservation by a bar chart\n geom_helix()\tannotation module\tdepicts RNA secondary structure as arc diagrams(need extra data)\n " xx <- strsplit(x, "\n\n")[[1]] y <- strsplit(xx, "\t") %>% do.call("rbind", .) y <- as.data.frame(y, stringsAsFactors = FALSE) colnames(y) <- c("Annotation modules", "Type", "Description") knitr::kable(y, align = "l", booktabs = TRUE, escape = TRUE) %>% kable_styling(latex_options = c("striped", "hold_position", "scale_down"))
Check out the guides for learning everything there is to know about all the different features:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.