library(knitr) library(HLAClustRView)
Package: r packageDescription("HLAClustRView")[["Package"]]
Authors: r packageDescription("HLAClustRView")[["Author"]]
Version: r packageDescription("HLAClustRView")$Version
Compiled date: r Sys.Date()
License: r packageDescription("HLAClustRView")[["License"]]
The r packageDescription("HLAClustRView")[["Package"]]
package and the
underlying r packageDescription("HLAClustRView")[["Package"]]
code
are distributed under the MIT license. You are free to use and
redistribute this software.
If you use the r packageDescription("HLAClustRView")[["Package"]]
package
for a publication, we would ask you to cite the following:
citation("HLAClustRView")
The human leukocyte antigen (HLA) complex plays an important biological role in the regulation of the immune system. HLA alleles encode major histocompatibility complex (MHC) protein, which display peptide antigens for recognition by T cells. MHCs are essential for antiviral, antibacterial, and anti-tumor immunity [@Kumar2012]. Further, inheritance of specific HLA alleles are implicated in autoimmune disorders, such as inflammatory bowel disease, type 1 diabetes, rheumatoid arthritis, and systemic lupus erythematosus [@Gutierrez-Arcelus2016a]. Furthermore, HLA gene products play a critical role in the outcomes of human organ transplantation [@Choo2007].
The set of genes that form the HLA complex are highly polymorphic and the novel alleles are still discovered [@Abraham2018].High polymorphism of HLA alleles provides some immunologic advantages against infectious disease, it also presents challenges for organ transplantation. Successful tissue and organ transplantation requires that donors and recipients have compatible HLA alleles[@Kumar2012]. Because of their high polymorphic status, accurate typing of HLA genes with short-read sequencing data is a challenging task. Software specialized in HLA typing such as xHLA [@Xie2017] and HLAProfiler [@Buchkovich2017], had to be developped.
Since 1998, the IMGT/HLA Database [@Robinson2015] has provided curated information about polymorphism in the human genes of the immune system. The naming of HLA genes, allele sequences, and their quality control under the responsibility of the WHO Nomenclature Committee for Factors of the HLA System.
Having metrics that would capture the degree of affiliation between HLA alleles would facilitate association studies and clustering analysis. However, establishing those similarity metrics is challenging for a number of reasons. First, the number of HLA alleles is very large and second, the HLA nomenclature is complex. Only few similarity metrics based on HLA nomenclature are currently available. As an example, van Dorp and Kesmir have developed a Bayesian method that takes functional HLA similarities into account to find HLA associations with quantitative traits [@VanDorp2018].
The HLAClustRView package implements novel metrics that use HLA typing to calculate the degree of similarity between HLA molecules. Those metrics has been developed to ease the integration of HLA typing nomenclature in complex analysis. In addition, functionalities enabling cluster analysis and visualisation of associated RNA expression have been added to the HLAClustRView package.
To enable quantification of the similarity between two HLA typing, a similarity metric must be used. The HLAClustRView package implements two metrics.
In information theory, this Hamming distance is broadly applied to quantify similarity among data strings. The Hamming distance between two binary strings of equivalent length is usually calculated by summing the differing positions between the two strings. This Hamming distance has also applications in computational biology where is can be used to approximate pattern matching between sequences [@Ristov2016].
We used the first HLA typing field, which designates the allele type based on genetic similarity, to define a Hamming-like distance. The metrcic is defined as the sum of the minimal differing allele types for each HLA gene. As alleles are not phased, all combinaisons between alleles of the two samples are tested. The combinaison with the minimal difference is retained for the calculation of the metric.
{width=110%}
{width=55%}
where:
TODO
The r packageDescription("HLAClustRView")[["Package"]]
package is split
in 3 main sections: input, process and visualization. The Figure 1 shows the
workflow within each section.
The file containing the HLA typing for multiple samples needs to respect a specific format.
The general format is:
The specification for the columns are:
This is an example of what the file should look like:
demoData <- data.frame(Sample=c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "..."), A_1=c("A*24:02:01:01", "A*01:01:01:20", "A*01:01:03", "A*01:01:03", "..."), A_2=c("A*01:01:03", "A*01:01:10", "A*01:01:21", "A*01:01:10", "..."), DMA_1=c("DMA*01:01:01:01", "DMA*01:02", "DMA*01:03", "DMA*01:02", "..."), DMA_2=c("DMA*01:02", "DMA*01:01:01:01", "DMA*01:01:01:01", "DMA*01:01:01:04", "..."), "..."=c(rep("...", 5))) knitr::kable(demoData, caption = 'A example of an input file containing multiple typing.')
Some metrics needs the information provided by the HLA database. The protein information of the most recent HLA database versions (3.35.0 and 3.36.0) are available through the HLAClusteRView package.
data(hladb_protein_3.35.0) summary(hladb_protein_3.35.0)
The devtools package provides install_github() that enables installing packages from GitHub.
library(devtools) install_github("NCBI-Hackathons/HLAClustRView")
library(HLAClustRView)
The sample file "Samples_HLA_typing.txt" used in this analysis looks like this (only the first seven columns and first five rows are shown):
demoData <- read.table("./Samples_HLA_typing.txt", header=TRUE, sep="\t") kable(demoData[1:5, 1:5], caption = 'A table of the 5 first lines and columns of the Samples_HLA_typing.txt file.')
The file containing the HLA typing for multiple samples needs to be loaded. This is done by the readHLADataset() function. The output is an object of class HLADataset.
HLAdata <- readHLADataset(hlaFilePath = "./Samples_HLA_typing.txt") HLAdata
The Hamming-like distance, which is based on first HLA typing field, can be calculated through the calculateHamming() function. The output is an object of class HLAMetric.
hammingMetric <- calculateHamming(HLAdata) print(hammingMetric)
The draw_heatmap() function enable the creation of sample-to-sample heatmap.
draw_heatmap(hammingMetric)
The draw_heatmap() function enable personalization by acception parameters
that are pass to the internal Heatmap() function from
r Biocpkg("ComplexHeatmap")
package.
library(circlize) ## Create a col_fun = colorRamp2(c(0, 10, 20), c("white", "orange", "violet")) draw_heatmap(hammingMetric, col = col_fun, clustering_method_rows = "median", clustering_method_columns = "median")
The draw_dendrogram() function enable the creation of a cluster dendrogram graph from an \code{HLAMetric} object, as shown in Figure \@ref(fig:clusteringHamming01).
## Draw a basic dendrogram using a HLAMetric object draw_dendrogram(hlaMetric = hammingMetric)
The draw_dendrogram() function enable personalization by acception parameters
that are pass to the internal plot() function from
r Rpackage("graphics")
package. An example is shown in
Figure \@ref(fig:clusteringHamming02).
## Get a triangle dendrogram with type="t" draw_dendrogram(hlaMetric = hammingMetric, type="t", main="Dendrogram based on HLA typing Hamming-like distance", xlab="", sub= "")
The draw_dendrogram() function also offers phylogenetic trees display through
the use of the \code{as.phylo} parameter. The phylogenetic trees option is
provided by the R package r Rpackage("ape")
. An example is shown in
Figure \@ref(fig:clusteringHamming03).
## Get a circular display with type="fan" when as.phylo is set to TRUE draw_dendrogram(hlaMetric = hammingMetric, as.phylo=TRUE, type="fan", main="Phylogenetic trees display")
Here is the output of sessionInfo() on the system on which this document was compiled:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.