A toy GWAS dataset is made available along with the package. Let's look at the dimensions, head and tail of the dataset.
library(ggman) dim(toy.gwas) head(toy.gwas) tail(toy.gwas)
To create a Manhattan plot, only the first 4 columns (chrom,snp,bp,pvalue) are required. Specific preformatting of the column classes is not required. The chromosome identifiers can be either numbers (1,2,3..) or strings("Chr1","Chr2"..).
ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue")
By enabling the relative positioning, the base pair positions will be scaled in proportion to the real genome positions. Hence, the gaps with no SNPs can be visualized. Be default this is not enabled. To use the relative positions, use the option relative.positions = TRUE
ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE)
Specific set of points in the plot can be annotated by providing a data.frame with only the SNPs those need to be labelled. Let's take a subset of the main data frame toy.gwas
.
#subset only the SNPs with -log10(pvalue) > 8 toy.gwas.sig <- toy.gwas[-log10(toy.gwas$pvalue)>8,] # dimensions dim(toy.gwas.sig) #head head(toy.gwas.sig)
The main layer of Manhattan plot should be saved in a variable and provided subsequently to ggmanLabel
function.
The name of the columns with snps and labels has to be supplied. In this case, we will label with SNP identifiers.
## save the main layer in a variable p1 <- ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE) ##add label ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp")
Annotations can be just text instead of labels. Use the type=
argument.
#add text ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", type = "text")
The R package ggrepel
is used for annotations. All the arguments that are applicable to geom_text_repel
and geom_label_repel
can be passed on to ggmanLabel
. Lets change the size and colour of the labels.
ggmanLabel(p1, labelDfm = toy.gwas.sig, snp = "snp", label = "snp", colour = "black", size = 2)
Caution: providing the whole main data frame as labelDfm
will fill the entire plot with text or might crash the R if the data frame is too big
The function ggmanHighlight
can be used to highlight a single group of points. Be default, while highlighting specific points, the main layer of Manhattan plot is greyed out. We need to supply a vector object with SNP names to highlight. The example file toy.highlights
comes along with package.
class(toy.highlights) length(toy.highlights) head(toy.highlights) ggmanHighlight(p1, highlight = toy.highlights)
The function ggmanHighlightGroup
can be used to highlight multiple groups of points and a legend can be added. Let's look at the example file toy.highlights.group
.
class(toy.highlights.group) dim(toy.highlights.group) head(toy.highlights.group)
Unlike ggmanHighllight
, the function ggmanHighlightGroup
requires data.frame as an input. One of the column names should be supplied as a grouping variable. The size of the highlighted points can be changed with size
argument. The legend title can be specified with legend.title
argument.
ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", size = 0.5, legend.title = "Significant groups")
It is also possible to remove the legend using legend.remove
argument.
ggmanHighlightGroup(p1, highlightDfm = toy.highlights.group, snp = "snp", group = "group", size = 0.5, legend.remove = TRUE)
In a typical genome wide association study, it is a standard practice to display SNPs in linkage disequilibrium with the index SNP as clumps. The plink software has clumping procedure, which outputs clump file with .clumped
extension.
Adding clumps to Manhattan plot involves four steps.
--clump
functionplink.clumped
then gwas.clump <- read.table("plink.clumped", header = TRUE)
Here, the example file toy.clumped
is a data.frame, which is created by reading the plink.clumped file and subsetting only the columns 'SNP' and 'SP2'.
toy.clumped
data.frame to a ggclumps
object using the ggmanClumps
function. The arguments index.snp.column
and clumps.column
are mandatory. The name of the column containing index SNPs ('SNP') should be passed to argument index.snp.column
and the name of the column containing the clumps should be passed to argument clumps.column
. toy.clumps <- ggmanClumps(toy.clumped, index.snp.column = "SNP", clumps.column = "SP2")
clumps=
argument of ggman
function.ggman(toy.gwas,clumps = toy.clumps, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE, pointSize = 0.5)
The clumps can be grouped using a grouping variable and the index SNPs can be labelled with user prefered labels. All you need to do is to add additional columns in the plink.clumped file and specify them in the ggmanClumps
function. Here in the example toy.clumped
file, there are 2 extra columns with names 'group' and 'label'.
toy.clumps <- ggmanClumps(toy.clumped, index.snp.column = "SNP", clumps.column = "SP2", group.column = "group", label.column = "label") ggman(toy.gwas,clumps = toy.clumps, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE, pointSize = 0.5)
Use legend.title
to change the legend title. If you prefer plain text without box for labels, use clumps.label.type = 'text
ggman(toy.gwas,clumps = toy.clumps, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", relative.positions = TRUE, pointSize = 0.5, legend.title = "Groups", clumps.label.type = 'text')
The function ggmanZoom
can be used to create regional association plot. Plotting a single chromosome is very simple.
ggmanZoom(p1, chromosome = 1)
To plot a specific region, the starting and the ending basepair positions have to be specified. Let's zoom in to the chromosome 1 region containing genes: GENE21, GENE22 and GENE23.
ggmanZoom(p1, chromosome = 1, start.position = 215388741, end.position = 238580695)
It's also possible to highlight using specific grouping variable. Here we have a column named gene
in the main data frame toy.gwas
that was used to construct the main layer p1
.
ggmanZoom(p1, chromosome = 1, start.position = 215388741, end.position = 238580695, highlight.group = "gene", legend.title = "Genes")
An inverted Manhattan plot can be created by inverting the direction of p values of variants with negative beta values (or odds ratio < 1). Set the argument invert
to TRUE
to get an inverted Manhattan plot. If invert=TRUE
, then invert.method
and invert.var
should be specified. The invert.method
can be either or
or beta
. The invert.var
is the name of the column containing the beta or odds ratio according to the value passed to invert.method
ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "pvalue", invert = TRUE, invert.method = 'or', invert.var = "or")
ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "or", logTransform = FALSE, ymax = 3)
ggman(toy.gwas, snp = "snp", bp = "bp", chrom = "chrom", pvalue = "beta", logTransform = FALSE, ymin = -2, ymax = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.