plotVariants: Plots variant positions
In R453Plus1Toolbox: A package for importing and analyzing data from Roche's Genome Sequencer System

Description Usage Arguments Details Value Note Author(s) See Also Examples

This function illustrates the positions and types of mutations within a given gene and transcript. The plot shows only coding regions (thus, units are amino acids / codons). The coding region is further divided into exons labeled with their rank in the transcript. Transcript annotation is obtained from the ensembl GRCh37 server.

1	plotVariants(data, gene, transcript, regions, mutationInfo, groupBy, horiz=FALSE, cex=1, title="", legend=TRUE)

`data`	This can be either a data.frame or an instance of class `annotatedVariants` you get by calling `annotateVariants`. A data frame requires the columns "label", "pos" "mutation" and "color" specifying an annotation for each mutation, its position (a single numeric value), the mutation type and (optionally) the color of the mutation (e.g. depicting a certain group identity independent from the mutation type). The position needs to be given as amino acids / codons.
`gene`	A string containing the Ensembl id of the gene of interest (required).
`transcript`	A string containing an Ensembl transcript id (optional, but recommended). If no Ensembl transcript-id is passed, it is chosen automatically and the function will return an appropriate data frame with all annotated transcripts for the given gene.
`regions`	A data frame having columns "name", "start", "end" and "color". The plot will highlight these regions with the given colors and print their name in the legend.
`mutationInfo`	A data frame with annotations for the different mutations occurring in the data (optional but recommended). It requires the columns "mutation", "legend" and "color". The first column must list the exact mutation names that occur in the data column "mutation". The column "legend" allows for a more detailed name of the mutation that will appear in the legend. The color of each mutation type is optional and can also be assigned automatically.
`groupBy`	By default, the mutations will be grouped by their position (i.e. the column "pos"). If necessary, one might give the name of an other column in the `data` here; for example the mutation labels. Please see the details below for more information.
`horiz`	In more comprehensive datasets, more than one mutation may be listed for a single position. If `horiz=FALSE`, these overlapping mutations will be aligned vertically. If `horiz=TRUE` they will be aligned horizontally in groups allowing a label for every single mutation.
`cex`	A numeric value `> 0` giving the factor by which the labels are magnified relative to the default text size. If the width of the device is too small for all mutation labels, their size will be scaled automatically until it fits.
`title`	A title for the plot (optional).
`legend`	A logical value (TRUE/FALSE) whether to plot a legend for the mutations types (see `mutationInfo`) and the highlighted regions (if any).

The plot will show the coding part of the gene and its exons as x-axis at the bottom. Mutations will be marked and ordered above according to their genomic position. The axis units are amino acids / codons (hence, all given genomic positions should be divided by three if necessary).
Passing a data frame to this function allows a much more individual and detailed annotation of the mutations like labels, colors and user defined mutation types.
Passing an instance of class annotatedVariants is useful for integration into the R453Plus1Toolbox pipeline and for compatibility to older versions of the plot. Then, it will only distinguish deletions and missense, nonsense and silent mutations.
By default, the plot will group mutations (horizontally or vertically) by their position. It is possible to group by an other column in the data (see parameter groupBy), but in the current version this makes only sense if the mutations in one group are locally clustered, i.e. have the same or a similar position. The parameter groupBy is mainly useful to modify or even disable the automatic grouping of different mutations at the same position.

The function will return a data frame containing all Ensembl transcript information for the given gene ("ensembl_transcript_id", "rank", "cds_start", "cds_end" and "cds_length"). This data frame may prove useful for retrieving a Ensembl transcript id for future plots.

Depending on the amount of mutations and the size of the gene, the plot may not fit into the device or the text may become too small. It is recommended to carefully select the right size of your device before starting this function to ensure a well scaled and beautiful plot.
This function requires the package TeachingDemos to work, which can be found at CRAN.

Christoph Bartenhagen

annotateVariants.

# EXAMPLE 1: Working with intances of class annotatedVariants

# one missense, one nonsense point mutation and one deletion
variants = data.frame(
    row.names=c("missense", "deletion", "nonsense"), 
    start=c(106157528, 106157635, 106193892),
    end=c(106157528, 106157635, 106193892),
    chromosome=c("4", "4", "4"),
    strand=c("+", "+", "+"),
    seqRef=c("A", "G", "C"),
    seqMut=c("G", "-", "T"),
    seqSur=c("TACAGAA", "TAAGCAG", "CGGCGAA"),
    stringsAsFactors=FALSE)

# annotate variants with affected genes, exons and codons (may take a minute to finish)
## Not run: varAnnot = annotateVariants(variants)

# plot variants for gene TET2 having the Ensembl id "ENSG00000168769"
# when passing no transcript, the largest transcript annotated in the Ensembl database for this gene will be selected automatically
## Not run: plotVariants(data=varAnnot, gene="ENSG00000168769", title="plotVariants Example", legend=TRUE)

# EXAMPLE 2: Working with a data frame

# two missense at one position, one nonsense point mutation and one deletion
# it is possible to assign a color to every single mutation independently from its type
variants = data.frame(
    label=c("A>G","A>G(2)","delG","C>T"),
    pos=c(831,831,867,1437),
    mutation=c("M","M","D","N"),
    color=c("black", "black", "green", "red"),
    stringsAsFactors=FALSE
)

# more detailed names for mutation abbreviations can be passed as mutationInfo
# this is useful for the legend, but can also be generated automatically
mutationInfo = data.frame(
    mutation=c("M","D","S","N"),
    legend=c("Missense","Nonsense","Silent","Deletion"),
    stringsAsFactors=FALSE
)

# regions of interest can be highlighted using the regions parameter
regions = data.frame(
    name = c("region1", "region2"),
    start = c(700, 1400),
    end = c(1000, 1900),
    color = c("red", "blue")
)

# using the horiz parameter, multiple mutations occurring at the same place can be either aligned ...
# ... vertically
## Not run: plotVariants(data=variants, gene="ENSG00000168769", transcript="ENST00000513237", regions=regions, mutationInfo=mutationInfo, horiz=FALSE, title="'plotVariants' Example", legend=TRUE)

# ... or horizontally in groups
## Not run: plotVariants(data=variants, gene="ENSG00000168769", transcript="ENST00000513237", regions=regions, mutationInfo=mutationInfo, horiz=TRUE, title="'plotVariants' Example", legend=TRUE)

# group mutations by their label and not by their position (which is the default)
## Not run: plotVariants(data=variants, gene="ENSG00000168769", transcript="ENST00000513237", regions=regions, mutationInfo=mutationInfo, groupBy="label", horiz=TRUE, title="'plotVariants' Example", legend=TRUE)