# set global chunk options: images will be bigger knitr::opts_chunk$set(fig.width=6, fig.height=4) options(digits = 2)
In this vignette, we will show how to work with morphological data in phangorn [@Schliep2011]. In most cases the different morphological characters or character states are encoded with the numbers 0:9 (or less, if there are less differences). Morphological data can come in different formats. The most common ones are .csv and .nexus.
We start by loading the phangorn package and setting a random seed:
library(phangorn) set.seed(9)
The dataset we're using contains morphological data for 12 mite species, with 79 encoded characters [@schaffer2010phylogenetic].
When reading in the .csv file, row.names = 1
uses the first column (species) as row names. To get a phyDat
object, we have to convert the dataframe into a matrix with as.matrix
.
``` {r load data}
fdir <- system.file("extdata", package = "phangorn")
mm <- read.csv(file.path(fdir, "mites.csv"), row.names = 1)
mm_pd <- phyDat(as.matrix(mm), type = "USER", levels = 0:7)
The data can then be written into a _nexus_ file: ```r write.phyDat(mm_pd, file.path(fdir, "mites.nex"), format = "nexus")
Reading in a nexus file is even easier than reading in a csv file:
mm_pd <- read.phyDat(file.path(fdir, "mites.nex"), format = "nexus", type = "STANDARD")
After reading in the nexus file, we have the states 0:9, but the data only has the states 0:7. Here is one possibility to change the contrast matrix: ``` {r contrast_matrix} contrast <- matrix(data = c(1,0,0,0,0,0,0,0,0, 0,1,0,0,0,0,0,0,0, 0,0,1,0,0,0,0,0,0, 0,0,0,1,0,0,0,0,0, 0,0,0,0,1,0,0,0,0, 0,0,0,0,0,1,0,0,0, 0,0,0,0,0,0,1,0,0, 0,0,0,0,0,0,0,1,0, 0,0,0,0,0,0,0,0,1, 1,1,1,1,1,1,1,1,1), ncol = 9, byrow = TRUE) dimnames(contrast) <- list(c(0:7,"-","?"), c(0:7, "-")) contrast mm_pd <- phyDat(mm_pd, type="USER", contrast=contrast)
Now that we have our data, we can start the analyses. # Parsimony For morphological data, one of the most frequently used approaches to conduct phylogenetic trees is maximum parsimony (MP). `pratchet` (as already described in _Estimating phylogenetic trees with phangorn_) implements the parsimony ratchet [@Nixon1999]. To create a starting tree, we can use the function `random.addition`: ```r mm_start <- random.addition(mm_pd)
This tree can then be given to pratchet
:
mm_tree <- pratchet(mm_pd, start = mm_start, minit = 1000, maxit = 10000, all = TRUE, trace = 0) mm_tree
With all=TRUE
we get all (in this case 19) trees with lowest parsimony score in a multiPhylo
object. Since we we did a minimum of 1000 iterations, we already have some edge support. Now we can assign the edge lengths.
mm_tree <- acctran(mm_tree, mm_pd)
In the case of our mites-dataset with 12 sequences, it's also possible to use the branch and bound algorithm [@Hendy1982] to find all most parsimonious trees. With bigger datasets it is definitely recommended to use pratchet
.
``` {r bab} mm_bab <- bab(mm_pd, trace = 0) mm_bab
## Root trees If we want our unrooted trees to be rooted, we have the possibility to use `midpoint` to perform midpoint rooting. Rooting the trees with a specific species (we chose _C. cymba_ here) can be done with the function `root` from the _ape_ package [@Paradis2018]. To save the correct node labels (edge support), it's important to set `edgelabel=TRUE`. ``` {r root trees, message=FALSE} mm_tree_rooted <- root(mm_tree, outgroup = "C._cymba", resolve.root = TRUE, edgelabel = TRUE)
With plotBS
we plot a tree with their respective edge support. It is also possible to save the plots as .pdf (or various other formats, e.g. svg, png, tiff) file. digits
is an argument to determine the number of digits shown for the bootstrap values.
# subsetting for tree nr. 9 plotBS(mm_tree_rooted[[9]], digits = 2) # save plot as pdf pdf(file = "mm_rooted.pdf") plotBS(mm_tree_rooted[[9]], digits = 2) dev.off()
To look at the consensus tree of our 19 trees from pratchet
, or of our 37 most parsimonious trees from bab
, we can use the consensus
function from ape.
# unrooted pratchet tree mm_cons <- consensus(mm_tree) # rooted pratchet tree mm_cons_root <- consensus(mm_tree_rooted, rooted = TRUE) # branch and bound, we root the consensus tree in the same step mm_bab_cons <- root(consensus(mm_bab), outgroup = "C._cymba", resolve.root = TRUE, edgelabel = TRUE)
plot(mm_cons, main="Unrooted pratchet consensus tree") plot(mm_cons_root, main="Rooted pratchet consensus tree") plot(mm_bab_cons, main="Rooted bab consensus tree")
We can clearly see that, as expected, the two rooted trees have the same topology.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.