1 History and definitions

1.1 What is phylogenetics and when did it start?

Phylogenetics is the study of the evolutionary history and relationships among individuals or groups of organisms

Phylogenetics goes back to the origin of evolution

Darwin drew this tree

There are many, many applications of phylogenetics:

Applications

There are a lot of learning materials online. Here’s an intro course from EBI: https://www.ebi.ac.uk/training/online/course/introduction-phylogenetics

1.2 What are phylogenetic trees?

A phylogenetic tree is a tree structure (with nodes and edges) in which tips correspond to organisms and internal nodes correspond to inferred common ancestors

Root usually an inferred oldest node, or node separating your study organisms from an “outgroup”. There may not be a root.

load("demotree.Rdata") 
library(ape)
plot(tr, edge.width=2, edge.col="grey",tip.col=c("grey","grey","grey","blue","blue","blue",rep("grey", 6)))

If we are studying the blue organisms we might want to root the tree this way (but we don’t have to – it depends on the question):

tr=root(phy=tr, outgroup=c("Organism 4", "Organism 5", "Organism 6"))
plot(tr,edge.width=2, edge.col="grey",tip.col=c("grey","grey","grey","blue","blue","blue",rep("grey", 6)),main="The same tree, with a different root")

The horizontal axis – branch lengths – are in units of genetic distance, typically something like the number of substitions per site in molecular (sequence) data. Sometimes we create a timed phylogenetic tree. Its branch lengths are in unit of time.

The vertical axis – NOTHING!

In the above tree, Organism 11 is as closely related to Organism 8 as Organism 10 is to Organism 9.

1.3 Terminology

Sister groups: two groups descending from the same ancestor

Clade: an ancestor and all its descendants. This is also sometimes called a monophyletic clade.

Common ancestor (of a set of tips): a node that is an ancestor of all tips in the set

Most recent common ancestor (MRCA) of a set of tips: the common ancestor of the set of tips that is farthest from the root

1.4 Quiz time!

This tree shows groups of animals and also indicates inferred evolutionary events that happened on some of the branches of the tree:

Q1: Do the reptiles form a clade?
Q2: Do the mammals form a clade?
Q3: What is the closest relative of the fish?
Q4: According to the groupings on the tree, what feature separates mammals from other organisms?
Q5: What feature separates sponges from other organisms?
Q6: Are snakes closer to mollusks or to amphibians in this tree?

2 Lots of trees

So: is it practical to find the truly best tree among all the possible trees?

A: not really, but the vast majority of the trees are terrible matches to data, which constrains the set of trees we need to examine. However, there is usually some uncertainty.

3 Phylogenetic analysis

3.1 Stages

Phylogenetic analysis usually proceeds in stages like the ones in this cartoon.

We’ll start from step 6 in this lecture. The other steps are important, but 1-3 are specific to your research question. 4 and 5 are in other areas of bioinformatics.

3.2 What data do we need for phylogenetics?

Now that sequencing technologies are so affordable, molecular (sequence) data are usually the key data for phylogenetic tree reconstruction.

Why use molecular data?

DNA is inherited material

We can now easily, quickly, inexpensively and reliably sequence genetic material

Sequences are highly specific and rich in information

But we could use morphological or phenotype data

In fact we can also test how consistent trees from genetic and phenotype data are with each other

The characters are the entries of the aligned data. In molecular data, each character is a genetic site or locus. For example, suppose the multiple sequence alignment looks like this:

head demo.fas

>A130
atgaaaccct cgcgccctta tctttaccac gcgtactaca actggatttt agttaatgat
>A131
atgaaaccct tgcgccctta tcttttccac gcgtcctaca actggatttt agataatgat
>A132
atgaaaccct cgcgccctta tctttaccac gcgtactaca actcgatttt agataatgat
>A133
atgaaaccct tgcgccctta tctttgccac gcgtactaca actggatttt agataatgat

Here the organism names are “A130”, “A131”, “A132” and “A133”. There are 60 characters, written in 6 groups of 10. All organisms have an ‘a’ as the first character.

3.3 THE TASK (of phylogenetic reconstruction):

Use a multiple sequence alignment to make a phylogenetic tree that best captures the evolution of the sequences.

Some principles we might want to have in place:

Similar sequences should be grouped together

Simplicity (or: parsimony) – minimal number of changes needed to explain the data

Or - choose the tree for which the probability that the observed sequences evolved is the highest (maximum likelihood)

Or - describe a lot of possible trees, sampling them with the appropriate probability (Bayesian models)

These principles motivate the methods in this lecture.

3.4 Distance- and character-based tree building

Distance-based methods

Compute distances between pairs of sequences and find a tree (eg using clustering) that best describes these. example: neighbour joining

In fact the distances are still based on characters, but the characters aren’t considered one at a time - they are pooled together to create pairwise distances between all the sequences in the dataset.

Character-based methods Consider sequences of (DNA, RNA, morphological) characters, one at a time, and seek the tree with an optimal “score”: likelihood or parsimony; or use a Bayesian method to compute a posterior collection of phylogenetic trees.

Can we get the best tree?

Either way, we need to understand how to compare sequences to each other! We need to either compute a distance between all the pairs of sequences, or to figure out a likelihood or score for a tree.

To do these comparisons, we use models that describe how sequences evolve.

4 Models of sequence evolution

4.1 What is this and how does it describe distance?

How can we compute the distance between two sequences?

A common approach: the number of differences per site. For example, for the two sequences “AACCTATCGG” and “AATCTAACGG” differ in two places, so they would be 2/10 = 0.2 substitutions per site apart.

In other words, if the number of differences is \(m\) and the total number of sites is \(n\) then this basic distance is \(d_s=\frac{m}{n}\).

However: what if there were actually 2 changes at the same site? This simple computation would underestimate the true distance. What if some columns are “N” (the sequencing data was unclear about whether there was an A, C, T or G at the site)?

There are many models for sequence evolution. They define the probability that a character at a site will evolve into a different character at that site over a specified time period. They can take various complexities into account. Models of sequence evolution can be used to define distances between two sequences, and they can be used to simulate evolving sequences.

Yang and Rannala, Molecular phylogenetics: principles and practice; https://www.nature.com/articles/nrg3186

A, C, T and G: adenine, cytosine, guanine, thymine

A and G: 2-ring structures; purine

C and T: 1-ring structures; pyrimidine

Transition: a purine-purine change, or a pyrimidine-pyrimidine change (C-T or A-G)

Transversion: a pyrimidine-purine change (or vice versa)

The Jukes-Cantor model treats these as having the same probability

But it is reasonable to think that transitions might be more probable than transversions

Also, A, C, T and G might not be equally frequent.

4.2 Assumptions and parameters in several common models

The Jukes-Cantor model and other models take this into account. In the JC model, any possible change is equally likely. It is the simplest model of sequence evolution.

Conceptually, the link is this: the JC model (and higher-complexity variations) define the probability that one sequence evolves into another. The “distance” is related to the amount of time that this would take in the model.

The Jukes-Cantor model is the simplest, but they are all based on homogeneous continuous-time Markov chains (CTMC). These are beyond the scope of this course. The wikipedia page on these models is quite good as a quick reference: https://en.wikipedia.org/wiki/Models_of_DNA_evolution.

“Markov Processes for Stochastic Modeling” by Masaaki Kijima is a math reference.

If the substitution rate is \(\mu\) substitutions per unit time, then the mean number of substitutions in time \(t\) is \(\mu t\). The CTMC has a rate matrix \(Q\) whose off-diagonal entries are the instantaneous rates of transition A to C, A to T, C to G and so on, and these rates are all equal to \(\mu/4\). The transition matrix is \(P(t) = e^{Qt}\).

Consider the probability that a change from state \(i\) to state \(j\) is observed in time \(t\), if the substitution rate is \(\mu\). We have a probability \(e^{-\mu t}\) that no event (ie no substitution) happens in time \(t\). The rate of leaving each state is \(3\mu/4\) since 3 of the 4 changes (A to A, C, T, G) arrive at a different base.

This means that the expected number of changes that occur in time \(t\) is

\[v = (3/4) \mu t.\]

\(v\) is going to be our measure of genetic distance. But we don’t necessarily see every change, because more than one change can happen at the same site. We want to relate \(v\), the expected number of changes that occur, to the number of changes we observe.

From \(v = (3/4) \mu t\), we can rearrange to get \(\mu t = (4/3) v\).

Main idea: use the relationship between (1) this expected number of changes and (2) the fraction of sites where a change is actually observed to compute \(v\), as a measure of the evolutionary distance.

The probability that character \(i\) (where we start) ends up at character \(j\), at a single site, in time \(t\) (where \(i\) and \(j\) could be the same) is:

\[ P(i \rightarrow j) = P( i\rightarrow j | \text{ an event happened in time t}) P(\text{an event happened in time t}) \]

\[ P(i \rightarrow j) = \tfrac{1}{4} (1-e^{-\mu t}) = \tfrac{1}{4} - \tfrac{1}{4} e^{-\mu t}\]

Here, 1/4 represents that \(j\) is 1 of the 4 characters that \(i\) could become (in the Jukes-Cantor model they are all equally likely). The \((1-e^{-\mu t})\) is the probability that any number (greater than 0) of events happened, because it’s 1 - Prob(no event happened).

This accounts for the fact that we could have 1 or more events at the same site. We could have A –> C –> G –> A. That’s why we do this transformation instead of just using the fraction of sites with a change for our genetic distance.

The probability of any change at the site is 3 times this, because there are 3 choices of base that are different from the current one:

\[ P(\text{any observed change}) = \tfrac{3}{4} - \tfrac{3}{4} e^{-\mu t} = \tfrac{3}{4} - \tfrac{3}{4} e^{-4/3 v}\]

Now let \(d_{raw}\) \(P(\text{any observed change})\), estimated as:

\[d_{raw} = \text{the fraction of sites where a change is observed.}\]

Solve for \(v\):

\[ d_{raw} = \tfrac{3}{4} - \tfrac{3}{4} e^{-\mu t} = \tfrac{3}{4} - \tfrac{3}{4} e^{-4/3 v}\]

\[ 1- e^{-4/3v} = \tfrac{4}{3}d_{raw}\]

\[ e^{-4/3v} = 1 - \tfrac{4}{3}d_{raw}\]

\[ v = - \frac{3}{4} \ln(1-\frac{4}{3}d_{raw}) \]

The Jukes-Cantor model’s distance is \(v\) in the above equation:

\[ d_{JC} = -\frac{3}{4}\ln(1-\frac{4}{3}d_{raw}) \]

where \(d_{raw}\) is the number of differences / sequence length.

Note that \(\ln(x) \approx x-1\) when \(x\) is near 1. So when the distance is small, \(\ln(1-\tfrac{4}{3} d_{raw}) \approx -\tfrac{4}{3} d_{raw}\). This means that when the distance is small, \(d_{JC} \approx d_{raw}\), as you’d expect.

Also note that we have eliminated the time, \(t\), and mutation rate \(\mu\), wrapping them together, because they combine to shape what we observe: the number of polymorphisms. There is more on timed trees later in this lecture.

There is a pdf at this link that does a nice job of explaining how the Jukes-Cantor model is related to sequence distances: https://people.montefiore.uliege.be/kvansteen/GBIO0009-1/ac20132014/T4/jc.pdf

Quick guide to a few evolutionary models:

JC: ACTG all equally frequent and all changes equally likely

K80: (Kimura 1980) ACTG all equally frequent, but transitions are more likely than transversions

F81 (Felsenstein 1981): as with JC but ACTG are not all equally frequent

HKY85: (Hasegawa, Kishino and Yano 1985) Transitions and transversions are not equally likely, and ACTG are not all equally frequent

In practice, these models are all implemented within software packages that we use to build phylogenetic trees.

5 Building phylogenetic trees: the central task of phylogenetics

5.1 Neighbour joining

Distances are defined using sequence evolution models

Idea: Join closest tips, make new node, repeat!

Details: differing clustering algorithms: Single linkage, complete linkage, UPGMA

Software: bionj, nj in R

Reference: Saitou and Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Molecular Biology and Evolution, 1987. https://pubmed.ncbi.nlm.nih.gov/3447015/.

5.1.1 The neighbour joining algorithm:

Intuition: (1) Start with all the tips separate. (2) Create a new node that joins the two that are closest to each other. (3) Then define the distances between these two tips and the new node. (4) Then define the distances between the new node and all the other tips. (5) Forget about those two tips, but keep the new node. (6) Repeat until everything is joined up.

Suppose there are \(n\) tips and the distance between \(i\) and \(j\) is \(D_{ij}\).

To make our description more precise, we will need to know a couple of key things. (1) How do we choose which pair to join up next? (2) when we make a new node, how far is our new node away from the two tips it is joining? (3) how far away is our new node from all the other tips?

Two definitions before we start

\[Q_{ij} = (n-2) D_{ij} - \sum_{k=1}^n D_{ik} - \sum_{k=1}^n D_{jk}\]

If \(E\) is the new node joining say \(A\) and \(B\), let

\[ D_{EA} = \frac{1}{2}D_{AB} + \frac{1}{2(n-2)}\left(\sum_{k=1}^n D_{Ak} -\sum_{k=1}^n D_{Bk} \right)\]

The NJ algorithm

Begin with a collection of sequences, unlinked. Compute the distance matrix \(D_{ij}\).

While there remain nodes that are not joined up:

Calculate \(Q_{ij}\) for all pairs \((i,j)\), using the current distance matrix (starting from the distance matrix on all the tips)

Find the pair of taxa \((A,B)\) with the lowest \(Q\) value, \(Q_{A,B}\). If there’s more than one such pair, choose one at random. The \(Q\) criterion comes from wanting to choose \(A\) and \(B\) to minimize the total branch length of the tree in the next step. See Saitou and Nei 1987: https://doi.org/10.1093/oxfordjournals.molbev.a040454.

Create a new node, temporarily called E. This node will join A and B.

Use the formula for \(D_{EA}\) above to compute the distance between E and A. Do the same for \(D_{EB}\).

Calculate the distance between E and all the other nodes in the tree (which are not A or B). (This is easy given the distance matrix and the result of step 4).

Update the distance matrix: remove the columns and rows corresponding to the temporary A and B and add a row and column for node E. This way, the size of the distance matrix reduces by 1.

Repeat 1 - 6. Eventually there are only two nodes left to join.

NJ strengths

Can use realistic substitution model and distances

Computationally efficient

No need to compare lots of trees for an optimality condition

Great for (reasonably) large datasets and short distances. (Still need an \(n\) by \(n\) distance matrix, so not for HUGE datasets).

Tree-like distances: If the distances are perfect (and reflect the true tree) the tree is guaranteed to be correct.

Weaknesses:

If the distances aren’t perfect then the tree may well be wrong.

In some circumstances the branch lengths can be negative!

The method is sensitive to gaps in the sequences.

The model does not allow for different sites to have different “matches” to the tree - they all just contribute to the distance.

Distance measures may be inaccurate for large distances.

5.2 Maximum parsimony

Based on character/sequence evolution on tree

Find the tree that minimises the number of changes you need to describe each character

Here, the tree on the right requires 4 changes. The one on the left only needs 2. So the one on the left is more parsimonious.

Strengths: Simple to score and understand; Computationally efficient (because it’s simple)

Weaknesses: Lack of explicit assumptions (eg hypervariable sites? Sites with many changes?); No branch length or timing information; Not statistically consistent Software: PAUP, MEGA, TNT

5.3 Maximum likelihood

Likelihood here: the likelihood of the observed sequence data, given the parameters of the evolutionary model \(M\), and a tree \(T\). Denoted \(L(D | M, T)\).

These are computed using the same continuous time Markov chains described above: Jukes-Cantor and so on.

These CTMCs give the probability that (for example) an A evolved from a C on a branch of length \(t\), \(P_{a,c}(t)\).

For large samples: max likelihood estimates of the parameters in the evolutionary models (MLEs) are unbiased, consistent and efficient

5.3.1 Felsenstein’s pruning algorithm - a quick description by example

Since we don’t know the states at the internal nodes, we have to add up their probabilities to get the total: the states could have been A or C or T or G. The probability of “event 1 OR event 2”, if they are mutually exclusive, is probability(event 1) \(+\) probability(event 2).

The Felsenstein pruning algorithm makes it feasible to sum over the unknown states at the internal nodes.

For example, the likelihood on the little demo tree for site 1 is the likelihood that the root (node 1) is in state \(s_1\), then it evolves from there to state \(c\) over branch length \(v_c\), and s1 evolves to s2 in length \(v_{12}\). Also, node 2 has to evolve from state \(s2\) to state ‘a’ at site 1, on the branch between node 2 and taxon A. And so on.

Just as \(P_{ij}(t)\) was the probability that \(i \rightarrow j\) in time \(t\) in the JC model, here, define \(P_{s1, s2}(v)\) to be the probability that state \(s1\) transitions to state \(s2\) on a branch of length \(v\). Let \(\pi_{s1}\) be the probability of state \(s1\); in the Jukes Cantor model \(\pi_{s1}=1/4\) for all four states.

Since we don’t know the sequences at the internal nodes ‘node 1’ and ‘node 2’, they could be any of the 4 bases.

\[ L(\text{site 1}) = \sum_{s1\in \{a,c,t,g\}}\sum_{s2\in \{a,c,t,g\}} \pi_{s1} P_{s1, a}(v_c) P_{s1, s2}(v_{12}) P_{s2,a}(v_a) P_{s_2,a}(v_b) \]

The key observation in Felsenstein’s pruning algorithm is that we can group these sums together:

\[ L(\text{site 1}) = \sum_{s1\in \{a,c,t,g\}}\pi_{s1} P_{s1, a}(v_c)\sum_{s2\in \{a,c,t,g\}} P_{s1, s2}(v_{12}) P_{s2,a}(v_a) P_{s_2,a}(v_b) \]

We can do this for every single site (in this example, there are 4 sites) to get the tree likelihood:

\[ L(\text{data}|\text{tree}) = \prod_j L(\text{site j}) \]

Statistically, the topology is a model and the rates of substitution and so on are parameters. One of the huge triumphs of maximum likelihood models is the ability to estimate these parameters by computing the above likelihood and maximizing it. This requires sampling over the space of possible trees, which is BIG.

Central assumptions used in defining the likelihood:

Independent evolution at different sites

Independent evolution in different branches

\[L = \prod_{i \in sites} L_i(D_i |M, T)\]

where \(D_i\) is the data for the characters at site \(i\).

Real datasets may break both assumptions! So be careful

See Felsenstein 1981, Felsenstein book: Inferring phylogenies

Software: RAxML, PhyML, R (phangorn), IQtree, FastTree

Strengths: rich repertoire of sophisticated models. Can estimate parameters of underlying models. This means we can estimate the transition/transversion ratio, etc.

Remember in the demo tree about organisms? ML methods allow us to estimate what the sequence (or phenotype, if we use phenotype data) was like at internal points in the tree.

Weaknesses: Computational cost. Cannot capture uncertainty. The space of trees is huge. What if there are many trees that match the data quite well?

5.4 Bayesian methods

Bayes’ theorem: \(P(M, T|D) = \frac{P(D|M, T) P(M) P(T)}{P(D)}\) where M: model, D: data, T: tree.

\(P(D|M, T)\) is the same as the likelihood (in the ML)

\(P(M), P(T)\) : prior information about the model and the tree. Some people would see the tree as part of the model.

Use MCMC to generate a sample from the posterior

Software: Beast, MrBayes, RevBayes. In our field people mainly use BEAST and BEAST2.

Strengths:

Clear re: uncertainty. The posterior collection of trees includes trees supported by the data, with tree topologies more often included if they are better supported.

There are many flexible models of evolution (as in ML)

Parameter estimation (as in ML).

Direct construction of timed phylogenetic trees. (No reason this couldn’t be done in the ML world, but it isn’t)

Bayesian approaches take prior knowledge into account. This means that there can be priors for the tree. That, with time, means that there is a link to many many models that connect with epidemiological ideas: birth-death, multi-type birth-death, coalescent skyline (population size over time), and more

See module on “phylodynamics”: many ways to run BEAST and estimate parameters for simple time dynamics.

Weaknesses

Posterior probabilities can appear too high: why? Sensitive to model violations and wrong priors.

Prior knowledge hard to obtain; people use defaults in software but innocent-looking priors can really shape the posterior

Computationally slow: intractable past a few hundred sequences

6 Tree quality

Three concepts:

Consistency: approaches true value if the amount of data grows. Parsimony may not be consistent. MLEs of parameters usually are, at least as the number of sites grows.

Efficiency: unbiased estimate with lowest variance. Usually people look at the probability of recovering the correct tree as the number of sites increases.

Robustness: correct answer even if assumptions are violated?

You will probably also consider speed! Broadly: ML beats parsimony for molecular data and is more interpretable, more connected to biological processes. So use of parsimony has reduced.

7 Bootstrapping

This is typically used in maximum likelihood tree reconstruction to explore uncertainty. In Bayesian reconstruction, that’s handled by sampling from the posterior, so bootstrapping is not needed.

In bootstrapping, we resample columns of the data (so some are left out, others repeated)

Then repeat the tree reconstruction

Perform this many times

Bootstrap support of an edge: how many times does its clade occur among the bootstrap trees?

Bootstrap numbers don’t have a clear statistical interpretation. But bootstrapping is important to understand tree uncertainty.

Not so good if sites aren’t independent. Why? –because you get similar data on different bootstraps, more than you would if the sites were actually independent. So nodes can have “high bootstrap values” (ie a split in a tree occurs often) not because the tree is a particularly great model but because the bootstrap resampling isn’t doing what it would be doing if the independence condition were met.

8 Timed trees

In our maximum likelihood section, we used the continuous time Markov process to define the likelihood that sequences evolved on a tree, given the parameters of the evolutionary model.

By default, tree reconstruction methods create trees whose branch lengths are in units of substitutions per site (i.e. genetic distance), not calendar time.

But for transmission reconstruction, we need to relate our genomic data to calendar time, not just genetic distance, because our epidemiological data is in calendar time: when individuals became symptomatic, when tests were done, when we think individuals were infected, when they were exposed.

Some of our transmission reconstruction methods use phylogenetic trees whose branch lengths are in units of calendar time.

So, we need to be able to construct these timed phylogenetic trees.

Two approaches:

make a maximum-likelihood tree in units of genetic distance. Then, keeping the tree topology the same, modify the branch lengths, so that (a) The distance from the root to each tip is consistent with that tip’s observed date; (b) The likelihood of the branch’s rates and node times, given the observed numbers of polymorphisms on the branch, is maximized. The observed changes on the branches also have to be inferred (standard methods, outside the scope here).

Use Bayesian methods, simultaneously constructing the tree and its timing (BEAST, for example).

Here, I’ll briefly describe (1), with reference to the ‘treedater’ R package and the corresponding paper: Volz and Frost, Scalable relaxed clock phylogenetic dating, Virus Evolution, 2017. https://academic.oup.com/ve/article/3/2/vex025/4100592

What’s a molecular clock, and what’s a relaxed (molecular) clock?

Molecular clock: a model that describes how mutations accumulate over time. In a “strict molecular clock”, mutations accumulate at a consistent, or “clock-like,” rate over time. In other words, the mutation rate is constant (mostly. could vary over branches or sites).

Nice resource (and source for this image) : https://help.czgenepi.org/hc/en-us/articles/6238483054868-Module-4-Estimating-evolutionary-rates-molecular-clocks-from-sequence-data

Over time, genetic changes accumulate that distinguish viruses (and groups of viruses) from their ancestors and each other.

The rate of genetic change can vary: natural selection, population size, generation time, and environmental pressures impact it.

“Relaxed” molecular clock: a model that allows for different rates of molecular evolution in different lineages (here: branches in the phylogeny).

Summary of the treedater method:

Ingredients

Likelihood for the branch-specific rates of evolution \(\lambda_i\) (in subst per site) given the node times \(t_i\) and parameters \(r, \phi\) for the relevant Gamma distribution.

Likelihood for \((r, \phi)\) given the node times

Least squares to minimize: try to get the branch lengths \(b_i\) (in subst per site, in the original max likelihood tree) to be close to what you’d expect according to the clock: minimize sum of \(\frac{1}{\sigma_i} (b_i - \omega_i \tau_i)^2\), where \(\sigma_i\) is a variance term, \(\omega_i\) are the rates of evolution in subst/site/time and \(\tau_i\) are the branch lengths in units of time - in the new proposed timed tree.

With these things, treedater is an algorithm to simultaneously optimize the \(w_i\), \(r, \phi\) and node times (iteratively). The algorithm (below) is only visible in the pdf version of the paper, not the online version (!). Also note the typo in Eq (2) (some of the \(s_i\) should be \(\lambda_i\)).

(Taken from the PDF version of the treedater paper at https://academic.oup.com/ve/article/3/2/vex025/4100592)

If not all nodes have known times, treedater can estimate the missing times. This is useful if some of the dates have errors or are missing.

There is more! But in summary: treedater takes in a phylogeny with branches in units of substitutions per site, together with dates for the tips (perhaps not all). It returns a timed phylogenetic tree.

You’ll be able to play with phylogenies and make timed phylogenies with treedater in the exercise.

9 Resources

Yang Z, Rannala B. Molecular phylogenetics: principles and practice. Nat Rev Genet. 2012;13: 303–314. doi:10.1038/nrg3186

Felsenstein book: Inferring Phylogenies https://www.amazon.co.uk/Inferring-Phylogenies-Joseph-Felsenstein/dp/0878931775

EBI course: http://www.ebi.ac.uk/training/online/course/introduction-phylogenetics

Neighbour joining revealed (Gascuel and Steel 2006) https://academic.oup.com/mbe/article/23/11/1997/1322446

Many other online resources are freely available

Introduction to Phylogenetics

Caroline Colijn