Phylogenetic Inference Using RevBayes

Sebastian Höhna1, Michael J. Landis2, Tracy A. Heath3

1 Department of Statistics, University of California, Berkeley, 2 Department of Ecology & Evolutionary Biology, Yale University, New Haven, Connecticut, 3 Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, Iowa
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 6.16
DOI:  10.1002/cpbi.22
Online Posting Date:  May, 2017
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Bayesian phylogenetic inference aims to estimate the evolutionary relationships among different lineages (species, populations, gene families, viral strains, etc.) in a model‐based statistical framework that uses the likelihood function for parameter estimates. In recent years, evolutionary models for Bayesian analysis have grown in number and complexity. RevBayes uses a probabilistic‐graphical model framework and an interactive scripting language for model specification to accommodate and exploit model diversity and complexity within a single software package. In this unit we describe how to specify standard phylogenetic models and perform Bayesian phylogenetic analyses in RevBayes. The protocols focus on the basic analysis of inferring a phylogeny from single and multiple loci, describe a hypothesis‐testing approach, and point to advanced topics. Thus, this unit is a starting point to illustrate the power and potential of Bayesian inference under complex phylogenetic models in RevBayes. © 2017 by John Wiley & Sons, Inc.

Keywords: Bayesian phylogenetics; Markov chain Monte Carlo; posterior probabilities; probabilistic graphical models; substitution model

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Estimating Phylogeny (Topology and Branch Lengths)
  • Basic Protocol 2: Partitioned Data Analysis
  • Basic Protocol 3: Model Comparison Using Bayes Factors
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


Basic Protocol 1: Estimating Phylogeny (Topology and Branch Lengths)

  Necessary Resources
  • All of the necessary resources for this tutorial are described above in protocol 1. All data files and analysis scripts are available for download from the RevBayes Web site html.
PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  R Core Team. (2013). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at http://www.R‐
  Altekar, G., Dwarkadas, S., Huelsenbeck, J. P., & Ronquist, F. (2004). Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics, 20, 407–415. doi: 10.1093/bioinformatics/btg427.
  Baele, G., Li, W., Drummond, A., Suchard, M., & Lemey, P. (2013). Accurate model selection of relaxed molecular clocks in Bayesian phylogenetics. Molecular Biology and Evolution, 30, 239–243. doi: 10.1093/molbev/mss243.
  Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M., & Alekseyenko, A. (2012). Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular Biology and Evolution, 29, 2157–2167. doi: 10.1093/molbev/mss084.
  Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.‐H., Xie, D., Suchard, M. A., Rambaut, A., & Drummond, A. J. (2014). BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 10, e1003537. doi: 10.1371/journal.pcbi.1003537.
  Brandley, M. C., Schmitz, A., & Reeder, T. W. (2005). Partitioned bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Systematic Biology, 54, 373–390. doi: 10.1080/10635150590946808.
  Brown, J. M., & Lemmon, A. R. (2007). The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Systematic Biology, 56, 643–655. doi: 10.1080/10635150701546249.
  Brown, J. M., Hedtke, S. M., Lemmon, A. R., & Lemmon, E. M. (2010). When trees grow too long: Investigating the causes of highly inaccurate Bayesian branch‐length estimates. Systematic Biology, 59, 145–161. doi: 10.1093/sysbio/syp081.
  Bull, J., Huelsenbeck, J. P., Cunningham, C. W., Swofford, D. L., & Waddell, P. J. (1993). Partitioning and combining data in phylogenetic analysis. Systematic Biology, 42, 384–397. doi: 10.1093/sysbio/42.3.384.
  Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (in press). Stan: A probabilistic programming language. Journal of Statistical Software.
  Chatterjee, H. J., Ho, S. Y., Barnes, I., & Groves, C. (2009). Estimating the phylogeny and divergence times of primates using a supermatrix approach. BMC Evolutionary Biology, 9, 1. doi: 10.1186/1471‐2148‐9‐259.
  Desper, R. & Gascuel, O. (2006). Getting a tree fast: Neighbor joining, FastME, and distance‐based methods. Current Protocols in Bioinformatics, 15, 6.3:6.3.1–6.3.28. doi: 10.1002/0471250953.bi0603s15.
  Drummond, A., & Rambaut, A. (2007). BEAST: Bayesian evolutionary analysis sampling trees. BMC Evolutionary Biology, 7, 214. doi: 10.1186/1471‐2148‐7‐214.
  Fan, Y., Wu, R., Chen, M.‐H., Kuo, L., & Lewis, P. O. (2011). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution, 28, 523–532. doi: 10.1093/molbev/msq224.
  Felsenstein, J. (1978). The number of evolutionary trees. Systematic Zoology, 27, 27–33. doi: 10.2307/2412810.
  Felsenstein, J. (1981). Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376. doi: 10.1007/BF01734359.
  Haario, H., Saksman, E., & Tamminen, J. (1999). Adaptive proposal distribution for random walk Metropolis algorithm. Computational Statistics, 14, 375–396. doi: 10.1007/s001800050022.
  Hartig, G., Churakov, G., Warren, W. C., Brosius, J., Makał owski, W., & Schmitz, J. (2013). Retrophylogenomics place tarsiers on the evolutionary branch of anthropoids. Scientific Reports, 3, 1756. doi: 10.1038/srep01756.
  Hasegawa, M., Kishino, H., & Yano, T. (1985). Dating of the human‐ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution, 22, 160–174. doi: 10.1007/BF02101694.
  Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109. doi: 10.1093/biomet/57.1.97.
  Heath, T. A., Huelsenbeck, J. P., & Stadler, T. (2014). The fossilized birth‐death process for coherent calibration of divergence‐time estimates. Proceedings of the National Academy of Sciences, 111, E2957‐E2966. doi: 10.1073/pnas.1319091111.
  Heled, J. & Bouckaert, R. R. (2013). Looking for trees in the forest: Summary tree from posterior samples. BMC Evolutionary Biology, 13, 1. doi: 10.1186/1471‐2148‐13‐1.
  Höhna, S., & Drummond, A. J. (2012). Guided tree topology proposals for Bayesian phylogenetic inference. Systematic Biology, 61, 1–11. doi: 10.1093/sysbio/syr074.
  Höhna, S., Defoin‐Platel, M., & Drummond, A. (2008). Clock‐constrained tree proposal operators in Bayesian phylogenetic inference. Pages 1‐7 in 8th IEEE International Conference on BioInformatics and BioEngineering Athens, Greece. doi: 10.1109/BIBE.2008.4696663.
  Höhna, S., Heath, T. A., Boussau, B., Landis, M. J., Ronquist, F., and Huelsenbeck, J. P. (2014). Probabilistic graphical model representation in phylogenetics. Systematic Biology, 63, 753–771. doi: 10.1093/sysbio/syu039.
  Höhna, S., Landis, M. J., Heath, T. A., Boussau, B., Lartillot, N., Moore, B. R., Huelsenbeck, J. P., & Ronquist, F. (2016). RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model‐specification language. Systematic Biology, 65, 726–736. doi: 10.1093/sysbio/syw021.
  Holder, M. & Lewis, P. (2003). Phylogeny estimation: Traditional and Bayesian approaches. Nature Reviews Genetics, 4, 275. doi: 10.1038/nrg1044.
  Holder, M. T., Sukumaran, J., and Lewis, P. O. (2008). A justification for reporting the majority‐rule consensus tree in Bayesian phylogenetics. Systematic Biology, 57, 814–821. doi: 10.1080/10635150802422308.
  Huelsenbeck, J., Ronquist, F., Nielsen, R., & Bollback, J. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294, 2310–2314. doi: 10.1126/science.1065889.
  Huelsenbeck, J., Larget, B., Miller, R., & Ronquist, F. (2002). Potential applications and pitfalls of Bayesian inference of phylogeny. Systematic Biology, 51, 673–688. doi: 10.1080/10635150290102366.
  Jeffreys, H. (1961). The Theory of Probability. Oxford: Oxford University Press.
  Jukes, T., & Cantor, C. (1969). Evolution of protein molecules. Mammalian Protein Metabolism, 3, 21–132. doi: 10.1016/B978‐1‐4832‐3211‐9.50009‐7.
  Kass, R., & Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773‐795. doi: 10.1080/01621459.1995.10476572.
  Kendall, D.G. (1948). On the generalized “birth‐and‐death” process. The Annals of Mathematical Statistics, 19, 1–15. doi: 10.1214/aoms/1177730285.
  Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16, 111–120. doi: 10.1007/BF01731581.
  Kimura, M. (1981). Estimation of evolutionary distances between homologous nucleotide sequences. Proceedings of the National Academy of Sciences, 78, 454–458. doi: 10.1073/pnas.78.1.454.
  Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA: MIT Press,.
  Lakner, C., van der Mark, P., Huelsenbeck, J. P., Larget, B., & Ronquist, F. (2008). Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Systematic Biology, 57, 86–103. doi: 10.1080/10635150801886156.
  Lartillot, N., & Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Systematic Biology, 55, 195. doi: 10.1080/10635150500433722.
  Lartillot, N., Lepage, T., & Blanquart, S. (2009). PhyloBayes 3: A Bayesian software package for phylo‐genetic reconstruction and molecular dating. Bioinformatics, 25, 2286. doi: 10.1093/bioinformatics/btp368.
  Lavine, M., & Schervish, M. J. (1999). Bayes factors: What they are and what they are not. The American Statistician, 53, 119–122. doi: 10.2307/2685729.
  Lewis, P. O. (2003). NCL: A C++ class library for interpreting data files in NEXUS format. Bioinformatics, 19, 2330–2331. doi: 10.1093/bioinformatics/btg319.
  Li, S., Pearl, D. K., & Doss, H. (2000). Phylogenetic tree construction using Markov chain Monte Carlo. Journal of the American Statistical Association, 95, 493–508. doi: 10.1080/01621459.2000.10474227.
  Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS — a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337. doi: 10.1023/A:1008929526011.
  Mau, B., Newton, M., & Larget, B. (1999). Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics, 55, 1–12. doi: 10.1111/j.0006‐341X.1999.00001.x.
  Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087–1092. doi: 10.1063/1.1699114.
  Nee, S., May, R. M., & Harvey, P. H. (1994). The Reconstructed Evolutionary Process. Philosophical Transactions: Biological Sciences, 344, 305–311. doi: 10.1098/rstb.1994.0068.
  Page, R.D. 2003. Introduction to inferring evolutionary relationships. Current Protocols in Bioinformatics 00, 6.1:6.1.1–6.1.13. doi: 10.1002/0471250953.bi0601s00.
  Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6, 7–11.
  Posada, D. & Crandall, K. A. (2001). Selecting the best‐fit model of nucleotide substitution. Systematic Biology, 50, 580–601. doi: 10.1080/106351501750435121.
  Rannala, B., & Yang, Z. (1996). Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference. Journal of Molecular Evolution, 43, 304–311. doi: 10.1007/BF02338839.
  Rannala, B., Zhu, T., & Yang, Z. (2012). Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Molecular Biology and Evolution, 29, 325–335. doi: 10.1093/molbev/msr210.
  Roberts, G. O., & Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18, 349–367. doi: 10.1198/jcgs.2009.06134.
  Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M. A., & Huelsenbeck, J. P. (2012). Mrbayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology, 61, 539–542. doi: 10.1093/sysbio/sys029.
  Shapiro, B., Rambaut, A., & Drummond, A. (2006). Choosing appropriate substitution models for the phylogenetic analysis of protein‐coding sequences. Molecular Biology and Evolution, 23, 7. doi: 10.1093/molbev/msj021.
  Stadler, T. (2010). Sampling‐through‐time in birth‐death trees. Journal of Theoretical Biology, 267, 396–404. doi: 10.1016/j.jtbi.2010.09.010.
  Sukumaran, J., & Holder, M. T. (2010). Dendropy: A Python library for phylogenetic computing. Bioinformatics, 26, 1569–1571. doi: 10.1093/bioinformatics/btq228.
  Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10, 512–526.
  Tavaré, S. (1986). Some probabilistic and statistical problems in the analysis of DNA sequences. In: Some Mathematical Questions in Biology—DNA Sequence Analysis, Miura RM (Ed.), American Mathematical Society, Providence (RI), 17, 57–86.
  Thorne, J., Kishino, H., & Painter, I. S. (1998). Estimating the rate of evolution of the rate of molecular evolution. Molecular Biology and Evolution, 15, 1647–1657. doi: 10.1093/oxfordjournals.molbev.a025892.
  Wilgenbusch, J.C. & Swofford, D. (2003). Inferring evolutionary trees with PAUP*. Current Protocols in Bioinformatics 00, 6.4:6.4.1–6.4.28.
  Xie, W., Lewis, P., Fan, Y., Kuo, L., & Chen, M. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60, 150–160. doi: 10.1093/sysbio/syq085.
  Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. Journal of Molecular Evolution, 39, 306–314. doi: 10.1007/BF00160154.
  Yang, Z. (2014). Molecular Evolution: A Statistical Approach. Oxford: Oxford University Press.
  Yang, Z., & Rannala, B. (1997). Bayesian phylogenetic inference using DNA sequences: A Markov Chain Monte Carlo method. Molecular Biology and Evolution, 14, 717–724. doi: 10.1093/oxfordjournals.molbev.a025811.
  Yang, Z., & Rannala, B. (2005). Branch‐length prior influences Bayesian posterior probability of phylogeny. Systematic Biology, 54, 455. doi: 10.1080/10635150590945313.
  Yang, Z., & Rannala, B. (2012). Molecular phylogenetics: Principles and practice. Nature Reviews Genetics, 13, 303–314. doi: 10.1038/nrg3186.
  Yoder, A. D. (2003). The phylogenetic position of genus Tarsius: Whose side are you on. In Tarsiers: Past, present, and future (pp. 161‐175). New Brunswick, NJ: Rutgers University Press.
  Zhang, C., Rannala, B., & Yang, Z. (2012). Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Systematic Biology, 61, 779–784. doi: 10.1093/sysbio/sys030.
PDF or HTML at Wiley Online Library