Getting a Tree Fast: Neighbor Joining, FastME, and Distance‐Based Methods

Richard Desper1, Olivier Gascuel2

1 Department of Biology, University College, London, United Kingdom, 2 Equipe “Méthodes et Algorithmes pour la, Bioinformatique”, LRMM‐CNRS, Montpellier
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 6.3
DOI:  10.1002/0471250953.bi0603s15
Online Posting Date:  October, 2006
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Neighbor Joining (NJ), FastME, and other distance‐based programs including BIONJ, WEIGHBOR, and (to some extent) FITCH, are fast methods to build phylogenetic trees. This makes them particularly effective for large‐scale studies or for bootstrap analysis, which require runs on multiple data sets. Like maximum likelihood methods, distance methods are based on a sequence evolution model that is used to estimate the matrix of pairwise evolutionary distances. Computer simulations indicate that the topological accuracy of FastME is best, followed by FITCH, WEIGHBOR, and BIONJ, while NJ is worse. Moreover, FastME is even faster than NJ with large data sets. Best‐distance methods are equivalent to parsimony in most cases, but become more accurate when the molecular clock is strongly violated or in the presence of long (e.g., outgroup) branches. This unit describes how to use distance‐based methods, focusing on NJ (the most popular) and FastME (the most efficient today). It also describes how to estimate evolutionary distances from DNA and proteins, how to perform bootstrap analysis, and how to use CLUSTAL to compute both a sequence alignment and a phylogenetic tree.

Keywords: Phylogenetic trees; evolutionary distances; distance‐based methods; NJ; FastME

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Using the NEIGHBOR Program from the PHYLIP Package to Construct a Phylogenetic Tree
  • Support Protocol 1: Distance Matrix Estimation from DNA (or RNA) Sequences Using DNADIST
  • Support Protocol 2: Distance Matrix Estimation from Proteins Using PROTDIST
  • Support Protocol 3: Bootstrapping Using SEQBOOT and CONSENSE
  • Alternate Protocol 1: Using BIONJ, WEIGHBOR, or FITCH to Construct a Tree
  • Alternate Protocol 2: Using FastME to Construct a Tree
  • Alternate Protocol 3: Computing NJ Trees Using Clustal
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Atteson, K. 1997. The performance of the NJ method of phylogeny reconstruction. In Mathematical Hierarchies and Biology (B. Mirkin, F.R. McMorris, F.S. Roberts, and A. Rzhetsky, eds.) pp.133‐148. American Mathematical Society, Providence, R.I.
   Berry, V. and Gascuel, O. 1996. Interpretation of bootstrap trees: Threshold of clade selection and induced gain. Mol. Biol. Evol. 13:999‐1011.
   Bruno, W.J., Socci, N.D., and Halpern, A.L. 2000. Weighted neighbor joining: A likelihood‐based approach to distance‐based phylogeny reconstruction. Mol. Biol. Evol. 17:189‐197.
   Bulmer, M. 1991. Use of the method of generalized least squares in reconstructing phylogenies from sequence data. Mol. Biol. Evol. 8:868‐883.
   Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. 1979. A model for evolutionary change in proteins. In Atlas of Protein Sequence and Structure (M.O. Dayhoff, ed.), vol. 5, pp. 345‐352.
   Desper, R. and Gascuel, O. 2002. Fast and accurate phylogeny reconstruction algorithms based on the minimum‐evolution principle. J. Comp. Biol. 9:687‐705.
   Desper, R. and Gascuel, O. 2004. Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least‐squares tree fitting. Mol. Biol. Evol. 21:587‐598.
   Desper, R. and Gascuel, O. 2005. The minimum evolution distance‐based approach to phylogenetic inference. In Mathematics of Evolution and Phylogeny (O. Gascuel, ed.) pp. 1‐32. Oxford University Press, Oxford.
   Felsenstein, J. 1989. PHYLIP–Phylogeny inference package (version 3.2). Cladistics 5:164‐166.
   Felsenstein, J. 1997. An alternating least‐squares approach to inferring phylogenies from pairwise distances. Syst. Biol. 46:101‐111.
   Felsenstein, J. and Churchill, G.A. 1996. A hidden Markov model approach to variation among sites in rate of evolution Mol. Biol. Evol. 13:93‐104
   Fitch, W.M. and Margoliash, E. 1967. Construction of phylogenetic trees. Science 155:279‐284.
   Galtier, N. 2001. Maximum‐likelihood phylogenetic analysis under a covarion‐like model. Mol. Biol. Evol. 18:866‐873.
   Gascuel, O. 1997a. BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14:685‐695.
   Gascuel, O. 1997b. Concerning the NJ algorithm and its unweighted version, UNJ. Mathematical Hierarchies and Biology (B. Mirkin, F.R. McMorris, F.S. Roberts, and A. Rzhetsky, eds.) pp. 149‐170. American Mathematical Society, Providence, R.I.
   Graur, D. and Li, W.‐H. 2000. Fundamentals of Molecular Evolution. Sinauer Associates, Sunderland, Mass.
   Guindon, S. and Gascuel, O. 2002. Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Mol. Biol. Evol. 19:534‐543.
   Guindon, S. and Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696‐704.
   Howe, K., Bateman, A., and Durbin, R. 2002. QuickTree: Building huge Neighbour‐Joining trees of protein sequences. Bioinformatics 18:1546‐1547.
   Jin, L. and Nei, M. 1990. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol. 7:82‐102.
   Jones, D.T., Taylor, W.R., and Thornton, J.M. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275‐82.
   Jukes, T.H. and Cantor, C.R. 1969. Evolution of protein molecules. Mammalian Protein Metabolism (H. N. Munro, ed.) pp.21‐132. Academic Press, New York.
   Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111‐120.
   Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, U.K.
   Kishino, H. and Hasegawa, M. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea J. Mol. Evol. 29:170‐179.
   Nei, M. and Jin, L. 1989. Variances of the average numbers of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6:290‐300.
   Page, R.D.M. and Holmes, E.C. 1998. Molecular Evolution: A Phylogenetic Approach. Blackwell Scientific, Oxford, U.K.
   Pauplin, Y. 2000. Direct calculation of a tree length using a distance matrix. J. Mol. Evol. 51:41‐47.
   Perrière, G. and Gouy, M. 1996. WWW‐Query: An on‐line retrieval system for biological sequence banks. Biochimie 78:364‐369.
   Rzhetsky, A. and Nei, M. 1993. Theoretical foundation of the minimum‐evolution method of phylogenetic inference. Mol. Biol. Evol. 10:1073‐1095.
   Saitou, N. and Nei, M. 1987. The neighbor‐joining method: A new method for reconstruction of phylogenetic trees. Mol. Biol. Evol. 4:406‐425.
   Sattath, S. and Tversky, A. 1977. Additive similarity trees. Psychometrika 42:319‐345.
   Sokal, R.R. and Michener, C.D. 1958. A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38:1409‐1438.
   Stamatakis, A., Ludwig, T., and Meier, H. 2004. RAxML‐III: A fast program for maximum likelihood‐based inference of large phylogenetic trees. Bioinformatics 21:456‐463.
   Steel, M.A. 1994. Recovering a tree from the Markov leaf colourations it generates under a Markov model. Appl. Math. Lett. 7:19‐23.
   Studier, J.A. and Keppler, K.J. 1988. A note on the neighbor‐joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5:729‐31.
   Swofford, D.L., Olsen, G.L., Waddell, P.J., and Hillis, D.M. 1996. Phylogenetic inference. In Molecular Systematics (D.M. Hillis, C. Moritz, and B.K. Mable, eds.) pp. 407‐514. Sinauer Associates, Sunderland, Mass.
   Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The ClustalX windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876‐4882.
   Veerassamy, S., Smith, A., and Tillier, E.R. 2003. A transition probability model for amino acid substitutions from blocks. J. Comput Biol. 10:997‐1010.
   Vinh, L.S. and von Haeseler, A. 2005. Shortest triplet clustering: Reconstructing large phylogenies using representative sets. BMC Bionfirmatics 6:92.
   Xia, X. and Xie, Z. 2001. DAMBE: Data analysis in molecular biology and evolution. J. Hered. 92:371‐373.
   Yang, Z. 1996. Among‐site rate variation and its impact on phylogenetic analyses. TREE 11:367‐372.
   Zaretskii, K. 1965. Reconstructing a tree from the distances between its leaves. Uspehi Mathematicheskikh Nauk 20:90‐92 (in Russian).
Internet Resources
  This web page from the authors provides FastME C source code and binaries for Windows, MAC OS and LINUX, as well as several papers to understand in depth the minimum evolution principle, its algorithms and its properties.
  This web page provides PHYML binaries for Windows, MAC OS and LINUX, and a web server to run PHYML online.
  Goloboff, P., Farris, S., and Nixon, K. 2000. TNT: Tree analysis using new technology. Beta version, published by the authors, Tucumán, Argentina.
  Joe Felsenstein's Web page, containing an extensive list of phylogeny software programs, including numerous distance‐based methods.
PDF or HTML at Wiley Online Library