Phylogenetic Analysis of Protein Sequence Data Using the Randomized Axelerated Maximum Likelihood (RAXML) Program

Antonis Rokas1

1 Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee
Publication Name:  Current Protocols in Molecular Biology
Unit Number:  Unit 19.11
DOI:  10.1002/0471142727.mb1911s96
Online Posting Date:  October, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Phylogenetic analysis is the study of evolutionary relationships among molecules, phenotypes, and organisms. In the context of protein sequence data, phylogenetic analysis is one of the cornerstones of comparative sequence analysis and has many applications in the study of protein evolution and function. This unit provides a brief review of the principles of phylogenetic analysis and describes several different standard phylogenetic analyses of protein sequence data using the RAXML (Randomized Axelerated Maximum Likelihood) Program. Curr. Protoc. Mol. Biol. 96:19.11.1‐19.11.14. © 2011 by John Wiley & Sons, Inc.

Keywords: molecular evolution; bootstrap; multiple sequence alignment; amino acid substitution matrix; evolutionary relationship; systematics

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Brief Introduction to Phylogenetic Analysis
  • Phylogenetic Analysis Using the RAXML Program
  • Concluding Remarks
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Abascal, F., Zardoya, R., and Posada, D. 2005. Prottest: Selection of best‐fit models of protein evolution. Bioinformatics 21:2104‐2105.
   Adachi, J. and Hasegawa, M. 1996. Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42:459‐468.
   Adachi, J., Waddell, P.J., Martin, W., and Hasegawa, M. 2000. Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J. Mol. Evol. 50:348‐358.
   Alexeyenko, A., Tamas, I., Liu, G., and Sonnhammer, E.L. 2006. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:E9‐E15.
   Capella‐Gutierrez, S., Silla‐Martinez, J.M., and Gabaldon, T. 2009. trimAl: A tool for automated alignment trimming in large‐scale phylogenetic analyses. Bioinformatics 25:1972‐1973.
   Castresana, J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17:540‐552.
   Chor, B. and Tuller, T. 2005. Maximum likelihood of evolutionary trees: Hardness and approximation. Bioinformatics 21:97‐106.
   Ciccarelli, F.D., Doerks, T., von Mering, C., Creevey, C.J., Snel, B., and Bork, P. 2006. Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283‐1287.
   Day, W.H.E., Johnson, D.S., and Sankoff, D. 1986. The computational complexity of inferring rooted phylogenies by parsimony. Math. Biosci. 81:33‐42.
   Dimmic, M.W., Rest, J.S., Mindell, D.P., and Goldstein, R.A. 2002. rtREV: An amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J. Mol. Evol. 55:65‐73.
   Drummond, A.J. and Rambaut, A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214.
   Dunn, C.W., Hejnol, A., Matus, D.Q., Pang, K., Browne, W.E., Smith, S.A., Seaver, E., Rouse, G.W., Obst, M., Edgecombe, G.D., Sorensen, M.V., Haddock, S.H., Schmidt‐Rhaesa, A., Okusu, A., Kristensen, R.M., Wheeler, W.C., Martindale, M.Q., and Giribet, G. 2008. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452:745‐749.
   Edwards, A.W.F. 1992. Likelihood (Expanded Edition). The John Hopkins University Press, Baltimore, Maryland.
   Felsenstein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783‐791.
   Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package). Distributed by the Author, Department of Genetics, University of Washington, Seattle.
   Felsenstein, J. 2003. Inferring Phylogenies. Sinauer, Sunderland, Massachusetts.
   Fitzpatrick, D.A., Logue, M.E., Stajich, J.E., and Butler, G. 2006. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol. Biol. 6:99.
   Garcia‐Fernandez, J. and Holland, P.W.H. 1994. Archetypal organization of the amphioxus Hox gene cluster. Nature 370:563‐566.
   Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. 2010. New algorithms and methods to estimate maximum likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59:307‐321.
   Hittinger, C.T., Johnston, M., Tossberg, J.T., and Rokas, A. 2010. Leveraging skewed transcript abundance by RNA‐Seq to increase the genomic depth of the tree of life. Proc. Natl. Acad. Sci. U.S.A. 107:1476‐1481.
   Huelsenbeck, J.P., Ronquist, F., Nielsen, R., and Bollback, J.P. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310‐2314.
   James, T.Y., Kauff, F., Schoch, C.L., Matheny, P.B., Hofstetter, V., Cox, C.J., Celio, G., Gueidan, C., Fraker, E., Miadlikowska, J., Lumbsch, H.T., Rauhut, A., Reeb, V., Arnold, A.E., Amtoft, A., Stajich, J.E., Hosaka, K., Sung, G.H., Johnson, D., O'Rourke, B., Crockett, M., Binder, M., Curtis, J.M., Slot, J.C., Wang, Z., Wilson, A.W., Schussler, A., Longcore, J.E., O'Donnell, K., Mozley‐Standridge, S., Porter, D., Letcher, P.M., Powell, M.J., Taylor, J.W., White, M.M., Griffith, G.W., Davies, D.R., Humber, R.A., Morton, J.B., Sugiyama, J., Rossman, A.Y., Rogers, J.D., Pfister, D.H., Hewitt, D., Hansen, K., Hambleton, S., Shoemaker, R.A., Kohlmeyer, J., Volkmann‐Kohlmeyer, B., Spotts, R.A., Serdani, M., Crous, P.W., Hughes, K.W., Matsuura, K., Langer, E., Langer, G., Untereiner, W.A., Lucking, R., Budel, B., Geiser, D.M., Aptroot, A., Diederich, P., Schmitt, I., Schultz, M., Yahr, R., Hibbett, D.S., Lutzoni, F., McLaughlin, D.J., Spatafora, J.W., and Vilgalys, R. 2006. Reconstructing the early evolution of Fungi using a six‐gene phylogeny. Nature 443:818‐822.
   Katoh, K., and Toh, H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinformatics 9:286‐298.
   Katoh, K., Misawa, K., Kuma, K., and Miyata, T. 2002. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30:3059‐3066.
   Kitching, I.J., Forey, P.L., Humphries, C.J., and Williams, D.M. 1998. Cladistics: The Theory and Practice of Parsimony Analysis, 2nd Ed. Oxford University Press, New York.
   Kuzniar, A., van Ham, R.C.H.J., Pongor, S., and Leunissen, J.A.M. 2008. The quest for orthologs: Finding the corresponding gene across genomes. Trends Genet. 24:539‐551.
   Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., and Higgins, D.G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947‐2948.
   Li, L., Stoeckert, C.J. Jr., and Roos, D.S. 2003. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178‐2189.
   Li, W‐H. 1997. Molecular Evolution. Sinauer, Sunderland, Massachusetts.
   Loytynoja, A. and Goldman, N. 2008. Phylogeny‐aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632‐1635.
   Loytynoja, A. and Goldman, N. 2010. webPRANK: A phylogeny‐aware multiple sequence aligner with interactive alignment browser. BMC Bioinformatics 11:579.
   Murphy, W.J., Eizirik, E., Johnson, W.E., Zhang, Y.P., Ryder, O.A., and O'Brien, S.J. 2001. Molecular phylogenetics and the origins of placental mammals. Nature 409:614‐618.
   Nei, M. and Kumar, S. 2000. Molecular Evolution and Phylogenetics. Oxford University Press, New York.
   Notredame, C., Higgins, D.G., and Heringa, J. 2000. T‐Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302:205‐217.
   Olsen, G.J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. Fastdnaml: A tool for construction of phylogenetic trees of DNA‐sequences using maximum‐likelihood. Comput. Appl. Biosci. 10:41‐48.
   Page, R.D.M. and Holmes, E.C. 1998. Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Malden, Massachusetts.
   Pattengale, N.D., Alipour, M., Bininda‐Emonds, O.R.P., Moret, B.M.E., and Stamatakis, A. 2010. How many bootstrap replicates are necessary? J. Comput. Biol. 17:337‐354.
   Pearson, W.R. and Sierk, M.L. 2005. The limits of protein sequence comparison? Curr. Opin. Struct. Biol. 15:254‐260.
   Posada, D. 2009. Selecting models of evolution. In: The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing ( P. Lemey, M. Salemi, and A.M. Vandamme, eds.) pp. 345‐361. Cambridge University Press, Cambridge.
   Remm, M., Storm, C.E., and Sonnhammer, E.L. 2001. Automatic clustering of orthologs and in‐paralogs from pairwise species comparisons. J. Mol. Biol. 314:1041‐1052.
   Rokas, A., Nylander, J.A.A., Ronquist, F., and Stone, G.N. 2002. A maximum likelihood analysis of eight phylogenetic markers in gallwasps (Hymenoptera: Cynipidae): Implications for insect phylogenetic studies. Mol. Phylogenet. Evol. 22:206‐219.
   Rokas, A., Williams, B.L., King, N., and Carroll, S.B. 2003. Genome‐scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798‐804.
   Salemi, M., Vandamme, A‐M., and Lemey, P. 2009. The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing, 2nd Ed. Cambridge University Press, Cambridge.
   Salichos, L. and Rokas, A. 2011. Evaluating ortholog prediction algorithms in a yeast model clade. PLoS One 6:e18755.
   Schmidt, H.A. and von Haeseler, A. 2009. Phylogenetic inference using maximum likelihood methods. In The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing ( P. Lemey, M. Salemi, and A.M. Vandamme, eds.) pp. 181‐209. Cambridge University Press, Cambridge.
   Shimodaira, H. and Hasegawa, M. 1999. Multiple comparisons of log‐likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114‐1116.
   Soltis, P.S. and Soltis, D.E. 2003. Applying the bootstrap in phylogeny reconstruction. Stat. Sci. 18:256‐267.
   Stamatakis, A. 2006. RAXML‐VI‐HPC: Maximum likelihood‐based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688‐2690.
   Stamatakis, A., Ludwig, T., and Meier, H. 2005. RAXML‐III: A fast program for maximum likelihood‐based inference of large phylogenetic trees. Bioinformatics 21:456‐463.
   Stamatakis, A., Blagojevic, F., Nikolopoulos, D.S., and Antonopoulos, C.D. 2007. Exploring new search algorithms and hardware for phylogenetics: RAXML meets the IBM cell. J. VLSI Signal Process. Syst. Signal Image Video Technol. 48:271‐286.
   Stamatakis, A., Hoover, P., and Rougemont, J. 2008. A rapid bootstrap algorithm for the RAXML Web servers. Syst. Biol. 57:758‐771.
   Sterner, K.N., Raaum, R.L., Zhang, Y.P., Stewart, C.B., and Disotell, T.R. 2006. Mitochondrial data support an odd‐nosed colobine clade. Mol. Phylogenet. Evol. 40:1‐7.
   Stewart, C.B., Schilling, J.W., and Wilson, A.C. 1987. Adaptive evolution in the stomach lysozymes of foregut fermenters. Nature 330:401‐404.
   Swofford, D.L. 2002. PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer, Sunderland, Massachusetts.
   Swofford, D.L., Olsen, G.J., Waddell, P.J., and Hillis, D.M. 1996. Phylogenetic inference. In Molecular Systematics ( D.M. Hillis, C. Moritz, and B.K. Mable, eds.) pp. 407‐514. Sinauer, Sunderland, Massachusetts.
   Talavera, G. and Castresana, J. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56:564‐577.
   Wall, D.P., Fraser, H.B., and Hirsh, A.E. 2003. Detecting putative orthologs. Bioinformatics 19:1710‐1711.
   Whelan, S. and Goldman, N. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum‐likelihood approach. Mol. Biol. Evol. 18:691‐699.
   Whelan, S., Lio, P., and Goldman, N. 2001. Molecular phylogenetics: State‐of‐the‐art methods for looking into the past. Trends Genet. 17:262‐272.
   Yang, Z. 1996. Among‐site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11:367‐372.
   Zwickl, D.J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Doctoral thesis. The University of Texas at Austin.
PDF or HTML at Wiley Online Library