What If I Don't Have a Tree?: Split Decomposition and Related Models

Daniel H. Huson1

1 Center for Bioinformatics Tübingen, Tübingen University, Tübingen, Germany
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 6.7
DOI:  10.1002/0471250953.bi0607s01
Online Posting Date:  May, 2003
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

A set of aligned character sequences or a matrix of evolutionary distances often contains a number of different and sometimes conflicting phylogenetic signals, and thus does not always support a unique tree. The method of split decomposition addresses this problem. For ideal data, this method gives rise to a phylogenetic tree, whereas less ideal data are represented by a treeā€like network that may indicate evidence of different and conflicting phylogenies. The SplitsTree program, described here, implements this approach and can be used to compute and visualize phylogenetic networks called splits graphs. It also implements a number of distance transformations, the computation of parsimony splits, spectral analysis and bootstrapping.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Using SplitsTree Interactively
  • Alternate Protocol 1: Using the Command‐Line Version of SplitsTree
  • Support Protocol 1: Obtaining SplitsTree
  • Guidelines for Understanding Results
  • Commentary
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Using SplitsTree Interactively

  Necessary Resources
  • Hardware
    • A computer running Windows, Linux, or Solaris
  • Software
    • The SplitsTree program, downloadable from the SplitsTree Web site: http://www‐ab.informatik.uni‐tuebingen.de/software/splits/welcome_en.html
  • Files
    • SplitsTree uses the NEXUS file format for input and output of data. Unfortunately, there are two variants of this format in use, an old version and the “official” format defined in Maddison et al. ( ). A number of existing programs use the old format or can parse both formats. SplitsTree, however, only supports the “official” format.
    • For identification purposes, the first line of a file in NEXUS format must consist of the statement #NEXUS. For a file containing input data, this is followed by a taxa block that lists the names of all occurring taxa. The taxa block is then followed by a characters block containing a set of aligned character sequences for the named taxa, or by a distances block, if the input consists of a distance matrix.
    • It is usually very easy to convert a file containing aligned sequences or distances into the NEXUS format because both the characters and the distances block contain a format command that one can use to describe the precise format in which the data is given. For example, in the characters block, one can specify whether the sequences are interleaved, not interleaved, or transposed. Moreover, one can specify whether the sequences are DNA, RNA, or protein and what the gap, missing, and match characters are. For distances, one can specify, e.g., whether the upper, lower, or both triangles of the matrix are given, with or without the diagonal. An example of an input file in NEXUS format is given in Figure . Note that the program does not distinguish between upper‐ and lowercase letters. Moreover, a taxon label must be a single word without spaces or special characters (see Maddison et al., , for details), unless it is surrounded by single quotes, e.g., 'Example One'.
    • A formal description of the input format used by SplitsTree is given in Figure .
    • In practice, the most tedious and error‐prone step in formatting input data in NEXUS format is producing a list of all taxa names to place in the taxa block. To avoid this, a special label, _detect_, can be used in the taxlabels command of the taxa block instead of a list of taxon names. In this case, SplitsTree will read the labels from the supplied distances or characters block.
    • The sample data used below are shown in Figure and can be downloaded from the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm).

Alternate Protocol 1: Using the Command‐Line Version of SplitsTree

  Necessary Resources
  • Hardware
    • A computer running Windows, Linux, or Solaris
  • Software
    • The command‐line version of the SplitsTree program, downloadable from the SplitsTree Web site: http://www‐ab.informatik.uni‐tuebingen.de/software/splits/welcome_en.html
  • Files
    • SplitsTree uses the NEXUS file format for input and output of data. Unfortunately, there are two variants of this format in use, an old version and the “official” format defined in Maddison et al. ( ). A number of existing programs use the old format or can parse both formats. SplitsTree, however, only supports the “official” format.
    • For identification purposes, the first line of a file in NEXUS format must consist of the statement #NEXUS. For a file containing input data, this is followed by a taxa block that lists the names of all occurring taxa. The taxa block is then followed by a characters block containing a set of aligned character sequences for the named taxa, or by a distances block, if the input consists of a distance matrix.
    • It is usually very easy to convert a file containing aligned sequences or distances into the NEXUS format because both the characters and the distances block contain a format command that one can use to describe the precise format in which the data is given. For example, in the characters block, one can specify whether the sequences are interleaved, not interleaved, or transposed. Moreover, one can specify whether the sequences are DNA, RNA, or protein and what the gap, missing, and match characters are. For distances, one can specify, e.g., whether the upper, lower, or both triangles of the matrix are given, with or without the diagonal. An example of an input file in NEXUS format is given in Figure . Note that the program does not distinguish between upper‐ and lowercase letters. Moreover, a taxon label must be a single word without spaces or special characters (see Maddison et al., , for details), unless it is surrounded by single quotes, e.g., 'Example One'.
    • A formal description of the input format used by SplitsTree is given in Figure .
    • In practice, the most tedious and error‐prone step in formatting input data in NEXUS format is producing a list of all taxa names to place in the taxa block. To avoid this, a special label, _detect_, can be used in the taxlabels command of the taxa block instead of a list of taxon names. In this case, SplitsTree will read the labels from the supplied distances or characters block.
    • The sample data used below are shown in Figure and can be downloaded from the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm).
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   FigureFigure 6.7.1 Example of an input file in NEXUS format. Usually, an input file will contain either a characters or distances block, but not both.
  •   FigureFigure 6.7.2 A depiction of the input format used by SplitsTree. Here, square brackets (e.g., [NO]) indicate optional statements and vertical lines (e.g., {LOWER|UPPER|BOTH}) indicate alternative choices. Note that the FORMAT is not case‐sensitive, unless the RESPECTCASE label is present in the FORMAT command of the CHARACTERS block, in which case the program will distinguish between upper‐ and lowercase characters in the input matrix.
  •   FigureFigure 6.7.3 The menus provided by SplitsTree.
  •   FigureFigure 6.7.4 Splits graphs are displayed in the main window. In this graph, a strong split groups M. fascicularis and M. mulatta together, whereas a second split, incompatible with the first, groups M. fascicularis, Lemur catta, and Samimiri sciureus together, against the rest. Notice the Fit value of 85.3 in the lower lefthand corner (see ). This graph was obtained from the data in Figure , using split decomposition applied to the first position of each codon.
  •   FigureFigure 6.7.5 (A) Taxa listed on the left are “active” and included in all computations. All taxa listed on the right are “hidden.” Clicking on a list item will move it to the other list. (B) Specific sites in aligned character sequences can be excluded from consideration; for example, here the sites 1, 99, and 200‐500 will be ignored. Additionally, one can specify which codon positions (with which offset) are to be considered.

Videos

Literature Cited

Literature Cited
   Bandelt, H.‐J. and Dress, A.W.M. 1992a. A canonical decomposition theory for metrics on a finite set. Adv. Math. 92:47‐105.
   Bandelt, H.‐J. and Dress, A.W.M. 1992b. Split decomposition: A new and useful approach to phylogenetic analysis of a distance data. Mol. Phylogenet. Evol. 1:242‐252.
   Bandelt, H.‐J. and Dress, A.W.M. 1993. A relational approach to split decomposition. In Information and Classification: Concepts, Methods and Applications. (O. Opitz, B. Lausen, and R. Klar, eds.) Heidelberg, Germany.
   Buneman, P. 1971. The recovery of trees from measures of dissimilarity. In Mathematics and the Archeological and Historical Sciences (F.R. Hodson, D.G. Kendall, and P. Tautu, eds.) pp. 387‐395. Edinburgh University Press, Edinburgh.
   Dress, A.W.M., Huson, D.H., and Moulton, V. 1996. Analyzing and visualizing sequence and distance data using SplitsTree. Discrete Appl. Math. 71:95‐109
   Hendy, M.D. and Penny, D. 1993. Spectral analysis of phylogenetic data. J. Classif. 10:5‐24.
   Huson, D.H. 1998. SplitsTree: A program for analyzing and visualizing evolutionary data. Bioinformatics 14:68‐73.
   Jukes, T.H. and Cantor, C.R. 1969. Evolution of protein molecules. In Mammalian Protein Metabolism (H.N. Munro, ed.) p. 21‐132. Academic Press, London.
   Kimura, M. 1981. Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. U.S.A. 78:454‐458.
   Lockhart, P.J., Steel, M.A., Hendy, M., and Penny, D. 1994. Recovering the correct tree under a more realistic model of evolution. Mol. Biol. Evol. 11:605‐612.
   Maddison, D.R., Swofford, D.L., and Maddison, W.P. 1997. NEXUS: An extensible file format for systematic information. Syst. Biol. 46:590‐621.
   Steel, M.A. 1994. Recovering a tree from the leaf colorations it generates under a markov model. Appl. Math. Lett. 7:19‐24.
Key References
   Bandelt and Dress, 1992a,b, 1993., See above.
  The theory of split decomposition and related methods was introduced by Hans‐Jürgen Bandelt and Andreas Dress.
Internet Resources
  http://www‐ab.informatik.uni‐tuebingen.de/software/splits/welcome_en.html
  This Web site contains additional information on SplitsTree, has links to Web servers running the program, and provides downloads of the different versions of the program.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library