Multiple Sequence Alignment Using ClustalW and ClustalX

Julie D. Thompson1, Toby. J. Gibson2, Des G. Higgins3

1 Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch Cedex, France, 2 European Molecular Biology Laboratory, Heidelberg, Germany, 3 University College, Cork, Ireland
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.3
DOI:  10.1002/0471250953.bi0203s00
Online Posting Date:  August, 2002
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

The Clustal programs are widely used for carrying out automatic multiple alignment of nucleotide or amino acid sequences. The most familiar version is ClustalW, which uses a simple text menu system that is portable to more or less all computer systems. ClustalX features a graphical user interface and some powerful graphical utilities for aiding the interpretation of alignments and is the preferred version for interactive usage. Users may run Clustal remotely from several sites using the Web or the programs may be downloaded and run locally on PCs, Macintosh, or Unix computers. The protocols in this unit discuss how to use ClustalX and ClustalW to construct an alignment, and create profile alignments by merging existing alignments.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Using ClustalW and ClustalX to Do Multiple Alignments
  • Alternate Protocol 1: Using ClustalW and ClustalX for Profile Alignments
  • Support Protocol 1: Obtaining the ClustalW and ClustalX Programs
  • Guidelines for Understanding Results
  • Commentary
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Using ClustalW and ClustalX to Do Multiple Alignments

  Necessary Resources
  • Hardware
    • Unix (including Linux) workstation (e.g., Sun, Alpha, Silicon Graphics, PC), PC with MS Windows, or Power Macintosh
  • Software
    • ClustalW or ClustalX program (see protocol 3)
  • Files
    • Sequences can be input to both ClustalW and ClustalX in one of seven file formats. All sequences must be in the same file. The formats that are automatically recognized are: NBRF/PIR, EMBL/Swiss‐Prot, Pearson ( FASTA; appendix 1B), Clustal, GCG/MSF, GCG9/RSF, and GDE flat file. The sequences must be all nucleotide or all amino acid, and the program will attempt to guess which by the composition of the letters. Upper‐ or lowercase can be used and most symbols and numbers will be ignored (removed); unrecognized residues will be counted as X or N.
    If using a word processor to prepare the input file, save the file as plain text with line breaks—i.e., as a simple ASCII file. ClustalX cannot deal with native word processor formats.

Alternate Protocol 1: Using ClustalW and ClustalX for Profile Alignments

  Necessary Resources
  • Hardware
    • Unix (including Linux) workstation (e.g., Sun, Alpha, Silicon Graphics, PC), PC with MS Windows, or Power Macintosh
  • Software
    • ClustalW or ClustalX program (see protocol 3)
  • Files
    • Sequences and existing alignments can be input to both ClustalW and ClustalX in one of seven file formats. All sequences must be in the same file. The formats that are automatically recognized are: NBRF/PIR, EMBL/Swiss‐Prot, Pearson ( FASTA; appendix 1B), Clustal, GCG/MSF, GCG9/RSF, and GDE flat file. In the examples here, unaligned sequences are in FASTA format and existing alignments are in Clustal and GCG/MSF formats.

Support Protocol 1: Obtaining the ClustalW and ClustalX Programs

  Necessary Resources
  • Hardware
    • Unix (including Linux) workstation (Sun, Alpha, Silicon Graphics, PC), PC with either MS‐DOS or MS Windows, Power Macintosh, or any other computer supporting a C compiler
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Doolittle, R.F. 1986. Of URFs and ORFs: A primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, Ca.
   Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783‐791.
   Feng, D.‐F. and Doolittle, R.F. 1987. Progressive sequence alignment as a pre‐requisite to correct phylogenetic trees. J. Mol. Evol. 25:351‐360.
   Gotoh, O. 1982. An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705‐708.
   Gribskov, M., McLachlan, A.D., and Eisenberg, D. 1987. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84:4355‐4358.
   Higgins, D.G. and Sharp, P.M. 1988. CLUSTAL: A package for performing multiple sequence alignments on a microcomputer. Gene 73:237‐244.
   Higgins, D.G. and Sharp, P.P. 1989. Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS 5:151‐153.
   Higgins, D.G., Bleasby, A.J., and Fuchs, R. 1992. CLUSTAL V: Improved software for multiple sequence alignment. Comp. Appl. Biosci. 8:189‐191.
   Hogeweg, P. and Hesper, B. 1984. The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20:175‐186.
   Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. CABIOS 4:11‐17.
   Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443‐453.
   Pearson, W.R. 2000. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132:185‐219.
   Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12:85‐94.
   Saitou, N. and Nei, M. 1987. The neighbor‐joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406‐425.
   States, D.J., Gish, W., and Altschul, S.F. 1991. Improved sensitivity of nucleic acid database searches using application‐specific scoring matrices. Methods 3:66‐70.
   Taylor, WR. 1988. A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28:161‐169.
   Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680.
   Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876‐4882.
   Thompson, J.D., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27:2682‐2690.
   Wilbur, W.J. and Lipman, D.J. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. U.S.A. 80:726‐730.
Key References
   Jeanmougin, F., Thompson, J.D., Gouy, M., Higgins, D.G., and Gibson, T.J. 1998. Multiple sequence alignment with ClustalX. Trends Biochem Sci. 23:403‐405.
  Both of these articles give extensive background and descriptive details as to what exactly happens when you try to use Clustal and what all of the parameters mean. They are intended for a lay, nontechnical audience.
   Higgins, D.G., Thompson, J.D., and Gibson, T.J. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383‐402.
Internet Resources
  http://www‐igbmc.u‐strasbg.fr/BioInfo/ClustalX/Top.html
  Get information on or download ClustalX.
   http://www.ebi.ac.uk/clustalw/
  Run ClustalW at the EBI using the Web.
   http://cmgm.stanford.edu/phylip/
  PHYLIP (Phylogeny Inference Package) version 3.5c., by J. Felsenstein. Department of Genetics, University of Washington, Seattle.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library