User Ratings

Your rating: None (1 vote)
Your rating: None (1 vote)
Your rating: None (1 vote)
Add your comments

Multiple Sequence Alignment Using ClustalW and ClustalX

Julie D. Thompson1,  Toby. J. Gibson2,  Des G. Higgins3

1Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch Cedex, France
2European Molecular Biology Laboratory, Heidelberg, Germany
3University College, Cork, Ireland



Unit Number: 
Unit 2.3
DOI: 
10.1002/0471250953.bi0203s00
Online Posting Date: 
January, 2003
GO TO THE FULL TEXT:
PDF or HTML at Wiley Online Library
Are you the author of this protocol? Login or register and return to this page.

Abstract

The Clustal programs are widely used for carrying out automatic multiple alignment of nucleotide or amino acid sequences. The most familiar version is ClustalW, which uses a simple text menu system that is portable to more or less all computer systems. ClustalX features a graphical user interface and some powerful graphical utilities for aiding the interpretation of alignments and is the preferred version for interactive usage. Users may run Clustal remotely from several sites using the Web or the programs may be downloaded and run locally on PCs, Macintosh, or Unix computers. The protocols in this unit discuss how to use ClustalX and ClustalW to construct an alignment, and create profile alignments by merging existing alignments.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Unit Introduction
  • Basic Protocol: Using ClustalW and ClustalX to Do Multiple Alignments
  • Alternate Protocol: Using ClustalW and ClustalX for Profile Alignments
  • Support Protocol: Obtaining the ClustalW and ClustalX Programs
  • Guidelines for Understanding Results
  • Commentary
  • Bibliography
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol: Using ClustalW and ClustalX to Do Multiple Alignments

 Necessary Resources
  • Hardware
    • Unix (including Linux) workstation (e.g., Sun, Alpha, Silicon Graphics, PC), PC with MS Windows, or Power Macintosh
  • Software
    • ClustalW or ClustalX program (see Support Protocol)
  • Files
    • Sequences can be input to both ClustalW and ClustalX in one of seven file formats. All sequences must be in the same file. The formats that are automatically recognized are: NBRF/PIR, EMBL/Swiss-Prot, Pearson (FASTA; appendix 1B), Clustal, GCG/MSF, GCG9/RSF, and GDE flat file. The sequences must be all nucleotide or all amino acid, and the program will attempt to guess which by the composition of the letters. Upper- or lowercase can be used and most symbols and numbers will be ignored (removed); unrecognized residues will be counted as X or N.
    If using a word processor to prepare the input file, save the file as plain text with line breaks—i.e., as a simple ASCII file. ClustalX cannot deal with native word processor formats.


Alternate Protocol: Using ClustalW and ClustalX for Profile Alignments

 Necessary Resources
  • Hardware
    • Unix (including Linux) workstation (e.g., Sun, Alpha, Silicon Graphics, PC), PC with MS Windows, or Power Macintosh
  • Software
    • ClustalW or ClustalX program (see Support Protocol)
  • Files
    • Sequences and existing alignments can be input to both ClustalW and ClustalX in one of seven file formats. All sequences must be in the same file. The formats that are automatically recognized are: NBRF/PIR, EMBL/Swiss-Prot, Pearson (FASTA; appendix 1B), Clustal, GCG/MSF, GCG9/RSF, and GDE flat file. In the examples here, unaligned sequences are in FASTA format and existing alignments are in Clustal and GCG/MSF formats.

Support Protocol: Obtaining the ClustalW and ClustalX Programs

 Necessary Resources
  • Hardware
    • Unix (including Linux) workstation (Sun, Alpha, Silicon Graphics, PC), PC with either MS-DOS or MS Windows, Power Macintosh, or any other computer supporting a C compiler
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  • Figure 2.3.1
    The ClustalX window on a Unix workstation before any sequences are loaded.

  • Figure 2.3.2
    The input file selection window for ClustalX.

  • Figure 2.3.3
    ClustalX with five loaded but unaligned sequences.

  • Figure 2.3.4
    Changing the format of the multiple alignment output in ClustalX. Clustal format is the default.

  • Figure 2.3.5
    Selecting the names for the output files for the dendrogram (1wit.dnd is offered as the default) and the multiple alignment (1wit.aln is the default) for an input file called 1wit.

  • Figure 2.3.6
    ClustalX after a multiple alignment has been carried out on the five sequences. The alignment has been written to a text file which can be used for further analysis. The user can also choose to analyse this alignment further within ClustalX (e.g., to calculate a phylogenetic tree).

  • Figure 2.3.7
    The windows containing the buttons and (default) settings for the pairwise alignment parameters (left) and the multiple alignment parameters (right).

  • Figure 2.3.8
    Producing a new multiple alignment (1wit.aln) using an old guide tree file (1wit.dnd).

  • Figure 2.3.9
    Window displayed upon selecting the Show Low Scoring Segments option from the Quality menu.

  • Figure 2.3.10
    The Save As menu from ClustalX which is used to save an alignment after it is produced. Alignments are written to output files by default anyway, but this option allows users to save the output afterwards, perhaps in a different format. The full alignment is saved by default; here the user has chosen to save residues 10 to 55.

  • Figure 2.3.11
    The PostScript output menu from ClustalX. This is used to save the colored alignment with or without some of the ornamentation in the window.

  • Figure 2.3.12
    ClustalX in profile alignment mode before any sequences or profiles are loaded. The two empty windows will hold the two profiles (existing alignments) or groups of sequences.

  • Figure 2.3.13
    ClustalX in profile alignment mode after the first profile (a five-sequence alignment) has been loaded (only three are visible in scrollable window).

  • Figure 2.3.14
    ClustalX in profile alignment mode with both profiles loaded. Alignment was based on secondary structure superposition and manually adjusted.

  • Figure 2.3.15
    Window displayed upon loading a profile with a structure mask in Profile Alignment Mode.

  • Figure 2.3.16
    The default file names for the output files from the profile alignment.

  • Figure 2.3.17
    The two profiles after they have been aligned together. They are still in separate windows but have been locked together by pressing the Lock Scroll button. They are moved together by the single scroll bar at the bottom of the screen.

  • Figure 2.3.18
    The final profile alignment can be viewed in a single window by reverting back to Multiple Alignment Mode (from Profile Alignment Mode).

  • Figure 2.3.19
    A sample text output file (x.aln) showing the alignment (obtained with default parameters) of seven globin sequences. The stars, dots and colons below the alignment indicate degree of conservation in the columns.

  • Figure 2.3.20
    Dendrogram of the alignment shown in Figure 2.3.6.

Literature Cited

 Literature Cited
    Doolittle, R.F. 1986. Of URFs and ORFs: A primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, Ca.
    Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.
    Feng, D.-F. and Doolittle, R.F. 1987. Progressive sequence alignment as a pre-requisite to correct phylogenetic trees. J. Mol. Evol. 25:351-360.
    Gotoh, O. 1982. An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705-708.
    Gribskov, M., McLachlan, A.D., and Eisenberg, D. 1987. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84:4355-4358.
    Higgins, D.G. and Sharp, P.M. 1988. CLUSTAL: A package for performing multiple sequence alignments on a microcomputer. Gene 73:237-244.
    Higgins, D.G. and Sharp, P.P. 1989. Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS 5:151-153.
    Higgins, D.G., Bleasby, A.J., and Fuchs, R. 1992. CLUSTAL V: Improved software for multiple sequence alignment. Comp. Appl. Biosci. 8:189-191.
    Hogeweg, P. and Hesper, B. 1984. The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol. 20:175-186.
    Myers, E.W. and Miller, W. 1988. Optimal alignments in linear space. CABIOS 4:11-17.
    Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443-453.
    Pearson, W.R. 2000. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132:185-219.
    Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12:85-94.
    Saitou, N. and Nei, M. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.
    States, D.J., Gish, W., and Altschul, S.F. 1991. Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 3:66-70.
    Taylor, WR. 1988. A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28:161-169.
    Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.
    Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882.
    Thompson, J.D., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27:2682-2690.
    Wilbur, W.J. and Lipman, D.J. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. U.S.A. 80:726-730.
 Key References
    Jeanmougin, F., Thompson, J.D., Gouy, M., Higgins, D.G., and Gibson, T.J. 1998. Multiple sequence alignment with ClustalX. Trends Biochem Sci. 23:403-405.
    Higgins, D.G., Thompson, J.D., and Gibson, T.J. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383-402.

Both of these articles give extensive background and descriptive details as to what exactly happens when you try to use Clustal and what all of the parameters mean. They are intended for a lay, nontechnical audience.

 Internet Resources
    http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html

Get information on or download ClustalX.

    http://www.ebi.ac.uk/clustalw/

Run ClustalW at the EBI using the Web.

    http://cmgm.stanford.edu/phylip/

PHYLIP (Phylogeny Inference Package) version 3.5c., by J. Felsenstein. Department of Genetics, University of Washington, Seattle.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
Looking for Answers?
Do you have tips, tricks, or improvements to share?

Join the Conversation

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.