Clustal Omega

Fabian Sievers1, Desmond G. Higgins1

1 School of Medicine and Medical Science, Conway Institute, University College Dublin
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 3.13
DOI:  10.1002/0471250953.bi0313s48
Online Posting Date:  December, 2014
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Clustal Omega is a package for making multiple sequence alignments of amino acid or nucleotide sequences, quickly and accurately. It is a complete upgrade and rewrite of earlier Clustal programs. This unit describes how to run Clustal Omega interactively from a command line, although it can also be run online from several sites. The unit describes a basic protocol for taking a set of unaligned sequences and producing a full alignment. There are also protocols for using an external HMM or iteration to help improve an alignment. © 2014 by John Wiley & Sons, Inc.

Keywords: Clustal Omega; multiple sequence alignment; iteration; HMM; hidden Markov model

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Creating a Multiple Sequence Alignment from Unaligned Sequences with Clustal Omega
  • Basic Protocol 2: Using an Existing Multiple Alignment to Increase Alignment Accuracy
  • Basic Protocol 3: Iterative Refinement of a Multiple Sequence Alignment
  • Support Protocol 1: Compiling Clustal Omega from Source
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
  Bailey, T.L and Elkan, C 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2:28‐36.
  Blackshields, G., Sievers, F., Shi, W., Wilm, A., and Higgins, D.G. 2010. Sequence emBedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol. Biol. 5:21.
  Boyce, K., Sievers, F., and Higgins, D.G. 2014. Simple chained guide trees give high‐quality protein multiple sequence alignments. PNAS 111:10556‐10561.
  Davey, N.E., Edwards, R.J., and Shields, D.C 2007. The SliMDisc server: Short, linear motif discovery in proteins. Nucleic Acids Res. 35:W455‐W459.
  Edgar, R.C 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792‐1797.
  Feng, D. and Doolittle, R.F 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 60:351‐360.
  Finn, R.D, Clements, J., and Eddy, S.R. 2011. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res. 39:W29‐W37.
  Gouy, M., Guindon, S., and Gascuel, O. 2010. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27:221‐224.
  Higgins, D.G. and Sharp, P.M. 1988. CLUSTAL: A package for performing multiple sequence alignment on a microcomputer. Gene 73:237‐244.
  Higgins, D.G. and Sharp, P.M. 1989. Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 5:151‐153.
  Higgins, D.G., Bleasby, A.J., and Fuchs, R. 1992. CLUSTAL V: Improved software for multiple sequence alignment. Comput. Appl. Biosci. 8:189‐191.
  Hogeweg, P. and Hesper, B. 1984. The alignment of sets of sequences and the construction of phyletic trees: An integrated method. J. Mol. Evol. 20:175‐186.
  Katoh, K., Kuma, K., Miyata, T., and Toh, H. 2005. MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33:511‐518.
  Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., and Higgins, D.G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947‐2948.
  Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E.L.L., Eddy, S.R., Bateman, A., and Finn R.D. 2012 The Pfam protein families database. Nucleic Acids Res. 40:D290‐D301.
  Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., and Higgins, D.G. 2011. Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Mol. Sys. Biol. 7:539.
  Sievers, F., Dineen, D., Wilm, A., and Higgins, D.G. 2013. Making automated multiple alignments of very large numbers of protein sequences. Bioinformatics 29:989‐995.
  Söding, J. 2005. Protein homology detection by HMM‐HMM comparison. Bioinformatics 21:951‐960.
  Taylor, W.R. 1987. Multiple sequence alignment by a pairwise algorithm. Comput Appl Biosci. 1987 Jun;3(2):81‐7.
  Thompson, J. D., Higgins, D. G. and Gibson, T. J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680.
  Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876‐4882.
  Troshin, P.V., Procter, J.B., and Barton, G.J. 2011. Java bioinformatics analysis web services for multiple sequence alignment: JABAWS:MSA. Bioinformatics 27:2001‐2002.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library