Constructing and Refining Multiple Sequence Alignments with PileUp, SeqLab, and the GCG Suite

Steven M. Thompson1

1 Florida State University, Tallahassee, Florida
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 3.6
DOI:  10.1002/0471250953.bi0306s00
Online Posting Date:  February, 2003
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

This unit discusses how the Accelrys GCG Wisconsin Package SeqLab graphical user interface can be used to align, annotate, analyze, and export into alternative formats, multiple biological sequence data. The emphasis is on discovering and recognizing common elements within the dataset. The GCG programs, or implementations of public domain programs thereof, investigated include: LookUp, PileUp, PlotSimilarity, FASTA, Motifs, MEME/MotifSearch, the Profile Package, the HMMER Package, PAUPSearch, and ToFastA. ReadSeq, a non‐GCG, public domain program is also used.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Multiple Sequence Alignment Using Pileup Within SeqLab
  • Support Protocol 1: Using Lookup to Assemble a Dataset
  • Support Protocol 2: Similarity Searching to Increase (or Decrease) Dataset Size
  • Support Protocol 3: Using Plotsimilarity and SeqLab to Improve and Edit the Multiple Sequence Alignment
  • Support Protocol 4: Consensus and Masking Issue: GCG's Mask Operation
  • Support Protocol 5: Convert a Multiple Sequence Alignment to PAUP* Format for Phylogenetic Analysis
  • Support Protocol 6: Convert a GCG Multiple Sequence Alignment to PHYLIP Format for Phylogenetic Analysis
  • Basic Protocol 2: Searching Prosite: GCG'S Motifs—A Quick and Dirty Method
  • Basic Protocol 3: Searching MEME Within GCG to Identify Motifs
  • Basic Protocol 4: Profile‐Analysis: Position‐Specific Weighted Score Matrices of Multiple Sequence Alignments
  • Commentary
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Multiple Sequence Alignment Using Pileup Within SeqLab

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • SeqLab (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
    X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are provided in appendix 1D. If the user is unsure of these procedures assistance from local computer support personnel should be sought.
  • Files
    • Protein or DNA sequences of interest in GCG format (e.g., from LookUp in the GCG package and/or FASTA; see Support Protocols protocol 21 and protocol 32)

Support Protocol 1: Using Lookup to Assemble a Dataset

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • LookUp (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
    X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are given in appendix 1D. If the user is unsure of these procedures, assistance from local computer support personnel should be sought.
  • Files
    • None

Support Protocol 2: Similarity Searching to Increase (or Decrease) Dataset Size

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • SeqLab (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
  • X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are given in appendix 1D. If the user is unsure of these procedures ask for assistance from local computer support personnel.
  • Files
    • Protein or DNA sequences of interest in GCG format (e.g., from LookUp in the GCG package; see protocol 2)

Support Protocol 3: Using Plotsimilarity and SeqLab to Improve and Edit the Multiple Sequence Alignment

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG Software
  • Software
    • PlotSimilarity and SeqLab (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
  • X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are provided in appendix 1D. If the user is unsure of these procedures ask for assistance from local computer support personnel.
  • Files
    • Multiple sequence alignment in GCG format (see protocol 2)

Support Protocol 4: Consensus and Masking Issue: GCG's Mask Operation

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • SeqLab (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
  • X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are discussed in appendix 1D. If the user is unsure of these procedures, ask for assistance from local computer support personnel.
  • Files
    • Multiple sequence alignment in GCG format (see protocol 2)

Support Protocol 5: Convert a Multiple Sequence Alignment to PAUP* Format for Phylogenetic Analysis

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • SeqLab (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
  • X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are provided in appendix 1D. If the user is unsure of these procedures, ask for assistance from local computer support personnel.
  • Files
    • Multiple sequence alignment loaded into SeqLab (see protocol 1)

Support Protocol 6: Convert a GCG Multiple Sequence Alignment to PHYLIP Format for Phylogenetic Analysis

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • SeqLab (GCG Wisconsin Package; see Internet Resources)
    • ReadSeq (D.G. Gilbert; see Internet Resources; appendix 1E)
    • X‐server graphics communications software ( appendix 1D)
  • X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are provided in appendix 1D. If the user is unsure of these procedures, ask for assistance from local computer support personnel.
  • Files
    • Multiple sequence alignment loaded into SeqLab (see protocol 1).

Basic Protocol 2: Searching Prosite: GCG'S Motifs—A Quick and Dirty Method

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • MotifSearch and SeqLab (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
  • X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are provided in appendix 1D. If the user is unsure of these procedures, ask for assistance from local computer support personnel.
  • Files
    • Protein or DNA sequences of interest in GCG format (e.g., from LookUp in the GCG package; see protocol 2; also see Internet Resources)

Basic Protocol 3: Searching MEME Within GCG to Identify Motifs

  Necessary Resources
  • Hardware
    • Terminal or personal workstation with access to a Unix server running commercial GCG software
  • Software
    • SeqLab (GCG Wisconsin Package; see Internet Resources)
    • X‐server graphics communications software ( appendix 1D)
  • X‐server emulation software needs to be installed separately on personal‐style Microsoft Windows or Macintosh machines, but genuine X‐Windowing comes standard with most Unix operating systems. Microsoft Windows machines are often set up with either XWin32 or eXceed to provide this function, while Macintoshes are often loaded with either MacX or eXodus software. The details of X are provided in appendix 1D. If the user is unsure of these procedures, ask for assistance from local computer support personnel.
  • Files
    • Multiple sequence alignment loaded into SeqLab (see protocol 1).
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Altschul, S.F., Gish, W., Miller, W., Myers, E. W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389‐3402.
   Bailey, T.L. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (R. Altman, D. Brutlag, P. Karp, R. Lathrop, and D. Searls, eds.), pp.28‐36. AAAI Press, Menlo Park, Calif.
   Bailey, T.L. and Gribskov, M. 1998. Combining evidence using p‐values: Application to sequence homology searches. Bioinfor. 14:48‐54.
   Bairoch, A. 1992. PROSITE: A dictionary of sites and patterns in proteins. Nucl. Acids Res. 20:2013‐2018.
   Dobzhansky, T., Ayala, F.J., Stebbins, G.L., and Valentine, J.W. 1977. Evolution. W.H. Freeman and Co. San Francisco, Calif. [The source of the original 1973 quote is obscure though it has been cited as being transcribed from the American Biology Teacher, March 1973, 35:125‐129].
   Doolittle, R.F. 1986. Of URFs and ORFs: A primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, Calif.
   Eddy, S.R. 1996. Hidden Markov models. Cur. Opin. Struc. Bio. 6:361‐365.
   Eddy, S.R. 1998. Profile hidden Markov models. Bioinfo. 14:755‐763.
   Etzold, T. and Argos, P. 1993. SRS — An indexing and retrieval tool for flat file data libraries. Comp. App. Biosci. 9:49‐57.
   Feng, D.F. and Doolittle, R. F. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25:351‐360.
   Gribskov, M., McLachlan, M., Eisenberg, D. 1987. Profile analysis: Detection of distantly related proteins. Proc. Nat. Acad. Sci. U.S.A. 84:4355‐4358.
   Gribskov, M., Luethy, R., and Eisenberg, D. 1989. Profile analysis. Meth. Enzym. 183:146‐159.
   Hasegawa, M., Hashimoto, T., Adachi, J., Iwabe, N., and Miyata, T. 1993. Early branchings in the evolution of Eukaryotes: Ancient divergence of Entamoeba that lacks mitochondria revealed by protein sequence data. J. Mol. Evol. 36:380‐388.
   Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. U.S.A. 89:10915‐10919.
   Iwabe, N., Kuma, E.‐I., Hasegawa, M., Osawa, S., and Miyata, T. 1989. Evolutionary relationship of Archaebacteria, Eubacteria, and Eukaryotes inferred from phylogenetic trees of duplicated genes. Proc. Nat. Acad. Sci. U.S.A. 86:9355‐9359.
   Kjeldgaard, M., Nissen, P., Thirup, S., and Nyborg, J. 1993. The crystal structure of elongation factor EF‐Tu from Thermus aquaticus in the GTP conformation. Structure 1:35‐50.
   Madsen, H.O., Poulsen, K., Dahl, O., Clark, B.F., and Hjorth, J.P. 1990. Retropseudogenes constitute the major part of the human elongation factor 1 alpha gene family. Nuc. Acids Res. 18:1513‐1516.
   Pearson, W.B. 1998. Empirical statistical estimates for sequence similarity searches. J. Mol. Bio. 276:71‐84.
   Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence analysis. Proc. Nat. Acad. Sci. U.S.A. 85:2444‐2448.
   Pearson, P., Francomano, C., Foster, P., Bocchini, C., Li, P., and McKusick, V. 1994. The status of online Mendelian inheritance in man (OMIM). Nuc. Acids Res. 22:3470‐3473.
   Rivera, M.C. and Lake, J.A. 1992. Evidence that Eukaryotes and Eocyte Prokaryotes are immediate relatives. Sci. 257:74‐76.
   Saraste, M., Sibbald, P.R., and Wittinghofer, A. 1990. he P‐loop—a common motif in ATP‐ and GTP‐binding proteins. T.I.B.S. 15:430‐434.
   Sayle, R. and Milner‐White, E.J. 1995. RasMol: Biomolecular graphics for all. T.I.B.S. 20:374‐376.
   Schwartz, R.M. and Dayhoff, M.O. 1979. Matrices for detecting distant relationships. In Atlas of Protein Sequences and Structure, Vol.5 (M.O. Dayhoff, ed.) pp.353‐358. National Biomedical Research Foundation, Washington, D.C.
   Smith, S.W., Overbeek, R., Woese, C.R., Gilbert, W., and Gillevet, P.M. 1994. The genetic data environment, an expandable GUI for multiple sequence analysis. Comp. App. Biosci. 10:671‐675.
   Sogin, M.L., Morrison, H.G., Hinkle, G., and Silberman, J.D. 1996. Ancestral relationships of the major eukaryotic lineages. Microbiolgia Sem. 12:17‐28.
   Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. ClustalW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions‐specific gap penalties and weight matrix choice. Nuc. Acids Res. 22:4673‐4680.
   Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The ClustalX windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nuc. Acids Res. 24:4876‐4882.
   von Heijne, G. 1987. Sequence Analysis in Molecular Biology; Treasure Trove or Trivial Pursuit. Academic Press, San Diego.
Internet Resources
   http://evolution.genetics.washington.edu/phylip.html
  The Phylogeny Inference Package (PHYLIP), version 3.5+, is public domain software distributed by the author, J. Felsenstein. It is available online from the Department of Genomes Sciences, University of Washington, Seattle.
   http://www.uni‐giessen.de/∼gx1052/ECDC/ecdc.htm
  The E. coli Database Collection (ECDC). The K12 chromosome. Available online from Justus‐Liebig‐Universitaet, Giessen, Germany.
   http://www.accelrys.com/products/gcg_wisconsin_package/index.html.
  The Wisconsin package, version 10.3, is available from the Genetics Computer Group (GCG), a part of Accelrys, which is in turn a subsidiary of Pharmacopeia, and is copyright protected (1982–2002). The home page includes a copy of the program manual.
   http://iubio.bio.indiana.edu/soft/molbio/readseq
  The Wisconsin Package provides a comprehensive toolkit of almost 150 integrated DNA and protein analysis programs, from database, pattern and motif searching, fragment assembly, mapping, and sequence comparison, to gene finding, protein and evolutionary analysis, primer selection, and DNA and RNA secondary structure prediction. The powerful SeqLab X‐windows based graphical user interface (GUI) is a front end to the package. It provides an intuitive alternative to the Unix command line by allowing menu‐driven access to most of GCG's programs. SeqLab is based on Steve Smith and collaborators' () genetic data environment (GDE) and makes running the Wisconsin Package much easier by providing a common editing interface from which most programs can be launched and alignments manipulated.
   http://www.ncbi.nlm.nih.gov/Entrez
  ReadSeq is public domain software distributed by the author, D.G. Gilbert, and is available from the Bioinformatics Group at the Biology Department of Indiana University, Bloomington.
   http://www.ncbi.nlm.nih.gov/omim
  Entrez is public domain software distributed by the authors and available from the National Center for Biotechnology Information (NCBI) at the National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland.
   http://www.sinauer.com
  Online Mendelian Inheritance in Man (OMIM). Available from the Center for Medical Genetics, Johns Hopkins University, Baltimore, Maryland, and the NCBI at the National Library of Medicine, NIH, Bethesda, Maryland. Also see Pearson et al. ().
   http://paup.csit.fsu.edu
  Phylogenetic Analysis Using Parsimony (PAUP*) was developed by D.L. Swofford (copyright, 1989–2002). The official homepage is located at Florida State University (see below). A 4.0 beta version is available at the time of this writing, and is distributed by Sinauer Associates.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library