Computer Manipulation of DNA and Protein Sequences

J. Michael Cherry1

1 Stanford University, Palo Alto, California
Publication Name:  Current Protocols in Molecular Biology
Unit Number:  Unit 7.7
DOI:  10.1002/0471142727.mb0707s30
Online Posting Date:  May, 2001
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


This unit outlines a variety of methods by which DNA sequences can be manipulated by computers. Procedures for entering sequence data into the computer and assembling raw sequence data into a contiguous sequence are described first, followed by a description of methods of analyzing and manipulating sequences‐‐e.g., verifying sequences, constructing restriction maps, designing oligonucleotides, identifying protein‐coding regions, and predicting secondary structures. This unit also provides information on the large amount of software available for sequence analysis.The appendix to this unit lists some of the commercial software, shareware, and free software related to DNA sequence manipulation. The goal of this unit is to serve as a starting point for researchers interested in utilizing the tremendous sequencing resources available to the computer‐knowledgeable molecular biology laboratory.

PDF or HTML at Wiley Online Library

Table of Contents

  • Sequence Data Entry
  • Sequence Data Verification
  • Restriction Mapping
  • Prediction of Nucleic Acid Structure
  • Oligonucleotide Design Strategy
  • Identification of Protein‐Coding Regions
  • Homology Searching
  • Genetic Sequence Databases and Other Electronic Resources Available to Molecular Biologists
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Church, G.M. and Kieffer‐Higgins, S. 1988. Multiplex DNA sequencing. Science. 240:185‐188.
   Feng, D.F. and Doolittle, R.F. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25:351‐360.
   Fickett, J. 1982. Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10:5303‐5318.
   Freier, S.M., Kierzek, R., Jaeger, J.A., Sugimoto, N., Caruthers, M.H., Neilson, T., and Turner, D.H. 1986. Improved free‐energy parameters for predictions of RNA duplex stability. Proc. Natl. Acad. Sci. U.S.A. 83:9373‐9377.
   Gonnet, G.H., Cohen, M.A., and Benner, S.A. 1992. Exhaustive matching of the entire protein sequence database. Science. 256:1443‐1445.
   Henikoff, S. and Henikoff, J.G. 1993. Performance evaluation of amino acid substitution matrices. Proteins. 17:49‐61.
   Higgins, D.G. and Sharp, P.M. 1988. Clustal: A package for performing multiple sequence alignment on a microcomputer. Gene 73:237‐244.
   Higgins, D.G. and Sharp, P.M. 1989. Fast and sensitive multiple sequence alignments on a microcomputer. Comp. App. Biosci. 5:151‐153.
   Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264‐2268.
   Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443‐453.
   Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
   Schuler, G.D., Altschul, S.F., and Lipman, D.J. 1991. A workbench for multiple alignment construction and analysis. Proteins Struct. Funct. Genet. 9:180‐190.
   Schwartz, R.M. and Dayhoff, M.O.(eds) 1978. Matrices for Detecting Distant Relationships: Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, D.C.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
   Turner, D.H., Sugimoto, N., Jaeger, J.A., Longfellow, C.E., Freier, S.M., and Kierzek, R. 1987. Improved parameters for prediction of RNA structure. Cold Spring Harbor Symp. Quant. Biol. 52:123‐133.
   Turner, D.H., Sugimoto, N., and Freier, S.M., and Kierzek, R. 1988. RNA structure prediction. Annu. Rev. Biophys. Chem. 17:167‐192.
   Wilbur, W.J. and Lipman, D.J. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. U.S.A. 80:726‐730.
   Zuker, M. 1989a. On finding all suboptimal foldings of an RNA molecule. Science. 244:48‐52.
   Zuker, M. 1989b. The use of dynamic programming algorithms in RNA secondary structure prediction. In Mathematical Methods for DNA Sequences (M.S. Waterman, ed.) p. 159‐184. CRC Press, Boca Raton, Fla.
PDF or HTML at Wiley Online Library