Predicting the Secondary Structure Common to Two RNA Sequences with Dynalign

David Mathews1

1 Center for Human Genetics and Molecular Pediatric Disease Aab Institute of Biomedical Sciences University of Rochester Medical Center, Rochester, New York
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 12.4
DOI:  10.1002/0471250953.bi1204s08
Online Posting Date:  December, 2004
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Dynalign is a dynamic programming algorithm for the simultaneous prediction of the lowest‐free‐energy secondary structure common to two RNA sequences and the alignment of the two sequences. It has been shown that the average accuracy of secondary structure prediction is improved using Dynalign, as compared to free‐energy minimization of a single sequence. This unit provides protocols for using Dynalign on a Microsoft Windows platform as part of the RNAstructure package, and for compiling and using Dynalign on Unix/Linux computers.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Using Dynalign on a Windows Platform with RNAstructure
  • Alternate Protocol 1: Using Dynalign for Unix/Linux
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   FigureFigure 12.4.1 A Sample Sequence (.seq) file. Any number of comment lines, labeled with a semicolon, can precede the sequence. The required title is on the next line and must be contained on a single line. The sequence follows, listed from 5′ to 3′. White space is ignored, but the sequence must terminate with 1. Note that RNAstructure and Dynalign do not allow lowercase nucleotides to base‐pair; therefore most nucleotides must be entered in uppercase.
  •   FigureFigure 12.4.2 The RNAstructure sequence editor.
  •   FigureFigure 12.4.3 The Dynalign window in RNAstructure. Two tRNA sequences that are included with RNAstructure are selected and the default parameters are shown. The calculation is started by clicking the START button.
  •   FigureFigure 12.4.4 Sample output from the Dynalign algorithm. This is the secondary structure predicted for the tRNA sequence RA7680.
  •   FigureFigure 12.4.5 Dynalign input. User‐provided responses are shown in bold for clarity. This input will predict the common secondary structure and alignment for the two tRNA sequences, RA7680 and RD0260.
  •   FigureFigure 12.4.6 Sample alignment (.ali) file. This is the alignment predicted for two tRNA sequences, RA7680 and RD0260, with RA7680 on top. The score reported is the Δ Gtotal (Equation ). Inserts are represented by a dash (–), and the caret symbols show locations of base pairs.
  •   FigureFigure 12.4.7 A sample CT file. The structure predicted for RA7680 is shown (for graphic equivalent, see Fig. ), with nucleotides 20 through 60 omitted for brevity. The first line gives a count of nucleotides, followed by the sequence title. Each nucleotide is then represented by a single line. From left to right are the nucleotide position, nucleotide type, 5′ connected nucleotide, 3′ connected nucleotide, canonical pair, and historical numbering. A break in the 5′ connected nucleotide and 3′ connected nucleotide sequence can be used to indicate separate strands, although Dynalign does not use this functionality. A 0 in the canonical pair column indicates that the nucleotide is unpaired. Note the symmetry in the canonical pair column—e.g., nucleotide 2 is paired to 71 and 71 is paired to 2. Historical numbering can be used to annotate a structure with a natural numbering scheme; for example, if the sequence was extracted from a longer sequence, the natural numbering does not need to start with 1. Dynalign will always output historical numbering identical to nucleotide position.

Videos

Literature Cited

   Bruccoleri, R.E. and Heinrich, H. 1988. An improved algorithm for nucleic acid secondary structure display. Comput. Appl. Biosci. 4:167‐173.
   Burgstaller, P. and Famulok, M. 1997. Flavin‐dependent photocleavage of RNA at G U base pairs. J. Am. Chem. Soc. 119:1137‐1138.
   Chen, J., Le, S., and Maizel, J.V. 2000. Prediction of common secondary structures of RNAs: A genetic algorithm approach. Nucleic Acids Res. 28:991‐999.
   Corpet, F. and Michot, B. 1994. RNAlign program: Alignment of RNA sequences using both primary and secondary structures. Comput. Appl. Biosci. 10:389‐399.
   Eddy, S.R. and Durbin, R. 1994. RNA sequence analysis using covariance models. Nucleic Acids Res. 22:2079‐2088.
   Ehresmann, C., Baudin, F., Mougel, M., Romby, P., Ebel, J., and Ehresmann, B. 1987. Probing the structure of RNAs in solution. Nucleic Acids Res. 15:9109‐9128.
   Gorodkin, J., Heyer, L.J., and Stormo, G.D. 1997. Finding the most significant common sequence and structure in a set of RNA sequences. Nucleic Acids Res. 25:3724‐3732.
   Hofacker, I.L., Fekete, M., and Stadler, P.F. 2002. Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 319:1059‐1066.
   Holmes, I. and Rubin, G.M. 2002. Pairwise RNA structure comparison using stochastic context‐free grammars. In Proceedings of the 7th Pacific Symposium on Biocomputing (PSB 2002), Lihue, Hawaii, January 3‐7, 2002 pp. 163‐174. World Scientific Press, Singapore.
   Juan, V. and Wilson, C. 1999. RNA secondary structure prediction based on free energy and phylogenetic analysis. J. Mol. Biol. 289:935‐947.
   Knapp, G. 1989. Enzymatic approaches to probing RNA secondary and tertiary structure. Methods Enzymol. 180:192‐212.
   Knudsen, B. and Hein, J.J. 1999. Using stochastic context free grammars and molecular evolution to predict RNA secondary structure. Bioinformatics 15:446‐454.
   Lück, R., Gräf, S., and Steger, G. 1999. ConStruct: A tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acids Res. 27:4208‐4217.
   Mathews, D.H. and Turner, D.H. 2002a. Dynalign: An algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 317:191‐203.
   Mathews, D.H. and Turner, D.H. 2002b. Use of chemical modification to elucidate RNA folding pathways. In Current Protocols in Nucleic Acid Chemistry (S.L. Beaucage, D.E. Bergstrum, G.D. Glick, and R.A. Jones, eds.) pp. 11.9.1‐11.9.4. John Wiley & Sons, Hoboken, N.J.
   Mathews, D.H., Sabina, J., Zuker, M., and Turner, D.H. 1999. Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA secondary structure. J. Mol. Biol. 288:911‐940.
   Mathews, D.H., Turner, D.H., and Zuker, M. 2000. RNA secondary structure prediction. In Current Protocols in Nucleic Acid Chemistry (S.L. Beaucage, D.E. Bergstrum, G.D. Glick, and R.A. Jones, eds.) pp. 11.2.1‐11.2.10. John Wiley & Sons, Hoboken, N.J.
   Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M., and Turner, D.H. 2004. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. U.S.A. 101:7287‐7292.
   Pace, N.R., Thomas, B.C., and Woese, C.R. 1999. Probing RNA structure, function, and history by comparative analysis. In The RNA World, 2nd ed. (R.F. Gesteland, T.R. Cech, and J.F. Atkins, eds.) pp. 113‐141. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.
   Sankoff, D. 1985. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45:810‐825.
   Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., and Steinberg, S. 1998. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 26:148‐153.
   Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680.
   Xia, T., SantaLucia, J. Jr., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C., and Turner, D.H. 1998. Parameters for an expanded nearest‐neighbor model for formation of RNA duplexes with Watson‐Crick pairs. Biochemistry 37:14719‐14735.
Key References
   Mathews and Turner, 2002. See above.
  Describes the Dynalign algorithm and benchmarks the accuracy of secondary structure prediction using Dynalign.
   Sankoff, 1985. See above.
  The paper that first proposed using dynamic programming to find a structure common to multiple sequences.
Internet Resources
   http://rna.chem.rochester.edu/RNAstructure.html
  The Dynalign algorithm for Microsoft Windows, as part of RNAstructure, is available for download at this URL.
   http://rna.chem.rochester.edu/dynalign.html
  The Dynalign algorithm for Unix/Linux is available for download.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library