Using the RNAstructure Software Package to Predict Conserved RNA Structures

David H. Mathews1

1 Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 12.4
DOI:  10.1002/0471250953.bi1204s46
Online Posting Date:  June, 2014
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

The structures of many non‐coding RNA (ncRNA) are conserved by evolution to a greater extent than their sequences. By predicting the conserved structure of two or more homologous sequences, the accuracy of secondary structure prediction can be improved as compared to structure prediction for a single sequence. This unit provides protocols for the use of four programs in the RNAstructure suite for prediction of conserved structures, Multilign, TurboFold, Dynalign, and PARTS. These programs can be run via Web servers, on the command line, or with graphical interfaces. Curr. Protoc. Bioinform. 46:12.4.1‐12.4.22. © 2014 by John Wiley & Sons, Inc.

Keywords: RNA secondary structure prediction; RNA folding thermodynamics; RNA comparison

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Predicting a Structure Conserved in Three or More Sequences with the RNAstructure Web Server
  • Basic Protocol 2: Predicting a Structure Conserved in two Sequences with the RNAstructure Web Server
  • Alternate Protocol 1: Predicting a Structure Conserved in Three or more Sequences with TurboFold in the RNAstructure Graphical Interface
  • Alternate Protocol 2: Predicting a Structure Conserved in Two Sequences with Dynalign in the RNAstructure Graphical Interface
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   FigureFigure 12.4.1 The RNAstructure multiple sequence FASTA format. Sequences are uploaded to the RNAstructure “Predict a Secondary Structure Common to Three or More Sequences” Web server in multiple sequence FASTA format. For each sequence, the first line, a title line, needs to start with >. Subsequent lines for each sequence should only contain sequence and whitespace, which is ignored. Lowercase nucleotides will be forced single‐stranded in structure prediction. X can also be used to indicate a nucleotide that neither pairs nor stacks. Note that T is treated as U. Subsequent sequences also start with >, which indicates that a new sequence is starting with a tile.
  •   FigureFigure 12.4.2 A screen shot of the Web form for the RNAstructure “Predict a Secondary Structure Common to Three or More Sequences” Web server. Clicking “Click here to add example sequences to the box” will paste the example sequences used here.
  •   FigureFigure 12.4.3 A screen shot of the results page for the RNAstructure “Predict a Secondary Structure Common to Three or More Sequences” Web server. The top link, “Click here to download the alignment file,” provides the plain text alignment file produced by Multilign. The drawing at the top of the page, shown here, is the Multilign structure prediction for the first sequence. Below are the structure predictions for the remaining sequences, and then the structure predictions by TurboFold.
  •   FigureFigure 12.4.4 The RNAstructure ct file format. A ct (connectivity table) file contains secondary structure information for a sequence. The format used by RNAstructure is as follows. The start of first line is the number of nucleotides in the sequence. The end of the first line is the title of the structure. Each of the following lines provides information about a given base in the sequence. Each base has its own line, with these elements in order: nucleotide number (starting with 1), base (A, C, G, T, U, X), the nucleotide connection in the 5′ direction, the nucleotide connection in the 3′ direction, number of the base to which the current nucleotide is paired (no pairing is indicated by 0, zero), and natural numbering (this will be the nucleotide index repeated for the calculations described in this unit). The ct file may hold multiple structures for a single sequence. This is done by repeating the format for each structure without blank lines between structures. The example shown here is the structure predicted for RA7680 by the “Predict a Secondary Structure Common to Three or More Sequences” Web server, as illustrated in . “…” indicates parts of the ct file not displayed in the figure.
  •   FigureFigure 12.4.5 The FASTA file format. Sequences can be uploaded to the RNAstructure Web server in FASTA format. For FASTA, the first line, a title line, needs to start with >. Subsequent lines should only contain sequence and whitespace, which is ignored. Lowercase nucleotides will be forced single stranded in structure prediction. X can also be used to indicate a nucleotide that neither pairs nor stacks.
  •   FigureFigure 12.4.6 A screen shot of the Web form for the RNAstructure “Predict a Secondary Structure Common to Two Sequences” Web server. Clicking “Click here to add an example sequence to both sequence boxes” will paste the example sequences used here.
  •   FigureFigure 12.4.7 The RNAstructure constraint file format. Folding constraint files are plain text files. These can be manually edited. For multiple entries of a specific type of constraint, entries are each listed on a separate line. Note that all specifiers, followed by −1 or −1 −1, are expected by RNAstructure. For all specifiers that take two arguments, it is assumed that the first argument is the 5′ nucleotide. Panel (A) shows the specification of the fields. The constraints are XA, nucleotides that will be double‐stranded; XB, nucleotides that will be single‐stranded (unpaired); XC, nucleotides accessible to chemical modification; XD1 and XD2, forced base pair between XD1 and XD2; XE, nucleotides accessible to FMN cleavage (U in GU pair); and XF1 and XF2, a base pair prohibited between nucleotides XF1 and XF2. All nucleotide indexes are from numbering 5′ to 3′, with the nucleotide at the 5′ end having an index of 1. Panel (B) shows an example.
  •   FigureFigure 12.4.8 A screen shot of the results page for the RNAstructure “Predict a Secondary Structure Common to Two Sequences” Web server. “Click here to download the alignment file” provides a link to the plain text alignment file produced by Dynalign. The first structure drawing, shown here, is the structure predicted by Dynalign for the first sequence. The predicted structures for the second sequence are shown below. Farther down the page are the structures predicted by PARTS.
  •   FigureFigure 12.4.9 A screen shot of the RNAstructure sequence editor. This illustrates the sequence editor as it appears on Microsoft Windows 7 for RNAstructure 5.6. The Java versions for Linux and Macintosh have the same items. Note that the menu appears on the Macintosh menu bar when running on OS X. The tRNA sequence, RA7680, has been opened from disk. Note that lowercase nucleotides are not allowed to pair in structure predictions. For tRNA sequences, this is a convenient way to specify modified nucleotides that cannot base pair in a helix (Mathews et al., ). It is important that most nucleotides be uppercase.
  •   FigureFigure 12.4.10 A screen shot of the RNAstructure TurboFold input form. This is the form as it appears on Microsoft Windows 7. The Java versions for Linux and Macintosh have the same items. Note that the menu appears on the Macintosh menu bar when running on OS X. The two example sequences used here, RA7680.seq and RD0260.seq, have been selected and now appear in the list to the right.
  •   FigureFigure 12.4.11 A screen shot of the structure drawing window from RNAstructure, showing the output of the TurboFold calculation. This is the output as it appears on Microsoft Windows 7. The structure predicted is for RD0260. The steps have been followed to add color annotation and to display the color annotation key. The probability dot plot windows have been closed.
  •   FigureFigure 12.4.12 A screen shot of the Dynalign input form for RNAstructure. This is the form as it appears on Microsoft Windows 7. The example sequences, RA7680.seq and RD0260.seq, have already been selected for input.
  •   FigureFigure 12.4.13 A screen shot of Dynalign output. This is the output as it appears on Microsoft Windows 7. The structure predicted for RA7680 is on top of the structure predicted for RD02060.

Videos

Literature Cited

Literature Cited
  Bellaousov, S., Reuter, J.S., Seetin, M.G., and Mathews, D.H. 2013. RNAstructure: Web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res. 41:W471‐W474.
  Bernhart, S.H. and Hofacker, I.L. 2009. From consensus structure prediction to RNA gene finding. Brief. Funct. Genomic Proteomic 8:461‐471.
  Burgstaller, P. and Famulok, M. 1997. Flavin‐dependent photocleavage of RNA at G.U base pairs. J. Am. Chem. Soc. 119:1137‐1138.
  Cordero, P., Kladwang, W., VanLang, C.C., and Das, R. 2012. Quantitative dimethyl sulfate mapping for automated RNA secondary structure inference. Biochemistry 51:7037‐7039.
  Darty, K., Denise, A., and Ponty, Y. 2009. VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics 25:1974‐1975.
  Deigan, K.E., Li, T.W., Mathews, D.H., and Weeks, K.M. 2009. Accurate SHAPE‐directed RNA structure determination. Proc. Natl. Acad. Sci. U.S.A. 106:97‐102.
  Ehresmann, C., Baudin, F., Mougel, M., Romby, P., Ebel, J., and Ehresmann, B. 1987. Probing the structure of RNAs in solution. Nucleic Acids Res. 15:9109‐9128.
  Gutell, R.R., Lee, J.C., and Cannone, J.J. 2002. The accuracy of ribosomal RNA comparative structure models. Curr. Opin. Struct. Biol. 12:301‐310.
  Hajdin, C.E., Bellaousov, S., Huggins, W., Leonard, C.W., Mathews, D.H., and Weeks, K.M. 2013. Accurate SHAPE‐directed RNA secondary structure modeling, including pseudoknots. Proc. Natl. Acad. Sci. U.S.A. 110:5498‐5503.
  Harmanci, A.O., Sharma, G., and Mathews, D.H. 2007. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics 8:130.
  Harmanci, A.O., Sharma, G., and Mathews, D.H. 2008. PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction. Nucleic Acids Res. 36:2406‐2417.
  Harmanci, A.O., Sharma, G., and Mathews, D.H. 2009. Stochastic sampling of the RNA structural alignment space. Nucleic Acids Res. 37:4063‐4075.
  Harmanci, A.O., Sharma, G., and Mathews, D.H. 2011. TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinformatics 12:108.
  Knapp, G. 1989. Enzymatic approaches to probing RNA secondary and tertiary structure. Methods Enzymol. 180:192‐212.
  Liu, B., Mathews, D.H., and Turner, D.H. 2010. RNA pseudoknots: Folding and finding. F1000 Biol. Rep. 2:8.
  Lu, Z.J., Turner, D.H., and Mathews, D.H. 2006. A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Res. 34 4912‐4924.
  Mathews, D.H. 2004. Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA 10:1178‐1190.
  Mathews, D.H. 2005. Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics 21:2246‐2253.
  Mathews, D.H. and Turner, D.H. 2002. Dynalign: An algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. 317:191‐203.
  Mathews, D.H., Sabina, J., Zuker, M., and Turner, D.H. 1999. Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA secondary structure. J. Mol. Biol. 288:911‐940.
  Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M., and Turner, D.H. 2004. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. U.S.A. 101:7287‐7292.
  Mathews, D.H., Schroeder, S.J., Turner, D.H., and Zuker, M. 2006. Predicting RNA secondary structure. In The RNA World, Third Edition (R.F. Gesteland, T.R. Cech, and J.F. Atkins, eds.) pp. 631‐657. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.
  Merino, E.J., Wilkinson, K.A., Coughlan, J.L., and Weeks, K.M. 2005. RNA structure analysis at single nucleotide resolution by selective 2′‐hydroxyl acylation and primer extension (SHAPE). J. Am. Chem. Soc. 127:4223‐4231.
  Pace, N.R., Thomas, B.C., and Woese, C.R. 1999. Probing RNA structure, function, and history by comparative analysis. In The RNA World, 2nd Ed. (R.F. Gesteland, T.R. Cech, and J.F. Atkins, eds.) pp. 113‐141. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.
  Reeder, J., Hochsmann, M., Rehmsmeier, M., Voss, B., and Giegerich, R. 2006. Beyond Mfold: Recent advances in RNA bioinformatics. J. Biotechnol. 124:41‐55.
  Reuter, J.S. and Mathews, D.H. 2010. RNAstructure: Software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11:129.
  Seetin, M.G. and Mathews, D.H. 2012a. RNA structure prediction: An overview of methods. Methods Mol. Biol. 905:99‐122.
  Seetin, M.G. and Mathews, D.H. 2012b. TurboKnot: Rapid prediction of conserved RNA secondary structures including pseudoknots. Bioinformatics 28:792‐798.
  Sprinzl, M. and Vassilenko, K.S. 2005. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 33:D139‐140.
  Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., and Steinberg, S. 1998. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 26:148‐153.
  Uzilov, A.V., Keegan, J.M., and Mathews, D.H. 2006. Detection of non‐coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 7:173.
  Xia, T., SantaLucia, J. Jr., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C., and Turner, D.H. 1998. Thermodynamic parameters for an expanded nearest‐neighbor model for formation of RNA duplexes with Watson‐Crick pairs. Biochemistry 37:14719‐14735.
  Xu, Z. and Mathews, D.H. 2011. Multilign: An algorithm to predict secondary structures conserved in multiple RNA sequences. Bioinformatics 27:626‐632.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library