Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families

Lars Barquist1, Sarah W. Burge1, Paul P. Gardner2

1 Wellcome Trust Sanger Institute, Hinxton, Cambridge, 2 Biomolecular Interaction Centre, University of Canterbury, Christchurch
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 12.13
DOI:  10.1002/cpbi.4
Online Posting Date:  June, 2016
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Emerging high‐throughput technologies have led to a deluge of putative non‐coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying “families” of homologous ncRNAs starting from single “seed” sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step‐by‐step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.

Keywords: covariance model; homology; RNA; Rfam; alignment; ncRNA; conservation

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Choosing the Right Protocols
  • Basic Protocol 1: Gathering an Initial Set of Homologous Sequences
  • Basic Protocol 2: Aligning and Predicting Secondary Structure
  • Basic Protocol 3: Guidance for Manually Refining Alignments
  • Basic Protocol 4: Building a Covariance Model With Infernal
  • Basic Protocol 5: Strategies for Expanding Model Coverage
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Gathering an Initial Set of Homologous Sequences

  Necessary Resources
  • Computer with an up‐to‐date Web browser (e.g., Firefox, Chrome, Internet Explorer)
  • Text editor

Basic Protocol 2: Aligning and Predicting Secondary Structure

  Necessary Resources
  • Computer with a modern Web browser (e.g., Firefox, Chrome, Internet Explorer)
  • Text editor

Basic Protocol 3: Guidance for Manually Refining Alignments

  Necessary Resources
  • Computer, preferably running a *NIX‐based operating system (e.g., Linux, MacOS X)
  • Emacs with RALEE mode installed (see http://sgjlab.org/ralee/)

Basic Protocol 4: Building a Covariance Model With Infernal

  Necessary Resources
  • Computer running a *NIX‐based operating system (e.g., Linux, OS X)
  • Infernal (see http://eddylab.org/infernal/)
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
  Alikhan, N.‐F., Petty, N.K., Ben Zakour, N.L., and Beatson, S.A. 2011. BLAST Ring Image Generator (BRIG): Simple prokaryote genome comparisons. BMC Genomics 12:402. doi: 10.1186/1471‐2164‐12‐402.
  Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410. doi: 10.1016/S0022‐2836(05)80360‐2.
  Andersen, E.S., Lind‐Thomsen, A., Knudsen, B., Kristensen, S.E., Havgaard, J.H., Torarinsson, E., Larsen, N., Zwieb, C., Sestoft, P., Kjems, J., and Gorodkin, J. 2007. Semiautomated improvement of RNA alignments. RNA 13:1850‐1859. doi: 10.1261/rna.215407.
  Argaman, L., Hershberg, R., Vogel, J., Bejerano, G., Wagner, E.G.H., Margalit, H., and Altuvia, S. 2001. Novel small RNA‐encoding genes in the intergenic regions of Escherichia coli. Curr. Biol. 11:941‐950. doi: 10.1016/S0960‐9822(01)00270‐6.
  Asai, K., Kiryu, H., Hamada, M., Tabei, Y., Sato, K., Matsui, H., Sakakibara, Y., Terai, G., and Mituyama, T. 2008. Software.ncrna.org: Web servers for analyses of RNA sequences. Nucleic Acids Res. 36:W75‐W78. doi: 10.1093/nar/gkn222.
  Barquist, L. and Vogel, J. 2015. Accelerating discovery and functional analysis of small RNAs with new technologies. Annu. Rev. Genet. 49:367‐394. doi: 10.1146/annurev‐genet‐112414‐054804.
  Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C.L., Serova, N., Davis, S., and Soboleva, A. 2013. NCBI GEO: Archive for functional genomics data sets‐update. Nucleic Acids Res. 41:D991‐D995. doi: 10.1093/nar/gks1193.
  Barrick, J.E. and Breaker, R.R. 2007. The distributions, mechanisms, and structures of metabolite‐binding riboswitches. Genome Biol. 8:R239. doi: 10.1186/gb‐2007‐8‐11‐r239.
  Barrick, J.E., Sudarsan, N., Weinberg, Z., Ruzzo, W.L., and Breaker, R.R. 2005. 6S RNA is a widespread regulator of eubacterial RNA polymerase that resembles an open promoter. RNA 11:774‐784. doi: 10.1261/rna.7286705.
  Bateman, A., Agrawal, S., Birney, E., Bruford, E.A., Bujnicki, J.M., Cochrane, G., Cole, J.R., Dinger, M.E., Enright, A.J., Gardner, P.P., Gautheret, D., Griffiths‐Jones, S., Harrow, J., Herrero, J., Holmes, I.H., Huang, H.D., Kelly, K.A., Kersey, P., Kozomara, A., Lowe, T.M., Marz, M., Moxon, S., Pruitt, K.D., Samuelsson, T., Stadler, P.F., Vilella, A.J., Vogel, J.H., Williams, K.P., Wright, M.W., and Zwieb, C. 2011. RNAcentral: A vision for an international database of RNA sequences. RNA 17:1941‐1946. doi: 10.1261/rna.2750811.
  Bauer, M., Klau, G.W., and Reinert, K. 2007. Accurate multiple sequence‐structure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics 8:271. doi: 10.1186/1471‐2105‐8‐271.
  Bernhart, S.H., Hofacker, I.L., Will, S., Gruber, A.R., and Stadler, P.F. 2008. RNAalifold: Improved consensus structure prediction for RNA alignments. BMC Bioinformatics 9:474. doi: 10.1186/1471‐2105‐9‐474.
  Birney, E., Clamp, M., and Durbin, R. 2004. GeneWise and Genomewise. Genome Res. 14:988‐995. doi: 10.1101/gr.1865504.
  Boratyn, G.M., Camacho, C., Cooper, P.S., Coulouris, G., Fong, A., Ma, N., Madden, T.L., Matten, W.T., McGinnis, S.D., Merezhuk, Y., Raytselis, Y., Sayers, E.W., Tao, T., Ye, J., and Zaretskaya, I. 2013. BLAST: A more efficient report with usability improvements. Nucleic Acids Res. 41:W29‐W33. doi: 10.1093/nar/gkt282.
  Bradley, R.K., Pachter, L., and Holmes, I. 2008. Specific alignment of structured RNA: Stochastic grammars and sequence annealing. Bioinformatics 24:2677‐2683. doi: 10.1093/bioinformatics/btn495.
  Bradley, R.K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., Holmes, I., and Pachter, L. 2009. Fast statistical alignment. PLoS Comput. Biol. 5:e1000392. doi: 10.1371/journal.pcbi.1000392.
  Brownlee, G.G. 1971. Sequence of 6S RNA of E. coli. Nature 229:147‐149. doi: 10.1038/229147a0.
  Burge, S.W., Daub, J., Eberhardt, R., Tate, J., Barquist, L., Nawrocki, E.P., Eddy, S.R., Gardner, P.P., and Bateman, A. 2013. Rfam 11.0: 10 years of RNA families. Nucleic Acids Res. 41:D226‐D232. doi: 10.1093/nar/gks1005.
  Carver, T., Harris, S.R., Berriman, M., Parkhill, J., and McQuillan, J.A. 2012. Artemis: An integrated platform for visualization and analysis of high‐throughput sequence‐based experimental data. Bioinformatics 28:464‐469. doi: 10.1093/bioinformatics/btr703.
  Carver, T.J., Rutherford, KM., Berriman, M., Rajandream, M.‐A., Barrell, B.G., and Parkhill, J. 2005. ACT: The artemis comparison tool. Bioinformatics 21:3422‐3423. doi: 10.1093/bioinformatics/bti553.
  Chan, P.P., Holmes, A.D., Smith, A.M., Tran, D., and Lowe, T.M. 2012. The UCSC archaeal genome browser: 2012 update. Nucleic Acids Res. 40:D646‐D652. doi: 10.1093/nar/gkr990.
  Chaudhuri, R.R., Yu, L., Kanji, A., Perkins, T.T., Gardner, P.P., Choudhary, J., Maskell, D.J., and Grant, A.J. 2011. Quantitative RNA‐seq analysis of the Campylobacter jejuni transcriptome. Microbiology 157:2922‐2932. doi: 10.1099/mic.0.050278‐0.
  Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., and de Hoon, M.J. 2009. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25:1422‐1423. doi: 10.1093/bioinformatics/btp163.
  Cordero, P., Lucks, J.B., and Das, R. 2012. An RNA Mapping DataBase for curating RNA structure mapping experiments. Bioinformatics 28:3006‐3008. Available at: http://bioinformatics.oxfordjournals.org/content/28/22/3006.short. doi: 10.1093/bioinformatics/bts554.
  Dalli, D., Wilm, A., Mainz, I., and Steger, G. 2006. STRAL: Progressive alignment of non‐coding RNA using base pairing probability vectors in quadratic time. Bioinformatics 22:1593‐1599. doi: 10.1093/bioinformatics/btl142.
  Durbin, R., Eddy, S.R., Krogh, A., and Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge.
  Eddy, S.R. 2004a. How do RNA folding algorithms work? Nat. Biotechnol. 22:1457‐1458. doi: 10.1038/nbt1104‐1457.
  Eddy, S.R. 2004b. What is dynamic programming? Nat. Biotechnol. 22:909‐910. doi: 10.1038/nbt0704‐909.
  Eddy, S.R. 2011. Accelerated profile HMM searches. PLoS Comput. Biol. 7:e1002195. doi: 10.1371/journal.pcbi.1002195
  Eddy, S.R. and Durbin, R. 1994. RNA sequence analysis using covariance models. Nucleic Acids Res. 22:2079‐2088. Available at: http://nar.oxfordjournals.org/content/22/11/2079.short. doi: 10.1093/nar/22.11.2079.
  ENCODE Project Consortium 2011. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9:e1001046. doi: 10.1371/journal.pbio.1001046.
  Finn, R.D., Clements, J., and Eddy, S.R. 2011. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res. 39:W29‐W37. doi: 10.1093/nar/gkr367.
  Finn, R.D., Clements, J., Arndt, W., Miller, B.L., Wheeler, T.J., Schreiber, F., Bateman, A., and Eddy, S.R. 2015. HMMER web server: 2015 update. Nucleic Acids Res. 43:W30‐W38. Available at: http://dx.doi.org/10.1093/nar/gkv397. doi: 10.1093/nar/gkv397.
  Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., and Punta, M. 2014. Pfam: The protein families database. Nucleic Acids Res. 42:D222‐D230. doi: 10.1093/nar/gkt1223.
  Flicek, P., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho‐Silva, D., Clapham, P., Coates, G., Fitzgerald, S., Gil, L., Girón, C.G., Gordon, L., Hourlier, T., Hunt, S., Johnson, N., Juettemann, T., Kähäri, A.K., Keenan, S., Kulesha, E., Martin, F.J., Maurel, T., McLaren, W.M., Murphy, D.N., Nag, R., Overduin, B., Pignatelli, M., Pritchard, B., Pritchard, E., Riat, H.S., Ruffier, M., Sheppard, D., Taylor, K., Thormann, A., Trevanion, S.J., Vullo, A., Wilder, S.P., Wilson, M., Zadissa, A., Aken, B.L., Birney, E., Cunningham, F., Harrow, J., Herrero, J., Hubbard, T.J., Kinsella, R., Muffato, M., Parker, A., Spudich, G., Yates, A., Zerbino, D.R., and Searle, S.M. 2014. Ensembl 2014. Nucleic Acids Res. 42:D749‐D755. doi: 10.1093/nar/gkt1196.
  Freyhult, E.K., Bollback, J.P., and Gardner, P.P. 2007. Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA. Genome Res. 17:117‐125. doi: 10.1101/gr.5890907.
  Fu, Y., Deiorio‐Haggar, K., Anthony, J., and Meyer, M.M. 2013. Most RNAs regulating ribosomal protein biosynthesis in Escherichia coli are narrowly distributed to Gammaproteobacteria. Nucleic Acids Res. 41:3491‐3503. doi: 10.1093/nar/gkt055.
  Gardner, P.P. 2009. The use of covariance models to annotate RNAs in whole genomes. Brief. Funct. Genomic Proteomic 8:444‐450. doi: 10.1093/bfgp/elp042.
  Gardner, P.P. and Bateman, A.G. 2009. A home for RNA families at RNA Biology. RNA Biol. 6:2‐4. doi: 10.4161/rna.6.1.7635.
  Gardner, P.P. and Eldai, H. 2015. Annotating RNA motifs in sequences and alignments. Nucleic Acids Res. 43:691‐698. doi: 10.1093/nar/gku1327.
  Gardner, P.P. and Giegerich, R. 2004. A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5:140. doi: 10.1186/1471‐2105‐5‐140.
  Gardner, P.P., Wilm, A., and Washietl, S. 2005. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. 33:2433‐2439. doi: 10.1093/nar/gki541.
  Gardner, P.P., Bateman, A., and Poole, A.M. 2010. SnoPatrol: How many snoRNA genes are there? J. Biol. 9:4. doi: 10.1186/jbiol211.
  Gardner, P.P., Barquist, L., Bateman, A., Nawrocki, E.P., and Weinberg, Z. 2011. RNIE: Genome‐wide prediction of bacterial intrinsic terminators. Nucleic Acids Res. 39:5845‐5852. doi: 10.1093/nar/gkr168.
  Gogol, E.B., Rhodius, V.A., Papenfort, K., Vogel, J., and Gross, C.A. 2011. Small RNAs endow a transcriptional activator with essential repressor functions for single‐tier control of a global stress regulon. Proc. Natl. Acad. Sci. U.S.A. 108:12875‐12880. doi: 10.1073/pnas.1109379108.
  Griffiths‐Jones, S. 2005. RALEE‐RNA ALignment editor in Emacs. Bioinformatics 21:257‐259. doi: 10.1093/bioinformatics/bth489.
  Gruber, A.R., Lorenz, R., Bernhart, S.H., Neuböck, R., and Hofacker, I.L. 2008b. The Vienna RNA websuite. Nucleic Acids Res. 36:W70‐W74. doi: 10.1093/nar/gkn188.
  Gruber, A.R., Kilgus, C., Mosig, A., Hofacker, I.L., Hennig, W., and Stadler, P.F. 2008a. Arthropod 7SK RNA. Mol. Biol. Evol. 25:1923‐1930. doi: 10.1093/molbev/msn140.
  Gutell, R.R., Weiser, B., Woese, C.R., and Noller, H.F. 1985. Comparative anatomy of 16‐S‐like ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 32:155‐216. doi: 10.1016/S0079‐6603(08)60348‐7.
  Hamada, M., Kiryu, H., Sato, K., Mituyama, T., and Asai, K. 2009a. Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 25:465‐473. doi: 10.1093/bioinformatics/btn601.
  Hamada, M., Sato, K., Kiryu, H., Mituyama, T., and Asai, K. 2009b. CentroidAlign: Fast and accurate aligner for structured RNAs by maximizing expected sum‐of‐pairs score. Bioinformatics 25:3236‐3243. doi: 10.1093/bioinformatics/btp580.
  Hertel, J., de Jong, D., Marz, M., Rose, D., Tafer, H., Tanzer, A., Schierwater, B., and Stadler, P.F. 2009. Non‐coding RNA annotation of the genome of Trichoplax adhaerens. Nucleic Acids Res. 37:1602‐1615. doi: 10.1093/nar/gkn1084.
  Höchsmann, M., Töller, T., Giegerich, R., and Kurtz, S. 2003. Local similarity in RNA secondary structures. Proc. IEEE Comput. Soc. Bioinform. Conf. 2:159‐168.
  Hoeppner, M.P., Barquist, L.E., and Gardner, P.P. 2014. An introduction to RNA databases. In RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods Methods in Molecular Biology (J. Gorodkin and W.L. Ruzzo, eds.) pp. 107‐123. Humana Press, Totowa, N.J.
  Iantorno, S., Gori, K., Goldman, N., Gil, M., and Dessimoz, C. 2014. Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment. In Multiple Sequence Alignment Methods Methods in Molecular Biology, pp. 59‐73. Humana Press, Totowa, N.J.
  Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., and Madden, T.L. 2008. NCBI BLAST: A better web interface. Nucleic Acids Res. 36:W5‐W9. doi: 10.1093/nar/gkn201.
  Jossinet, F. and Westhof, E. 2005. Sequence to Structure (S2S): Display, manipulate and interconnect RNA data from sequence to structure. Bioinformatics 21:3320‐3321. doi: 10.1093/bioinformatics/bti504.
  Karolchik, D., Barber, G.P., Casper, J., Clawson, H., Cline, M.S., Diekhans, M., Dreszer, T.R., Fujita, P.A., Guruvadoo, L., Haeussler, M., Harte, R.A., Heitner, S., Hinrichs, A.S., Learned, K., Lee, B.T., Li, C.H., Raney, B.J., Rhead, B., Rosenbloom, K.R., Sloan, C.A., Speir, M.L., Zweig, A.S., Haussler, D., Kuhn, R.M., and Kent, W.J. 2014. The UCSC genome browser database: 2014 update. Nucleic Acids Res. 42:D764‐D770. doi: 10.1093/nar/gkt1168.
  Katoh, K. and Standley, D.M. 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30:772‐780. doi: 10.1093/molbev/mst010.
  Kent, W.J. 2002. BLAT—The BLAST‐Like Alignment Tool. Genome Res. 12:656‐664. doi: 10.1101/gr.229202.
  Kiryu, H., Tabei, Y., Kin, T., and Asai, K. 2007. Murlet: A practical multiple alignment tool for structural RNA sequences. Bioinformatics 23:1588‐1598. doi: 10.1093/bioinformatics/btm146.
  Knudsen, B. and Hein, J. 2003. Pfold: RNA secondary structure prediction using stochastic context‐free grammars. Nucleic Acids Res. 31:3423‐3428. doi: 10.1093/nar/gkg614.
  Kwok, C.K., Tang, Y., Assmann, S.M., and Bevilacqua, P.C. 2015. The RNA structurome: Transcriptome‐wide structure probing with next‐generation sequencing. Trends Biochem. Sci. 40:221‐232. doi: 10.1016/j.tibs.2015.02.005.
  Lai, D. and Meyer, I.M. 2015. A comprehensive comparison of general RNA‐RNA interaction prediction methods. Nucleic Acids Res. Available at: http://nar.oxfordjournals.org/content/early/2015/12/15/nar.gkv1477.abstract. doi: 10.1093/nar/gkv1477.
  Lane, D.J., Pace, B., Olsen, G.J., Stahl, D.A., Sogin, M.L., and Pace, N.R. 1985. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc. Natl. Acad. Sci. U.S.A. 82:6955‐6959. doi: 10.1073/pnas.82.20.6955.
  Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., and Higgins, D.G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947‐2948. doi: 10.1093/bioinformatics/btm404.
  Lindgreen, S., Gardner, P.P., and Krogh, A. 2007. MASTR: Multiple alignment and structure prediction of non‐coding RNAs using simulated annealing. Bioinformatics 23:3304‐3311. doi: 10.1093/bioinformatics/btm525.
  Lindgreen, S., Umu, S.U., Lai, A.S.‐W., Eldai, H., Liu, W., McGimpsey, S., Wheeler, N.E., Biggs, P.J., Thomson, N.R., Barquist, L., Poole, A.M., and Gardner, P.P. 2014. Robust identification of noncoding RNA from transcriptomes requires phylogenetically‐informed sampling. PLoS Comput. Biol. 10:e1003907. doi: 10.1371/journal.pcbi.1003907.
  Livny, J., Fogel, M.A., Davis, B.M., and Waldor, M.K. 2005. sRNAPredict: An integrative computational approach to identify sRNAs in bacterial genomes. Nucleic Acids Res. 33:4096‐4105. doi: 10.1093/nar/gki715.
  Löytynoja, A. and Goldman, N. 2008. Phylogeny‐aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632‐1635. doi: 10.1126/science.1158395.
  Marz, M., Mosig, A., Podlevsky, J.D., and Stadler, P.F. 2012. The common ancestral core of vertebrate and fungal telomerase RNAs. Nucleic Acids Res. Available at: http://nar.oxfordjournals.org/content/early/2012/10/23/nar.gks980.short. doi: 10.1093/nar/gks980.
  Marz, M., Donath, A., Verstraete, N., Nguyen, V.T., Stadler, P.F., and Bensaude, O. 2009. Evolution of 7SK RNA and its protein partners in metazoa. Mol. Biol. Evol. 26:2821‐2830. doi: 10.1093/molbev/msp198.
  McWilliam, H., Li, W., Uludag, M., Squizzato, S., Park, Y.M., Buso, N., Cowley, A.P., and Lopez, R. 2013. Analysis tool web services from the EMBL‐EBI. Nucleic Acids Res. 41:W597‐W600. doi: 10.1093/nar/gkt376.
  Menzel, P., Gorodkin, J., and Stadler, P.F. 2009. The tedious task of finding homologous noncoding RNA genes. RNA 15:2075‐2082. doi: 10.1261/rna.1556009.
  Mituyama, T., Yamada, K., Hattori, E., Okida, H., Ono, Y., Terai, G., Yoshizawa, A., Komori, T., and Asai, K. 2009. The Functional RNA Database 3.0: Databases to support mining and annotation of functional RNAs. Nucleic Acids Res. 37:D89‐D92. doi: 10.1093/nar/gkn805.
  Mosig, A., Chen, J.J.‐L., and Stadler, P.F. 2007. Homology search with fragmented nucleic acid sequence patterns. In Algorithms in Bioinformatics Lecture Notes in Computer Science, pp. 335‐345. Springer, Berlin, Heidelberg.
  Mosig, A., Zhu, L., and Stadler, P.F. 2009. Customized strategies for discovering distant ncRNA homologs. Brief Funct. Genomic Proteomic 8:451‐460. doi: 10.1093/bfgp/elp035.
  Myslinski, E., Ségault, V., and Branlant, C. 1990. An intron in the genes for U3 small nucleolar RNAs of the yeast Saccharomyces cerevisiae. Science 247:1213‐1216. doi: 10.1126/science.1690452.
  Nawrocki, E.P. 2014. Annotating functional RNAs in genomes using Infernal. Methods Mol. Biol. 1097:163‐197. doi: 10.1007/978‐1‐62703‐709‐9_9.
  Nawrocki, E.P. and Eddy, S.R. 2007. Query‐dependent banding (QDB) for faster RNA similarity searches. PLoS Comput. Biol. 3:e56. Available at: http://dx.plos.org/10.1371/journal.pcbi.0030056. doi: 10.1371/journal.pcbi.0030056.
  Nawrocki, E.P. and Eddy, S.R. 2013. Infernal 1.1: 100‐fold faster RNA homology searches. Bioinformatics 29:2933‐2935. doi: 10.1093/bioinformatics/btt509.
  Nawrocki, E.P., Kolbe, D.L., and Eddy, S.R. 2009. Infernal 1.0: Inference of RNA alignments. Bioinformatics 25:1335‐1337. doi: 10.1093/bioinformatics/btp157.
  Nawrocki, E.P., Burge, S.W., Bateman, A., Daub, J., Eberhardt, R.Y., Eddy, S.R., Floden, E.W., Gardner, P.P., Jones, T.A., Tate, J., and Finn, R.D. 2015. Rfam 12.0: Updates to the RNA families database. Nucleic Acids Res. 43:D130‐D137. doi: 10.1093/nar/gku1063.
  Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443‐453. doi: 10.1016/0022‐2836(70)90057‐4.
  Notredame, C., Higgins, D.G., and Heringa, J. 2000. T‐Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302:205‐217. doi: 10.1006/jmbi.2000.4042.
  Nussinov, R., Pieczenik, G., Griggs, J.R., and Kleitman, D.J. 1978. Algorithms for loop matchings. SIAM J. Appl. Math. 35:68‐82. doi: 10.1137/0135006.
  Pace, N.R., Smith, D.K., Olsen, G.J., and James, B.D. 1989. Phylogenetic comparative analysis and the secondary structure of ribonuclease P RNA—a review. Gene 82:65‐75. doi: 10.1016/0378‐1119(89)90031‐0.
  Perkins, T.T., Kingsley, R.A., Fookes, M.C., Gardner, P.P., James, K.D., Yu, L., Assefa, S.A., He, M., Croucher, N.J., Pickard, D.J., Maskell, D.J., Parkhill, J., Choudhary, J., Thomson, N.R., and Dougan, G. 2009. A strand‐specific RNA‐Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet. 5:e1000569. doi: 10.1371/journal.pgen.1000569.
  Puton, T., Kozlowski, L.P., Rother, K.M., and Bujnicki, J.M. 2014. CompaRNA: A server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res. 42:5403‐5406. doi: 10.1093/nar/gku208.
  Reeder, J. and Giegerich, R. 2005. Consensus shapes: An alternative to the Sankoff algorithm for RNA consensus structure prediction. Bioinformatics 21:3516‐3523. doi: 10.1093/bioinformatics/bti577.
  Richter, A. and Backofen, R. 2012. Accessibility and conservation: General features of bacterial small RNA‐mRNA interactions? RNA Biol. 9:954‐965. doi: 10.4161/rna.20294.
  Rinn, J.L. and Chang, H.Y. 2012. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81:145‐166. doi: 10.1146/annurev‐biochem‐051410‐092902.
  RNAcentral Consortium. 2015. RNAcentral: An international database of ncRNA sequences. Nucleic Acids Res. 43:D123‐D129. doi: 10.1093/nar/gku991.
  Roberts, E., Eargle, J., Wright, D., and Luthey‐Schulten, Z. 2006. MultiSeq: Unifying sequence and structure data for evolutionary analysis. BMC Bioinformatics 7:382. doi: 10.1186/1471‐2105‐7‐382.
  Rocca‐Serra, P., Bellaousov, S., Birmingham, A., Chen, C., Cordero, P., Das, R., Davis‐Neulander, L., Duncan, C.D.S., Halvorsen, M., Knight, R., Leontis, N.B., Mathews, D.H., Ritz, J., Stombaugh, J., Weeks, K.M., Zirbel, C.L., and Laederach, A. 2011. Sharing and archiving nucleic acid structure mapping data. RNA 17:1204‐1212. doi: 10.1261/rna.2753211.
  Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12:85‐94. doi: 10.1093/protein/12.2.85.
  Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjölander, K., Underwood, R.C., and Haussler, D. 1994. Stochastic context‐free grammars for tRNA modeling. Nucleic Acids Res. 22:5112‐5120. Available at: http://nar.oxfordjournals.org/content/22/23/5112.short. doi: 10.1093/nar/22.23.5112.
  Sankoff, D. 1985. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math. 45:810‐825. doi: 10.1137/0145048.
  Schneider, K.L., Pollard, K.S., Baertsch, R., Pohl, A., and Lowe, T.M. 2006. The UCSC archaeal genome browser. Nucleic Acids Res. 34:D407‐D410. doi: 10.1093/nar/gkj134.
  Schu, D.J., Zhang, A., Gottesman, S., and Storz, G. 2015. Alternative Hfq‐sRNA interaction modes dictate alternative mRNA recognition. EMBO J. 34:2557‐2573. doi: 10.15252/embj.201591569.
  Schwartz, A.S. and Pachter, L. 2007. Multiple alignment by sequence annealing. Bioinformatics 23:e24‐e29. doi: 10.1093/bioinformatics/btl311.
  Schwartz, A.S., Myers, E.W., and Pachter, L. 2005. Alignment metric accuracy. arXiv [q‐bio.QM]. Available at: http://arxiv.org/abs/q‐bio/0510052.
  Seemann, S.E., Richter, A.S., Gesell, T., Backofen, R., and Gorodkin, J. 2011. PETcofold: Predicting conserved interactions and structures of two multiple alignments of RNA sequences. Bioinformatics 27:211‐219. doi: 10.1093/bioinformatics/btq634.
  Sharma, C.M., Hoffmann, S., Darfeuille, F., Reignier, J., Findeiss, S., Sittka, A., Chabas, S., Reiche, K., Hackermüller, J., Reinhardt, R., Stadler, P.F., and Vogel, J. 2010. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464:250‐255. doi: 10.1038/nature08756.
  Silvester, N., Alako, B., Amid, C., Cerdeño‐Tárraga, A., Cleland, I., Gibson, R., Goodgame, N., Ten Hoopen, P., Kay, S., Leinonen, R., Li, W., Liu, X., Lopez, R., Pakseresht, N., Pallreddy, S., Plaister, S., Radhakrishnan, R., Rossello, M., Senf, A., Smirnov, D., Toribio, A.L., Vaughan, D., Zalunin, V., and Cochrane, G. 2015. Content discovery and retrieval services at the european nucleotide archive. Nucleic Acids Res. 43:D23‐D29. doi: 10.1093/nar/gku1129.
  Smith, C., Heyne, S., Richter, A.S., Will, S., and Backofen, R. 2010. Freiburg RNA Tools: A web server integrating INTARNA, EXPARNA and LOCARNA. Nucleic Acids Res. 38:W373‐W377. doi: 10.1093/nar/gkq316.
  Spitale, R.C., Flynn, R.A., Torre, E.A., Kool, E.T., and Chang, H.Y. 2014. RNA structural analysis by evolving SHAPE chemistry. Wiley Interdiscip. Rev. RNA 5:867‐881. doi: 10.1002/wrna.1253.
  Stadler, P.F., Chen, J.J.‐L., Hackermüller, J., Hoffmann, S., Horn, F., Khaitovich, P., Kretzschmar, A.K., Mosig, A., Prohaska, S.J., Qi, X., Schutt, K., and Ullmann, K. 2009. Evolution of vault RNAs. Mol. Biol. Evol. 26:1975‐1991. doi: 10.1093/molbev/msp112.
  Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G.R., Korf, I., Lapp, H., Lehväslaiho, H., Matsalla, C., Mungall, C.J., Osborne, B.I., Pocock, M.R., Schattner, P., Senger, M., Stein, L.D., Stupka, E., Wilkinson, M.D., and Birney, E. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12:1611‐1618. doi: 10.1101/gr.361602.
  Stombaugh, J., Widmann, J., McDonald, D., and Knight, R. 2011. Boulder ALignment Editor (ALE): A web‐based RNA alignment tool. Bioinformatics 27:1706‐1707. doi: 10.1093/bioinformatics/btr258.
  Suchard, M.A. and Redelings, B.D. 2006. BAli‐Phy: Simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047‐2048. doi: 10.1093/bioinformatics/btl175.
  Sullivan, M.J., Petty, N.K., and Beatson, S.A. 2011. Easyfig: A genome comparison visualizer. Bioinformatics 27:1009‐1010. doi: 10.1093/bioinformatics/btr039.
  Tabei, Y., Kiryu, H., Kin, T., and Asai, K. 2008. A fast structural multiple alignment method for long RNA sequences. BMC Bioinformatics 9:33. doi: 10.1186/1471‐2105‐9‐33.
  Thompson, J.D., Plewniak, F., and Poch, O. 1999. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27:2682‐2690. doi: 10.1093/nar/27.13.2682.
  Thorvaldsdóttir, H., Robinson, J.T., and Mesirov, J.P. 2013. Integrative Genomics Viewer (IGV): High‐performance genomics data visualization and exploration. Brief. Bioinformatics 14:178‐192. doi: 10.1093/bib/bbs017.
  Torarinsson, E. and Lindgreen, S. 2008. WAR: Webserver for aligning structural RNAs. Nucleic Acids Res. 36:W79‐W84. doi: 10.1093/nar/gkn275.
  Torarinsson, E., Havgaard, J.H., and Gorodkin, J. 2007. Multiple structural alignment and clustering of RNA sequences. Bioinformatics 23:926‐932. doi: 10.1093/bioinformatics/btm049.
  Vogel, J. 2009. A rough guide to the non‐coding RNA world of Salmonella. Mol. Microbiol. 71:1‐11. Available at: http://onlinelibrary.wiley.com/doi/10.1111/j.1365‐2958.2008.06505.x/full. doi: 10.1111/j.1365‐2958.2008.06505.x.
  Waldispühl, J., Kam, A., and Gardner, P.P. 2015. Crowdsourcing RNA structural alignments with an online computer game. Pac. Symp. Biocomput. 330‐341. doi: 10.1142/9789814644730_0032.
  Wassarman, K.M. and Storz, G. 2000. 6S RNA regulates E. coli RNA polymerase activity. Cell 101:613‐623. doi: 10.1016/S0092‐8674(00)80873‐9.
  Waterhouse, A.M., Procter, J.B., Martin, D.M.A., Clamp, M., and Barton, G.J. 2009. Jalview Version 2‐a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189‐1191. doi: 10.1093/bioinformatics/btp033.
  Wehner, S., Damm, K., Hartmann, R.K., and Marz, M. 2014. Dissemination of 6S RNA among bacteria. RNA Biol. 11:1467‐1478. doi: 10.4161/rna.29894.
  Weinberg, Z., Wang, J.X., Bogue, J., Yang, J., Corbino, K., Moy, R.H., and Breaker, R.R. 2010. Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes. Genome Biol. 11:R31. doi: 10.1186/gb‐2010‐11‐3‐r31.
  Weinberg, Z., Kim, P.B., Chen, T.H., Li, S., Harris, K.A., Lünse, C.E., and Breaker, R.R. 2015. New classes of self‐cleaving ribozymes revealed by comparative genomics analysis. Nat. Chem. Biol. 11:606‐610. doi: 10.1038/nchembio.1846.
  Westesson, O., Barquist, L., and Holmes, I. 2012. HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstruction. Bioinformatics 28:1170‐1171. doi: 10.1093/bioinformatics/bts058.
  Wheeler, T.J. and Eddy, S.R. 2013. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29:2487‐2489. doi: 10.1093/bioinformatics/btt403.
  Will, S., Reiche, K., Hofacker, I.L., Stadler, P.F., and Backofen, R. 2007. Inferring noncoding RNA families and classes by means of genome‐scale structure‐based clustering. PLoS Comput. Biol. 3:e65. doi: 10.1371/journal.pcbi.0030065.
  Woese, C.R., Gutell, R., Gupta, R., and Noller, H.F. 1983. Detailed analysis of the higher‐order structure of 16S‐like ribosomal ribonucleic acids. Microbiol. Rev. 47:621‐669.
  Wong, K.M., Suchard, M.A., and Huelsenbeck, J.P. 2008. Alignment uncertainty and genomic analysis. Science 319:473‐476. doi: 10.1126/science.1151532.
  Xie, M., Mosig, A., Qi, X., Li, Y., Stadler, P.F., and Chen, J.J.‐L. 2008. Size variation and structural conservation of vertebrate telomerase RNA. J. Biol. Chem. 283:2049‐2059. doi: 10.1074/jbc.M708032200.
  Xu, X., Ji, Y., and Stormo, G.D. 2007. RNA Sampler: A new sampling based algorithm for common RNA secondary structure prediction and structural alignment. Bioinformatics 23:1883‐1891. doi: 10.1093/bioinformatics/btm272.
  Yao, Z., Weinberg, Z., and Ruzzo, W.L. 2006. CMfinder—a covariance model based RNA motif finding algorithm. Bioinformatics 22:445‐452. doi: 10.1093/bioinformatics/btk008.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library