Annotating Non‐Coding RNAs with Rfam

Sam Griffiths‐Jones1

1 Wellcome Trust Sanger Institute Wellcome Trust Genome Campus, Hinxton, Cambridgeshire
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 12.5
DOI:  10.1002/0471250953.bi1205s9
Online Posting Date:  April, 2005
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Non‐coding RNA (ncRNA) genes produce a functional RNA product, rather than a translated protein. The range and importance of such genes is only recently apparent, with known ncRNAs participating in a wide range of structural, regulatory, and catalytic roles within the cell. Like protein‐coding genes, multiple sequence alignments of families of ncRNAs tell us much about their structure and function, and enable the formulation of statistical models for the detection of related sequences. Rfam is a database of families of ncRNAs, represented by structure‐annotated multiple sequence alignments and covariance models.

Keywords: non‐coding RNA; multiple sequence alignment; genome annotation

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Finding Members of a Family via the Rfam Web Interface
  • Alternate Protocol 1: Using Rfam and Infernal to Identify ncRNAs in Genomic DNA
  • Guidelines for Understanding Results
  • Commentary
  • Appendix: Rfam Alignment Formats
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library


  •   FigureFigure 12.5.1 A typical Rfam sequence search results page. The table in the upper part of the figure summarizes the hits, locations, and scores; the alignments of each subsequence to the model are shown in the lower part of the figure. The first sequence line shows the simplified consensus sequence represented by the model as generated by the search software, with the bottom line showing the subsequence match of the query. The central line shows bases conserved between the model consensus and the query, with the base‐paired secondary structure depicted as nested parentheses above (see Appendix for more information).
  •   FigureFigure 12.5.2 The Rfam Genomes page for Clostridium perfringens from the Rfam U.K. Web site. The table describes the families identified in the genome, together with types and counts. The graphic on the left shows the locations of all hits in the genome sequence.
  •   FigureFigure 12.5.3 The Rfam family page for the nuclear RNase P family.
  •   FigureFigure 12.5.4 The scheme for construction of an Rfam family. The programs cmbuild, cmsearch, and cmalign are components of the INFERNAL software. See text for more detail.
  •   FigureFigure 12.5.5 The Rfam Seed alignment for the U12 minor spliceosomal RNA family.
  •   FigureFigure 12.5.6 The SS_cons line (per‐column) annotation describes the consensus base‐paired secondary structure of a given alignment, encoded as nested sets of brackets.


Literature Cited

   Bartel, D.P. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116:281‐297.
   Doudna, J.A. and Cech, T.R. 2002. The chemical repertoire of natural ribozymes. Nature 418:222‐228.
   Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press. Cambridge, U.K.
   Griffiths‐Jones, S., Bateman, A., Marshall, M., Khanna, A., and Eddy, S.R. 2003. Rfam: An RNA family database. Nucleic Acids Res. 31:439‐441.
   Griffiths‐Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., and Bateman, A. 2005. Rfam: Annotating non‐coding RNAs in complete genomes. Nucleic Acids Res. 33:D121‐D124.
   Kiss, T. 2002. Small nucleolar RNAs: An abundant group of noncoding RNAs with diverse cellular functions. Cell 109:145‐148.
Internet Resources
  Rfam home pages.
  Rfam FTP sites.
  INFERNAL home page.
PDF or HTML at Wiley Online Library