Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families

Lars Barquist1, Sarah W. Burge1, Paul P. Gardner2

1 Wellcome Trust Sanger Institute, Hinxton, Cambridge, 2 Biomolecular Interaction Centre, University of Canterbury, Christchurch
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 12.13
DOI:  10.1002/cpbi.4
Online Posting Date:  June, 2016
Emerging high‐throughput technologies have led to a deluge of putative non‐coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying “families” of homologous ncRNAs starting from single “seed” sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step‐by‐step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.

Keywords: covariance model; homology; RNA; Rfam; alignment; ncRNA; conservation

Table of Contents

  • Introduction
  • Strategic Planning
  • Choosing the Right Protocols
  • Basic Protocol 1: Gathering an Initial Set of Homologous Sequences
  • Basic Protocol 2: Aligning and Predicting Secondary Structure
  • Basic Protocol 3: Guidance for Manually Refining Alignments
  • Basic Protocol 4: Building a Covariance Model With Infernal
  • Basic Protocol 5: Strategies for Expanding Model Coverage
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
Basic Protocol 1: Gathering an Initial Set of Homologous Sequences

  Necessary Resources
  • Computer with an up‐to‐date Web browser (e.g., Firefox, Chrome, Internet Explorer)
  • Text editor

Basic Protocol 2: Aligning and Predicting Secondary Structure

  Necessary Resources
  • Computer with a modern Web browser (e.g., Firefox, Chrome, Internet Explorer)
  • Text editor

Basic Protocol 3: Guidance for Manually Refining Alignments

  Necessary Resources
  • Computer, preferably running a *NIX‐based operating system (e.g., Linux, MacOS X)
  • Emacs with RALEE mode installed (see

Basic Protocol 4: Building a Covariance Model With Infernal

  Necessary Resources
  • Computer running a *NIX‐based operating system (e.g., Linux, OS X)
  • Infernal (see
Literature Cited

