Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences

Maja Tarailo‐Graovac1, Nansheng Chen1

1 Simon Fraser University, Burnaby, British Columbia, Canada
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.10
DOI:  10.1002/0471250953.bi0410s25
Online Posting Date:  March, 2009
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


RepeatMasker is a popular software tool widely used in computational genomics to identify, classify, and mask repetitive elements, including low‐complexity sequences and interspersed repeats. RepeatMasker searches for repetitive sequence by aligning the input genome sequence against a library of known repeats, such as Repbase. Here, we describe two Basic Protocols that provide detailed guidelines on how to use RepeatMasker, either via the Web interface or command‐line Unix/Linux system, to analyze repetitive elements in genomic sequences. Sequence comparisons in RepeatMasker are usually performed by the alignment program cross_match, which requires significant processing time for larger sequences. An Alternate Protocol describes how to reduce the processing time using an alternative alignment program, such as WU‐BLAST. Further, the advantages, limitations, and known bugs of the software are discussed. Finally, guidelines for understanding the results are provided. Curr. Protoc. Bioinform. 25:4.10.1‐4.10.14. © 2009 by John Wiley & Sons, Inc.

Keywords: RepeatMasker; genome annotation; repetitive elements; repeat library; cross_match; WU‐BLAST; RECON

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using RepeatMasker via the Web Interface
  • Basic Protocol 2: Using the Command‐Line Unix/Linux Version of RepeatMasker to Study Repetitive Elements in Genomic Sequences
  • Alternate Protocol 1: Running RepeatMasker with WU‐BLAST
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Bao, Z. and Eddy, S.R. 2002. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12:1269‐1276.
   Bedell, J.A., Korf, I., and Gish, W. 2000. MaskerAid: A performance enhancement to RepeatMasker. Bioinformatics 16:1040‐1041.
   Jurka, J. 2001. Repbase update, a database and an electronic journal of repetitive elements. Trends Genet. 16:418‐420.
   Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., and Walichiewicz, J. 2005. Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110:462‐467.
   Price, A.L., Jones, N.C., and Pevzner, P.A. 2005. De novo identification of repeat families in large genomes. Bioinformatics 21:Suppl 1:i351‐358.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
   Stein, L.D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M., Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A., Coulson, A., D'Eustachio, P., Fitch, D.H.A., Fulton, L., Fulton, R., Griffiths‐Jones, S., Harris, T.W., Hillier, L.W., Kamath, R., Kuwabara, P.E., Marra, M., Mardis, E., Miner, T., Minx, P., Mullikin, J.C., Plumb, R.W., Rogers, J., Schein, J., Sohrmann, M., Spieth, J., Stajich, J.E., Wei, C., Willey, D., Wilson, R., Durbin, R., and Waterston, R. 2003. The genome sequence of Caenorhabditis briggsae: A platform for comparative genomics. PLoS Biol. 1:E45.
Internet Resources
  RepeatMasker Web server
  Repbase Update
  RECON Web site
  RepeatScout Web site
  cross_match Web site
  WU‐BLAST Web sites‐bin/hgGateway
  UCSC Genome Browser
  WormBaseFTP site
  RECON site, the newest version of RECON is available from the RepeatMasker
  BioPerl Web site
PDF or HTML at Wiley Online Library