The HMMER Web Server for Protein Sequence Similarity Search

Ananth Prakash1, Matt Jeffryes1, Alex Bateman1, Robert D. Finn1

1 European Molecular Biology Laboratory, The European Bioinformatics Institute (EMBL‐EBI), Wellcome Genome Campus, Cambridge, Hinxton
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 3.15
DOI:  10.1002/cpbi.40
Online Posting Date:  December, 2017
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence‐search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc.

Keywords: bioinformatics; homology; profile hidden Markov model; protein sequence analysis

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Quick Search Using PHMMER
  • Alternate Protocol 1: PHMMER Advanced Search
  • Support Protocol 1: Protein Sequence Databases
  • Basic Protocol 2: Quick Profile Search Using HMMSCAN
  • Basic Protocol 3: Iterative Searching with JACKHMMER
  • Alternate Protocol 2: JACKHMMER Using Profile HMM or Multiple Sequence Alignment as Input
  • Support Protocol 2: Generating a Multiple Sequence Alignment
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Quick Search Using PHMMER

  Necessary Resources
  • An up‐to‐date Web browser such as Firefox, Safari, or Chrome

Alternate Protocol 1: PHMMER Advanced Search

  Necessary Resources
  • An up‐to‐date Web browser such as Firefox, Safari, or Chrome

Support Protocol 1: Protein Sequence Databases

  Necessary Resources
  • An up‐to‐date Web browser such as Firefox, Safari, or Chrome

Basic Protocol 2: Quick Profile Search Using HMMSCAN

  Necessary Resources
  • An up‐to‐date Web browser such as Firefox, Safari, or Chrome

Basic Protocol 3: Iterative Searching with JACKHMMER

  Necessary Resources
  • An up‐to‐date Web browser such as Firefox, Safari, or Chrome

Alternate Protocol 2: JACKHMMER Using Profile HMM or Multiple Sequence Alignment as Input

  Necessary Resources
  • An up‐to‐date Web browser such as Firefox, Safari, or Chrome

Support Protocol 2: Generating a Multiple Sequence Alignment

  Necessary Resources
  • An up‐to‐date Web browser such as Firefox, Safari, or Chrome
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
  Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402. doi: 10.1093/nar/25.17.3389.
  Coggill, P., Finn, R. D., & Bateman, A. (2008). Identifying protein domains with the Pfam database. Current Protocols in Bioinformatics, 23, 2.5:2.5.1–2.5.17. doi: 10.1002/0471250953.bi0205s23.
  Finn, R. D., Clements, J., & Eddy, S. R. (2011). HMMER web server: Interactive sequence similarity searching. Nucleic Acids Research, 39, W29–37. doi: 10.1093/nar/gkr367.
  Finn, R. D., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Mistry, J., Mitchell, A. L., … Bateman, A. (2016). The Pfam protein families database: Towards a more sustainable future. Nucleic Acids Research, 44, D279–285. doi: 10.1093/nar/gkv1344.
  Gibney, G., & Baxevanis, A. D. (2011). Searching NCBI databases using Entrez. Current Protocols in Bioinformatics, 34, 1.3:1.3.1–1.3.25. doi: 10.1002/0471250953.bi0103s34.
  Haft, D. H., Selengut, J. D., Richter, R. A., Harkins, D., Basu, M. K., & Beck, E. (2013). TIGRFAMs and Genome Properties in 2013. Nucleic Acids Research, 41, D387–395. doi: 10.1093/nar/gks1234.
  Henikoff, S., & Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89, 10915–10919. doi: 10.1073/pnas.89.22.10915.
  Mills, L. (2014). Common file formats. Current Protocols in Bioinformatics, 1, 1B:A.1B.1–A.1B.18. doi: 10.1002/0471250953.bia01bs45.
  Pearson, W. R. (2013a). An introduction to sequence similarity (“homology”) searching. Current Protocols in Bioinformatics, 42, 3.1:3.1.1–3.1.8. doi: 10.1002/0471250953.bi0301s42.
  Pearson, W. R. (2013b). Selecting the right similarity‐scoring matrix. Current Protocols in Bioinformatics, 3, 3.5:3.5.1–3.5.9.
  Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America, 85, 2444–2448. doi: 10.1073/pnas.85.8.2444.
  Pundir, S., Magrane, M., Martin, M. J., O'Donovan, C., & The UniProt Consortium. (2015). Searching and navigating UniProt databases. Current Protocols in Bioinformatics, 50, 1.27.1‐1.27.10. doi: 10.1002/0471250953.bi0127s50.
  Schuster‐Böckler, B., & Bateman, A. (2007). An introduction to hidden Markov models. Current Protocols in Bioinformatics, 18, A.3A.1–A.3A.9. doi: 10.1002/0471250953.bia03as18.
  Sillitoe, I., Lewis, T., & Orengo, C. (2015). Using CATH‐Gene3D to analyze the sequence, structure, and function of proteins. Current Protocols in Bioinformatics, 50, 1.28.1‐1.28.21. doi: 10.1002/0471250953.bi0128s50.
  Wheeler, T. J., Clements, J., & Finn, R. D. (2014). Skylign: A tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics, 15, 7. doi: 10.1186/1471‐2105‐15‐7.
  Wilson, D., Pethica, R., Zhou, Y., Talbot, C., Vogel, C., Madera, M., … Gough, J. (2009). SUPERFAMILY—Sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Research, 37, D380–386. doi: 10.1093/nar/gkn762.
  Wu, C. H., Nikolskaya, A., Huang, H., Yeh, L. S., Natale, D. A., Vinayaka, C. R., … Barker, W. C. (2004). PIRSF: Family classification system at the Protein Information Resource. Nucleic Acids Research, 32, D112–114. doi: 10.1093/nar/gkh097.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library