Identifying Protein Domains with the Pfam Database

Penny Coggill1, Robert D. Finn1, Alex Bateman1

1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.5
DOI:  10.1002/0471250953.bi0205s23
Online Posting Date:  September, 2008
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Pfam is a database of protein domain families, with each family represented by multiple sequence alignments and profile hidden Markov models (HMMs). In addition, each family has associated annotation, literature references, and links to other databases. The entries in Pfam are available via the World Wide Web and in flatfile format. This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation. Details on running Pfam, both remotely and locally are presented. Curr. Protoc. Bioinform. 23:2.5.1‐2.5.17. © 2008 by John Wiley & Sons, Inc.

Keywords: protein domain; HMM; protein family; superfamily; sequence alignment; sequence analysis

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Analyzing a Protein Sequence with Pfam via the Web
  • Alternate Protocol 1: Running Pfam/HMMER Locally
  • Alternate Protocol 2: Using Pfam Profile HMMs to Find Domains in Genomic Sequence
  • Guidelines for Understanding Results
  • Commentary
  • Appendix: Pfam Data Available via the Web Site
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Bateman, A., Birney, E., Cerrutti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276‐280.
   Birney, E., Clamp, M., and Durbin, R. 2004. GeneWise and Genomewise. Genome Res. 14: 998‐995.
   Chothia, C. 1992. Proteins. One thousand families for the molecular biologist. Nature 357: 543‐534.
   Corpet, F., Servant, F., Gouzy, J., and Kahn, D. 2000. ProDom and ProDom‐CG: Tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28: 267‐269.
   Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., and Bateman, A. 2008. The Pfam protein families database. Nucleic Acids Res. 36: D281‐D288.
   Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features. Biopolymers 22: 2577‐2637.
   Mistry, J., Bateman, A., and Finn, R.D. 2007. Predicting active site residue annotations in the Pfam database. BMC Bioinformatics 8: 298.
   Sonnhammer, E.L.L., Eddy, S.R., and Durbin, R. 1997. Pfam: A comprehensive database of protein domain families based on seed alignments. Proteins 28: 405‐420.
   Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Bateman, A., and Durbin, R. 1998. Pfam: Multiple sequence alignments and HMM‐profiles of protein domains. Nucleic Acids Res. 26: 320‐322.
   Wu, C.H., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O'Donovan, C., Redaschi, N., and Suzek, B. 2006. The Universal Protein Resource (UniProt): An expanding universe of protein information. Nucleic Acids Res. 34: D187‐D191.
   Yooseph, S., Sutton, G., Rusch, D.B., Halpern, A.L., Williamson, S.J., Remington, K., Eisen, J.A., Heidelberg, K.B., Manning, G., Li, W., Jaroszewski, L., Cieplak, P., Miller, C.S., Li, H., Mashiyama, S.T., Joachimiak, M.P., van Belle, C., Chandonia, J.M., Soergel, D.A., Zhai, Y., Natarajan, K., Lee, S., Raphael, B.J., Bafna, V., Friedman, R., Brenner, S.E., Godzik, A., Eisenberg, D., Dixon, J.E., Taylor, S.S., Strausberg, R.L., Frazier, M., and Venter, J.C. 2007. The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol. 5: e16.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library