Identifying Protein Domains with the Pfam Database
1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Abstract
Pfam is a database of protein domain families, with each family represented by multiple sequence alignments and profile hidden Markov models (HMMs). In addition, each family has associated annotation, literature references, and links to other databases. The entries in Pfam are available via the World Wide Web and in flatfile format. This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation. Details on running Pfam, both remotely and locally are presented. Curr. Protoc. Bioinform. 23:2.5.1-2.5.17. © 2008 by John Wiley & Sons, Inc.
Keywords: protein domain; HMM; protein family; superfamily; sequence alignment; sequence analysis
Table of Contents
- Introduction
- Basic Protocol: Analyzing a Protein Sequence with Pfam via the Web
- Alternate Protocol 1: Running Pfam/HMMER Locally
- Alternate Protocol 2: Using Pfam Profile HMMs to Find Domains in Genomic Sequence
- Guidelines for Understanding Results
- Commentary
- Appendix: Pfam Data Available via the Web Site
- Literature Cited
- Figures
- Tables
- Other Versions
Figures
-
Figure 2.5.1Screenshot of the protein search submission page. The use of this search page is described in detail in the main text.
-
Figure 2.5.2Example search outputs from the server for a query sequence of ~1300 residues. The site was queried with the UniProt identifier VAV_HUMAN. In the example, graphical representation of the domains with the query sequence is shown with, below, a table giving the domain positions and match scores.
-
Figure 2.5.3View of the CBS domain pair Family homepage. This contains a description of the function of the family and cross-links to other databases. The links on the left allow the user to view the alignments both seed and full, the domain organization, and other family-specific pages as described in the main text. There is also information on when, how, and by whom the family was built.
-
Figure 2.5.4The Pfam view of domain organization within proteins of the CBS (cystathionine--synthase) domain family. Pfam-A domains are shown as large-colored boxes, with Pfam-B families as smaller-striped boxes.
-
Figure 2.5.5The species distribution of the JmjN domain family. Nodes within the tree are clickable and can be expanded or collapsed. The numbers in brackets represent the number of proteins containing the domain in the respective level of the tree as described in the main text.
-
Figure 2.5.6The entry in the Pfam-A.full file for the family 7kD_DNA_binding, in Stockholm format. The different sections (header, references, comment, and alignment) are labeled. Fields are described in detail in Table 2.5.2 and the main text.
Literature Cited
| Literature Cited | |
| Bateman, A., Birney, E., Cerrutti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276-280. | |
| Birney, E., Clamp, M., and Durbin, R. 2004. GeneWise and Genomewise. Genome Res. 14: 998-995. | |
| Chothia, C. 1992. Proteins. One thousand families for the molecular biologist. Nature 357: 543-534. | |
| Corpet, F., Servant, F., Gouzy, J., and Kahn, D. 2000. ProDom and ProDom-CG: Tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28: 267-269. | |
| Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., and Bateman, A. 2008. The Pfam protein families database. Nucleic Acids Res. 36: D281-D288. | |
| Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577-2637. | |
| Mistry, J., Bateman, A., and Finn, R.D. 2007. Predicting active site residue annotations in the Pfam database. BMC Bioinformatics 8: 298. | |
| Sonnhammer, E.L.L., Eddy, S.R., and Durbin, R. 1997. Pfam: A comprehensive database of protein domain families based on seed alignments. Proteins 28: 405-420. | |
| Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Bateman, A., and Durbin, R. 1998. Pfam: Multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26: 320-322. | |
| Wu, C.H., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O'Donovan, C., Redaschi, N., and Suzek, B. 2006. The Universal Protein Resource (UniProt): An expanding universe of protein information. Nucleic Acids Res. 34: D187-D191. | |
| Yooseph, S., Sutton, G., Rusch, D.B., Halpern, A.L., Williamson, S.J., Remington, K., Eisen, J.A., Heidelberg, K.B., Manning, G., Li, W., Jaroszewski, L., Cieplak, P., Miller, C.S., Li, H., Mashiyama, S.T., Joachimiak, M.P., van Belle, C., Chandonia, J.M., Soergel, D.A., Zhai, Y., Natarajan, K., Lee, S., Raphael, B.J., Bafna, V., Friedman, R., Brenner, S.E., Godzik, A., Eisenberg, D., Dixon, J.E., Taylor, S.S., Strausberg, R.L., Frazier, M., and Venter, J.C. 2007. The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol. 5: e16. | |
Did you know we publish 20-30 new protocols monthly? Stay informed! Sign up for NEW PROTOCOL ALERTS.
PUBLISH YOUR PROTOCOL on CurrentProtocols.com.
Read our editors' blog for news, commentaries, and the latest developments in methods in and out of the lab.
Tools & Calculators
Your Recently Viewed Protocols
- Aphid Transmission of Plant Viruses
- Quantitation of Functional T Cells by Limiting Dilution
- High‐Throughput Screening for Protein‐Protein Interactions Using Yeast Two‐Hybrid Arrays
- 2′‐Hydroxyl‐Protecting Groups that are Either Photochemically Labile or Sensitive to Fluoride Ions
- Generation and Utilization of Phosphorylation State–Specific Antibodies to Investigate Signaling Pathways




Join the Conversation
Post new comment