User Ratings

Your rating: None
Your rating: None
Your rating: None
Add your comments

Identifying Protein Domains with the Pfam Database

Penny Coggill1,  Robert D. Finn1,  Alex Bateman1

1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom

Unit Number: 
UNIT 2.5
DOI: 
10.1002/0471250953.bi0205s23
Online Posting Date: 
September, 2008
GO TO THE FULL TEXT:
PDF or HTML at Wiley Interscience
Are you the author of this protocol? Login or register and return to this page.

Abstract

Pfam is a database of protein domain families, with each family represented by multiple sequence alignments and profile hidden Markov models (HMMs). In addition, each family has associated annotation, literature references, and links to other databases. The entries in Pfam are available via the World Wide Web and in flatfile format. This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation. Details on running Pfam, both remotely and locally are presented. Curr. Protoc. Bioinform. 23:2.5.1-2.5.17. © 2008 by John Wiley & Sons, Inc.

Keywords: protein domain; HMM; protein family; superfamily; sequence alignment; sequence analysis

     
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Interscience

Table of Contents

  • Introduction
  • Basic Protocol: Analyzing a Protein Sequence with Pfam via the Web
  • Alternate Protocol 1: Running Pfam/HMMER Locally
  • Alternate Protocol 2: Using Pfam Profile HMMs to Find Domains in Genomic Sequence
  • Guidelines for Understanding Results
  • Commentary
  • Appendix: Pfam Data Available via the Web Site
  • Literature Cited
  • Figures
  • Tables
  • Other Versions
     
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Interscience

Figures

  • Figure 2.5.1
    Screenshot of the protein search submission page. The use of this search page is described in detail in the main text.

  • Figure 2.5.2
    Example search outputs from the server for a query sequence of ~1300 residues. The site was queried with the UniProt identifier VAV_HUMAN. In the example, graphical representation of the domains with the query sequence is shown with, below, a table giving the domain positions and match scores.

  • Figure 2.5.3
    View of the CBS domain pair Family homepage. This contains a description of the function of the family and cross-links to other databases. The links on the left allow the user to view the alignments both seed and full, the domain organization, and other family-specific pages as described in the main text. There is also information on when, how, and by whom the family was built.

  • Figure 2.5.4
    The Pfam view of domain organization within proteins of the CBS (cystathionine--synthase) domain family. Pfam-A domains are shown as large-colored boxes, with Pfam-B families as smaller-striped boxes.

  • Figure 2.5.5
    The species distribution of the JmjN domain family. Nodes within the tree are clickable and can be expanded or collapsed. The numbers in brackets represent the number of proteins containing the domain in the respective level of the tree as described in the main text.

  • Figure 2.5.6
    The entry in the Pfam-A.full file for the family 7kD_DNA_binding, in Stockholm format. The different sections (header, references, comment, and alignment) are labeled. Fields are described in detail in Table 2.5.2 and the main text.

Literature Cited

Literature Cited
    Bateman, A., Birney, E., Cerrutti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276-280.
    Birney, E., Clamp, M., and Durbin, R. 2004. GeneWise and Genomewise. Genome Res. 14: 998-995.
    Chothia, C. 1992. Proteins. One thousand families for the molecular biologist. Nature 357: 543-534.
    Corpet, F., Servant, F., Gouzy, J., and Kahn, D. 2000. ProDom and ProDom-CG: Tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res. 28: 267-269.
    Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., and Bateman, A. 2008. The Pfam protein families database. Nucleic Acids Res. 36: D281-D288.
    Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577-2637.
    Mistry, J., Bateman, A., and Finn, R.D. 2007. Predicting active site residue annotations in the Pfam database. BMC Bioinformatics 8: 298.
    Sonnhammer, E.L.L., Eddy, S.R., and Durbin, R. 1997. Pfam: A comprehensive database of protein domain families based on seed alignments. Proteins 28: 405-420.
    Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Bateman, A., and Durbin, R. 1998. Pfam: Multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26: 320-322.
    Wu, C.H., Apweiler, R., Bairoch, A., Natale, D.A., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Mazumder, R., O'Donovan, C., Redaschi, N., and Suzek, B. 2006. The Universal Protein Resource (UniProt): An expanding universe of protein information. Nucleic Acids Res. 34: D187-D191.
    Yooseph, S., Sutton, G., Rusch, D.B., Halpern, A.L., Williamson, S.J., Remington, K., Eisen, J.A., Heidelberg, K.B., Manning, G., Li, W., Jaroszewski, L., Cieplak, P., Miller, C.S., Li, H., Mashiyama, S.T., Joachimiak, M.P., van Belle, C., Chandonia, J.M., Soergel, D.A., Zhai, Y., Natarajan, K., Lee, S., Raphael, B.J., Bafna, V., Friedman, R., Brenner, S.E., Godzik, A., Eisenberg, D., Dixon, J.E., Taylor, S.S., Strausberg, R.L., Frazier, M., and Venter, J.C. 2007. The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol. 5: e16.
     
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Interscience
Looking for Answers?
Do you have tips, tricks, or improvements to share?

Join the Conversation

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.