Identifying Protein Domains with the Pfam Database

Penny Coggill1, Robert D. Finn1, Alex Bateman1

1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.5
DOI:  10.1002/0471250953.bi0205s23
Online Posting Date:  September, 2008
Pfam is a database of protein domain families, with each family represented by multiple sequence alignments and profile hidden Markov models (HMMs). In addition, each family has associated annotation, literature references, and links to other databases. The entries in Pfam are available via the World Wide Web and in flatfile format. This unit contains detailed information on how to access and utilize the information present in the Pfam database, namely the families, multiple alignments, and annotation. Details on running Pfam, both remotely and locally are presented. Curr. Protoc. Bioinform. 23:2.5.1‐2.5.17. © 2008 by John Wiley & Sons, Inc.

Keywords: protein domain; HMM; protein family; superfamily; sequence alignment; sequence analysis

Table of Contents

  • Introduction
  • Basic Protocol 1: Analyzing a Protein Sequence with Pfam via the Web
  • Alternate Protocol 1: Running Pfam/HMMER Locally
  • Alternate Protocol 2: Using Pfam Profile HMMs to Find Domains in Genomic Sequence
  • Guidelines for Understanding Results
  • Commentary
  • Appendix: Pfam Data Available via the Web Site
  • Literature Cited
  • Figures
  • Tables
