Using CATH‐Gene3D to Analyze the Sequence, Structure, and Function of Proteins

Ian Sillitoe1, Tony Lewis1, Christine Orengo1

1 University College London, London
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 1.28
DOI:  10.1002/0471250953.bi0128s50
Online Posting Date:  June, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The CATH database is a classification of protein structures found in the Protein Data Bank (PDB). Protein structures are chopped into individual units of structural domains, and these domains are grouped together into superfamilies if there is sufficient evidence that they have diverged from a common ancestor during the process of evolution. A sister resource, Gene3D, extends this information by scanning sequence profiles of these CATH domain superfamilies against many millions of known proteins to identify related sequences. Thus the combined CATH‐Gene3D resource provides confident predictions of the likely structural fold, domain organisation, and evolutionary relatives of these proteins. In addition, this resource incorporates annotations from a large number of external databases such as known enzyme active sites, GO molecular functions, physical interactions, and mutations. This unit details how to access and understand the information contained within the CATH‐Gene3D Web pages, the downloadable data files, and the remotely accessible Web services. © 2015 by John Wiley & Sons, Inc.

Keywords: protein structure; protein domain; protein classification; functional family; superfamily

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Searching CATH with a New Protein Sequence
  • Alternate Protocol 1: Access CATH Sequence Scan Remotely
  • Basic Protocol 2: Search CATH with a New Protein Structure
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.E. Jr., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. 1977. The protein data bank: A computer‐based archival file for macromolecular structures. J. Mol. Biol. 112:535‐542.
  Furnham, N., Sillitoe, I., Holliday, G.L., Cuff, A.L., Rahman, S.A., Laskowski, R.A., Orengo, C.A., and Thornton, J.M. 2012. FunTree: A resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucl. Acids Res. 40:D776‐D782.
  Lees, J., Yeats, C., Perkins, J., Sillitoe, I., Rentzsch, R., Dessailly, B.H., and Orengo C. 2012. Gene3D: A domain‐based resource for comparative genomics, functional annotation and protein network analysis. Nucl. Acids Res. 40:D465‐D471.
  Redfern, O.C., Harrison, A., Dallman, T., Pearl, F.M., and Orengo, C.A. 2007. CATHEDRAL: A fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput. Biol. 3(11):e232.
  Sillitoe, I., Cuff, A.L., Dessailly, B.H., Dawson, N.L., Furnham, N., Lee, D., Lees, J.G., Lewis, T.E., Studer, R.A., Rentzsch, R., Yeats, C., Thornton, J.M., and Orengo, C.A. 2013. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D Structures. Nucl. Acids Res.:D490‐D498.
  Sillitoe, I., Lewis, T.E., Cuff, A., Das, S., Ashford, P., Dawson, N.L., Furnham, N., Laskowski, R.A., Lee, D., Lees, J.G., Lehtinen, S., Studer, R.A., Thornton, J., and Orengo, C.A. 2015. CATH: Comprehensive structural and functional annotations for genome sequences. Nucl. Acids Res. 43:D376‐D381.
  Supek, F., Bošnjak, M., Škunca, N., and Šmuc T. 2011. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One. 6(7):e21800.
  Tamuri, A.U. and Laskowski, R.A. 2010. ArchSchema: A tool for interactive graphing of related Pfam domain architectures. Bioinformatics 26:1260‐1261.
  Valdar, W.S. 2002. Scoring residue conservation. Proteins 48:227‐241.
  Yeats, C., Redfern, O.C., and Orengo C. 2010. A fast and automated solution for accurately resolving protein domain architectures. Bioinformatics 26:745‐751.
PDF or HTML at Wiley Online Library