Basic Protein Sequence Analysis

Nandini Krishnamurthy1, Kimmen V. Sjölander1

1 University of California, Berkeley, California
Publication Name:  Current Protocols in Molecular Biology
Unit Number:  Unit 19.5
DOI:  10.1002/0471142727.mb1905s70
Online Posting Date:  May, 2005
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Prediction of molecular function of proteins has become an important task in the genomics era. A wide variety of sequence analysis tools are available to biologists for this task. We have selected one or two primary protocols for tasks such as domain detection, subcellular localization, and motif detection. We also present a strategy for integration of results from different protocols. All the resources needed for these protocols are accessible via publicly available Web servers and databases and require little or no computational expertise.

Keywords: protein sequence analysis; domain detection; subcellular localization; motif detection

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers
  • Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers
  • Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search
  • Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search
  • Alternate Protocol 2: Predicting Structural Domains and Secondary Structure Using 3D‐PSSM
  • Support Protocol 3: Guidelines for Understanding Results of Analyses from the 3D‐PSSM Server
  • Basic Protocol 2: Predicting Helical Transmembrane Regionsand Subcellular Localization
  • Support Protocol 4: Guidelines for Understanding Results of Predictionsof Helical Transmembrane Regions and Subcellular Localization
  • Alternate Protocol 3: Predicting the Subcellular Localization of a Protein Using TargetP
  • Support Protocol 5: Guidelines for Understanding Results Predicting the Subcellular Localization of a Protein Using TargetP
  • Basic Protocol 3: Predicting Key Functional Residues and Motifs using the Prosite Web Server
  • Support Protocol 6: Guidelines for Understanding Results of Searches Done Using the PROSITE Web Server
  • Support Protocol 7: Homolog Identification
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers

Materials

Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers

  Materials
  • See protocol 1.

Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search

  Materials
  • See protocol 1.

Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search

  Materials
  • See protocol 7.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276‐280.
   Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M. 2003. The SWISS‐PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365‐370.
   Chen, C.P., Kernytsky, A., and Rost, B. 2002. Transmembrane helix predictions revisited. Protein Sci. 11:2774‐2791.
   Eisen, J.A. 1998. Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163‐167.
   Emanuelsson, O. and von Heijne, G. 2001. Prediction of organellar targeting signals. Biochim. Biophys. Acta 1541:114‐119.
   Geer, L.Y., Domrachev, M., Lipman, D.J., and Bryant, S.H. 2002. CDART: Protein homology by domain architecture. Genome Res. 12:1619‐1623.
   Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta‐Vidal, A., Vastrik, I., and Clamp, M. 2002. The Ensembl genome database project. Nucleic Acids Res. 30:38‐41.
   Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk‐Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., and Bairoch, A. 2004. Recent improvements to the PROSITE database. Nucleic Acids Res. 32:D134‐D137.
   Jones, D.T. 1999. Protein secondary structure prediction based on position‐specific scoring matrices. J. Mol. Biol. 292:195‐202.
   Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D‐PSSM. J. Mol. Biol. 299:499‐520.
   Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305:567‐580.
   Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., and Bork, P. 2004. SMART 4.0: Towards genomic data integration. Nucleic Acids Res. 32:D142‐D144.
   Marchler‐Bauer, A., Anderson, J.B., DeWeese‐Scott, C., Fedorova, N.D., Geer, L.Y., He, S., Hurwitz, D.I., Jackson, J.D., Jacobs, A.R., Lanczycki, C.J., Liebert, C.A., Liu, C., Madej, T., Marchler, G.H., Mazumder, R., Nikolskaya, A.N., Panchenko, A.R., Rao, B.S., Shoemaker, B.A., Simonyan, V., Song, J.S., Thiessen, P.A., Vasudevan, S., Wang, Y., Yamashita, R.A., Yin, J.J., and Bryant, S.H. 2003. CDD: A curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31:383‐387.
   Marchler‐Bauer, A. and Bryant, S.H. 2004. CD‐Search: Protein domain annotations on the fly. Nucleic Acids Res. 32:W327‐W331.
   Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536‐540.
   Schatz, G. and Dobberstein, B. 1996. Common principles of protein translocation across membranes. Science 271:1519‐1526.
   Schultz, J., Milpetz, F., Bork, P., and Ponting, C.P. 1998. SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. U.S.A. 95:5857‐5864.
   Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., and Bucher, P. 2002. PROSITE: A documented database using patterns and profiles as motif descriptors. Brief. Bioinform. 3:265‐274.
   Sjölander, K. 2004. Phylogenomic inference of protein molecular function: Advances and challenges. Bioinformatics 20:170‐179.
   Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., and Koonin, E.V. 2001. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29:22‐28.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library