Using the Blocks Database to Recognize Functional Domains

Jorja G. Henikoff1, Elizabeth A. Greene1, Nick Taylor1, Steven Henikoff1, Shmuel Pietrokovski2

1 Fred Hutchinson Cancer Research Center, Seattle, Washington, 2 Weizmann Institute of Science, Rehovot, Israel
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.2
DOI:  10.1002/0471250953.bi0202s00
Online Posting Date:  August, 2002
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Blocks are ungapped multiple alignments of related protein sequence segments that correspond to the most conserved regions of the proteins. The Blocks Database is a collection of blocks representing known protein families that can be used to compare a protein or DNA sequence with documented families of proteins. Protocols in this unit describe the analysis of proteins and families using Blocksā€based tools, including searching, exploring relationships with trees, making new blocks, and designing PCR primers from blocks for isolating homologous sequences.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Exploring Protein Families Using the Blocks Database
  • Support Protocol 1: Search Blocks Versus Other Databases
  • Basic Protocol 2: Analyzing Protein Sequences with the Block Searcher
  • Basic Protocol 3: Analyzing DNA Sequences with the Block Searcher
  • Basic Protocol 4: Viewing Trees Based on Blocks
  • Basic Protocol 5: Using Block Maker
  • Basic Protocol 6: Designing Primers from Blocks
  • Guidelines for Understanding Results
  • Commentary
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Exploring Protein Families Using the Blocks Database

  Necessary Resources
  • Hardware
    • Workstation, personal computer, or terminal connected to the Internet
  • Software
    • Any type of Web browser for the Web interface
    • Either Chime or Rasmol helper application to view protein structures using a browser

Support Protocol 1: Search Blocks Versus Other Databases

  Necessary Resources
  • Hardware
    • Workstation, personal computer, or terminal connected to the Internet. The programs can be installed on common Unix workstations.
  • Software
    • E‐mail program for the E‐mail interface
    • Web browser for the Web interface
    • Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.
  • Files
    • Query sequences are accepted in FASTA or GenBank format ( appendix 1B)

Basic Protocol 2: Analyzing Protein Sequences with the Block Searcher

  Necessary Resources
  • Hardware
    • Workstation, personal computer, or terminal connected to the Internet. The programs can be installed on common Unix workstations.
  • Software
    • E‐mail program for the E‐mail interface
    • Web browser for the Web interface
    • Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.
  • Files
    • Query sequences are accepted in FASTA or GenBank format ( appendix 1B)

Basic Protocol 3: Analyzing DNA Sequences with the Block Searcher

  Necessary Resources
  • Hardware
    • Workstation, personal computer, or terminal connected to the Internet
  • Software
    • Any type of Web browser
  • Files
    • None

Basic Protocol 4: Viewing Trees Based on Blocks

  Necessary Resources
  • Hardware
    • Workstation, personal computer, or terminal connected to the Internet. The programs can be installed on common Unix workstations.
  • Software
    • E‐mail program for the E‐mail interface
    • Web browser for the Web interface
    • Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.
  • Files
    • Query sequences are accepted in FASTA or GenBank format ( appendix 1B)

Basic Protocol 5: Using Block Maker

  Necessary Resources
  • Hardware
    • Workstation, personal computer, or terminal connected to the Internet for the Web interface. The programs can be installed on common Unix workstations.
  • Software
    • Web browser for the Web interface
    • Pre‐compiled versions of the programs are provided for Sun Solaris and Linux systems. Other Unix systems need an ANSI C compiler. See the downloaded INSTALL file for installation instructions.
  • Files
    • Input is in Blocks format as described at http://blocks.fhcrc.org/block_format.html.
    • Utilities are available at http://blocks.fhcrc.org/process_blocks.html to convert common multiple alignment formats to Blocks format.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Holo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M., Servant, F., Sigrist, C.J., and Zdobnov, E.M. 2000. InterPro—An integrated documentation resource for protein families, domains and functional sites. Bioinformatics 16:1145‐1150.
   Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordia, P., Selley, J.N., and Wright, W. 2000. PRINTS‐S: The database formerly known as PRINTS. Nucleic Acids Res. 28:225‐227.
   Bailey, T. and Elkan, C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology pp. 28‐36. AAAI Press, Menlo Park, Calif.
   Bailey, T.L. and Gribskov, M. 1998. Combining evidence using p‐values: Application to sequence homology searches. Bioinformatics 14:48‐54.
   Bairoch, A. and Apweiler, R. 2000. The SWISS‐PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28:45‐48.
   Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffith‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276‐280.
   Hall, B.G. 2001. Phylogenetic Trees Made Easy: A How‐To Manual for Molecular Biologists. Sinauer Press, Sunderland, Mass.
   Henikoff, S. 1991. Playing with blocks: Some pitfalls of forcing multiple alignments. New Biol. 3:1148‐1154.
   Henikoff, S. and Henikoff, J.G. 1991. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19:6565‐6572.
   Henikoff, S. and Henikoff, J.G. 1994. Position‐based sequence weights. J. Mol. Biol. 243:574‐578.
   Henikoff, J.G. and Henikoff, S. 1996. Using substitution probabilities to improve position‐specific scoring matrices. Comput. Appl. Biosci. 12:135‐143.
   Henikoff, S. and Henikoff, J.G. 1997. Embedding strategies for effective use of multiple sequence alignment information. Protein Sci. 6:698‐705.
   Henikoff, S., Henikoff, J.G., Alford, W.J., and Pietrokovski, S. 1995. Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163:GC17‐GC26.
   Huang, J.Y. and Brutlag, D.L. 2001. The eMOTIF database. Nucleic Acids Res. 29:202‐204.
   Kunin, V., Chan, B., Sitbon, E., Lithwick, G., and Pietrokovski, S. 2001. Consistency analysis of similarity between multiple alignments: Prediction of protein function and fold structure from analysis of local sequence motifs. J. Mol. Biol. 307:939‐949.
   Mount, D.W. 2001. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,NY.
   Neuwald, A.F., Liu, J.S., and Lawrence, C.E. 1995. Gibbs motif sampling: Detection of bacterial outer membrane protein repeats. Protein Sci. 4:1618‐1632.
   Ng, P.C. and Henikoff, S. 2001. Predicting deleterious amino acid substitutions. Genome Res. 11:863‐874.
   Ng, P.C. and Henikoff, S. 2002. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12:436‐446.
   Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Meth. Enzymol. 183:63‐98.
   Pietrokovski, S. 1996. Searching databases of conserved sequence regions by aligning protein multiple‐alignments. Nucleic Acids. Res. 24:3836‐3845.
   Pietrokovski, S. and Henikoff, S. 1997. A helix‐turn‐helix DNA‐binding motif predicted for transposases of DNA transposons. Mol. Gen. Gent. 254:689‐695.
   Pietrokovski, S., Henikoff, J.G., and Henikoff, S. 1998. Exploring protein homology with the Blocks server. Trends Genet. 14:162‐163.
   Pinarbasi, E., Elliott, J., and Hornby, D..P. 1996. Activation of a yeast pseudo DNA methyltransferase by deletion of a single amino acid. J. Mol. Biol. 257:804‐813.
   Rose, T.M., Schultz, E.R., Henikoff, J.G., Pietrokovski, S., McCallum, C.M., and Henikoff, S. 1998. Consensus‐degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res. 26:1628‐1635.
   Saitou, N. and Nei, M. 1987. The neighbor‐joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406‐425.
   Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L., and Altschul, S.F. 1999. IMPALA: Matching a protein sequence against a collection of PSI‐BLAST‐constructed position‐specific score matrices. Bioinformatics 15:1000‐1011.
   Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., and Altschul, S.F. 2001. Improving the accuracy of PSI‐BLAST protein database searches with composition‐based statistics and other refinements. Nucleic Acids Res. 29:2994‐3005.
   Schneider, T.D. and Stephens, R.M. 1990. Sequence logos: A new way to display consensus sequences. Nucleic Acids Res. 18:6097‐6100.
   Silverstein, K.A., Shoop, E., Johnson, J.E., and Retzel, E.F. 2001. MetaFam: A unified classification of protein families. I. Overview and statistics. Bioinformatics 17:249‐261.
   Smith, H.O., Annau, T.M., and Chandrasegaran, S. 1990. Finding sequence motifs in groups of functionally related proteins. Proc. Natl. Acad. Sci. U.S.A. 87:826‐830.
   Tatusov, R.L., Altschul, S.F., and Koonin, E.V. 1994. Detection of conserved segments in proteins: Iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. U.S.A. 91:12091‐12095.
   Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673‐4680.
   Waskiewicz, A.J., Rikhof, H.A., Hernandez, R.E., and Moens, C.B. 2001. Zebrafish Meis functions to stabilize Pbx proteins and regulate hindbrain patterning. Development 128:4139‐4151.
   Wootton, J.C. and Federhen, S. 1993. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149‐163.
Key References
   Henikoff and Henikoff, 1991. See above.
  Introduces the Blocks Database, how it is constructed using PROTOMAT and how it is searched using Block Searcher.
   Pietrokovski, 1996. See above.
  Introduces LAMA for searching blocks versus a database of blocks as an example of searching multiple alignments against one another for sensitive detection of motifs.
   Rose et al., 1998. See above.
  Describes the CODEHOP strategy for detecting distant homologs using PCR and the Web‐based implementation for designing optimal CODEHOP primers.
Internet Resources
   http://blocks.fhcrc.org
  This is the Blocks Web page.
   http://www.proweb.org
  This is the ProWeb Web page.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library