Using the Gibbs Motif Sampler to Find Conserved Domains in DNA and Protein Sequences

William Thompson1, Lee Ann McCue2, Charles E. Lawrence1

1 Brown University, Providence, Rhode Island, 2 Center for Bioinformatics The Wadsworth Center New York State Department of Health, Albany, New York
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.8
DOI:  10.1002/0471250953.bi0208s10
Online Posting Date:  July, 2005
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

The Gibbs Motif Sampler (Gibbs) is a software package for discovering conserved elements in biopolymer sequences. This unit describes the basic operation of the Web‐based interface to Gibbs, along with advanced examples of its use, and the Web interface to dscan, a sequence database search program.

Keywords: Gibbs sampling; Transcription factor binding site; Sequence Alignment; Motif; DNA; Protein; Phylogentic Footprinting; Stochastic Algorithm; Markov Chain Monte‐Carlo; Bayesian statistics

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Running the Gibbs Motif Sampler
  • Basic Protocol 2: Searching for Other Sequences Containing Similar Motifs Using dscan
  • Guidelines for Understanding Results
  • Commentary
  • Appendix A
  • Appendix B
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

  Altschul, S.F. and Lipman, D.J. 1990. Protein database searches for multiple alignments. Proc. Natl. Acad. Sci. U.S.A. 87:5509‐5513.
   Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nat. Genet. 6:119.
  Bailey, T.L. and Elkan, C. 1995. Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21:51‐80.
  Claverie, J.M. and States, D.J. 1993. Information enhancement methods for large scale sequence analysis. Comput. Chem. 17:191‐201.
  Florczyk, M.A., McCue, L.A., Purkayastha, A., Currenti, E., Wolin, M.J., and McDonough, K.A. 2003. A family of acr‐coregulated mycobacterium tuberculosis genes shares a common DNA motif and requires Rv3133c (dosR or devR) for expression. Infect. Immun. 71:5332‐5343.
  Lawrence, C.E. and Reilly, A.A. 1990. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct. Funct. Genet. 7:41‐51.
  Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. 1993. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science 262:208‐214.
  Liu, J.S. and Lawrence, C.E. 1999. Bayesian inference on biopolymer models. Bioinformatics 15:38‐52.
  Liu, J., Neuwald, A., and Lawrence, C. 1995. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 432:1156‐1170.
   Liu, X., Brutlag, D.L., and Liu, J.S. 2001. BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co‐expressed genes. In Proceedings of the Pacific Symposium on Biocomputing, pp. 127‐138. World Scientific Press, Hawaii.
  McCue, L., Thompson, W., Carmack, C., Ryan, M.P., Liu, J.S., Derbyshire, V., and Lawrence, C.E. 2001. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucl. Acids Res. 29:774‐782.
  McCue, L.A., Thompson, W., Carmack, C.S., and Lawrence, C.E. 2002. Factors influencing the identification of transcription factor binding sites by cross‐species comparison. Genome Res. 12:1523‐1532.
  Neuwald, A., Liu, J., and Lawrence, C. 1995. Gibbs motif sampling: Detection of bacterial outer membrane protein repeats. Protein Sci. 4:1618‐1632.
  Schneider, T.D. and Stephens, R.M. 1990. Sequence logos: A new way to display consensus sequences. Nucl. Acids Res. 18:6097‐6100.
  Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., and Miller, W. 2003. Human‐mouse alignments with BLASTZ. Genome Res. 13:103‐107.
  Sherman, D.R., Voskuil, M., Schnappinger, D., Liao, R., Harrell, M.I., and Schoolnik, G.K. 2001. Regulation of the Mycobacterium tuberculosis hypoxic response gene encoding alpha‐crystallin. Proc. Natl. Acad. Sci. U.S.A. 98:7534‐7539.
  Thompson, W., Rouchka, E.C., and Lawrence, C.E. 2003. Gibbs Recursive Sampler: Finding transcription factor binding sites. Nucl. Acids. Res. 31:3580‐3585.
  Thompson, W., Palumbo, M.J., Wasserman, W.W., Liu, J.S., and Lawrence, C.E. 2004. Decoding human regulatory circuits. Genome Res. 14:1967‐1974.
   Wanner, B.L. 1996. Phosphorus assimilation and control of the phosphate regulon. In Escherichia coli and Salmonella: Cellular and Molecular Biology (F.C. Neihdhardt, ed.), pp. 1357‐1381. ASM Press, Washington, D.C.
  Webb, B.J., Liu, J.S., and Lawrence, C.E. 2002. BALSA: Bayesian algorithm for local sequence alignment. Nucl. Acids Res. 30:1268‐1277.
Internet Resources
  http://bayesweb.wadsworth.org/gibbs/gibbs.html
  Web sites for running the Gibbs sample
  http://www.bioinfo.rpi.edu/applications/bayesian/gibbs/gibbs.html
  The above sites provide information about obtaining Gibbs.
  http://bayesweb.wadsworth.org/GIBBS‐SAMPLER‐ACADEMIC.htm
  Auxiliary data for running the examples
  http://bayesweb.wadsworth.org/GIBBS‐SAMPLER‐COMMERCIAL.htm
  IUPAC amino acid codes
  http://bayesweb.wadsworth.org/gibbs/module
  Annotated examples using Gibbs to analyze bacterial data
  http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html#AA21
  http://bayesweb.wadsworth.org/web_help.PF.html
  http://bayesweb.wadsworth.org/web_help_text.CE.htm
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library