Using the Gibbs Motif Sampler to Find Conserved Domains in DNA and Protein Sequences

William Thompson1, Lee Ann McCue2, Charles E. Lawrence1

1 Brown University, Providence, Rhode Island, 2 Center for Bioinformatics The Wadsworth Center New York State Department of Health, Albany, New York
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.8
DOI:  10.1002/0471250953.bi0208s10
Online Posting Date:  July, 2005
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The Gibbs Motif Sampler (Gibbs) is a software package for discovering conserved elements in biopolymer sequences. This unit describes the basic operation of the Web‐based interface to Gibbs, along with advanced examples of its use, and the Web interface to dscan, a sequence database search program.

Keywords: Gibbs sampling; Transcription factor binding site; Sequence Alignment; Motif; DNA; Protein; Phylogentic Footprinting; Stochastic Algorithm; Markov Chain Monte‐Carlo; Bayesian statistics

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Running the Gibbs Motif Sampler
  • Basic Protocol 2: Searching for Other Sequences Containing Similar Motifs Using dscan
  • Guidelines for Understanding Results
  • Commentary
  • Appendix A
  • Appendix B
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

  Altschul, S.F. and Lipman, D.J. 1990. Protein database searches for multiple alignments. Proc. Natl. Acad. Sci. U.S.A. 87:5509‐5513.
   Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nat. Genet. 6:119.
  Bailey, T.L. and Elkan, C. 1995. Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21:51‐80.
  Claverie, J.M. and States, D.J. 1993. Information enhancement methods for large scale sequence analysis. Comput. Chem. 17:191‐201.
  Florczyk, M.A., McCue, L.A., Purkayastha, A., Currenti, E., Wolin, M.J., and McDonough, K.A. 2003. A family of acr‐coregulated mycobacterium tuberculosis genes shares a common DNA motif and requires Rv3133c (dosR or devR) for expression. Infect. Immun. 71:5332‐5343.
  Lawrence, C.E. and Reilly, A.A. 1990. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct. Funct. Genet. 7:41‐51.
  Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., and Wootton, J. 1993. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science 262:208‐214.
  Liu, J.S. and Lawrence, C.E. 1999. Bayesian inference on biopolymer models. Bioinformatics 15:38‐52.
  Liu, J., Neuwald, A., and Lawrence, C. 1995. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 432:1156‐1170.
   Liu, X., Brutlag, D.L., and Liu, J.S. 2001. BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co‐expressed genes. In Proceedings of the Pacific Symposium on Biocomputing, pp. 127‐138. World Scientific Press, Hawaii.
  McCue, L., Thompson, W., Carmack, C., Ryan, M.P., Liu, J.S., Derbyshire, V., and Lawrence, C.E. 2001. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucl. Acids Res. 29:774‐782.
  McCue, L.A., Thompson, W., Carmack, C.S., and Lawrence, C.E. 2002. Factors influencing the identification of transcription factor binding sites by cross‐species comparison. Genome Res. 12:1523‐1532.
  Neuwald, A., Liu, J., and Lawrence, C. 1995. Gibbs motif sampling: Detection of bacterial outer membrane protein repeats. Protein Sci. 4:1618‐1632.
  Schneider, T.D. and Stephens, R.M. 1990. Sequence logos: A new way to display consensus sequences. Nucl. Acids Res. 18:6097‐6100.
  Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., and Miller, W. 2003. Human‐mouse alignments with BLASTZ. Genome Res. 13:103‐107.
  Sherman, D.R., Voskuil, M., Schnappinger, D., Liao, R., Harrell, M.I., and Schoolnik, G.K. 2001. Regulation of the Mycobacterium tuberculosis hypoxic response gene encoding alpha‐crystallin. Proc. Natl. Acad. Sci. U.S.A. 98:7534‐7539.
  Thompson, W., Rouchka, E.C., and Lawrence, C.E. 2003. Gibbs Recursive Sampler: Finding transcription factor binding sites. Nucl. Acids. Res. 31:3580‐3585.
  Thompson, W., Palumbo, M.J., Wasserman, W.W., Liu, J.S., and Lawrence, C.E. 2004. Decoding human regulatory circuits. Genome Res. 14:1967‐1974.
   Wanner, B.L. 1996. Phosphorus assimilation and control of the phosphate regulon. In Escherichia coli and Salmonella: Cellular and Molecular Biology (F.C. Neihdhardt, ed.), pp. 1357‐1381. ASM Press, Washington, D.C.
  Webb, B.J., Liu, J.S., and Lawrence, C.E. 2002. BALSA: Bayesian algorithm for local sequence alignment. Nucl. Acids Res. 30:1268‐1277.
Internet Resources
  Web sites for running the Gibbs sample
  The above sites provide information about obtaining Gibbs.‐SAMPLER‐ACADEMIC.htm
  Auxiliary data for running the examples‐SAMPLER‐COMMERCIAL.htm
  IUPAC amino acid codes
  Annotated examples using Gibbs to analyze bacterial data
PDF or HTML at Wiley Online Library