Using PhyloCon to Identify Conserved Regulatory Motifs

Ting Wang1

1 University of California at Santa Cruz, Santa Cruz, California
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.12
DOI:  10.1002/0471250953.bi0212s19
Online Posting Date:  September, 2007
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Understanding gene regulation has been and remains one of the major challenges for the molecular biology community. Gene regulation is mediated by a variety of short DNA sequences called regulatory elements, which include transcription factor binding sites. A first step toward understanding gene regulation is the identification of regulatory elements present in the genome. This challenge has been defined as the “motif finding problem” in the field of computational biology. Over the past 20 years, many algorithms have been developed to tackle the motif finding problem computationally. The PhyloCon algorithm, developed in 2003, is one of the first motif finding algorithms that take advantage of two important data resources, i.e., phylogenetic conservation and gene co‐regulation, to improve the efficiency of motif identification in biological datasets. This unit presents basic protocols to obtain, install, and apply the PhyloCon program and discusses the underlying algorithm and how to interpret the results. Curr. Protoc. Bioinform. 19:2.12.1‐2.12.29. © 2007 by John Wiley & Sons, Inc.

Keywords: motif discovery; comparative genomics; algorithm

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Running the PhyloCon Program
  • Basic Protocol 2: Post‐Processing PhyloCon Results with Auxiliary Scripts
  • Support Protocol 1: Obtaining and Installing the PhyloCon Software
  • Support Protocol 2: Understanding PhyloCon's File Format
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Berg, O.G., and von Hippel, P.H., 1987. Selection of DNA binding sites by regulatory proteins. Statistical‐mechanical theory and application to operators and promoters. J. Mol. Biol. 193:723‐750.
   Cliften, P.F., Hillier, L.W., Fulton, L., Graves, T., Miner, T., Gish, W.R., Waterston, R.H., and Johnston, M. 2001. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11:1175‐1186.
   Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., and Johnston, M. 2003. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301:71‐76.
   Hertz, G.Z. and Stormo, G.D. 1999. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15:563‐577.
   Hertz, G.Z., Hartzell, G.W.,3rd, and Stormo, G.D., 1990. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 6:81‐92.
   Liu, J., Tan, K., and Stormo, G.D. 2003. Computational identification of the Spo0A‐phosphate regulon that is essential for the cellular differentiation and development in Gram‐positive spore‐forming bacteria. Nucl. Acids Res. 31:6891‐6903.
   MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., and Fraenkel, E. 2006. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7:113
   Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel‐Margoulis, O.V., Kloos, D.U., Land, S., Lewicki‐Potapov, B., Michael, H., Münch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E. 2003. TRANSFAC: Transcriptional regulation, from patterns to profiles. Nucl. Acids Res. 31:374‐378.
   Pietrokovski, S. 1996. Searching databases of conserved sequence regions by aligning protein multiple‐alignments. Nucl. Acids Res. 24:3836‐3845.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
   Stormo, G.D. 2000. Identification of coordinated gene expression and regulatory sequences. Pac. Symp. Biocomput. 12:416‐417.
   Stormo, G.D. and Fields, D.S. 1998. Specificity, free energy and information content in protein‐DNA interactions. Trends Biochem. Sci. 23:109‐113.
   Stormo, G.D., and Hartzell, G.W., 3rd. 1989. Identifying protein‐binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. U.S.A. 86:1183‐1187.
   Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position‐specific gap penalties and weight matrix choice. Nucl. Acids Res. 22:4673‐4680.
   Wang, T. and Stormo, G.D. 2003. Combining phylogenetic data with co‐regulated genes to identify regulatory motifs. Bioinformatics 19:2369‐2380.
   Wang, T. and Stormo, G.D. 2005. Identifying the conserved network of cis‐regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. U.S.A. 102:17400‐17405.
PDF or HTML at Wiley Online Library