PipMaker: A World Wide Web Server for Genomic Sequence Alignments

Laura Elnitski1, Cathy Riemer1, Scott Schwartz1, Ross Hardison1, Webb Miller1

1 The Pennsylvania State University, University Park, Pennsylvania
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 10.2
DOI:  10.1002/0471250953.bi1002s00
Online Posting Date:  February, 2003
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

PipMaker is a World‐Wide Web site used to compare two long genomic sequences and identify conserved segments between them. This unit describes the use of the PipMaker server and explains the resulting output files. PipMaker provides an efficient method of aligning genomic sequences and returns a compact, but easy‐to‐interpret form of output, the percent identity plot (pip). For each aligning segment between two sequences the pip shows both the position relative to the first sequence and the degree of similarity. Optional annotations on the pip provide additional information to assist in the interpretation of the alignment. The default parameters of the underlying blastz alignment program are tuned for human‐mouse alignments.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Strategic Planning
  • Basic Protocol 1: Submitting Sequences to PipMaker
  • Support Protocol 1: Generating a Repeats File for Use with PipMaker
  • Support Protocol 2: Generating an Exons File for Use with PipMaker
  • Support Protocol 3: Generating Color Underlays for Use with PipMaker
  • Support Protocol 4: Generating Annotation Files for Use with PipMaker
  • Support Protocol 5: Installing Stand‐Alone Blastz
  • Guidelines for Understanding Results
  • Commentary
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Submitting Sequences to PipMaker

  Necessary Resources
  • Hardware
    • PipMaker can be accessed and used by any computer with a World Wide Web browser and E‐mail access.
  • Software
    • PipMaker is accessible via a Web interface at http://bio.cse.psu.edu/. All output files will be returned to the user via E‐mail. The E‐mail account and software must be capable of handling large messages. Viewing the output from PipMaker requires a PDF viewer to display the pip or dot plot, such as Aladdin GhostScript or Adobe Acrobat Reader. These are available for free download at http://www.cs.wisc.edu/~ghost/ and http://www.adobe.com/, respectively. At the present time, Acrobat Reader has better support for hyperlinks in PDF files, which are an option in PipMaker. PipMaker can optionally generate a PostScript version of the output files. This feature is useful for importing the plot into a graphics program in preparation for publication.
  • Files
    • The following file types are used:
      • Sequences: The PipMaker server accepts two DNA sequences in FASTA format ( appendix 1B) only. These sequence files must be in plain text format, consisting of A, C, G, T, N, and X, typically uppercase. Line length should be within ∼70 characters. The first sequence should be in one contiguous piece, while the second sequence can be in unordered, unoriented contigs.
      • Repeatsfile (see protocol 2)
      • Exonfile (optional; see protocol 3)
      • Underlayfile (optional; see protocol 4)
      • Annotationfile (optional; see protocol 5)

Support Protocol 1: Generating a Repeats File for Use with PipMaker

  Necessary Resources
  • Hardware
    • The authors test and use Blastz on Solaris/Sparc and Linux/x86 platforms, but it should be portable to virtually any ANSI/POSIX system, including Windows and Macintosh.
  • Software
    • The current development snapshot of Blastz is available on the authors' Web site (http://bio.cse.psu.edu/), in a tar.gz file. To unpack it, tar and gzip (or compatible programs) will be needed. An ANSI‐compatible C compiler and the make utility will be needed to compile and install it.
  • Files
    • The stand‐alone of the Blastz program uses the same sequence and repeats files as the PipMaker Web server (see protocol 1).
NOTE: for an introduction to Unix, see appendix 1C.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   Bulger, M., van Doorninck, J.H., Saitoh, N., Telling, A., Farrell, C., Bender, M.A., Felsenfeld, G., Axel, R., Groudine, M., and von Doorninck, J.H. 1999. Conservation of sequence and structure flanking the mouse and human beta‐globin loci: The beta‐globin genes are embedded within an array of odorant receptor genes. Proc. Natl. Acad. Sci. U.S.A. 96:5129‐5134
   Bulger, M., Bender, M.A., van Doorninck, J.H., Wertman, B., Farrell, C.M., Felsenfeld, G., Groudine, M., and Hardison, R. 2000. Comparative structural and functional analysis of the olfactory receptor genes flanking the human and mouse beta‐globin gene clusters. Proc. Natl. Acad. Sci. U.S.A. 97:14560‐14565
   Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94
   Chiaromonte, F., Yap, V.B., and Miller, W. 2002. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 2002:115‐126
   Elnitski, L., Riemer, C., Petrykowska, H., Florea, L., Schwartz, S., Hardison, R., and Miller, W. 2002. PipTools: A computational toolkit to prepare and evaluate annotated pairwise comparisons of genomic sequences. Genomics. In press.
   Endrizzi, M.G., Hadinoto, V., Growney, J.D., Miller, W., and Dietrich, W.F. 2000. Genomic sequence analysis of the mouse Naip gene array. Genome Res. 10:1095‐1102
   Florea, F., Riemer, C., Schwartz, S., Zhang, Z., Stojanovic, N., Miller, W., and McClelland, M. 2000. Web‐based visualization tools for bacterial genome alignments. Nucleic Acids Res. 28:3486‐3496
   Gumucio, D., Shelton, D., Zhu, W., Millinoff, D., Gray, T., Bock, J., Slightom, J., and Goodman, M. 1996. Evolutionary strategies for the elucidation of cis and trans factors that regulate the developmental switching programs of the ‐like globin genes. Mol. Phylogenet. Evol. 5:18‐32
   Hardison, R. and Miller, W. 1993. Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Mol. Biol. Evol. 10:73‐102
   Hardison, R., Slightom, J.L., Gumucio, D.L., Goodman, M., Stojanovic, N., and Miller, W. 1997. Locus control regions of mammalian globin gene clusters: Combining phylogenetic analyses and experimental results to gain functional insights. Gene 205:73‐94
   Jang, W., Hua, A., Spilson, S.V., Miller, W., Roe, B.A., and Meisler, M.H. 1999. Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13. Genome Res. 9:53‐61
   Kent, W.J. and Zahler, A.M. 2000. Conservation, regulation, synteny, and introns in a large‐scale C. briggsae–C. elegans genomic alignment. Genome Res. 10:1115‐1125
   Kent, W.J., Sugnet, C.W., Terrence, S.F., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The Human Genome Browser at UCSC. Genome Res. 12:996‐1006
   Kurihara, L.J., Semenova, E., Miller, W., Ingram, R.S., Guan, X.J., and Tilghman, S.M. 2002. Candidate genes required for embryonic development: A comparative analysis of distal mouse chromosome 14 and human chromosome 13q22. Genomics 79:154‐161
   Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860‐921
   Liang, Y., Wang, A., Belyantseva, I., Anderson, D., Probst, F.J., Barber, T.D., Miller, W., Touchman, J., Jin, L., and Sullivan, S. 1999. Structure and expression of the human and mouse novel unconventional myosin XV genes responsible for hereditary deafness, DFNB3 and shaker‐2. Genomics 61:243‐258
   Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M., and Frazer, K.A. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross‐species sequence comparisons. Science 288:136‐140
   McClelland, M., Florea, L., Sanderson, K., Clifton, S.W., Parkhill, J., Churcher, C., Dougan, G., Wilson, R.K., and Miller, W. 2000. Comparison of the Escherichia coli K‐12 genome with sampled genomes of a Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res. 28:4974‐4986
   Oeltjen, J.C., Malley, T.M., Muzny, D.M., Miller, W., Gibbs, R.A., and Belmont, J.W. 1997. Large‐scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res. 7:315‐329
   Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMaker: A web server for aligning two genomic DNA sequences. Genome Res. 10:577‐586
   Waterston, R., Lindblad‐Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., Antonarakis, S.E., Attwood, J., Baertsch, R. Bailey, J., Barlow, K., Beck, S., Berry, E., Birren, B., Bloom, T., Bork, P., Botcherby, M., Bray, N., Brent, M.R., Brown, D.B., Bult, C., Burton, J., Butler, J., Campbell, R.D., Carninci, P., Cawley, S., Chinwalla, A., Church, D., Clamp, M., Clee, C., Collins, F.S., Cook, L., Copley, R.R., Coulson, A., Couronne, O., Cuff, J., Curwen, V., Cutts, T., Daly, M., David, R., Davies, J., Delehaunty, K., Deri, J., Dermitzakis, E.T., Dewey, C., Dickens, N.J., Diekhans, M., Dodge, S., Dubchak, I., Dunn, D.M., Eddy, S.R., Elnitski, L., Emes, R.D., Eswara, P., Eyras, E., Felsenfeld, A., Fewell, G., Flicek, P., Foley, K., Frankel, W.N., Fulton, L., Fulton, R., Furey, T.S., Gage, D., Gibbs, R.A., Glusman, G., Gnerre, S., Goldman, N., Goodstadt, L., Graffham, D., Graves, T., Green, E.D., Gregory, S., Guigo, R., Guyer, M., Hardison, R.C., Haussler, D., Hayashizaki, Y., Hillier, L., Hinrichs, A., Hlavina, W., Holzer, T., Hsu, F., Hua, A., Hubbard, T., Hunt, A., Jackson, I., Jaffe, D.B., Johnson, L.S., Jones, M., Jones, T.A., Joy, A., Kamal, M., Karlsson, E.K., Karolchik, D., Kasprzyk, A., Kawai, A., Keibler, E., Kells, C., Kent, W.J., Kirby, A., Kolbe, D., Korf, I., Kucherlapati, R.S., Kulbokas, R.J. III., Kulp, D., Landers, T., Leger, J.P., Leonard, S., Letunic, I., Levine, R., Li, J., Li, M., Lloyd, C., Lucas, S., Ma, B., Maglott, D.R., Maier, J., Mardis, E.R., Matthews, L., Mauceli, E., Mayer, J.H., McCarthy, M., McCombie, R., McLaren, S., McLay, K., McPherson, J., Meldrim, J., Meredith, B., Mesirov, J.P., Miller, W., Miner, T., Mongin, E., Montgomery, K.T., Morgan, M., Mott, R., Mullikin, J.C., Muzny, D.M., Nash, W., Nelson, J., Nhan, M., Nicol, R., Ning, Z., Nusbaum, C., O'Connor, M.J., Okazaki, Y., Oliver, K., Overton‐Larty, E., Pachter, L., Parra, G., Pepin, K., Peterson, J., Pezvner, P., Plumb, R., Pohl, C., Poliakov, A., Ponce, T., Ponting, C., Potter, S., Quail, M., Reymond, A., Roe, B.A., Roskin, K.M., Rubin, E., Rust, A.G., Santos, R., Sapojnikov, V., Schultz, B., Schultz, J., Schwartz, M.S., Schwartz, S., Scott, C., Seaman, S., Searle, S., Sharpe, T., Sheridan, A., Shownkeen, R., Sims, S., Singer, J.B., Slater, G., Smit, A., Smith, D.R., Spencer, B., Stabenau, A., Stange‐Thomann, N., Sugnet, C., Suyama, N., Tesler, G., Thompson, J., Torrents, D., Trevaskis, E., Tromp, J., Ucla, C., Ureta‐Vidal, A., Vinson, J.P., von Niederhausern, A.C., Wade, C.M., Wall, M., Weber, R.J., Weiss, R.B., Wendl1, M., West, T., Wetterstrand, C., Wheeler, R., Wierzbowski, J., Willey, T., Williams, S., Wilson, R., Winter, E., Worley, K.C., Wyman, D., Yang, S., Shiaw‐Pyng Ya, Zdobnov, E., Zody, M.C., and Lander, E.S. 2002. Initial sequencing and comparative analysis of the mouse genome. In press.
   Wiehe, T., Gebauer‐Jung, S., Mitchell‐Olds, T., and Guigo, R. 2001. SGP‐1: Prediction and validation of homologous genes based on sequence alignments. Genome Res. 11:1574‐1583
   Wilson, M.D., Riemer, C., Martindale, D.W., Schnupf, P., Boright, A.P., Cheung, T.L., Hardy, D.M., Schwartz, S., Scherer, S.W., Tsui, L.C., Miller, W., and Koop, B.F. 2001. Comparative analysis of the gene‐dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5. Nucleic Acids Res. 29:1352‐1365
Key References
   Schwartz et al. 2000. See above.
  Detailed documentation for the PipMaker server.
Internet Resources
   http://bio.cse.psu.edu/
  Links to alignment programs and complementary programs, including the PipMaker server homepage, the list of PipMaker underlay and annotation colors, PipMaker examples, whole genome human/mouse homology, and Laj download and instruction site.
   http://www.cs.wisc.edu/~ghost/
  Aladdin homepage (for the GhostScript program).
   http://www.adobe.com/
  Adobe homepage (for the Acrobat Acrobat Reader program).
   http://ftp.genome.washington.edu/cgi‐bin/RepeatMasker/
  RepeatMasker Web site.
   http://www.ncbi.nlm.nih.gov/
  Contains link to GenBank Web site.
   http://genome.cse.ucsc.edu/
  Human Genome Browser.
   http://www.ensembl.org/
  Ensembl Genome Browser.
   http://soft.ice.mpg.de/sgp‐1
  SGP‐1 (Syntenic Gene Prediction Program) server.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library