User Ratings

Your rating: None
Your rating: None
Your rating: None
Add your comments

PipMaker: A World Wide Web Server for Genomic Sequence Alignments

Laura Elnitski1,  Cathy Riemer1,  Scott Schwartz1,  Ross Hardison1,  Webb Miller1

1The Pennsylvania State University, University Park, Pennsylvania

Unit Number: 
UNIT 10.2
DOI: 
10.1002/0471250953.bi1002s00
Online Posting Date: 
February, 2003
GO TO THE FULL TEXT:
PDF or HTML at Wiley Online Library
Are you the author of this protocol? Login or register and return to this page.

Abstract

PipMaker is a World-Wide Web site used to compare two long genomic sequences and identify conserved segments between them. This unit describes the use of the PipMaker server and explains the resulting output files. PipMaker provides an efficient method of aligning genomic sequences and returns a compact, but easy-to-interpret form of output, the percent identity plot (pip). For each aligning segment between two sequences the pip shows both the position relative to the first sequence and the degree of similarity. Optional annotations on the pip provide additional information to assist in the interpretation of the alignment. The default parameters of the underlying blastz alignment program are tuned for human-mouse alignments.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Unit Introduction
  • Strategic Planning
  • Basic Protocol: Submitting Sequences to PipMaker
  • Support Protocol 1: Generating a Repeats File for Use with PipMaker
  • Support Protocol 2: Generating an Exons File for Use with PipMaker
  • Support Protocol 3: Generating Color Underlays for Use with PipMaker
  • Support Protocol 4: Generating Annotation Files for Use with PipMaker
  • Support Protocol 5: Installing Stand-Alone Blastz
  • Guidelines for Understanding Results
  • Commentary
  • Bibliography
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol: Submitting Sequences to PipMaker

 Necessary Resources
  • Hardware
    • PipMaker can be accessed and used by any computer with a World Wide Web browser and E-mail access.
  • Software
    • PipMaker is accessible via a Web interface at http://bio.cse.psu.edu/. All output files will be returned to the user via E-mail. The E-mail account and software must be capable of handling large messages. Viewing the output from PipMaker requires a PDF viewer to display the pip or dot plot, such as Aladdin GhostScript or Adobe Acrobat Reader. These are available for free download at http://www.cs.wisc.edu/~ghost/ and http://www.adobe.com/, respectively. At the present time, Acrobat Reader has better support for hyperlinks in PDF files, which are an option in PipMaker. PipMaker can optionally generate a PostScript version of the output files. This feature is useful for importing the plot into a graphics program in preparation for publication.
  • Files
    • The following file types are used:
      • Sequences: The PipMaker server accepts two DNA sequences in FASTA format (appendix 1B) only. These sequence files must be in plain text format, consisting of A, C, G, T, N, and X, typically uppercase. Line length should be within ~70 characters. The first sequence should be in one contiguous piece, while the second sequence can be in unordered, unoriented contigs.
      • Repeats file (see Support Protocol 1)
      • Exon file (optional; see Support Protocol 2)
      • Underlay file (optional; see Support Protocol 3)
      • Annotation file (optional; see Support Protocol 4)

Support Protocol 5: Installing Stand-Alone Blastz

 Necessary Resources
  • Hardware
    • The authors test and use Blastz on Solaris/Sparc and Linux/x86 platforms, but it should be portable to virtually any ANSI/POSIX system, including Windows and Macintosh.
  • Software
    • The current development snapshot of Blastz is available on the authors' Web site (http://bio.cse.psu.edu/), in a tar.gz file. To unpack it, tar and gzip (or compatible programs) will be needed. An ANSI-compatible C compiler and the make utility will be needed to compile and install it.
  • Files
    • The stand-alone of the Blastz program uses the same sequence and repeats files as the PipMaker Web server (see Basic Protocol).

NOTE: for an introduction to Unix, see appendix 1C.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  • Figure 10.2.1
    Flowchart outlining the steps for using PipMaker.

  • Figure 10.2.2
    A pip showing human chromosome 22 in the region associated with velocardiofacial syndrome aligned with the orthologous region in mouse. A 60-kb segment surrounding the CDC45L and CLDN5 genes illustrates the alignments and annotations on a pip. Each aligning segment is displayed as a series of horizontal lines whose positions correspond to the first sequence used in the alignment. The aligning segments are drawn according to their percent identity, which is shown on the vertical axis from 50% to 100%. A number of optional annotation files allow a user to augment the information content of the display. For instance, the names of genes and their direction of transcription is part of the exons file format. An underlay file specifies what color to draw genomic features such as exons (e.g., light blue), introns (e.g., light yellow), UTRs (e.g., light orange), and conserved noncoding regions (e.g., shades of red). An annotations file provides colored, horizontal lines above the alignment that are hyperlinks in the PDF file and provide direct links to relevant Internet sites—e.g., appropriate PubMed citation(s) for the gene (red), the LocusLink entry for the gene (blue), or a protein sequence from GenBank (green). The bookmarks along the left side provide links to compiled information about the various genes and other annotations describing the comparative analysis of the sequences. The bookmarks represent a much larger region than that shown in the image.

  • Figure 10.2.3
    A dot plot of the 1.5-Mb region from human chromosome 22 associated with velocardiofacial syndrome, aligned to the orthologous sequences from mouse. Annotations used in the alignment are displayed along the horizontal axis as gene names with the direction of transcription. The mouse sequence is represented by two contigs that are labeled along the vertical axis of the plot (gi: 20346266 and 20346218). The Order and Orient option attempts to arrange the mouse sequences in the same relative order as the human and indicates the presence of rearrangements in the mouse sequence relative to the human sequence. The dot plot uses the same underlay file as the pip to color the image. Note that the gene names used in this example are not all recognized by the HUGO nomenclature committee, and serve as illustrations only.

  • Figure 10.2.4
    Illustration of the PipMaker options (A) Show All Matches, (B) Chaining, and (C) Single Coverage. The human -globin gene locus contains a family of duplicated genes which, when aligned to the orthologous region in mouse, shows matches to multiple family members in the mouse globin locus. The option to Show All Matches reveals extensive sequence similarity between globin gene clusters in human and mouse sequences (panel A, from left to right: dot plot of the extended genomic region and the pip of the human -globin gene, HBD). In addition, a cluster of related olfactory receptor genes surrounds the globin locus in both species and creates the checkerboard pattern. The Chaining option reduces the amount of aligning sequence by showing alignments of sequences that appear in the same relative order between the two species (panel B, leftmost box). For this reason, most of the ORGs disappear. One aligning segment is identified for each human globin gene and multiple hits are removed from the pip (panel B, rightmost box). The Single Coverage option identifies the highest-scoring alignments and allows any position in the first sequence to align only once to the second sequence. Therefore, no alignments are in the same vertical space, although they may appear to be very close if the display is of insufficient resolution. In the globin locus, several genes are most similar to the same sequence in the mouse locus (see panel C, leftmost and middle boxes) and appear on the same horizontal line. The pip (panel C, rightmost box) shows more alignments than with the Chaining option because the best match is not restricted to being in the same relative order along the two sequences.

  • Figure 10.2.5
    Example of documentation output from Repeatmasker.

  • Figure 10.2.6
    Alternate format for repeats file.

  • Figure 10.2.7
    An exons file containing two genes from human chromosome 22 that are transcribed in opposite orientations.

  • Figure 10.2.8
    An example of an underlay file which refers to Figure 10.2.7.

  • Figure 10.2.9
    Alternate format for an underlay file which refers to Figure 10.2.8.

  • Figure 10.2.10
    An example of a line in the body of an underlay file used to paint just the upper or lower half of a region by using a + or - sign.

  • Figure 10.2.11
    A sample type definition entry for the header of an annotation file for Advanced PipMaker.

  • Figure 10.2.12
    A sample annotation entry for the body of an annotation file for Advanced PipMaker.

  • Figure 10.2.13
    Compact overview of the alignment. The two-panel image shows the locations of aligned regions (upper panel) and the position of colors specified by the underlay file (lower panel). Green bars represent all regions within an alignment and red bars are those regions that align at a high level of similarity (at least 100 bp without a gap and with at least 70% nucleotide identity). The colors are specified from the underlay file and the gene names and directionality come from the exons file.

  • Figure 10.2.14
    Icons used in a pip that represent features in a genomic sequence, such as exons, repeats, and CpG islands.

  • Figure 10.2.15
    An example of the concise output file.

  • Figure 10.2.16
    The traditional textual form of alignment output.

  • Figure 10.2.17
    Analysis of Exons output when the second sequence has only one contig.

  • Figure 10.2.18
    Putative coding sequence from optional Analysis of Exons output.

  • Figure 10.2.19
    A sample index for Analysis of Exons output when the second sequence file contains multiple contigs.

  • Figure 10.2.20
    A sample listing of the exon positions for Analysis of Exons output (analogous to Fig. 10.2.17) when the second sequence file contains multiple contigs.

  • Figure 10.2.21
    Text file describing the predicted arrangement of ordered and oriented contigs.

  • Figure 10.2.22
    An example of a second sequence consisting of two segments separated by 100 Ns, which is then is treated as shown in Figure 10.2.23.

  • Figure 10.2.23
    See Figure 10.2.22.

  • Figure 10.2.24
    Example messages from a PDF file with embedded contig names.

  • Figure 10.2.25
    The default scoring matrix.

Literature Cited

 Literature Cited
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.
    Bulger, M., van Doorninck, J.H., Saitoh, N., Telling, A., Farrell, C., Bender, M.A., Felsenfeld, G., Axel, R., Groudine, M., and von Doorninck, J.H. 1999. Conservation of sequence and structure flanking the mouse and human beta-globin loci: The beta-globin genes are embedded within an array of odorant receptor genes. Proc. Natl. Acad. Sci. U.S.A. 96:5129-5134
    Bulger, M., Bender, M.A., van Doorninck, J.H., Wertman, B., Farrell, C.M., Felsenfeld, G., Groudine, M., and Hardison, R. 2000. Comparative structural and functional analysis of the olfactory receptor genes flanking the human and mouse beta-globin gene clusters. Proc. Natl. Acad. Sci. U.S.A. 97:14560-14565
    Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78-94
    Chiaromonte, F., Yap, V.B., and Miller, W. 2002. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 2002:115-126
    Elnitski, L., Riemer, C., Petrykowska, H., Florea, L., Schwartz, S., Hardison, R., and Miller, W. 2002. PipTools: A computational toolkit to prepare and evaluate annotated pairwise comparisons of genomic sequences. Genomics. In press.
    Endrizzi, M.G., Hadinoto, V., Growney, J.D., Miller, W., and Dietrich, W.F. 2000. Genomic sequence analysis of the mouse Naip gene array. Genome Res. 10:1095-1102
    Florea, F., Riemer, C., Schwartz, S., Zhang, Z., Stojanovic, N., Miller, W., and McClelland, M. 2000. Web-based visualization tools for bacterial genome alignments. Nucleic Acids Res. 28:3486-3496
    Gumucio, D., Shelton, D., Zhu, W., Millinoff, D., Gray, T., Bock, J., Slightom, J., and Goodman, M. 1996. Evolutionary strategies for the elucidation of cis and trans factors that regulate the developmental switching programs of the -like globin genes. Mol. Phylogenet. Evol. 5:18-32
    Hardison, R. and Miller, W. 1993. Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Mol. Biol. Evol. 10:73-102
    Hardison, R., Slightom, J.L., Gumucio, D.L., Goodman, M., Stojanovic, N., and Miller, W. 1997. Locus control regions of mammalian globin gene clusters: Combining phylogenetic analyses and experimental results to gain functional insights. Gene 205:73-94
    Jang, W., Hua, A., Spilson, S.V., Miller, W., Roe, B.A., and Meisler, M.H. 1999. Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13. Genome Res. 9:53-61
    Kent, W.J. and Zahler, A.M. 2000. Conservation, regulation, synteny, and introns in a large-scale C. briggsaeC. elegans genomic alignment. Genome Res. 10:1115-1125
    Kent, W.J., Sugnet, C.W., Terrence, S.F., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The Human Genome Browser at UCSC. Genome Res. 12:996-1006
    Kurihara, L.J., Semenova, E., Miller, W., Ingram, R.S., Guan, X.J., and Tilghman, S.M. 2002. Candidate genes required for embryonic development: A comparative analysis of distal mouse chromosome 14 and human chromosome 13q22. Genomics 79:154-161
    Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921
    Liang, Y., Wang, A., Belyantseva, I., Anderson, D., Probst, F.J., Barber, T.D., Miller, W., Touchman, J., Jin, L., and Sullivan, S. 1999. Structure and expression of the human and mouse novel unconventional myosin XV genes responsible for hereditary deafness, DFNB3 and shaker-2. Genomics 61:243-258
    Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M., and Frazer, K.A. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288:136-140
    McClelland, M., Florea, L., Sanderson, K., Clifton, S.W., Parkhill, J., Churcher, C., Dougan, G., Wilson, R.K., and Miller, W. 2000. Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res. 28:4974-4986
    Oeltjen, J.C., Malley, T.M., Muzny, D.M., Miller, W., Gibbs, R.A., and Belmont, J.W. 1997. Large-scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res. 7:315-329
    Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMaker: A web server for aligning two genomic DNA sequences. Genome Res. 10:577-586
    Waterston, R., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., Antonarakis, S.E., Attwood, J., Baertsch, R. Bailey, J., Barlow, K., Beck, S., Berry, E., Birren, B., Bloom, T., Bork, P., Botcherby, M., Bray, N., Brent, M.R., Brown, D.B., Bult, C., Burton, J., Butler, J., Campbell, R.D., Carninci, P., Cawley, S., Chinwalla, A., Church, D., Clamp, M., Clee, C., Collins, F.S., Cook, L., Copley, R.R., Coulson, A., Couronne, O., Cuff, J., Curwen, V., Cutts, T., Daly, M., David, R., Davies, J., Delehaunty, K., Deri, J., Dermitzakis, E.T., Dewey, C., Dickens, N.J., Diekhans, M., Dodge, S., Dubchak, I., Dunn, D.M., Eddy, S.R., Elnitski, L., Emes, R.D., Eswara, P., Eyras, E., Felsenfeld, A., Fewell, G., Flicek, P., Foley, K., Frankel, W.N., Fulton, L., Fulton, R., Furey, T.S., Gage, D., Gibbs, R.A., Glusman, G., Gnerre, S., Goldman, N., Goodstadt, L., Graffham, D., Graves, T., Green, E.D., Gregory, S., Guigo, R., Guyer, M., Hardison, R.C., Haussler, D., Hayashizaki, Y., Hillier, L., Hinrichs, A., Hlavina, W., Holzer, T., Hsu, F., Hua, A., Hubbard, T., Hunt, A., Jackson, I., Jaffe, D.B., Johnson, L.S., Jones, M., Jones, T.A., Joy, A., Kamal, M., Karlsson, E.K., Karolchik, D., Kasprzyk, A., Kawai, A., Keibler, E., Kells, C., Kent, W.J., Kirby, A., Kolbe, D., Korf, I., Kucherlapati, R.S., Kulbokas, R.J. III., Kulp, D., Landers, T., Leger, J.P., Leonard, S., Letunic, I., Levine, R., Li, J., Li, M., Lloyd, C., Lucas, S., Ma, B., Maglott, D.R., Maier, J., Mardis, E.R., Matthews, L., Mauceli, E., Mayer, J.H., McCarthy, M., McCombie, R., McLaren, S., McLay, K., McPherson, J., Meldrim, J., Meredith, B., Mesirov, J.P., Miller, W., Miner, T., Mongin, E., Montgomery, K.T., Morgan, M., Mott, R., Mullikin, J.C., Muzny, D.M., Nash, W., Nelson, J., Nhan, M., Nicol, R., Ning, Z., Nusbaum, C., O'Connor, M.J., Okazaki, Y., Oliver, K., Overton-Larty, E., Pachter, L., Parra, G., Pepin, K., Peterson, J., Pezvner, P., Plumb, R., Pohl, C., Poliakov, A., Ponce, T., Ponting, C., Potter, S., Quail, M., Reymond, A., Roe, B.A., Roskin, K.M., Rubin, E., Rust, A.G., Santos, R., Sapojnikov, V., Schultz, B., Schultz, J., Schwartz, M.S., Schwartz, S., Scott, C., Seaman, S., Searle, S., Sharpe, T., Sheridan, A., Shownkeen, R., Sims, S., Singer, J.B., Slater, G., Smit, A., Smith, D.R., Spencer, B., Stabenau, A., Stange-Thomann, N., Sugnet, C., Suyama, N., Tesler, G., Thompson, J., Torrents, D., Trevaskis, E., Tromp, J., Ucla, C., Ureta-Vidal, A., Vinson, J.P., von Niederhausern, A.C., Wade, C.M., Wall, M., Weber, R.J., Weiss, R.B., Wendl1, M., West, T., Wetterstrand, C., Wheeler, R., Wierzbowski, J., Willey, T., Williams, S., Wilson, R., Winter, E., Worley, K.C., Wyman, D., Yang, S., Shiaw-Pyng Ya, Zdobnov, E., Zody, M.C., and Lander, E.S. 2002. Initial sequencing and comparative analysis of the mouse genome. In press.
    Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T., and Guigo, R. 2001. SGP-1: Prediction and validation of homologous genes based on sequence alignments. Genome Res. 11:1574-1583
    Wilson, M.D., Riemer, C., Martindale, D.W., Schnupf, P., Boright, A.P., Cheung, T.L., Hardy, D.M., Schwartz, S., Scherer, S.W., Tsui, L.C., Miller, W., and Koop, B.F. 2001. Comparative analysis of the gene-dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5. Nucleic Acids Res. 29:1352-1365
 Key References
    Schwartz et al. 2000. See above.

Detailed documentation for the PipMaker server.

 Internet Resources
    http://bio.cse.psu.edu/

Links to alignment programs and complementary programs, including the PipMaker server homepage, the list of PipMaker underlay and annotation colors, PipMaker examples, whole genome human/mouse homology, and Laj download and instruction site.

    http://www.cs.wisc.edu/~ghost/

Aladdin homepage (for the GhostScript program).

    http://www.adobe.com/

Adobe homepage (for the Acrobat Acrobat Reader program).

    http://ftp.genome.washington.edu/cgi-bin/RepeatMasker/

RepeatMasker Web site.

    http://www.ncbi.nlm.nih.gov/

Contains link to GenBank Web site.

    http://genome.cse.ucsc.edu/

Human Genome Browser.

    http://www.ensembl.org/

Ensembl Genome Browser.

    http://soft.ice.mpg.de/sgp-1

SGP-1 (Syntenic Gene Prediction Program) server.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
Looking for Answers?
Do you have tips, tricks, or improvements to share?

Join the Conversation

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.