Generating a Genome Assembly with PCAP

Xiaoqiu Huang1, Shiaw‐Pyng Yang2

1 Iowa State University, Ames, Iowa, 2 Washington University Medical School, St. Louis, Missouri
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 11.3
DOI:  10.1002/0471250953.bi1103s11
Online Posting Date:  October, 2005
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

This unit describes how to use the Parallel Contig Assembly Program (PCAP) to assemble the data produced by a whole‐genome shotgun sequencing project. We present a basic protocol for using PCAP on a multiprocessor computer in a 300‐Mb genome assembly project. A support protocol to prepare input files for PCAP is also described. Another basic protocol for using PCAP on a distributed cluster of computers in a 3‐Gb genome assembly project is presented, in addition to suggestions for understanding results from PCAP.

Keywords: Whole‐Genome Shotgun Sequencing; Genome Assembly

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Producing an Assembly with PCAP Using an Example Data Set
  • Support Protocol 1: Downloading and Installing PCAP
  • Support Protocol 2: Preparation of Input Files
  • Support Protocol 3: Generating the fofn.con File
  • Basic Protocol 2: Generating a Large‐Scale Assembly with PCAP Using Distributed Computing
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

  Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
  Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A.F., Gelpke, M.D., Roach, J., Oh, T., Ho, I.Y., Wong, M., Detter, C., Verhoef, F., Predki, P., Tay, A., Lucas, S., Richardson, P., Smith, S.F., Clark, M.S., Edwards, Y.J., Doggett, N., Zharkikh, A., Tavtigian, S.V., Pruss, D., Barnstead, M., Evans, C., Baden, H., Powell, J., Glusman, G., Rowen, L., Hood, L., Tan, Y.H., Elgar, G., Hawkins, T., Venkatesh, B., Rokhsar, D., and Brenner, S. 2002. Whole‐genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301‐1310.
  Havlak, P., Chen, R., Durbin, K.J., Egan, A., Ren, Y., Song, X.‐Z., Weinstock, G.M., and Gibbs, R. 2004. The Atlas genome assembly system. Genome Res. 14:721‐732.
  Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9:868‐877.
  Huang, X., Wang, J., Aluru, S., Yang, S.‐P., and Hillier, L. 2003. PCAP: A whole‐genome assembly program. Genome Res. 13:2164‐2170.
  Jaffe, D.B., Butler, J., Gnerre, S., Mauceli, E., Lindblad‐Toh, K., Mesirov, J.P., Zody, M.C. and Lander, E.S. 2003. Whole‐genome sequence assembly for mammalian genomes: ARACHNE 2. Genome Res. 13:91‐96.
  Kent, W.J. 2002. BLAT: The BLAST‐like alignment tool. Genome Res. 12:656‐664.
  Kruskal, J.B. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Amer. Math. Soc. 7:48‐50.
  Mullikin, J.C. and Ning, Z. 2003. The Phusion assembler. Genome Res. 13:81‐90.
  Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., Reinert, K.H., Remington, K.A., Anson, E.L., Bolanos, R.A., Chou, H.H., Jordan, C.M., Halpern, A.L., Lonardi, S., Beasley, E.M., Brandon, R.C., Chen, L., Dunn, P.J., Lai, Z., Liang, Y., Nusskern, D.R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G.M., Adams, M.D., and Venter, J.C. 2000. A whole‐genome assembly of Drosophila. Science 287:2196‐2204.
  Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48:443‐453.
  Pearson, W.R. and Lipman, D. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
  Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
Key References
   Huang et al., 2003. See above.
  This article describes the methods used in PCAP in detail.
Internet Resources
   http://seq.cs.iastate.edu
  This site contains documentation on PCAP and example test data sets.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library