PolyPhred Analysis Software for Mutation Detection from Fluorescence‐Based Sequence Data

Kate T. Montgomery1, Oleg Iartchouck1, Li Li2, Stephanie Loomis1, Vanessa Obourn1, Raju Kucherlapati1

1 Harvard Medical School ‐ Partners Healthcare Center for Genetics and Genomics, Boston, Massachusetts, 2 Albany Medical College, Albany, New York
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 7.16
DOI:  10.1002/0471142905.hg0716s59
Online Posting Date:  October, 2008
The ability to search for genetic variants that may be related to human disease is one of the most exciting consequences of the availability of the sequence of the human genome. Large cohorts of individuals exhibiting certain phenotypes can be studied and candidate genes resequenced. However, the challenge of analyzing sequence data from many individuals with accuracy, speed, and economy is great. This unit describes one set of software tools: Phred, Phrap, PolyPhred, and Consed. Coverage includes the advantages and disadvantages of these analysis tools, details for obtaining and using the software, and the results one may expect. The software is being continually updated to permit further automation of mutation analysis. Currently, however, at least some manual review is required if one wishes to identify 100% of the variants in a sample set. Curr. Protoc. Hum. Genet. 59:7.16.1‐7.16.21. © 2008 by John Wiley & Sons, Inc.

Keywords: DNA sequencing; mutation identification; SNPs; indels; sequence traces; Consed; Phred; Phrap; PolyPhred

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: Using PolyPhred for Sequence Analysis and Mutation Detection
  • Commentary
  • Literature Cited
  • Figures
Basic Protocol 1: Using PolyPhred for Sequence Analysis and Mutation Detection

  • High‐speed access to the Internet for Web‐based tools and information, especially access to genome databases
  • A UNIX server for running the mutation identification software, or other type as described below. The programs can be run on Mac OSX.X or LINUX, but they do not run under Microsoft Windows.
  • A directory on the UNIX computer where data analysis will be performed, referred to here as ANALYSIS_DIRECTORY. The user will have full privileges to read, write, and execute here, and the sequence analysis programs will be available. Within this directory, all projects will have their own folders. A Project may consist of one gene with multiple exons, or many genes. The program will align like sequences into “contigs.”
  • A Windows PC terminal with an X‐terminal emulator installed and running, to interface with the UNIX System. One option is X‐Win32, which may be downloaded from http://www.starnet.com/products/xwin32/. There is a free trial version, and various types of licenses may be purchased at reasonable cost. Other options are Exceed, Reflection X, and OpenNT.X.
  • A three‐button mouse or a mouse capable of emulating a three‐button mouse. A two‐button mouse with a scroll button is easily adapted to Consed.
  • Current versions of Phred, Phrap (and its associated programs), Consed, and PolyPhred, installed in the user's pathway of the UNIX machine. A sample set of data is provided by the software developers, and if this is downloaded, there is a tutorial that is very useful in developing a complete understanding of how to use the programs.
  • Experimental sequence data equivalent to .ab1 files or scf files generated by ABI 3730xl or other fluorescence‐based automated sequencer (see also Beckman Instruments, LI‐COR Life Sciences, or Amersham Biosciences analyzers). Such data are generally known as “chromatograms.”
NOTE: In the following directions, all commands to be typed in the UNIX command window are in italics, while directories are bold. All UNIX commands are case sensitive, and spaces are not allowed within file or directory names. Underscore is frequently used in place of a space.
Literature Cited

   Bhangale, T.R., Stephens, M., and Nickerson, D.A. 2006. Automating resequencing‐based detection of insertion‐deletion polymorphisms. Nat. Genet. 38:1457‐1462.
   Ewing, B. and Green, P. 1998. Base‐calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8:186‐194.
   Ewing, B., Hillier, L., Wendl, M., Green, P. 1998. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8:175‐185.
   Gordon, D. 2004. Viewing and editing assembled sequences using consed. Curr. Protoc. Bioinformatics 11.2.1‐11.2.43.
   Gordon, D., Abajian, C., and Green, P. 1998. Consed: A graphical tool for sequence finishing. Genome Res. 8:195‐202.
   Nickerson, D.A., Tobe, V.O., and Taylor, S.L. 1997. PolyPhred: Automating the detection and genotyping of single nucleotide substitutions using fluorescence‐based resequencing. Nucleic Acids Res. 25:2745‐2751.
   Staden, R. 1994. Staden: Comparing sequences. Methods Mol. Biol. 25:155‐170.
   Stephens, M., Sloan, J.S., Robertson, P.D., Scheet, P., and Nickerson, D.A. 2006. Automating sequence‐based detection and genotyping of SNPs from diploid samples. Nat. Genet. 38:375‐381.
