Whole‐Genome Sequencing Analysis Using Next‐Generation Sequencing Data

Chi Kent Ho1, Xiaohui Cui1, Sharon Grubner1, Christopher A. Larson1, Ying Wei1, Paul K. Flook1

1 Illumina, Inc, San Diego, California
Publication Name:  Current Protocols Essential Laboratory Techniques
Unit Number:  Unit 11.5
DOI:  10.1002/cpet.2
Online Posting Date:  May, 2016
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Next‐generation sequencing (NGS) technologies have revolutionized the biosciences and become invaluable to the discovery of gene function and its involvement in disease conditions. The fast pace of innovation in NGS technologies has enabled the production of huge volumes of sequence data at progressively lower cost. However, the increasing throughput combined with the growing accessibility of these technologies poses significant challenges for downstream data analysis. Here, we provide an overview of NGS methods and key secondary analysis pipelines with a focus on technology provided by Illumina. As a case study, we highlight potential applications in cancer research. © 2016 by John Wiley & Sons, Inc.

Keywords: cloud computing; Next‐Generation Sequencing (NGS); tumor normal analysis; Whole‐Genome Sequencing (WGS); BaseSpace

PDF or HTML at Wiley Online Library

Table of Contents

  • Overview and Principles
  • Strategic Questions
  • Strategic Planning
  • Protocols
  • Basic Protocol 1: Whole‐Genome Sequencing
  • Basic Protocol 2: Tumor‐Normal Analysis
  • Commentary
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


Basic Protocol 1: Whole‐Genome Sequencing

  • NGS data in FASTQ format: the recommended coverage depth for a Human Whole Genome Sequencing run is 30×
  • An Illumina BaseSpace account

Basic Protocol 2: Tumor‐Normal Analysis

  • Tumor sample NGS data in FASTQ format (the recommended coverage is 80×)
  • Matched normal sample NGS data in FASTQ format (the recommended coverage is 40×)
  • An Illumina BaseSpace account
PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Chen, K., Wallis, J.W., McLellan, M.D., Larson, D.E., Kalicki, J.M., Pohl, C.S., McGrath, S.D., Wendl, M.C., Zhang, Q., Locke, D.P., Shi, X., Fulton, R.S., Ley, T.J., Wilson, R.K., Ding, L., and Mardis, E.R. 2009. BreakDancer: An algorithm for high‐resolution mapping of genomic structural variation. Nat. Methods 6:677‐681. doi: 10.1038/nmeth.1363.
  Cibulskis, K., Lawrence, M.S., Carter, S.L., Sivachenko, A., Jaffe, D., Sougnez, C., Gabriel, S., Meyerson, M., Lander, E.S., and Getz, G. 2013. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotech. 31:213‐219. doi: 10.1038/nbt.2514.
  Goecks, J., Nekrutenko, A., Taylor, J., and The Galaxy Team 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:R86. doi: 10.1186/gb‐2010‐11‐8‐r86.
  Head, S.R., Komori, K.H., LaMere, S.A., Whisenant, T., Nieuwerburgh, F.V., Salomon, D.R., and Ordoukhanian, P. 2014. Library construction for next‐generation sequencing: Overviews and challenges. BioTechniques 56:61‐77. doi: 10.2144/000114133.
  Illumina. 2011. Paired‐End Sample Preparation Guide. Available at http://support.illumina.com/content/dam/illumina‐support/documents/myillumina/e5af4eb5‐6742‐40c8‐bcb1‐d8b350bcb964/paired‐end_sampleprep_guide_1005063_e.pdf.
  Illumina. 2013a. An Introduction to Next‐Generation Sequencing Technology. Pub. No. 770‐2012‐008. Available at http://www.illumina.com/content/dam/illumina‐marketing/documents/products/illumina_sequencing_introduction.pdf.
  Illumina. 2013b. Cancer Analysis Services Guide. Available at http://support.illumina.com/content/dam/illumina‐support/documents/documentation/cancer_analysis/fasttrack‐cancer‐analysis‐services‐guide‐15040893‐01.pdf.
  Illumina. 2014a. BaseSpace User Guide. Available at http://support.illumina.com/content/dam/illumina‐support/documents/documentation/software_documentation/basespace/basespace‐user‐guide‐15044182‐e.pdf.
  Illumina, Inc. 2014b. Nextera DNA Sample Preparation Kits. Pub. No. 770‐2011‐021. Available at http://www.illumina.com/documents/products/datasheets/datasheet_nextera_dna_sample_prep.pdf.
  Jesaitis, A. 2014. The state of variant annotation: A comparison of AnnoVar, snpEff and VEP. Available at http://blog.goldenhelix.com/ajesaitis/the‐sate‐of‐variant‐annotation‐a‐comparison‐of‐annovar‐snpeff‐and‐vep/.
  Li, H. 2012. Exploring single‐sample SNP and INDEL calling with whole‐genome de novo assembly. Bioinformatics 28:1838‐1844. doi: 10.1093/bioinformatics/bts280.
  Li, H. and Homer, N. 2010. A survey of sequence alignment algorithms for next‐generation sequencing. Brief. Bioinform. 11:473‐483. doi: 10.1093/bib/bbq015.
  Linnarsson, S. 2010. Recent advances in DNA sequencing methods ‐ general principles of sample preparation. Exp. Cell Res. 316:1339‐1343 doi: 10.1016/j.yexcr.2010.02.036.
  Raczy, C., Petrovski, R., Saunders, C.T., Chorny, I., Kruglyak, S., Margulies, E.H., Chuang, H., Källberg, M., Kumar, S.A., Liao, A., Little, K.M., Strömberg, M.P., and Tanner, S.W. 2013. Isaac: Ultra‐fast whole‐genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29:2041‐2043. doi: 10.1093/bioinformatics/btt314.
  Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlén, M., and Nyrén, P. 1996. Real‐time DNA sequencing using detection of pyrophosphate release. Anal. Biochem. 242:84‐89. doi: 10.1006/abio.1996.0432.
  Sanger, F., Nicklen, S., and Coulson, A.R. 1977. DNA sequencing with chain‐terminating inhibitors. PNAS 74:5463‐5467. doi: 10.1073/pnas.74.12.5463.
  Saunders, C.T., Wong, W.S., Swamy, S., Becq, J., Murray, L.J., and Cheetham, R.K. 2012. Strelka: Accurate somatic small‐variant calling from sequenced tumor‐normal sample pairs. Bioinformatics 28:1811‐7. doi: 10.1093/bioinformatics/bts271.
  Shendure, J.A., Porreca, G.J., Church, G.M., Gardner, A.F., Hendrickson, C.L., Kieleczawa, J., and Slatko, B.E. 2011. Overview of DNA sequencing strategies. Curr. Protoc. Mol. Biol. 96:7.1.1‐7.1.23. doi: 10.1002/0471142727.mb0701s96.
  Slatko, B.E., Albright, L.M., Tabor, S., and Ju, J. 1999. DNA sequencing by the dideoxy method. Curr. Protoc. Mol. Biol. 47:7.4A:7.4A.1‐7.4A.39. doi: 10.1002/0471142727.mb0704as47.
  Touchman, J.W. 2009. DNA sequencing: An outsourcing guide. Curr. Protoc. Essent. Lab. Tech. 2:12.1.1‐12.1.19.
  Van Dijk, E.L., Jaszczyszyn, Y., and Thermes, C. 2014. Library preparation methods for next‐generation sequencing: Tone down the bias. Exp. Cell Res. 322:12‐20. doi: 10.1016/j.yexcr.2014.01.008.
  Watson, J.D. and Crick, F.H. 1953. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature 171:737‐738. doi: 10.1038/171737a0.
  Wong, K.H., Jin, Y., and Moqtaderi, Z. 2013. Multiplex Illumina sequencing using DNA barcoding. Curr. Protoc. Mol. Biol. 101:7.11.1‐7.11.11. doi: 10.1002/0471142727.mb0711s101.
  Wu, R. 1970. Nucleotide sequence analysis of DNA. I. Partial sequence of the cohesive ends of bacteriophage lambda and 186 DNA. J. Mol. Biol. 51:501‐521. doi: 10.1016/0022‐2836(70)90004‐5. [I have taken date from their citation]
  Xu, H., DiCarlo, J., Satya, R.V., Peng, Q., and Wang, Y. 2014. Comparison of somatic mutation calling methods in amplicon and whole exome sequence data. BMC Genomics 15:244. doi: 10.1186/1471‐2164‐15‐244.
Key References
  Raczy et al., 2013. See above.
  This is the original Isaac aligner and Isaac Variant Caller paper that explains the methodology and compares it with the BWA and GATK combination for alignment and variant calling.
Internet Resources
  Manta User Guide (v0.18.1). 2014.
  A general introduction to Illumina Sequencing By Synthesis (SBS) sequencing technology.
  A comprehensive overview of sequencing techniques, including RNA‐seq, DNA‐seq, and others.
  A look at some common applications of NGS including CNV detection, methylation detection, ChIP, and other techniques.
  A collection of public and example data housed within BaseSpace. A user can browse, search, or filter for specific types of public datasets. Once the dataset is found, a user can choose to import one or all of the datasets. Users may also choose to employ these data to test different applications or analyses within BaseSpace. These data are considered to be shared with the user, and all rules and restrictions apply accordingly.
  A guide to Tumor‐Normal NGS analysis services from Illumina.
  Platinum Genomes is a collection of manually curated truth datasets that can be used to evaluate the quality of an analysis pipeline.
PDF or HTML at Wiley Online Library