Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection

Daniel C. Koboldt1, David E. Larson1, Richard K. Wilson1

1 The Genome Institute at Washington University, St. Louis, Missouri
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 15.4
DOI:  10.1002/0471250953.bi1504s44
Online Posting Date:  December, 2013
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The identification of small sequence variants remains a challenging but critical step in the analysis of next‐generation sequencing data. Our variant‐calling tool, VarScan 2, employs heuristic and statistic thresholds based on user‐defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy‐number alterations, and LOH events in tumor‐normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios. Further, we describe a strategy for variant filtering that removes likely false positives associated with common sequencing‐ and alignment‐related artifacts. Curr. Protoc. Bioinform. 44:15.4.1‐15.4.17. © 2013 by John Wiley & Sons, Inc.

Keywords: variant calling; mutation detection; trio calling; snvs; indels; varscan 2; next‐generation sequencing

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: Germline Variant Calling in Individual or Pooled Samples
  • Alternate Protocol 1: Germline Variant Calling in a Cohort of Individuals
  • Basic Protocol 2: Somatic Mutation Detection in Tumor‐Normal Pairs
  • Support Protocol 1: Filtering to Remove False Positives
  • Alternate Protocol 2: Somatic Copy‐Number Alteration Detection in Tumor‐Normal Pairs
  • Basic Protocol 3: Pedigree‐Aware Calling of Family Trios
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

  1000 Genomes Project Consortium, T. 2010. A map of human genome variation from population‐scale sequencing. Nature 467:1061‐1073.
  Albers, C.A., Lunter, G., MacArthur, D.G., McVean, G., Ouwehand, W.H., and Durbin, R. 2010. Dindel: Accurate indel calls from short‐read data. Genome Res. 21:961‐973.
  Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., McVean, G., and Durbin, R.; 1000 Genomes Project Analysis Group. 2011. The variant call format and VCFtools. Bioinformatics 27: 2156‐2158.
  Koboldt, D.C., Chen, K., Wylie, T., Larson, D.E., McLellan, M.D., Mardis, E.R., Weinstock, G.M., Wilson, R.K., and Ding, L. 2009. VarScan: Variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 25:2283‐2285.
  Koboldt, D.C., Ding, L., Mardis, E.R., and Wilson, R.K. 2010. Challenges of sequencing human genomes. Brief. Bioinform. 11:484‐498.
  Koboldt, D.C., Zhang, Q., Larson, D.E., Shen, D., McLellan, M.D., Lin, L., Miller, C.A., Mardis, E.R., Ding, L., and Wilson, R.K. 2012a. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22:568‐576.
  Koboldt, D.C., Larson, D.E., Chen, K., Ding, L., and Wilson, R.K. 2012b. Massively parallel sequencing approaches for characterization of structural variation. Methods Mol. Biol. 838:369‐384.
  Larson, D.E., Harris, C.C., Chen, K., Koboldt, D.C., Abbott, T.E., Dooling, D.J., Ley, T.J., Mardis, E.R., Wilson, R.K., and Ding, L. 2012. SomaticSniper: Identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28: 311‐317.
  Li, H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:2987‐2993.
  Li, B., Chen, W., Zhan, X., Busonero, F., Sanna, S., Sidore, C., Cucca, F., Kang, H.M., Abecasis, G.R. 2012. A likelihood‐based framework for variant calling and de novo mutation detection in families. PLoS Genet. 8:e1002944.
  Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.; 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078‐2079.
  Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., and Wang, J. 2009. SNP detection for massively parallel whole‐genome resequencing. Genome Res. 19:1124‐1132.
  Li, S., Li, R., Li, H., Lu, J., Li, Y., Bolund, L., Schierup, M.H., and Wang, J. 2013. SOAPindel: Efficient identification of indels from short paired reads. Genome Res. 23:195‐200.
  Mardis, E.R. 2008. Next‐generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9:387‐402.
  McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Res. 20:1297‐1303.
  Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M.R., Zschocke, J., Trajanoski, Z. 2013. A survey of tools for variant analysis of next‐generation genome sequencing data. Brief. Bioinform. Jan 21. [Epub ahead of print].
  Roach, J.C., Glusman, G., Smit, A.F., Huff, C.D., Hubley, R., Shannon, P.T., Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., Shendure, J., Drmanac, R., Jorde, L.B., Hood, L., and Galas, D.J. 2010. Analysis of genetic inheritance in a family quartet by whole‐genome sequencing. Science 328:636‐639.
  Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24‐26.
  Saunders, C.T., W.S. Wong, W.S., Swamy, S., Becq, J., Murray, L.J., and Cheetham, R.K. 2012. Strelka: Accurate somatic small‐variant calling from sequenced tumor‐normal sample pairs. Bioinformatics 28:1811‐1817.
  Shen, Y., Wan, Z., Coarfa, C., Drabek, R., Chen, L., Ostrowski, E.A., Liu, Y., Weinstock, G.M., Wheeler, D.A., Gibbs, R.A., and Yu, F. 2010. A SNP discovery method to assess variant allele probability from next‐generation resequencing data. Genome Res. 20:273‐280.
  Stead, L.F., Sutton, K.M., Taylor, G.R., Quirke, P., and Rabbitts, P. 2013. Accurately identifying low‐allelic fraction variants in single samples with next‐generation sequencing: Applications in tumor subclone resolution. Hum. Mutat. 20:273‐280.
  Wei, Z., Wang, W., Hu, P., Lyon, G.J., and Hakonarson, H. 2011. SNVer: A statistical tool for variant calling in analysis of pooled or individual next‐generation sequencing data. Nucleic Acids Res. 39:e132.
  Ye, K., Schulz, M.H., Long, Q., Apweiler, R., and Ning, Z. 2009. Pindel: A pattern growth approach to detect break points of large deletions and medium sized insertions from paired‐end short reads. Bioinformatics 25:2865‐2871.
Key References
  Koboldt et al., 2010. See above.
  This review article offers guidelines for next‐generation sequencing data analysis while highlighting some of the important challenges of human genome resequencing with NGS technologies.
  Koboldt et al., 2012a. See above.
  The VarScan 2 publication describes the algorithm's underlying methodology and showcases its performance (variant calling, mutation detection, somatic CNA detection, and false positive filtering) using exome data from tumor‐normal pairs.
Internet Resources
  VarScan Web site.
  SAMtools Web site.‐readcount
  bam‐readcount Web site.
  SAM/BAM format specification.
  Variant Call Format (VCF) specification.
PDF or HTML at Wiley Online Library