Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection

Daniel C. Koboldt1, David E. Larson1, Richard K. Wilson1

1 The Genome Institute at Washington University, St. Louis, Missouri
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 15.4
DOI:  10.1002/0471250953.bi1504s44
Online Posting Date:  December, 2013
The identification of small sequence variants remains a challenging but critical step in the analysis of next‐generation sequencing data. Our variant‐calling tool, VarScan 2, employs heuristic and statistic thresholds based on user‐defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy‐number alterations, and LOH events in tumor‐normal pairs; and (3) identify germline variants, de novo mutations, and Mendelian inheritance errors in family trios. Further, we describe a strategy for variant filtering that removes likely false positives associated with common sequencing‐ and alignment‐related artifacts. Curr. Protoc. Bioinform. 44:15.4.1‐15.4.17. © 2013 by John Wiley & Sons, Inc.

Keywords: variant calling; mutation detection; trio calling; snvs; indels; varscan 2; next‐generation sequencing

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: Germline Variant Calling in Individual or Pooled Samples
  • Alternate Protocol 1: Germline Variant Calling in a Cohort of Individuals
  • Basic Protocol 2: Somatic Mutation Detection in Tumor‐Normal Pairs
  • Support Protocol 1: Filtering to Remove False Positives
  • Alternate Protocol 2: Somatic Copy‐Number Alteration Detection in Tumor‐Normal Pairs
  • Basic Protocol 3: Pedigree‐Aware Calling of Family Trios
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
Literature Cited

Key References
  Koboldt et al., 2010. See above.
  This review article offers guidelines for next‐generation sequencing data analysis while highlighting some of the important challenges of human genome resequencing with NGS technologies.
  Koboldt et al., 2012a. See above.
  The VarScan 2 publication describes the algorithm's underlying methodology and showcases its performance (variant calling, mutation detection, somatic CNA detection, and false positive filtering) using exome data from tumor‐normal pairs.
Internet Resources
  VarScan Web site.
  SAMtools Web site.‐readcount
  bam‐readcount Web site.
  SAM/BAM format specification.
  Variant Call Format (VCF) specification.
