Using VAAST to Identify Disease‐Associated Variants in Next‐Generation Sequencing Data

Brett Kennedy1, Zev Kronenberg1, Hao Hu1, Barry Moore2, Steven Flygare2, Martin G. Reese3, Lynn B. Jorde2, Mark Yandell2, Chad Huff4

1 These authors collectively are the first authors of the unit, 2 Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, 3 Omicia, Inc, Emeryville, California, 4 Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 6.14
DOI:  10.1002/0471142905.hg0614s81
Online Posting Date:  April, 2014
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The VAAST pipeline is specifically designed to identify disease‐associated alleles in next‐generation sequencing data. In the protocols presented in this paper, we outline the best practices for variant prioritization using VAAST. Examples and test data are provided for case‐control, small pedigree, and large pedigree analyses. These protocols will teach users the fundamentals of VAAST, VAAST 2.0, and pVAAST analyses. Curr. Protoc. Hum. Genet. 81:6.14.1‐6.14.25. © 2014 by John Wiley & Sons, Inc.

Keywords: VAAST; rare‐variant association test; variant classification; disease‐gene identification; next‐generation sequencing; genome‐wide association studies; human disease; genomics; computational genomics; bioinformatics

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Variant Calling
  • Support Protocol 1: Obtaining and Installing VAAST
  • Basic Protocol 2: Using VAAST with Case‐Control Data
  • Basic Protocol 3: Using VAAST with Pedigrees
  • Basic Protocol 4: Using pVAAST With Pedigree Data
  • Alternate Protocol 1: Accessing VAAST Through Opal
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., and McVean, G.A. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491:56‐65.
  Challis, D., Yu, J., Evani, U.S., Jackson, A.R., Paithankar, S., Coarfa, C., Milosavljevic, A., Gibbs, R.A., and Yu, F. 2012. An integrative variant analysis suite for whole exome next‐generation sequencing data. BMC Bioinformatics 13:8.
  Coonrod, E.M., Margraf, R.L., Russell, A., Voelkerding, K.V, and Reese, M.G. 2013. Clinical analysis of genome next‐generation sequencing data using the Omicia platform. Exp. Rev. Mol. Diagn. 13:529‐540.
  DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V, Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., and Daly, M.J. 2011. A framework for variation discovery and genotyping using next‐generation DNA sequencing data. Nat. Genet. 43:491‐498.
  Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G., et al. 2010. Human genome sequencing using unchained base reads on self‐assembling DNA nanoarrays. Science 327:78‐81.
  Eilbeck, K., Lewis, S.E., Mungall, C.J., Yandell, M., Stein, L., Durbin, R., and Ashburner, M. 2005. The Sequence Ontology: A tool for the unification of genome annotations. Genome Biol. 6:R44.
  Hu, H., Huff, C.D., Moore, B., Flygare, S., Reese, M.G., and Yandell, M. 2013. VAAST 2.0: Improved variant classification and disease‐gene identification using a conservation‐controlled amino acid substitution matrix. Genet. Epidemiol. 37:622‐634.
  Hu, H., Roach, J.C., Coon, H., Guthery, S.L., Voelkerding, K.V., Margraf, R.L., Durtschi, J.D., Tavtigian, S.V., Shankaracharya, Wu, W., Scheet, P., Wang, S., Xing, J., Glusman, G., Hubley, R., Li, H., Garg, V., Moore, B., Hood, L., Galas, D.J., Srivastava, D., Reese, M.G., Jorde, L.B., Yandell, M., and Huff, C.D. 2014. A unified test of linkage analysis and rare‐variant association. Nature Biotech. In press.
  Kamphans, T., Sabri, P., Zhu, N., Heinrich, V., Mundlos, S., Robinson, P.N., Parkhomchuk, D., and Krawitz, P.M. 2013. Filtering for compound heterozygous sequence variants in non‐consanguineous pedigrees. PloS One 8:e70151.
  Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC table browser data retrieval tool. Nucleic Acids Res. 32:D493‐D496.
  Kohane, I. S., Hsing, M., and Kong, S.W. 2012. Taxonomizing, sizing, and overcoming the incidentalome. Genet. Med. 14:399‐404.
  Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860‐921.
  Li, H. and Homer, N. 2010. A survey of sequence alignment algorithms for next‐generation sequencing. Brief. Bioinform. 11:473‐483.
  Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.; 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078‐2079.
  McElroy, J.J., Gutman, C.E., Shaffer, C.M., Busch, T.D., Puttonen, H., Teramo, K., Murray, J.C., Hallman, M., and Muglia, L.J. 2013. Maternal coding variants in complement receptor 1 and spontaneous idiopathic preterm birth. Hum. Genet. 132:935‐942.
  McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Res. 20:1297‐1303.
  Meyer, L.R., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Kuhn, R.M., Wong, M., Sloan, C.A., Rosenbloom, K.R., Roe, G., Rhead, B., Raney, B.J., Pohl, A., Malladi, V.S., Li, C.H,. Lee, B.T., Learned, K., Kirkup, V., Hsu, F., Heitner, S., Harte, R.A., Haeussler, M., Guruvadoo, L., Goldman, M., Giardine, B.M., Fujita, P.A., Dreszer, T.R., Diekhans, M., Cline, M.S., Clawson, H., Barber, G.P., Haussler, D., and Kent, W.J. 2013. The UCSC Genome Browser database: Extensions and updates 2013. Nucleic Acids Res. 41:D64‐D69.
  Nielsen, R., Paul, J.S., Albrechtsen, A., and Song, Y.S. 2011. Genotype and SNP calling from next‐generation sequencing data. Nat. Rev. Genet. 12:443‐451.
  O'Rawe, J., Jiang, T., Sun, G., Wu, Y., Wang, W., Hu, J., Bodily, P., Tian, L., Hakonarson, H., Johnson, W.E., Wei, Z., Wang, K., and Lyon, G.J. 2013. Low concordance of multiple variant‐calling pipelines: Practical implications for exome and genome sequencing. Genome Med. 5:28.
  Online Mendelian Inheritance in Man, OMIM. 2013. McKusick‐Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Md.
  Pabinger, S., Dander, A., Fischer, M., Snajder, R., Sperk, M., Efremova, M., Krabichler, B., Speicher, M.R., Zschocke, J., and Trajanoski, Z. 2013. A survey of tools for variant analysis of next‐generation genome sequencing data. Brief. Bioinform. 14:1‐23.
  Purcell, S., Neale, B., Todd‐Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., De Bakker, P.I.W., Daly, M.J., and Sham, P.C. 2007. PLINK: A tool set for whole‐genome association and population‐based linkage analyses. Am. J. Hum. Genet. 81:559‐575.
  Reese, M.G., Moore, B., Batchelor, C., Salas, F., Cunningham, F., Marth, G.T., Stein, L., Flicek, P., Yandell, M., and Eilbeck, K. 2010. A standard variation file format for human genome sequences. Genome Biol. 11:R88.
  Roach, J.C., Glusman, G., Smit, A.F.A., Huff, C.D., Hubley, R., Shannon, P.T., Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., Shendure, J., Drmanac, R., Jorde, L.B., Hood, L., and Galas, D.J. 2010. Analysis of genetic inheritance in a family quartet by whole‐genome sequencing. Science 328:636‐639.
  Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24‐26.
  Rope, A.F., Wang, K., Evjenth, R., Xing, J., Johnston, J.J., Swensen, J.J., Johnson, W.E., Moore, B., Huff, C.D., Bird, L.M., Care, J.C., Opitz, J.M., Stevens, C.A., Schank, C. Fain, H.D., Robison, R., Dalley, B., Chin, S., South, S.T., Pysher, T.J., Jorde, L.B., Hakonarson, H. Lillehaug, J.R., Biesecker, L.G., Yandell, M., Arnesen, T., and Lyon, G.J. 2011. Massively parallel sequencing identifies a previously unrecognized X‐linked disorder resulting in lethality in male infants owing to amino‐terminal acetyltransferase deficiency. Genome Biol. 12:P13.
  Shapiro, M.D., Kronenberg, Z., Li, C., Domyan, E.T., Pan, H., Campbell, M., Tan, H., Huff, C.D., Hu, H., Vickrey, A.I., Nielsen, S.C., Stringham, S.A., Hu, H., Willerslev, E., Gilbert, M.T,. Yandell, M., Zhang, G., and Wang, J. 2013. Genomic diversity and evolution of the head crest in the rock pigeon. Science 339:1063‐1067.
  Shirley, M.D., Tang, H., Gallione, C.J., Baugher, J.D., Frelin, L.P., Cohen, B., North, P.E., Marchuk, D.A., Comi, A.M., and Pevsner, J. 2013. Sturge‐Weber syndrome and port‐wine stains caused by somatic mutation in GNAQ. New Engl. J. Med. 368:1971‐1979.
  Wei, Z., Wang, W., Hu, P., Lyon, G.J., and Hakonarson, H. 2011. SNVer: A statistical tool for variant calling in analysis of pooled or individual next‐generation sequencing data. Nucleic Acids Res. 39:e132‐e132.
  Yandell, M., Huff, C., Hu, H., Singleton, M., Moore, B., Xing, J., Jorde, L.B., and Reese, M.G. 2011. A probabilistic disease‐gene finder for personal genomes. Genome Res. 21:1529‐1542.
  Yang, Z. 1995. A space‐time process model for the evolution of DNA sequences. Genetics 139:993‐1005.
PDF or HTML at Wiley Online Library