Quality Control for the Illumina HumanExome BeadChip

Robert P. Igo1, Jessica N. Cooke Bailey1, Jane Romm2, Jonathan L. Haines3, Janey L. Wiggs4

1 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, 2 Center for Inherited Disease Research, Johns Hopkins University, Baltimore, Maryland, 3 Institute of Computational Biology, Case Western Reserve University, Cleveland, Ohio, 4 Department of Ophthalmology, Harvard Medical School, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 2.14
DOI:  10.1002/cphg.15
Online Posting Date:  July, 2016
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

The Illumina HumanExome BeadChip and other exome‐based genotyping arrays offer inexpensive genotyping of some 240,000 mostly nonsynonymous coding variants across the human genome. The HumanExome chip, with its highly non‐uniform distribution of markers and emphasis on rare coding variants, presents some unique challenges for quality control (QC) and data cleaning. Here, we describe QC procedures for HumanExome data, with examples of challenges specific to exome arrays from our experience cleaning a data set of ∼7,500 samples from the NEIGHBORHOOD Consortium. We focus on standard procedures for QC of genome‐wide array data including genotype calling, sex verification, sample identity verification, relationship checking, and population structure that are complicated by the HumanExome panel's enrichment in rare, exonic variation. © 2016 by John Wiley & Sons, Inc.

Keywords: quality control; genetic association studies; exome arrays; Illumina HumanExome BeadChip; NEIGHBORHOOD Consortium

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Genotyping and Initial QC
  • Sample Quality
  • Marker Quality
  • Acknowledgments
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
  Babron, M.‐C., de Tayrac, M., Rutledge, D.N., Zeggini, E., and Génin, E. 2012. Rare and low frequency variant stratification in the UK population: Description and impact on association tests. PLoS ONE 7:e46519. doi: 10.1371/journal.pone.0046519.
  Chen, C.‐Y., Pollack, S., Hunter, D.J., Hirschhorn, J.N., Kraft, P., and Price, A.L. 2013. Improved ancestry inference using weights from external reference panels. Bioinformatics 29:1399‐1406. doi: 10.1093/bioinformatics/btt144.
  Edwards, T.L. and Gao, X. 2012. Methods for detecting and correcting for population stratification. Curr. Protoc. Hum. Genet. 73:1.22.1‐1.22.14. doi: 10.1002/0471142905.hg0122s73.
  Forsberg, L.A., Rasi, C., Malmqvist, N., Davies, H., Pasupulati, S., Pakalapati, G., Sandgren, J., de Ståhl, T.D., Zaghlool, A., Giedraitis, V., Lannfelt, L., Score, J., Cross, N.C.P., Absher, D., Janson, E.T., Lindgren, C.M., Morris, A.P., Ingelsson, E., Lind, L., and Dumanski, J.P. 2008. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet. 46:624‐628. doi: 10.1038/ng.2966.
  Geschwind, D.H., Boone, K.B., Miller, B.L., and Swerdloff, R.S. 2000. Neurobehavioral phenotype of Kleinfelter syndrome. Ment. Retard. Dev. Disabil. Res. Rev. 6:107‐116. doi: 10.1002/1098‐2779(2000)6:2%3c107::AID‐MRDD4%3e3.0.CO;2‐2.
  Goldstein, J.I., Crenshaw, A., Carey, J., Grant, G.B., Maguire, J., Fromer, M., O'Dushlaine, C., Moran, J.L., Chambert, K., Stevens, C., Swedish Schizophrenia Consortium, ARRA Autism Sequencing Collaboration, Sklar, P., Hultman, C.M., Purcell, S., McCarroll, S.A., Sullivan, P.F., Daly, M.J., and Neale, B.M. 2012. zCall: A rare variant caller for array‐based genotyping. Bioinformatics 28:2543‐2545. doi: 10.1093/bioinformatics/bts479.
  Grove, M.L., Yu, B., Cochran, B.J., Haritunians, T., Bis, J.C., Taylor, K.D., Hansen, M., Borecki, I.B., Cupples, L.A., Fornage, M., Gudnason, V., Harris, T.B., Kathiresan, S., Kraaij, R., Launer, L.J., Levy, D., Liu, Y., Mosley, T., Peloso, G.M., Psaty, B.M., Rich, S.S., Rivadeneira, F., Siscovick, D., Smith, A.V., Uitterlinden, A., van Duijn, C., Wilson, J.G., O'Donnell, C.J., Rotter, J.I., and Boerwinkle, E. 2013. Best practices and joint calling of the HumanExome BeadChip: The CHARGE Consortium. PLoS ONE 8:e68095. doi: 10.1371/journal.pone.0068095.
  Guo, X., Liu, Z., Wang, X., and Zhang, H. 2012. Genetic association test for multiple traits at gene level. Genet. Epidemiol. 37:122‐129. doi: 10.1002/gepi.21688.
  Guo, Y., He, J., Zhao, S., Wu, H., Zhong, X., Sheng, Q., Samuels, D.C., Shyr, Y., and Long, J. 2014. Illumina human exome genotyping array clustering and quality control. Nat. Protoc. 9:2643‐2662. doi: 10.1038/nprot.2014.174.
  Hindorff, L.A., MacArthur, J., Morales, J., Junkins, H.A., Hall, P.N., Klemm, A.K., and Manolio, T.A. 2016. A catalog of published genome‐wide association studies. http://www.genome.gov/gwastudies. Accessed January 31, 2016.
  Howie, B.N., Donnelly, P., and Marchini, J. 2009. A flexible and accurate genotype imputation method for the next generation of genomewide association studies. PLoS Genet. 5:e1000529. doi: 10.1371/journal.pgen.1000529.
  Jun, G., Flickinger, M., Hetrick, K.N., Romm, J.M., Doheny, K.F., Abecasis, G.R., Boehnke, M., and Kang, H.M. 2012. Detecting and estimating contamination of human DNA samples in sequencing and array‐based genotype data. Am. J. Hum. Genet. 91:839‐848. doi: 10.1016/j.ajhg.2012.09.004.
  Laurie, C.C., Doheny, K.F., Mirel, D.B., Pugh, E.W., Bierut, L.J., Bhangale, T., Boehm, F., Caporaso, N.E., Cornelis, M.C., Edenberg, H.J., Gabriel, S.B., Harris, E.L., Hu, F.B., Jacobs, K.B., Kraft, P., Landi, M.T., Lumley, T., Manolio, T.A., McHugh, C., Painter, I., Paschall, J., Rice, J.P., Rice, K.M., Zheng, X., Weir, B.S., and for the GENEVA Investigators. 2010. Quality control and quality assurance in genotypic data for genome‐wide association studies. Genet. Epidemiol. 34:591‐602. doi: 10.1002/gepi.20516.
  Lee, S., Abecasis, G., Boehnke, M., and Lin, X. 2014. Rare‐variant association analysis: Study designs and statistical tests. Am. J. Hum. Genet. 95:5‐23. doi: 10.1016/j.ajhg.2014.06.009.
  Li, Y., Willer, C.J., Ding, J., Scheet, P., and Abecasis, G.R. 2010. MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34:816‐834. doi: 10.1002/gepi.20533.
  Manichaikul, A., Mychaleckyj, J., Rich, S.S., Daly, K., Sale, M., and Chen, W.M. 2010. Robust relationship inference in genome‐wide association studies. Bioinformatics 26:2867‐2873. doi: 10.1093/bioinformatics/btq559.
  Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., Cho, J.H., Guttmacher, A.E., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C.N., Slatkin, M., Valle, D., Whittemore, A.S., Boehnke, M., Clark, A.G., Eichler, E.E., Gibson, G., Haines, J.L., Mackay, T.F.C., McCarroll, S.A., and Visscher, P.M. 2009. Finding the missing heritability of complex diseases. Nature 461:747‐753. doi: 10.1038/nature08494.
  Milligan, B.G. 2003. Maximum‐likelihood estimation of relatedness. Genetics 163:1153‐1167.
  Page, C.M., Baranzini, S.E., Mevik, B.‐H., Bos, S.D., Harbo, H.F., and Andreassen, B.K. 2015. Assessing the power of exome chips. PLoS ONE 10:e0139642. doi: 10.1371/journal.pone.0139642.
  Patterson, N., Price, A.L., and Reich, D. 2006. Population structure and eigenanalysis. PLoS Genet. 2:e190. doi: 10.1371/journal.pgen.0020190.
  Perreault, L.‐P.L., Legault, M.‐A., Barhdadi, A., Provost, S., Normand, V., Tardif, J.‐C., and Dubé, M.‐P. 2014. Comparison of genotype clustering tools with rare variants. BMC Bioinformat. 15:52. doi: 10.1186/1471‐2105‐15‐52.
  Porcu, E., Sanna, S., Fuchsberger, C., and Fritsche, L.G. 2013. Genotype imputation in genome‐wide association studies. Curr. Protoc. Hum. Genet. 78:1.25.1‐1.25.14. doi: 10.1002/0471142905.hg0125s78.
  Purcell, S., Neale, B., Todd‐Brow, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., and Sham, P.C. 2007. PLINK: A toolset for whole‐genome association and population‐based linkage analysis. Am. J. Hum. Genet. 81:559‐575. doi: 10.1086/519795.
  Romm, J., McMullen, I., Jewell, M., Zhang, J., Zhang, P., Ling, H., Pugh, E., and Doheny, K.F. 2013. Abstract #1916. Presented at the American Society of Human Genetics 63rd Annual Meeting, Boston, Massachusetts. October 22‐26, 2013.
  Turner, S., Armstrong, L.L., Bradford, Y., Carlson, C.S., Crawford, D.C., Crenshaw, A.T., de Andrade, M., Doheny, K.F., Haines, J.L., Hayes, G., Jarvik, G., Jiang, L., Kullo, I.J., Li, R., Ling, H., Manolio, T.A., Matsumoto, M., McCarty, C.A., McDavid, A.N., Mirel, D.B., Paschall, J.E., Pugh, E.W., Rasmussen, L.V., Wilke, R.A., Zuvich, R.L., and Ritchie, M.D. 2011. Quality control procedures for genome‐wide association studies. Curr. Protoc. Hum. Genet. 68:1.19.11‐1.19.18. doi: 10.1002/0471142905.hg0119s68.
  Veerappa, A.M., Padakannaya, P., and Ramachandra, N.B. 2013. Copy number variation‐based polymorphism in a new pseudoautosomal region 3 (PAR3) of a human X‐chromosome‐transposed region (XTR) in the Y chromosome. Funct. Integr. Genomics 13:285‐293. doi: 10.1007/s10142‐013‐0323‐6.
  Wellcome Trust Case Control Consortium. 2010. Genome‐wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464:713‐722. doi: 10.1038/nature08979.
  Wiggs, J.L., Yaspan, B.L., Hauser, M.A., Kang, J.H., Allingham, R.R., Olson, L.M., Abdrabou, W., Fan, B.J., Wang, D.Y., Brodeur, W., Budenz, D.L., Caprioli, J., Crenshaw, A., Crooks, K., DelBono, E., Doheny, K.F., Friedman, D.S., Gaasterland, D., Gaasterland, T., Laurie, C., Lee, R.K., Lichter, P.R., Loomis, S., Liu, Y., Medeiros, F.A., McCarty, C., Mirel, D., Moroi, S.E., Musch, D.C., Realini, A., Rozsa, F.W., Schuman, J.S., Scott, K., Singh, K., Stein, J.D., Trager, E.H., VanVeldhuisen, P., Vollrath, D., Wollstein, G., Yoneyama, S., Zhang, K., Weinreb, R.N., Ernst, J., Kellis, M., Masuda, T., Zack, D., Richards, J.E., Pericak‐Vance, M.A., Pasquale, L.R., and Haines, J.L. 2012. Common variants at 9p21 and 8q22 are associated with increased susceptibility to optic nerve degeneration in glaucoma. PLoS Genet. 8:e1002654. doi: 10.1371/journal.pgen.1002654.
  Wiggs, J.L., Hauser, M.A., Abdrabou, W., Allingham, R.R., Budenz, D.L., DelBono, E., Friedman, D.S., Kang, J.H., Gaasterland, D., Gaasterland, T., Lee, R.K., Lichter, P.R., Loomis, S., Liu, Y., McCarty, C., Medeiros, F.A., Moroi, S.E., Olson, L.M., Realini, A., Richards, J.E., Rozsa, F., Schuman, J.S., Singh, K., Stein, J.D., Vollrath, D., Weinreb, R.N., Wollstein, G., Yaspan, B.L., Yoneyama, S., Zack, D., Zhang, K., Pericak‐Vance, M.A., Pasquale, L.R., and Haines, J.L. 2013. The NEIGHBOR Consortium primary open angle glaucoma genome‐wide association study: Rationale, study design and clinical variables. J. Glaucoma 22:517‐525. doi: 10.1097/IJG.0b013e31824d4fd8.
  Zeng, Z., Weeks, D.E., Chen, W., Mukhopadhyay, N., and Feingold, E. 2016. A pipeline for classifying relationships using dense SNP/SNV data and putative pedigree information. Genet. Epidemiol. 40:161‐171. doi:10.1002/gepi.21948.
  Zhang, Y., Shen, X., and Pan, W. 2013. Adjusting for population stratification in a fine scale with principal components and sequencing data. Genet. Epidemiol. 37:787‐801. doi: 10.1002/gepi.21764.
Key References
  Turner et al., 2011. See above.
  Comprehensive overview of common quality‐control screens for genome‐wide association data.
  Guo et al., 2012. See above.
  Detailed protocols for processing and quality control of Illumina HumanExome chip data, with
  Grove et al., 2013. See above.
  CHARGE Consortium quality control criteria for genotype calling on a combined set of 62,000 individuals genotyped at multiple study centers, with links to CHARGE Consortium resources for the HumanExome chip, including genotypes for HapMap samples and a custom GenomeStudio clustering (.egt) file.
Internet Resources
  http://genome.sph.umich.edu/wiki/Exome_Chip_Design
  Description of the development and contents of the exome‐based marker array implemented as the Illumina HumanExome chip.
  ftp://share.sph.umich.edu/exomeChip/IlluminaDesigns/annotatedList.txt
  List of proposed markers for the HumanExome chip with role of variants in the panel (e.g., nonsynonymous coding variant, splice variant, AIM, linkage grid marker).
  http://pngu.mgh.harvard.edu/∼purcell/plink/index.shtml
  Source for PLINK software download, documentation, and tutorials.
  https://cran.r‐project.org
  Source for the R software and R packages.
  http://www.chargeconsortium.com/main/exomechip
  Resources for Illumina HumanExome data processing developed by the CHARGE Consortium.
  https://github.com/jigold/zCall
  Source for zCall scripts and protocols.
  http://support.illumina.com/array/array_software/genomestudio/downloads.html
  Source for extensions to the GenomeStudio software for Illumina genotype calling and QC.
  http://www.illumina.com/documents/products/technotes/technote_topbot.pdf
  Illumina algorithm for assigning the TOP strand.
  http://www.well.ox.ac.uk/∼wrayner/strand/
  Correspondence between the Illumina TOP strand and the dbSNP + strand for HumanExome markers.
  https://support.illumina.com/array/array_kits/infinium_humanexome_beadchip_kit/downloads.html
  Accessory files for the HumanExome BeadChip, including cluster (.egt) files, gene annotation data, rsID information, and changes made between array versions.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library