Quality Control for the Illumina HumanExome BeadChip

Robert P. Igo1, Jessica N. Cooke Bailey1, Jane Romm2, Jonathan L. Haines3, Janey L. Wiggs4

1 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, 2 Center for Inherited Disease Research, Johns Hopkins University, Baltimore, Maryland, 3 Institute of Computational Biology, Case Western Reserve University, Cleveland, Ohio, 4 Department of Ophthalmology, Harvard Medical School, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 2.14
DOI:  10.1002/cphg.15
Online Posting Date:  July, 2016
The Illumina HumanExome BeadChip and other exome‐based genotyping arrays offer inexpensive genotyping of some 240,000 mostly nonsynonymous coding variants across the human genome. The HumanExome chip, with its highly non‐uniform distribution of markers and emphasis on rare coding variants, presents some unique challenges for quality control (QC) and data cleaning. Here, we describe QC procedures for HumanExome data, with examples of challenges specific to exome arrays from our experience cleaning a data set of ∼7,500 samples from the NEIGHBORHOOD Consortium. We focus on standard procedures for QC of genome‐wide array data including genotype calling, sex verification, sample identity verification, relationship checking, and population structure that are complicated by the HumanExome panel's enrichment in rare, exonic variation. © 2016 by John Wiley & Sons, Inc.

Keywords: quality control; genetic association studies; exome arrays; Illumina HumanExome BeadChip; NEIGHBORHOOD Consortium

Table of Contents

  • Introduction
  • Genotyping and Initial QC
  • Sample Quality
  • Marker Quality
  • Acknowledgments
  • Figures
  • Tables
Literature Cited

Key References
  Turner et al., 2011. See above.
  Comprehensive overview of common quality‐control screens for genome‐wide association data.
  Guo et al., 2012. See above.
  Detailed protocols for processing and quality control of Illumina HumanExome chip data, with
  Grove et al., 2013. See above.
  CHARGE Consortium quality control criteria for genotype calling on a combined set of 62,000 individuals genotyped at multiple study centers, with links to CHARGE Consortium resources for the HumanExome chip, including genotypes for HapMap samples and a custom GenomeStudio clustering (.egt) file.
Internet Resources
  Description of the development and contents of the exome‐based marker array implemented as the Illumina HumanExome chip.
  List of proposed markers for the HumanExome chip with role of variants in the panel (e.g., nonsynonymous coding variant, splice variant, AIM, linkage grid marker).
  Source for PLINK software download, documentation, and tutorials.
  Source for the R software and R packages.
  Resources for Illumina HumanExome data processing developed by the CHARGE Consortium.
  Source for zCall scripts and protocols.
  Source for extensions to the GenomeStudio software for Illumina genotype calling and QC.
  Illumina algorithm for assigning the TOP strand.
  Correspondence between the Illumina TOP strand and the dbSNP + strand for HumanExome markers.
  Accessory files for the HumanExome BeadChip, including cluster (.egt) files, gene annotation data, rsID information, and changes made between array versions.
