Analysis of Gene‐Gene Interactions

Brian S. Cole1, Molly A. Hall2, Ryan J. Urbanowicz1, Diane Gilbert‐Diamond3, Jason H. Moore1

1 Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, Philadelphia, 2 The Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania, 3 Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 1.14
DOI:  10.1002/cphg.45
Online Posting Date:  October, 2017
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The goal of this unit is to introduce epistasis, or gene‐gene interactions, as a significant contributor to the genetic architecture of complex traits, including disease susceptibility. This unit begins with an historical overview of the concept of epistasis and the challenges inherent in the identification of potential gene‐gene interactions. Then, it reviews statistical and machine learning methods for discovering epistasis in the context of genetic studies of quantitative and categorical traits. This unit concludes with a discussion of meta‐analysis, replication, and other topics of active research. © 2017 by John Wiley & Sons, Inc.

Keywords: epistasis; gene‐gene interaction; complex genetic traits; disease susceptibility

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Epistasis: An Historical Perspective
  • Statistical Approaches to Discover Epistasis
  • Machine Learning Approaches to Epistasis Discovery
  • Conclusions and Future Directions
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Bateson, W. (1909). Mendel's principles of heredity. Cambridge University Press. Cambridge, UK.
  Bureau, A., Dupuis, J., Falls, K., Lunetta, K. L., Hayward, B., Keith, T. P., & Van Eerdewegh, P. (2005). Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology, 28(2), 171–182. doi: 10.1002/gepi.20041.
  Calle, M. L., Urrea, V., Malats, N., & Van Steen, K. (2010). mbmdr: An R package for exploring gene‐gene interactions associated with binary or quantitative traits. Bioinformatics (Oxford, England), 26(17), 2198–2199. doi: 10.1093/bioinformatics/btq352.
  Calle, M. L., Urrea, V., Vellalta, G., Malats, N., & Steen, K. V. (2008). Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Statistics in Medicine, 27(30), 6532–6546. doi: 10.1002/sim.3431.
  Cattaert, T., Urrea, V., Naj, A. C., De Lobel, L., De Wit, V., Fu, M., … Van Steen, K. (2010). FAM‐MDR: A flexible family‐based multifactor dimensionality reduction technique to detect epistasis using related individuals. PLoS One, 5(4), e10304. doi: 10.1371/journal.pone.0010304.
  Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second‐generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience, 4, 7. doi: 10.1186/s13742‐015‐0047‐8.
  Cordell, H. J. (2002). Epistasis: What it means, what it doesn't mean, and statistical methods to detect it in humans. Human Molecular Genetics, 11(20), 2463–2468. doi: 10.1093/hmg/11.20.2463.
  Cordell, H. J. (2009). Detecting gene–Gene interactions that underlie human diseases. Nature Reviews Genetics, 10(6), 392–404. doi: 10.1038/nrg2579.
  Culverhouse, R. (2007). The use of the restricted partition method with case‐control data. Human Heredity, 63(2), 93–100. doi: 10.1159/000099181.
  De, R., Verma, S. S., Holzinger, E., Hall, M., Burt, A., Carrell, D. S., … Gilbert‐Diamond, D. (2017). Identifying gene–gene interactions that are highly associated with four quantitative lipid traits across multiple cohorts. Human Genetics, 136(2), 165–178. doi: 10.1007/s00439‐016‐1738‐7.
  De, R., Verma, S. S., Drenos, F., Holzinger, E. R., Holmes, M. V., & Hall, M. A. (2015). Identifying gene‐gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR). BioData Mining, 8(1), 41. doi: 10.1186/s13040‐015‐0074‐0.
  Donnelly, P. (2008). Progress and challenges in genome‐wide association studies in humans. Nature, 456(7223), 728–731. doi: 10.1038/nature07631.
  Edwards, T. L., Turner, S. D., Torstenson, E. S., Dudek, S. M., Martin, E. R., & Ritchie, M. D. (2010). A general framework for formal tests of interaction after exhaustive search methods with applications to MDR and MDR‐PDT T. I. A. Sorensen, PLoS One, 5(2), e9363. doi: 10.1371/journal.pone.0009363.
  Eichler, E. E., Flint, J., Gibson, G., Kong, A., Leal, S. M., Moore, J. H., & Nadeau, J. H. (2010). Missing heritability and strategies for finding the underlying causes of complex disease. Nature Reviews Genetics, 11(6), 446–450. doi: 10.1038/nrg2809.
  Evans, D. M., Marchini, J., Morris, A. P., & Cardon, L. R. (2006). Two‐stage two‐locus models in genome‐wide association. PLoS Genetics, 2(9), e157. doi: 10.1371/journal.pgen.0020157.
  Goldstein, D. B. (2009). Common genetic variation and human traits. The New England Journal of Medicine, 360(17), 1696–1698. doi: 10.1056/NEJMp0806284.
  Grady, B. J., Torstenson, E., Dudek, S. M., Giles, J., Sexton, D., & Ritchie, M. D. (2010). Finding unique filter sets in PLATO: A precursor to efficient interaction analysis in GWAS data. Pacific Symposium on Biocomputing, 315–326. Available at: Retrieved from [Accessed March 31, 2017].
  Granizo‐Mackenzie, D., & Moore, J. H. (2013). Multiple threshold spatially uniform Relief for the genetic analysis of complex human diseases (pp. 1–10). Berlin, Heidelberg: Springer. Available at: Retrieved from‐3‐642‐37189‐9_1 [Accessed June 16, 2017].
  Greene, C. S., Himmelstein, D. S., Kiralis, J., & Moore, J. H. (2010). The informative extremes: Using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics (pp. 182–193). Berlin, Heidelberg: Springer. Available at: Retrieved from‐3‐642‐12211‐8_16 [Accessed March 30, 2017].
  Gui, J., Andrew, A. S., Andrews, P., Nelson, H. M., Kelsey, K. T., Karagas, M. R., & Moore, J. H. (2011). A robust multifactor dimensionality reduction method for detecting gene‐gene interactions with application to the genetic analysis of bladder cancer susceptibility. Annals of Human Genetics, 75(1), 20–28. doi: 10.1111/j.1469‐1809.2010.00624.x.
  Hall, M. A., Moore, J. H., & Ritchie, M. D. (2016). Embracing complex associations in common traits: Critical considerations for precision medicine. Trends in Genetics: TIG, 32(8), 470–484. doi: 10.1016/j.tig.2016.06.001.
  Hancock, D. B. & Scott, W. K. (2012). Population‐based case‐control association studies. Current Protocols in Human Genetics, 74, 1.17.1–1.17.20. doi: 10.1002/0471142905.hg0117s74.
  Hindorff, L. A., Sethupathy, P., Junkins, H. A., Ramos, E. M., Mehta, J. P., Collins, F. S., & Manolio, T. A. (2009). Potential etiologic and functional implications of genome‐wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America, 106(23), 9362–9367. doi: 10.1073/pnas.0903103106.
  Hirschhorn, J. N., Lohmueller, K., Byrne, E., & Hirschhorn, K. (2002). A comprehensive review of genetic association studies. Genetics in Medicine, 4(2), 45–61. doi: 10.1097/00125817‐200203000‐00002.
  Hoh, J., Wille, A., Zee, R., Cheng, S., Reynolds, R., Lindpaintner, K., & Ott, J. (2000). Selecting SNPs in two‐stage analysis of disease association data: A model‐free approach. Annals of Human Genetics, 64(5), 413–417. doi: 10.1046/j.1469‐1809.2000.6450413.x.
  Hosmer, D. W. Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. Hoboken, NJ: John Wiley & Sons.
  Jiang, R., Tang, W., Wu, X., & Fu, W. (2009). A random forest approach to the detection of epistatic interactions in case‐control studies. BMC Bioinformatics, 10(1), S65.
  Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Proceedings of the ninth international workshop on machine learning, Aberdeen, Scotland, United Kingdom. pp. 249–256. San Francisco, CA: Morgan Kaufmann Publishers.
  Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF (171–182). Berlin, Heidelberg: Springer. Available at: Retrieved from‐540‐57868‐4_57 [Accessed June 16, 2017].
  Kruglyak, L., et al. (1999). A random forest approach to the detection of epistatic interactions in case‐control studies. Nature Genetics, 22(2), 139–144. doi: 10.1038/9642.
  Lee, S., Abecasis, G. R., Boehnke, M., & Lin, X. (2014). Rare‐variant association analysis: Study designs and statistical tests. American Journal of Human Genetics, 95(1), 5–23. doi: 10.1016/j.ajhg.2014.06.009.
  Lee, S. Y., Chung, Y., Elston, R. C., Kim, Y., & Park, T. (2007). Log‐linear model‐based multifactor dimensionality reduction method to detect gene gene interactions. Bioinformatics (Oxford, England), 23(19), 2589–2595. doi: 10.1093/bioinformatics/btm396.
  Li, W., & Reich, J. (2000). A complete enumeration and classification of two‐locus disease models. Human Heredity, 50(6), 334–349. doi: 10.1159/000022939.
  Lippert, C., Listgarten, J., Liu, Y., Kadie, C, Davidson, R., & Heckerman, D. (2011). FaST linear mixed models for genome‐wide association studies. Nature Methods, 8(10), 833–835.
  Liu, Y., Chen, Y., & Scheet, P. (2016). A meta‐analytic framework for detection of genetic interactions. Genetic Epidemiology, 40(7), 534–543. doi: 10.1002/gepi.21996.
  Lou, X.‐Y., Chen, G.‐B., Yan, L., Ma, J. Z., Zhu, J., Elston, R. C., & Li, M. D. (2007). A generalized combinatorial approach for detecting gene‐by‐gene and gene‐by‐environment interactions with application to nicotine dependence. The American Journal of Human Genetics, 80(6), 1125–1137. doi: 10.1086/518312.
  Lunetta, K. L., Hayward, L. B., Segal, J., & Van Eerdewegh, P. (2004). Screening large‐scale association study data: Exploiting interactions using random forests. BMC Genetics, 5, 32. doi: 10.1186/1471‐2156‐5‐32.
  Maher, B. (2008). Personal genomes: The case of the missing heritability. Nature, 456(7218), 18–21. doi: 10.1038/456018a.
  Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., … Visscher, P. M. (2009). Finding the missing heritability of complex diseases. Nature, 461(7265), 747–753. doi: 10.1038/nature08494.
  Martin, E., Ritchie, M., Hahn, L., Kang, S., & Moore, J. (2006). A novel method to identify gene‐gene effects in nuclear families: the MDR‐PDT. Genetic Epidemiology, 30(2), 111–123.
  McKinney, B. A., Reif, D. M., Ritchie, M. D., & Moore, J. H. (2006). Machine learning for detecting gene‐gene interactions. Applied Bioinformatics, 5(2), 77–88. doi: 10.2165/00822942‐200605020‐00002.
  Millstein, J., Conti, D. V., Gilliland, F. D., & Gauderman, W. J. (2006). A testing framework for identifying susceptibility genes in the presence of epistasis. American Journal of Human Genetics, 78, 15–27. doi: 10.1086/498850.
  Moore, J. H., & Andrews, P. C. (2015). Epistasis analysis using multifactor dimensionality reduction. In J. Moore J. & S. Williams (eds.), Epistasis: Methods in molecular biology (methods and protocols), (vol 1253, pp. 301–314). New York, NY: Humana Press.
  Moore, J. H., & Ritchie, M. D. (2004). The challenges of whole‐genome approaches to common diseases. JAMA: The Journal of the American Medical Association, 291(13), 1642–1643. doi: 10.1001/jama.291.13.1642.
  Moore, J. H., & White, B. C. (2007). Tuning relief for genome‐wide genetic analysis. In Evolutionary computation, machine learning and data mining in bioinformatics (pp. 166–175). Berlin, Heidelberg: Springer Berlin Heidelberg. Available at: Retrieved from‐3‐540‐71783‐6_16 [Accessed March 30, 2017].
  Moore, J. H., & Williams, S. M. (2005). Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more modern synthesis. Bioessays. 27, 637–646. doi: 10.1002/bies.20236.
  Moore, J. H. (2003). The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity, 56(1–3), 73–82. doi: 10.1159/000073735.
  Moore, J. H., Asselbergs, F. W., & Williams, S. M. (2010). Bioinformatics challenges for genome‐wide association studies. Bioinformatics (Oxford, England), 26(4), 445–455. doi: 10.1093/bioinformatics/btp713.
  Moore, J. H., Gilbert, J. C., Tsai, C. T., Chiang, F. T., Holden, T., Barney, N., & White, B. C. (2006). A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology, 241(2), 252–261. doi: 10.1016/j.jtbi.2005.11.036.
  Nelson, M. R., Kardia, S. L., Ferrell, R. E., & Sing, C. F. (2001). A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Research, 11(3), 458–470. doi: 10.1101/gr.172901.
  Nicodemus, K. K., Callicott, J. H., Higier, R. G., Luna, A., Nixon, D. C., Lipska, B. K., … Weinberger, D. R. (2010). Evidence of statistical epistasis between DISC1, CIT and NDEL1 impacting risk for schizophrenia: Biological validation with functional neuroimaging. Human Genetics, 127(4), 441–452. doi: 10.1007/s00439‐009‐0782‐y.
  Nicodemus, K. K., Kolachana, B. S., Vakkalanka, R., Straub, R. E., Giegling, I., Egan, M. F., … Weinberger, D. R. (2007). Evidence for statistical epistasis between catechol‐O‐methyltransferase (COMT) and polymorphisms in RGS4, G72 (DAOA), GRM3, and DISC1: Influence on risk of schizophrenia. Human Genetics, 120(6), 889–906. doi: 10.1007/s00439‐006‐0257‐3.
  Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2(12), e190. doi: 10.1371/journal.pgen.0020190.
  Pattin, K. A., White, B. C., Barney, N., Gui, J., Nelson, H. H., Kelsey, K. T., … Moore, J. H. (2009). A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genetic Epidemiology, 33(1), 87–94. doi: 10.1002/gepi.20360.
  Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227. doi: 10.1126/science.1213847.
  Pezawas, L., Meyer‐Lindenberg, A., Goldman, A. L., Verchinski, B. A., Chen, G., Kolachana, B. S., … Weinberger, D. R. (2008). Evidence of biologic epistasis between BDNF and SLC6A4 and implications for depression. Molecular Psychiatry, 13(7), 709–716. doi: 10.1038/mp.2008.32.
  Purcell, S., Neale, B., Todd‐Brown, K., Thomas, L., Ferreira, M. A., Bender, D., … Sham, P. C. (2007). PLINK: A tool set for whole‐genome association and population‐based linkage analyses. American Journal of Human Genetics, 81(3), 559–575. doi: 10.1086/519795.
  Risher, R. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 399–433.
  Ritchie, M. D., Hahn, L. W., Roodi, N., Bailey, L. R., Dupont, W. D., Parl, F. F., & Moore, J. H. (2001). Multifactor‐dimensionality reduction reveals high‐order interactions among estrogen‐metabolism genes in sporadic breast cancer. The American Journal of Human Genetics, 69(1), 138–147. doi: 10.1086/321276.
  Ritchie, M. D., Hahn, L. W., & Moore, J. H. (2003). Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genetic Epidemiology, 24(2), 150–157. doi: 10.1002/gepi.10218.
  Schwender, H., Li, Q., Neumann, C., Taub, M. A., Younkin, S. G., Berger, P., … Ruczinski, I. (2014). Detecting disease variants in case‐parent trio studies using the bioconductor software package trio. Genetic Epidemiology, 38(6), 516–522. doi: 10.1002/gepi.21836.
  Sham, P. C., & Purcell, S. M. (2014). Statistical power and significance testing in large‐scale genetic studies. Nature Publishing Group, 15, 335–346. doi: 10.1038/nrg3706.
  Sing, C. F., Stengård, J. H., & Kardia, S. L. R. (2003). Genes, environment, and cardiovascular disease. Arteriosclerosis, Thrombosis, and Vascular Biology, 23(7) doi: 10.1161/01.ATV.0000075081.51227.86.
  Stankov, K., Sabo, A., & Mikov, M. (2013). Pharmacogenetic biomarkers as tools for pharmacoepidemiology of severe adverse drug reactions. Drug Development Research, 74(1), 1–14. doi: 10.1002/ddr.21050.
  Tan, Y., Hu, Y., Liu, X., Yin, Z., Chen, X. W., & Liu, M. (2016). Improving drug safety: From adverse drug reaction knowledge discovery to clinical implementation. Methods (San Diego, Calif), 110, 14–25. doi: 10.1016/j.ymeth.2016.07.023.
  Thornton, T.A. (2015). Statistical methods for genome‐wide and sequencing association studies of complex traits in related samples. Current Protocols in Human Genetics. 84, 1.28.1‐1.28.9. doi: 10.1002/0471142905.hg0128s84.
  Thornton‐Wells, T. A., Moore, J. H., & Haines, J. L. (2004). Genetics, statistics and human disease: Analytical retooling for complexity. Trends in Genetics, 20(12), 640–647. doi: 10.1016/j.tig.2004.09.007.
  Turner, S., Armstrong, L. L., Bradford, Y., Carlson, C. S., Crawford, D. C., Crenshaw, A. T., … Ritchie, M. D. (2011). Quality control procedures for genome‐wide association studies. Current Protocols in Human Genetics. 68, 1.19.1–1.19.18. doi: 10.1002/0471142905.hg0119s68.
  Yaspan, B. L. & Veatch, O. J. (2011). Strategies for Pathway Analysis from GWAS Data. Current Protocols in Human Genetics. 71, 1.20.1–1.20.15. doi: 10.1002/0471142905.hg0120s71.
  Vrieze, S. I., Iacono, W. G., & McGue, M. (2012). Confluence of genes, environment, development, and behavior in a post Genome‐Wide Association Study world. Development and Psychopathology, 24(4), 1195–1214. doi: 10.1017/S0954579412000648.
  Wang, M. H., Sun, R., Guo, J., Weng, H., Lee, J., Hu, I., … Zee, B. C.‐Y. (2016). A fast and powerful W ‐test for pairwise epistasis testing. Nucleic Acids Research, 44(12), e115–e115. doi: 10.1093/nar/gkw347.
  Wang, Z., Sul, J. H., Snir, S., Lozano, J. A., & Eskin, E. (2015). Gene–gene interactions detection using a two‐stage model. Journal of Computational Biology, 22(6), 563–576. doi: 10.1089/cmb.2014.0163.
  Wei, W.‐H., Hemani, G., & Haley, C. S. (2014). Detecting epistasis in human complex traits. Nature Reviews Genetics, 15(11), 722–733. doi: 10.1038/nrg3747.
  Weiss, K. M. (1995). Genetic variation and human disease: Principles and evolutionary approaches. Cambridge, UK: Cambridge University Press.
  Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., … Parkinson, H. (2014). The NHGRI GWAS catalog, a curated resource of SNP‐trait associations. Nucleic Acids Research, 42(D1), 1001–1006. doi: 10.1093/nar/gkt1229.
  Wray, N., & Visscher, P. (2008). Estimating trait heritability | learn science at. Nature Education, 1(1), 29. Available at: Retrieved from‐trait‐heritability‐46889 [Accessed March 31, 2017].
  Yang, J., Lee, S. H., Goddard, M. E., & Visscher, P. M. (2011). GCTA: A tool for genome‐wide complex trait analysis. Available at: Retrieved from [Accessed March 31, 2017].
  Zhou, X., & Stephens, M. (2012). Genome‐wide efficient mixed‐model analysis for association studies. Nature Genetics, 44(7) doi: 10.1038/ng.2310.
PDF or HTML at Wiley Online Library