Analysis of Gene‐Gene Interactions

Diane Gilbert‐Diamond1, Jason H. Moore2

1 Computational Genetics Laboratory, Departments of Genetics and Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, 2 Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, New Hampshire
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 1.14
DOI:  10.1002/0471142905.hg0114s70
Online Posting Date:  July, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The goal of this unit is to introduce gene‐gene interactions (epistasis) as a significant complicating factor in the search for disease susceptibility genes. This unit begins with an overview of gene‐gene interactions and why they are likely to be common. Then, it reviews several statistical and computational methods for detecting and characterizing genes with effects that are dependent on other genes. The focus of this unit is genetic association studies of discrete and quantitative traits because most of the methods for detecting gene‐gene interactions have been developed specifically for these study designs. Curr. Protoc. Hum. Genet. 70:1.14.1‐1.14.12 © 2011 by John Wiley & Sons, Inc.

Keywords: epistasis; genetics; statistics; bioinformatics

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • What are Gene‐Gene Interactions?
  • Why are Gene‐Gene Interactions Likely to be Common?
  • Why are Gene‐Gene Interactions Difficult to Detect?
  • Methods for Detecting Gene‐Gene Interactions in Association Studies of Discrete Traits
  • Methods for Detecting Gene‐Gene Interactions in Association Studies of Quantitative Traits
  • Detecting Gene‐Gene Interactions on a Genome‐Wide Scale
  • Summary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Aiken, L.S. and West, S.G. 1991. Multiple Regression: Testing and Interpreting Interactions. Sage Publications, Thousand Oaks, Calif.
   Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., and Donnelly, P. 2005. International HapMap Consortium. A haplotype map of the human genome. Nature 437:1299‐1320.
   Andrew, A.S., Nelson, H.H., Kelsey, K.T., Moore, J.H., Meng, A.C., Casella, D.P., Tosteson, T.D., Schned, A.R., and Karagas, M.R. 2006. Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking, and bladder cancer susceptibility. Carcinogenesis 27:1030‐1037.
   Andrew, A.S., Karagas, M.R., Nelson, H.H., Guarrera, S., Polidoro, S., Gamberini, S., Sacerdote, C., Moore, J.H., Kelsey, K.T., Demidenko, E., Vineis, P., and Matullo, G. 2008. DNA repair polymorphisms modify bladder cancer risk: A multifactor analytic strategy. Hum Hered. 65:105‐118.
   Askland, K., Read, C., and Moore, J. 2009. Pathways‐based analyses of whole‐genome association study data in bipolar disorder reveal genes mediating ion channel activity and synaptic neurotransmission . Hum. Genet. 125:63‐79.
   Asselbergs, F.W., Williams, S.M., Hebert, P.R., Coffey, C.S., Hillege, H.L., Navis, G., Vaughan, D.E., van Gilst, W.H., and Moore, J.H. 2007. Epistatic effects of polymorphisms in genes from the renin‐angiotensin, bradykinin, and fibrinolytic systems on plasma t‐PA and PAI‐1 levels. Genomics 89:362‐369.
   Bateson, W. 1909. Mendel's Principles of Heredity. Cambridge University Press, Cambridge.
   Bellman, R. 1961. Adaptive Control Processes. Princeton University Press, Princeton, N.J.
   Bush, W.S., Edwards, T.L., Dudek, S.M., McKinney, B.A., and Ritchie, M.D. 2008. Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction. BMC Bioinform. 9:238‐255.
   Bush, W.S., Dudek, S.M., and Ritchie, M.D. 2009. Biofilter: A knowledge‐integration system for the multi‐locus analysis of genome‐wide association studies. Pac. Symp. Biocomput. 2009:368‐379.
   Calle, M.L., Urrea, V., Vellalta, G., Malats, N., and Steen, K.V. 2008. Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Stat. Med. 27:6532‐6546.
   Calle, M.L., Urrea, V., Malats, N., and Van Steen, K. 2010. mbmdr: An R package for exploring gene‐gene interactions associated with binary or quantitative traits. Bioinformatics 26:2198‐2199.
   Cattaert, T., Urrea, V., Naj, A.C., De Lobel, L., De Wit, V., Fu, M., Mahachie John, J.M., Shen, H., Calle, M.L., Ritchie, M.D., Edwards, T.L., and Van Steen, K. 2010. FAM‐MDR: A flexible family‐based multifactor dimensionality reduction technique to detect epistasis using related individuals. PLoS ONE 5:e10304.
   Chen, M., Kamat, A.M., Huang, M., Grossman, H.B., Dinney, C.P., Lerner, S.P., Wu, X., and Gu, J. 2007. High‐order interactions among genetic polymorphisms in nucleotide excision repair pathway genes and smoking in modulating bladder cancer risk. Carcinogenesis 28:2160‐2165.
   Cheverud, J.M. and Routman, E.J. 1995. Epistasis and its contribution to genetic variance components. Genetics 139:1455‐1461.
   Chung, Y., Lee, S.Y., Elston, R.C., and Park, T. 2007. Odds ratio based multifactor‐dimensionality reduction method for detecting gene‐gene interactions. Bioinformatics 23:71‐76.
   Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Mahwah, N.J.
   Concato, J., Feinstein, A.R., and Holford, T.R. 1993. The risk of determining risk with multivariable models. Ann. Intern. Med. 118:201‐210.
   Cordell, H.J. 2002. Epistasis: What it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11:2463‐2468.
   Cordell, H.J. 2009. Genome‐wide association studies: Detecting gene‐gene interactions that underlie human diseases. Nat. Rev. Genet. 10:392‐404.
   Cowper‐Sal Lari, R., Cole, M.D., Karagas, M.R., Lupien, M., and Moore, J.H. 2010. Layers of epistasis: Genome‐wide regulatory networks and network approaches to genome‐wide association studies. Wiley Interdiscip. Rev. Syst. Biol. Med. 3.
   Culverhouse, R. 2007. The use of the restricted partition method with case‐control data. Hum. Hered. 63:93‐100.
   Culverhouse, R., Suarez, B.K., Lin, J., and Reich, T. 2002. A perspective on epistasis: Limits of models displaying no main effect. Am. J. Hum. Genet. 70:461‐471.
   Culverhouse, R., Klein, T., and Shannon, W. 2004. Detecting epistatic interactions contributing to quantitative traits. Genet. Epidemiol. 27:141‐152.
   Edwards, T.L., Turner, S.D., Torstenson, E.S., Dudek, S.M., Martin, E.R., and Ritchie, M.D. 2010. A general framework for formal tests of interaction after exhaustive search methods with applications to MDR and MDR‐PDT. PLoS ONE 5:e9363.
   Eichler, E.E., Flint, J., Gibson, G., Kong, A., Leal, S.M., Moore, J.H., and Nadeau, J.H. 2010. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11:446‐450.
   Emily, M., Mailund, T., Hein, J., Schauser, L., and Schierup, M.H. 2009. Using biological networks to search for interacting loci in genome‐wide association studies. Eur. J. Hum. Genet. 17:1231‐1240.
   Fisher, R.A. 1918. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52:399‐433.
   Freitas, A.A. 2001. Understanding the crucial role of attribute interaction in data mining. Artif. Intel. Rev. 16:177‐199.
   Freitas, A.A. 2002. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, New York.
   Garcia‐Closas, M. and Lubin, J.H. 1999. Power and sample size calculations in case‐control studies of gene‐environmental interactions: Comments on different approaches. Am. J. Epidemiol. 149:689‐693.
   Gauderman, W.J. 2002. Sample size requirements for association studies of gene‐gene interaction. Am. J. Epidemiol. 155:478‐484.
   Goldberg, D.E. 2002. The Design of Innovation. Kluwer, Boston.
   Greene, C.S., Penrod, N.M, Kiralis, J., and Moore, J.H. 2009. Spatially uniform reliefF (SURF) for computationally‐efficient filtering of gene‐gene interactions. BioData Mining 2:5‐14.
   Greene, C.S., Himmelstein, D.S., Nelson, H.H., Kelsey, K.T., Williams, S.M., Andrew, A.S., Karagas, M.R., and Moore, J.H. 2010a. Enabling personal genomics with an explicit test of epistasis. Pac. Symp. Biocomput. 2010:327‐336.
   Greene, C.S., Sinnott‐Armstrong, N.A., Himmelstein, D.S., Park, P.J., Moore, J.H., and Harris, B.T. 2010b. Multifactor dimensionality reduction for graphics processing units enables genome‐wide testing of epistasis in sporadic ALS. Bioinformatics 26:694‐695.
   Greene, C.S., Himmelstein, D.S., Kiralis, J., and Moore, J.H. 2010c. The informative extremes: Using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. EvoBIO 2010, LNCS. 6023:182‐193.
   Griffiths, A.J.F., Wessler, S.R., Lewontin, R.C., and Carroll, S.B. 2008. Gene interaction. In Introduction to Genetic Analysis, 9th ed, p. 243. W.H. Freeman & Co., New York.
   Gui, J., Andrew, A.S., Andrews, P., Nelson, H.H., Kelsey, K.R., Karagas, M.R., and Moore, J.H. 2011. A robust multifactor dimensionality reduction method for detecting gene‐gene interactions with application to the genetic analysis of bladder cancer susceptibility. Ann. Hum. Genet. 75:20‐28.
   Hahn, L.W. and Moore, J.H. 2004. Ideal discrimination of discrete clinical endpoints using multilocus genotypes. In Silico Biology 4:183‐194.
   Hahn, L.W., Ritchie, M.D., and Moore, J.H. 2003. Multifactor dimensionality reduction software for detecting gene‐gene and gene‐environment interactions. Bioinformatics 19:376‐382.
   Hamon, S.C., Stengard, J.H., Clark, A.G., Salomaa, V., Boerwinkle, E., and Sing, C.F. 2004. Evidence for nonadditive influence of single nucleotide polymorphisms within the apolipoprotein E gene. Ann. Hum. Genet. 68:521‐535.
   Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Springer, New York.
   Herold, C., Steffens, M., Brockschmidt, F.F., Baur, M.P., and Becker, T. 2009. INTERSNP: Genome‐wide interaction analysis guided by a priori information. Bioinformatics 25:3275‐3281.
   Hirschhorn, J.N. and Daly, M.J. 2005. Genome‐wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6:95‐108.
   Hirschhorn, J.N., Lohmueller, K., Byrne, E., and Hirschhorn, K. 2002. A comprehensive review of genetic association studies. Genet. Med. 4:45‐61.
   Hoh, J. and Ott, J. 2001. A train of thoughts on gene mapping. Theor. Popul. Biol. 60:149‐153.
   Hoh, J., Wille, A., Zee, R., Cheng, S., Reynolds, R., Lindpaintner, K., and Ott, J. 2000. Selecting SNPs in two‐stage analysis of disease association data: A model‐free approach. Ann. Hum. Genet. 64:413‐417.
   Hollander, W.F. 1955. Epistasis and hypostasis. J. Hered. 46:222‐225.
   Hosmer, D.W. and Lemeshow, S. 2000. Applied Logistic Regression. John Wiley & Sons, New York.
   Hua, X., Zhang, H., Zhang, H., Yang, Y., and Kuk, A.Y.C. 2010. Testing multiple gene interactions by the ordered combinatorial partitioning method in case‐control studies. Bioinformatics 26:1871‐1878.
   Huang, M., Dinney, C.P., Lin, X., Lin, J., Grossman, H.B., and Wu, X. 2007. High‐order interactions among genetic variants in DNA base excision repair pathway genes and smoking in bladder cancer susceptibility. Cancer Epidemiol. Biomarkers Prev. 16:84‐91.
   Kira, K. and Rendell, L. 1992. The feature selection problem: Traditional methods and new algorithm. In Proc AAAI'92, pp. 129‐134. The MIT Press.
   Kleinbaum, D.G. and Klein, M. 2002. Logistic Regression: A Self‐Learning Text. Springer‐Verlag, New York.
   Kooperberg, C., Ruczinski, I., LeBlanc, M.L., and Hsu, L. 2001. Sequence analysis using logic regression. Genet. Epidemiol. 21:S626‐S631.
   Lee, S.Y., Chung, Y., Elston, R.C., Kim, Y., and Park, T. 2007. Log‐linear model‐based multifactor dimensionality reduction method to detect gene‐gene interactions. Bioinformatics 23:2589‐90255.
   Li, W. and Reich, J. 2000. A complete enumeration and classification of two‐locus disease models. Hum. Hered. 50:334‐349.
   Lou, X.Y., Chen, G.B., Yan, L., Ma, J.Z., Zhu, J., Elston, R.C., and Li, M.D. 2007. A generalized combinatorial approach for detecting gene‐by‐gene and gene‐by‐environment interactions with application to nicotine dependence. Am. J. Hum. Genet. 80:1125‐1137.
   Lubin, J.H. and Gails, M.H. 1990. On power and sample size for studying features of the relative odds of disease. Am. J. Epidemiol. 131:552‐566.
   Mahachie, J.M., Baurecht, H., Rodríguez, E., Naumann, A., Wagenpfeil, S., Klopp, N., Mempel, M., Novak, N., Bieber, T., Wichmann, H.E., Ring, J., Illig, T., Cattaert, T., Van Steen, K., and Weidinger, S. 2010. Analysis of the high affinity IgE receptor genes reveals epistatic effects of FCER1A variants on eczema risk. Allergy 65:875‐882.
   Marchini, J., Donnelly, P., and Cardon, L.R. 2005. Genome‐wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37:413‐417.
   Mardis, E. 2011. A decade's perspective on DNA sequencing technology. Nature 470:198‐203
   Martin, E.R., Hahn, L.W., Bass, M., Ritchie, M.D., and Moore, J.H. 2006. A combined multifactor dimensionality reduction and pedigree disequilibrium test (MDR‐PDT) approach for detecting gene‐gene interactions in pedigrees. Genet. Epidemiol. 30:111‐123.
   Mei, H., Cuccaro, M.L., and Martin, E.R. 2007. Multifactor dimensionality reduction‐phenomics: A novel method to capture genetic heterogeneity with use of phenotypic variables. Am. J. Hum. Genet. 81:1251‐1261.
   Michalski, R.S. 1983. A theory and methodology of inductive learning. Artif. Intell. 20:111‐161.
   Millstein, J., Conti, D.V., Gilliland, F.D., and Gauderman, W.J. 2006. A testing framework for identifying susceptibility genes in the presence of epistasis. Am. J. Hum. Genet. 78:15‐27.
   Moore, J.H. 2003. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56:73‐83.
   Moore, J.H. 2004. Computational analysis of gene‐gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev. Mol. Diag. 4:795‐803.
   Moore, J.H. 2005. A global view of epistasis. Nat. Genet. 37:13‐4.
   Moore, J.H. 2007. Genome‐wide analysis of epistasis using multifactor dimensionality reduction: Feature selection and construction in the domain of human genetics. In Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data (X. Zhu and I. Davidson, eds.) pp. 17‐30. IGI Global, Hershey, PA.
   Moore, J.H. and Ritchie, M.D. 2004. The challenges of whole‐genome approaches to common diseases. JAMA 291:1642‐1643.
   Moore, J.H. and White, B.C. 2006. Exploiting expert knowledge in genetic programming for genome‐wide genetic analysis. Lect. Notes Comput. Sci. 4193:696‐977.
   Moore, J.H. and White, B.C. 2007a. Tuning ReliefF for genome‐wide genetic analysis. Lect. Notes Comput. Sci. 4447:166‐175.
   Moore, J.H. and White, B.C. 2007b. Genome‐wide genetic analysis using genetic programming. The critical need for expert knowledge. In Genetic Programming Theory and Practice IV (R. Riolo, T. Soule, and B. Worzel, eds.) pp. 11‐28. Springer, New York.
   Moore, J.H. and Williams, S.M. 2002. New strategies for identifying gene‐gene interactions in hypertension. Ann. Med. 34:88‐95.
   Moore, J.H. and Williams, S.M. 2005. Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more modern synthesis. Bioessays 27:637‐46.
   Moore, J.H., Lamb, J.M., Brown, N.J., and Vaughan, D.E. 2002a. A comparison of combinatorial partitioning and linear regression for the detection of epistatic effects of the ACE I/D and PAI‐1 4G/5G polymorphisms on plasma PAI‐1 levels. Clin. Genet. 62:74‐79.
   Moore, J.H., Smolkin, M.E., Lamb, J.M., Brown, N.J., and Vaughan, D.E. 2002b. The relationship between plasma t‐PA and PAI‐1 levels is dependent on epistatic effects of the ACE I/D and PAI‐1 4G/5G polymorphisms. Clin. Genet. 62:53‐59.
   Moore, J.H., Gilbert, J.C., Tsai, C.‐T., Chiang, F.T., Holden, W., Barney, N., and White, B.C. 2006. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241:252‐261.
   Moore, J.H., Asselbergs, F.W., and Williams, S.M. 2010. Bioinformatics challenges for genome‐wide association studies. Bioinformatics 26:445‐455.
   Namkung, J., Kim, K., Yi, S., Chung, W., Kwon, M.S., and Park, T. 2009a. New evaluation measures for multifactor dimensionality reduction classifiers in gene‐gene interaction analysis. Bioinformatics 25:338‐345.
   Namkung, J., Elston, R.C., Yang, J.M., and Park, T. 2009b. Identification of gene‐gene interactions in the presence of missing data using the multifactor dimensionality reduction method. Genet. Epidemiol. 33:646‐656.
   Neel, J.V. and Schull, W.J. 1954. Human Heredity. University of Chicago Press, Chicago.
   Nelson, M.R., Kardia, S.L., Ferrell, R.E., and Sing, C.F. 2001. A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res. 11:458‐470.
   Neter, J., Wasserman, W., and Kutner, M.H. 1990. Applied Linear Statistical Models. Irwin, Chicago.
   Park, M.Y. and Hastie, T. 2008. Penalized logistic regression for detecting gene interactions. Biostatistics 9:30‐50.
   Pattin, K.A. and Moore, J.H. 2008. Exploiting the proteome to improve the genome‐wide genetic analysis of epistasis in common human diseases. Hum. Genet. 124:19‐29.
   Pattin, K.A., White, B.C., Barney, N., Gui, J., Nelson, H.H., Kelsey, K.T., Andrew, A.S., Karagas, M.R., and Moore, J.H. 2009. A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genet. Epidemiol. 33:87‐94.
   Peduzzi, P., Concato, J., Kemper, E., Holford, T.R., and Feinstein, A.R. 1996. A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 49:1373‐1379.
   Phillips, P.C. 1998. The language of gene interaction. Genetics 149:1167‐1171.
   Phillips, P.C. 2008. Epistasis—The essential role of gene interactions in the structure and evolution of genetic systems. Nat. Rev. Genet. 9:855‐867.
   Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., and Moore, J.H. 2001. Multifactor dimensionality reduction reveals high‐order interactions among estrogen metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69:138‐147.
   Ritchie, M.D., Hahn, L.W., and Moore, J.H. 2003. Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet. Epidemiol. 24:150‐157.
   Shull, G.H. 1914. Duplicate genes for capsule form in Bursa bursa pastoris. J. Ind. Abst. Vererb. 12:97‐149.
   Sing, C.F., Stengård, J.H., and Kardia, S.L. 2003. Genes, environment, and cardiovascular disease. Arterioscler. Thromb. Vasc. Biol. 23:1190‐1196.
   Stone, M. 1978. Cross‐validation: A review. Math. Operationsforsch. Statist. Ser. Statistics 9:127‐129.
   Templeton, A.R. 2000. Epistasis and complex traits. In Epistasis and the Evolutionary Process. (J. Wolf, B. Brodie III, and M. Wade, eds.) Oxford University Press, New York.
   Thornton‐Wells, T.A., Moore, J.H., and Haines, J.L. 2004. Genetics, statistics and human disease: Analytical retooling for complexity. Trends Genet. 20:640‐647.
   Velez, D.R., White, B.C., Motsinger, A.A., Bush, W.S., Ritchie, M.D., Williams, S.M., and Moore, J.H. 2007. A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31:306‐315.
   Wade, M.J., Winther, R.G., Agrawal, A.F., and Goodnight, C.J. 2001. Alternative definitions of epistasis: Dependence and interaction. Trends Ecol. Evol. 16:498‐504.
   Wahlsten, D. 1990. Insensitivity of the analysis of variance to heredity‐environment interaction. Behav. Brain Sci. 13:109‐120.
   Wang, W.Y., Barratt, B.J., Clayton, D.G., and Todd, T.A. 2005. Genome‐wide association studies: Theoretical and practical concerns. Nat. Rev. Genet. 6:109‐118.
   Weiss, K.M. 1993. Genetic Variation and Human Disease. Cambridge University Press, Cambridge.
   Wilke, R.A., Reif, D.M., and Moore, J.H. 2005. Combinatorial pharmacogenetics. Nat. Rev. Drug Discov. 4:911‐918.
   Winham, S., Wang, C., and Motsinger‐Reif, A.A. 2011. A comparison of multifactor dimensionality reduction and L1‐penalized regression to identify gene‐gene interactions in genetic association studies. Stat. Appl. Genet. Mol. Biol. 10:Article 4.
   Zee, R.Y., Hoh, J., Cheng, S., Reynolds, R., Grow, M.A., Silbergleit, A., Walker, K., Steiner, L., Zangenberg, G., Fernandez‐Ortiz, A., Macaya, C., Pintor, E., Fernandez‐Cruz, A., Ott, J., and Lindpainter, K. 2002. Multilocus interactions predict risk for post‐PTCA restenosis: An approach to the genetic analysis of common complex disease. Pharmacogenomics J. 2:197‐201.
Internet Resources
  The Power program for estimation of sample size and power for two‐locus interactions in both cohort and case‐control studies.
  The focused interaction testing framework (FITF) for detecting gene‐gene interactions using logistic regression.
  The Quanto program for estimation of sample size and power in matched case‐control, case‐sibling, case‐parent, and case‐only designs.
  Computational Genetics Laboratory at Dartmouth Medical School, where open‐source software for MDR can be obtained.
PDF or HTML at Wiley Online Library