Strategies for Pathway Analysis from GWAS Data

Brian L. Yaspan1, Olivia J. Veatch1

1 Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 1.20
DOI:  10.1002/0471142905.hg0120s71
Online Posting Date:  October, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Genome‐wide association studies (GWAS) are a standard approach for investigating the relationship of common variation within the human genome to a given phenotype of interest. However, single‐allele association results published for many GWAS studies represent only the tip of the iceberg for the information that can be extracted from these datasets. The primary analysis strategy for GWAS entails association analysis in which only the single nucleotide polymorphisms (SNPs) with the strongest p values are declared statistically significant due to issues arising from multiple testing and type I error concerns. Factors such as locus heterogeneity, epistasis, and multiple genes conferring small effects contribute to the complexity of the genetic models underlying phenotype expression. Thus, many biologically meaningful associations having lower effect sizes at individual genes are overlooked, as they are difficult to separate from a sea of false‐positive associations. Organizing these individual SNPs into biologically meaningful groups to look at overall effects of minor perturbations to genes and pathways is desirable. This pathway‐based approach provides researchers with insight into the functional foundations of the phenotype being studied and allows testing of various genetic scenarios. Curr. Protoc. Hum. Genet. 71:1.20.1‐1.20.15 © 2011 by John Wiley & Sons, Inc.

Keywords: GWAS; genome‐wide association; pathway analysis; genetic epidemiology

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Key Concepts
  • Strategic Approach
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Anney, R.J., Kenny, E.M., O'Dushlaine, C., Yaspan, B.L., Parkhomenka, E., Buxbaum, J.D., Sutcliffe, J., Gill, M., Gallagher, L., Bailey, A.J., Fernandez, B.A., Szatmari, P., Scherer, S.W., Patterson, A., Marshall, C.R., Pinto, D., Vincent, J.B., Fombonne, E., Betancur, C., Delorme, R., Leboyer, M., Bourgeron, T., Mantoulan, C., Roge, B., Tauber, M., Freitag, C.M., Poustka, F., Duketis, E., Klauck, S.M., Poustka, A., Papanikolaou, K., Tsiantis, J., Gallagher, L., Gill, M., Anney, R., Bolshakova, N., Brennan, S., Hughes, G., McGrath, J., Merikangas, A., Ennis, S., Green, A., Casey, J.P., Conroy, J.M., Regan, R., Shah, N., Maestrini, E., Bacchelli, E., Minopoli, F., Stoppioni, V., Battaglia, A., Igliozzi, R., Parrini, B., Tancredi, R., Oliveira, G., Almeida, J., Duque, F., Vicente, A., Correia, C., Magalhaes, T.R., Gillberg, C., Nygren, G., Jonge, M.D., Van, E.H., Vorstman, J.A., Wittemeyer, K., Baird, G., Bolton, P.F., Rutter, M.L., Green, J., Lamb, J.A., Pickles, A., Parr, J.R., Couteur, A.L., Berney, T., McConachie, H., Wallace, S., Coutanche, M., Foley, S., White, K., Monaco, A.P., Holt, R., Farrar, P., Pagnamenta, A.T., Mirza, G.K., Ragoussis, J., Sousa, I., Sykes, N., Wing, K., Hallmayer, J., Cantor, R.M., Nelson, S.F., Geschwind, D.H., Abrahams, B.S., Volkmar, F., Pericak‐Vance, M.A., Cuccaro, M.L., Gilbert, J., Cook, E.H., Guter, S.J., Jacob, S., Nurnberger, J.I. Jr, McDougle, C.J., Posey, D.J., Lord, C., Corsello, C., Hus, V., Buxbaum, J.D., Kolevzon, A., Soorya, L., Parkhomenko, E., Leventhal, B.L., Dawson, G., Vieland, V.J., Hakonarson, H., Glessner, J.T., Kim, C., Wang, K., Schellenberg, G.D., Devlin, B., Klei, L., Minshew, N., Sutcliffe, J.S., Haines, J.L., Lund, S.C., Thomson, S., Yaspan, B.L., Coon, H., Miller, J., McMahon, W.M., Munson, J., Estes, A., and Wijsman, E.M. 2011. Gene‐ontology enrichment analysis in two independent family‐based samples highlights biologically plausible processes for autism spectrum disorders. Eur. J. Hum. Genet. In press. Epub ahead of print [PMID: 2152218].
   Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel‐Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., and Sherlock, G. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25:25‐29.
   Bader, G.D., Cary, M.P., and Sander, C. 2006. Pathguide: A pathway resource list. Nucleic Acids Res. 34:D504‐D506.
   Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths‐Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C., and Eddy, S.R. 2004. The Pfam protein families database. Nucleic Acids Res. 32:D138‐D141.
   Cerami, E.G., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., and Sander, C. 2011. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39:D685‐D690.
   Chen, L.S., Hutter, C.M., Potter, J.D., Liu, Y., Prentice, R.L., Peters, U., and Hsu, L. 2010. Insights into colon cancer etiology using a regularized approach to gene set analysis of GWAS data. Am. J. Hum. Genet. 86:860‐871.
   Costanzo, M.C., Park, J., Balakrishnan, R., Cherry, J.M., and Hong, E.L. 2011. Using computational predictions to improve literature‐based Gene Ontology annotations: A feasibility study. Database 2011:bar004.
   Dall'olio, G.M., Jassal, B., Montanucci, L., Gagneux, P., Bertranpetit, J., and Laayouni, H. 2011. The annotation of the asparagine N‐linked glycosylation pathway in the Reactome Database. Glycobiology. In press. Epub ahead of print [PMID: 21199820].
   Demir, E., Cary, M.P., Paley, S., Fukuda, K., Lemer, C., Vastrik, I., Wu, G., D'Eustachio, P., Schaefer, C., Luciano, J., Schacherer, F., Martinez‐Flores, I., Hu, Z., Jimenez‐Jacinto, V., Joshi‐Tope, G., Kandasamy, K., Lopez‐Fuentes, A.C., Mi, H., Pichler, E., Rodchenkov, I., Splendiani, A., Tkachev, S., Zucker, J., Gopinath, G., Rajasimha, H., Ramakrishnan, R., Shah, I., Syed, M., Anwar, N., Babur, O., Blinov, M., Brauner, E., Corwin, D., Donaldson, S., Gibbons, F., Goldberg, R., Hornbeck, P., Luna, A., Murray‐Rust, P., Neumann, E., Reubenacker, O., Samwald, M., van, I.M., Wimalaratne, S., Allen, K., Braun, B., Whirl‐Carrillo, M., Cheung, K.H., Dahlquist, K., Finney, A., Gillespie, M., Glass, E., Gong, L., Haw, R., Honig, M., Hubaut, O., Kane, D., Krupa, S., Kutmon, M., Leonard, J., Marks, D., Merberg, D., Petri, V., Pico, A., Ravenscroft, D., Ren, L., Shah, N., Sunshine, M., Tang, R., Whaley, R., Letovksy, S., Buetow, K.H., Rzhetsky, A., Schachter, V., Sobral, B.S., Dogrusoz, U., McWeeney, S., Aladjem, M., Birney, E., Collado‐Vides, J., Goto, S., Hucka, M., Le, N.N., Maltsev, N., Pandey, A., Thomas, P., Wingender, E., Karp, P.D., Sander, C., and Bader, G.D. 2010. The BioPAX community standard for pathway data sharing. Nat. Biotechnol. 28:935‐942.
   Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R., and Bateman, A. 2010. The Pfam protein families database. Nucleic Acids Res. 38:D211‐D222.
   Hermjakob, H., Montecchi‐Palazzi, L., Bader, G., Wojcik, J., Salwinski, L., Ceol, A., Moore, S., Orchard, S., Sarkans, U., von, M.C., Roechert, B., Poux, S., Jung, E., Mersch, H., Kersey, P., Lappe, M., Li, Y., Zeng, R., Rana, D., Nikolski, M., Husi, H., Brun, C., Shanker, K., Grant, S.G., Sander, C., Bork, P., Zhu, W., Pandey, A., Brazma, A., Jacq, B., Vidal, M., Sherman, D., Legrain, P., Cesareni, G., Xenarios, I., Eisenberg, D., Steipe, B., Hogue, C., and Apweiler, R. 2004. The HUPO PSI's molecular interaction format—a community standard for the representation of protein interaction data. Nat. Biotechnol. 22:177‐183.
   Holmans, P., Green, E.K., Pahwa, J.S., Ferreira, M.A., Purcell, S.M., Sklar, P. Wellcome Trust Case‐Control Consortium, Owen, M.J., O'Donovan, M.C., and Craddock, N. 2009. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am. J. Hum. Genet. 85:13‐24.
   Hong, M.G., Pawitan, Y., Magnusson, P.K., and Prince, J.A. 2009. Strategies and issues in the detection of pathway enrichment in genome‐wide association studies. Hum. Genet. 126:289‐301.
   Hucka, M., Finney, A., Sauro, H.M., Bolouri, H., Doyle, J.C., Kitano, H., Arkin, A.P., Bornstein, B.J., Bray, D., Cornish‐Bowden, A., Cuellar, A.A., Dronov, S., Gilles, E.D., Ginkel, M., Gor, V., Goryanin, I.I., Hedley, W.J., Hodgman, T.C., Hofmeyr, J.H., Hunter, P.J., Juty, N.S., Kasberger, J.L., Kremling, A., Kummer, U., Le, N.N., Loew, L.M., Lucio, D., Mendes, P., Minch, E., Mjolsness, E.D., Nakayama, Y., Nelson, M.R., Nielsen, P.F., Sakurada, T., Schaff, J.C., Shapiro, B.E., Shimizu, T.S., Spence, H.D., Stelling, J., Takahashi, K., Tomita, M., Wagner, J., and Wang, J. 2003. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics 19:524‐531.
   Joshi‐Tope, G., Gillespie, M., Vastrik, I., D'Eustachio, P., Schmidt, E., de, B.B., Jassal, B., Gopinath, G.R., Wu, G.R., Matthews, L., Lewis, S., Birney, E., and Stein, L. 2005. Reactome: A knowledgebase of biological pathways. Nucleic Acids Res. 33:D428‐D432.
   Kandasamy, K., Mohan, S.S., Raju, R., Keerthikumar, S., Kumar, G.S., Venugopal, A.K., Telikicherla, D., Navarro, J.D., Mathivanan, S., Pecquet, C., Gollapudi, S.K., Tattikota, S.G., Mohan, S., Padhukasahasram, H., Subbannayya, Y., Goel, R., Jacob, H.K., Zhong, J., Sekhar, R., Nanjappa, V., Balakrishnan, L., Subbaiah, R., Ramachandra, Y.L., Rahiman, B.A., Prasad, T.S., Lin, J.X., Houtman, J.C., Desiderio, S., Renauld, J.C., Constantinescu, S.N., Ohara, O., Hirano, T., Kubo, M., Singh, S., Khatri, P., Draghici, S., Bader, G.D., Sander, C., Leonard, W.J., and Pandey, A. 2010. NetPath: A public resource of curated signal transduction pathways. Genome Biol. 11:R3.
   Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28:27‐30.
   Kanehisa, M., Goto, S., Hattori, M., Aoki‐Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M. 2006. From genomics to chemical genomics: New developments in KEGG. Nucleic Acids Res. 34:D354‐D357.
   Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M. 2010. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38:D355‐D360.
   Medina, I., Montaner, D., Bonifaci, N., Pujana, M.A., Carbonell, J., Tarraga, J., Al‐Shahrour, F. and Dopaso, J. 2009. Gene set‐based analysis of polymorphisms: Finding pathways or biological processes associated to traits in genome‐wide association studies. Nucleic Acids Res. 37:W340‐W344.
   O'Dushlaine, C., Kenny, E., Heron, E.A., Segurado, R., Gill, M., Morris, D.W., and Corvin, A. 2009. The SNP ratio test: Pathway analysis of genome‐wide association datasets. Bioinformatics 25:2762‐2763.
   Ooi, H.S., Schneider, G., Lim, T.T., Chan, Y.L., Eisenhaber, B., and Eisenhaber, F. 2010. Biomolecular pathway databases. Methods Mol. Biol. 609:129‐144.
   Purcell, S., Neale, B., Todd‐Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., and Sham, P.C. 2007. PLINK: A tool set for whole‐genome association and population‐based linkage analyses. Am. J. Hum. Genet. 81:559‐575.
   Wang, K., Li, M., and Hakonarson, H. 2010. Analysing biological pathways in genome‐wide association studies. Nat. Rev. Genet. 11:843‐854.
   Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M., and Eisenberg, D. 2000. DIP: The database of interacting proteins. Nucleic Acids Res. 28:289‐291.
   Yang, W., de las Fuentes, L., Davila‐Roman, V.G., and Gu, C.C. 2011. Variable set enrichment analysis in genome‐wide association studies. Eur. J. Hum. Genet. 19:893‐900.
   Yaspan, B.L., Bush, W.S., Torstenson, E.S., Ma, D., Pericak‐Vance, M.A., Ritchie, M.D., Sutcliffe, J.S., and Haines, J.L. 2011. Genetic analysis of biological pathway data through genomic randomization. Hum. Genet. 129:563‐571.
PDF or HTML at Wiley Online Library