Some Phenotype Association Tools in Galaxy: Looking for Disease SNPs in a Full Genome

Belinda M. Giardine1, Cathy Riemer1, Richard Burhans1, Aakrosh Ratan1, Webb Miller1

1 Pennsylvania State University, University Park, Pennsylvania
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 15.2
DOI:  10.1002/0471250953.bi1502s39
Online Posting Date:  September, 2012
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


This unit focuses on some of the tools available on the public Galaxy server that are useful for exploring possible associations between human genetic variants and phenotypes. We trace step‐by‐step through an example illustrating several methods for examining a single full‐coverage genome to look for single‐nucleotide polymorphisms (SNPs) that are either known to be associated with disease or suspected to have impact for other reasons. It makes use of public genomic data, tools designed specifically for working with variants, and also some general tools for text manipulation and operations on genomic coordinates. Curr. Protoc. Bioinform. 39:15.2.1‐15.2.27. © 2012 by John Wiley & Sons, Inc.

Keywords: disease; SNP; genome variation; coding; non‐coding; gene‐based analysis; Web application

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using Galaxy to Look for Disease SNPs in a Full Genome: Preparing Input Data
  • Basic Protocol 2: Selecting Known Coding SNPs Predicted to be Damaging, then Finding Their Genes and Associated Pathways
  • Basic Protocol 3: Running New Predictions of Coding SNPs Likely to be Detrimental
  • Basic Protocol 4: Finding SNPs that Fall in Suspected Functional Regions
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P., Kondrashov, A.S., and Sunyaev, S.R. 2010. A method and server for predicting damaging missense mutations. Nat. Methods 7:248‐249.
   Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A web‐based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21.
   Davis, A.P., Murphy, C.G., Saraceni‐Richards, C.A., Rosenstein, M.C., Wiegers, T.C., and Mattingly, C.J. 2009. Comparative Toxicogenomics Database: A knowledgebase and discovery tool for chemical.gene.disease networks. Nucleic Acids Res. 37:D786‐D792.
   Drmanac, R., Sparks, A.B., Callow, J.M., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G, Dahl, F., Fernandez, A., Staker, B., Pant, K.P., Baccash, J., Borcherding, A.P., Brownley, A., Cedeno, R., Chen, L., Chernikoff, D., Cheung, A., Chirita, R., Curson, B., Ebert, J.C., Hacker, C.R., Hartlage, R., Hauser, B., Huang, S., Jiang, Y., Karpinchyk, V., Koenig, M., Kong, C., Landers, T., Le, C., Liu, J., McBride, C.E., Morenzoni, M., Morey, R.E., Mutch, K., Perazich, H., Perry, K., Peters, B.A., Peterson, J., Pethiyagoda, C.L., Pothuraju, K., Richter, C., Rosenbaum, A.M., Roy, S., Shafto, J., Sharanhovich, U., Shannon, K.W., Sheppy, C.G., Sun, M., Thakuria, J.V., Tran, A., Vu, D., Zaranek, A.W., Wu, X., Drmanac, S., Oliphant, A.R., Banyai, W.C., Martin, B., Ballinger, D.G., Church, G.M., and Reid, C.A. 2009. Human genome sequencing using unchained base reads on self‐assembling DNA nanoarrays. Science 327:78‐81.
   Ferretti, V., Poitras, C., Bergeron, D., Coulombe, B., Robert, F., and Blanchette, M. 2007. PReMod: A database of genome‐wide mammalian cis‐regulatory module predictions. Nucleic Acids Res. 35:D122‐D126.
   Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W.J., and Nekrutenko, A. 2005. Galaxy: A platform for interactive large‐scale genome analysis. Genome Res. 15:1451‐1455.
   Giardine, B., Riemer, C., Hefferon, T., Thomas, D., Hsu, F., Zielenski, J., Sang, Y., Elnitski, L., Cutting, G., Trumbower, H., Kern, A., Kuhn, R., Patrinos, G.P., Hughes, J., Higgs, D., Chui, D., Scriver, C., Phommarinh, M., Patnaik, S.K., Blumenfeld, O., Gottlieb, B., Vihinen, M., Väliaho, J., Kent, J., Miller, W., and Hardison, R.C. 2007. PhenCode: Connecting ENCODE data with mutations and phenotype. Hum. Mutat. 28:554‐562.
   Goecks, J., Nekrutenko, A., Taylor, J.; Galaxy Team. 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:R86.
   Huang, D.W., Sherman, B.T., and Lempicki, R.A. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4:44‐57.
   Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32:D493‐D496.
   Kumar, P., Henikoff, S., and Ng, P.C. 2009. Predicting the effects of coding non‐synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4:1073‐1081.
   Reimand, J., Kull, M., Peterson, H., Hansen, J., and Vilo, J. 2007. g:Profiler: A web‐based toolset for functional profiling of gene lists from large‐scale experiments. Nucleic Acids Res. 35:W193‐W200.
   Seal, R.L., Gordon, S.M., Lush, M.J., Wright, M.W., and Bruford, E.A. 2011. The HGNC resources in 2011. Nucleic Acids Res. 39:D514‐519.
   Siepel, A., Pollard, K.S., and Haussler, D. 2006. New methods for detecting lineage‐specific selection. In Proceedings of the 10th International Conference on Research in Computational Molecular Biology (RECOMB 2006), pp. 190‐205, Venice, Italy.
   Taylor, J., Tyekucheva, S., King, D.C., Hardison, R.C., Miller, W., and Chiaromonte, F. 2006. ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res. 16:1596‐1604.
Internet Resources
  The main public instance of Galaxy.
  A collection of human phenotype‐associated SNPs from Locus‐Specific Databases.
  A version of this tutorial in HTML format.
  Descriptions of file formats used by the UCSC Table Browser.
Supplementary File
  This is an alternate URL to access the file “test.masterVar.gz” cited in , Necessary Resources, Files on page 15.2.3.
PDF or HTML at Wiley Online Library