Some Phenotype Association Tools in Galaxy: Looking for Disease SNPs in a Full Genome

Belinda M. Giardine1, Cathy Riemer1, Richard Burhans1, Aakrosh Ratan1, Webb Miller1

1 Pennsylvania State University, University Park, Pennsylvania
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 15.2
DOI:  10.1002/0471250953.bi1502s39
Online Posting Date:  September, 2012
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

This unit focuses on some of the tools available on the public Galaxy server that are useful for exploring possible associations between human genetic variants and phenotypes. We trace step‐by‐step through an example illustrating several methods for examining a single full‐coverage genome to look for single‐nucleotide polymorphisms (SNPs) that are either known to be associated with disease or suspected to have impact for other reasons. It makes use of public genomic data, tools designed specifically for working with variants, and also some general tools for text manipulation and operations on genomic coordinates. Curr. Protoc. Bioinform. 39:15.2.1‐15.2.27. © 2012 by John Wiley & Sons, Inc.

Keywords: disease; SNP; genome variation; coding; non‐coding; gene‐based analysis; Web application

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using Galaxy to Look for Disease SNPs in a Full Genome: Preparing Input Data
  • Basic Protocol 2: Selecting Known Coding SNPs Predicted to be Damaging, then Finding Their Genes and Associated Pathways
  • Basic Protocol 3: Running New Predictions of Coding SNPs Likely to be Detrimental
  • Basic Protocol 4: Finding SNPs that Fall in Suspected Functional Regions
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   FigureFigure 15.2.1 Uploading a data file. See text for details.
  •   FigureFigure 15.2.2 Converting to pgSnp format. See text for details.
  •   FigureFigure 15.2.3 Putative SNP Phenotypes library. See text for details.
  •   FigureFigure 15.2.4 Removing SNPs found in healthy individuals. See text for details.
  •   FigureFigure 15.2.5 Completed input dataset. See text for details.
  •   FigureFigure 15.2.6 Details about the PolyPhen‐2 dataset. See text for details.
  •   FigureFigure 15.2.7 Joining on genomic intervals. See text for details.
  •   FigureFigure 15.2.8 Selecting damaging results. See text for details.
  •   FigureFigure 15.2.9 PolyPhen‐2 results. See text for details.
  •   FigureFigure 15.2.10 Mapping between identifiers. See text for details.
  •   FigureFigure 15.2.11 Choosing the identifier fields. See text for details.
  •   FigureFigure 15.2.12 Joining on identifiers. See text for details.
  •   FigureFigure 15.2.13 CTD. See text for details.
  •   FigureFigure 15.2.14 CTD results. See text for details.
  •   FigureFigure 15.2.15 Input for SIFT. See text for details.
  •   FigureFigure 15.2.16 Viewing the workflow. See text for details.
  •   FigureFigure 15.2.17 Running the workflow. See text for details.
  •   FigureFigure 15.2.18 SIFT. See text for details.
  •   FigureFigure 15.2.19 Selecting damaging SNPs. See text for details.
  •   FigureFigure 15.2.20 SIFT results. See text for details.
  •   FigureFigure 15.2.21 Intersecting with the PRPs. See text for details.
  •   FigureFigure 15.2.22 SNPs in PRPs. See text for details.
  •   FigureFigure 15.2.23 DNase hypersensitive sites (HSSs) from ENCODE. See text for details.
  •   FigureFigure 15.2.24 Intersecting with the HSSs. See text for details.
  •   FigureFigure 15.2.25 SNPs in HSSs. See text for details.
  •   FigureFigure 15.2.26 PhyloP. See text for details.
  •   FigureFigure 15.2.27 Distribution of phyloP scores. See text for details.
  •   FigureFigure 15.2.28 Histogram. See text for details.
  •   FigureFigure 15.2.29 Filtering the SNPs based on phyloP score. See text for details.
  •   FigureFigure 15.2.30 Highly conserved SNPs. See text for details.

Videos

Literature Cited

   Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P., Kondrashov, A.S., and Sunyaev, S.R. 2010. A method and server for predicting damaging missense mutations. Nat. Methods 7:248‐249.
   Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A web‐based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21.
   Davis, A.P., Murphy, C.G., Saraceni‐Richards, C.A., Rosenstein, M.C., Wiegers, T.C., and Mattingly, C.J. 2009. Comparative Toxicogenomics Database: A knowledgebase and discovery tool for chemical.gene.disease networks. Nucleic Acids Res. 37:D786‐D792.
   Drmanac, R., Sparks, A.B., Callow, J.M., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G, Dahl, F., Fernandez, A., Staker, B., Pant, K.P., Baccash, J., Borcherding, A.P., Brownley, A., Cedeno, R., Chen, L., Chernikoff, D., Cheung, A., Chirita, R., Curson, B., Ebert, J.C., Hacker, C.R., Hartlage, R., Hauser, B., Huang, S., Jiang, Y., Karpinchyk, V., Koenig, M., Kong, C., Landers, T., Le, C., Liu, J., McBride, C.E., Morenzoni, M., Morey, R.E., Mutch, K., Perazich, H., Perry, K., Peters, B.A., Peterson, J., Pethiyagoda, C.L., Pothuraju, K., Richter, C., Rosenbaum, A.M., Roy, S., Shafto, J., Sharanhovich, U., Shannon, K.W., Sheppy, C.G., Sun, M., Thakuria, J.V., Tran, A., Vu, D., Zaranek, A.W., Wu, X., Drmanac, S., Oliphant, A.R., Banyai, W.C., Martin, B., Ballinger, D.G., Church, G.M., and Reid, C.A. 2009. Human genome sequencing using unchained base reads on self‐assembling DNA nanoarrays. Science 327:78‐81.
   Ferretti, V., Poitras, C., Bergeron, D., Coulombe, B., Robert, F., and Blanchette, M. 2007. PReMod: A database of genome‐wide mammalian cis‐regulatory module predictions. Nucleic Acids Res. 35:D122‐D126.
   Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W.J., and Nekrutenko, A. 2005. Galaxy: A platform for interactive large‐scale genome analysis. Genome Res. 15:1451‐1455.
   Giardine, B., Riemer, C., Hefferon, T., Thomas, D., Hsu, F., Zielenski, J., Sang, Y., Elnitski, L., Cutting, G., Trumbower, H., Kern, A., Kuhn, R., Patrinos, G.P., Hughes, J., Higgs, D., Chui, D., Scriver, C., Phommarinh, M., Patnaik, S.K., Blumenfeld, O., Gottlieb, B., Vihinen, M., Väliaho, J., Kent, J., Miller, W., and Hardison, R.C. 2007. PhenCode: Connecting ENCODE data with mutations and phenotype. Hum. Mutat. 28:554‐562.
   Goecks, J., Nekrutenko, A., Taylor, J.; Galaxy Team. 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:R86.
   Huang, D.W., Sherman, B.T., and Lempicki, R.A. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4:44‐57.
   Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32:D493‐D496.
   Kumar, P., Henikoff, S., and Ng, P.C. 2009. Predicting the effects of coding non‐synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4:1073‐1081.
   Reimand, J., Kull, M., Peterson, H., Hansen, J., and Vilo, J. 2007. g:Profiler: A web‐based toolset for functional profiling of gene lists from large‐scale experiments. Nucleic Acids Res. 35:W193‐W200.
   Seal, R.L., Gordon, S.M., Lush, M.J., Wright, M.W., and Bruford, E.A. 2011. genenames.org: The HGNC resources in 2011. Nucleic Acids Res. 39:D514‐519.
   Siepel, A., Pollard, K.S., and Haussler, D. 2006. New methods for detecting lineage‐specific selection. In Proceedings of the 10th International Conference on Research in Computational Molecular Biology (RECOMB 2006), pp. 190‐205, Venice, Italy.
   Taylor, J., Tyekucheva, S., King, D.C., Hardison, R.C., Miller, W., and Chiaromonte, F. 2006. ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res. 16:1596‐1604.
Internet Resources
   http://galaxyproject.org
  The main public instance of Galaxy.
  http://phencode.bx.psu.edu
  A collection of human phenotype‐associated SNPs from Locus‐Specific Databases.
  http://www.bx.psu.edu/miller_lab/docs/galaxy_phen_assoc/tutorial/
  A version of this tutorial in HTML format.
  http://genome.ucsc.edu/FAQ/FAQformat.html
  Descriptions of file formats used by the UCSC Table Browser.
Supplementary File
  http://www.currentprotocols.com/protocol/bi1502
  This is an alternate URL to access the file “test.masterVar.gz” cited in , Necessary Resources, Files on page 15.2.3.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library