Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists

Daniel Blankenberg1, Gregory Von Kuster1, Nathaniel Coraor1, Guruprasad Ananda1, Ross Lazarus1, Mary Mangan2, Anton Nekrutenko1, James Taylor1

1 The Galaxy Team, Pennsylvania State University, University Park, Pennsylvania, 2 OpenHelix LLC, Bellevue, Washington
Publication Name:  Current Protocols in Molecular Biology
Unit Number:  Unit 19.10
DOI:  10.1002/0471142727.mb1910s89
Online Posting Date:  January, 2010
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

High‐throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high‐throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. Either way, it allows experimentalists without informatics or programming expertise to perform complex large‐scale analysis with just a Web browser. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21. © 2010 by John Wiley & Sons, Inc.

Keywords: Galaxy; analysis; bioinformatics; workflow; algorithm; pipeline; genomics; SNPs

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: An Introduction to the Galaxy Approach: Finding Promoters Containing TAF1 Binding Sites Identified From a ChIP‐Seq Experiment
  • Basic Protocol 2: Combining and Filtering Genome Annotations: Finding Exons with the Highest Number of Nucleotide Polymorphisms
  • Support Protocol 1: Saving Results in Galaxy and Sharing Data with Others
  • Basic Protocol 3: Generating a Workflow From a History in Galaxy
  • Support Protocol 2: Modify a Parameter in the Workflow in Galaxy
  • Support Protocol 3: Running Workflows with Galaxy
  • Support Protocol 4: Sharing Workflows with Galaxy
  • Basic Protocol 4: Generating Workflows from Scratch with Galaxy
  • Basic Protocol 5: Extracting Sequences and Alignments with Galaxy: An SNPs in Exons Example
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: An Introduction to the Galaxy Approach: Finding Promoters Containing TAF1 Binding Sites Identified From a ChIP‐Seq Experiment

  Materials
  • A file containing genomic coordinates for TAF1‐binding sites from the ChIP‐Seq experiment (an example file can be downloaded at http://galaxy.psu.edu/CPMB/TAF1_ChIP.txt; Kim et al., )
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)

Basic Protocol 2: Combining and Filtering Genome Annotations: Finding Exons with the Highest Number of Nucleotide Polymorphisms

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
NOTE: It is beneficial to clear the current history and start re‐numbering from 1 by accessing the History Options and selecting Create New. It simplifies following the numbered steps.

Support Protocol 1: Saving Results in Galaxy and Sharing Data with Others

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
  • Results from protocol 2
  • A Galaxy account (created by clicking Register in the Galaxy interface); histories must be linked to a user to be stored and shared

Basic Protocol 3: Generating a Workflow From a History in Galaxy

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
  • History created from protocol 2
  • A Galaxy account (created by clicking Register in the Galaxy interface); all workflow manipulation in Galaxy requires the user to be logged in with an account

Support Protocol 2: Modify a Parameter in the Workflow in Galaxy

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
  • Workflow created by protocol 4

Support Protocol 3: Running Workflows with Galaxy

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
  • Workflow saved in protocol 4

Support Protocol 4: Sharing Workflows with Galaxy

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
  • Workflow created by protocol 4

Basic Protocol 4: Generating Workflows from Scratch with Galaxy

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
  • A Galaxy account (created by clicking Register in the Galaxy interface); all workflow manipulation in Galaxy requires the user to be logged in with an account

Basic Protocol 5: Extracting Sequences and Alignments with Galaxy: An SNPs in Exons Example

  Materials
  • An internet‐accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
  • Completed and saved history created by protocol 2 and protocol 3
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   FigureFigure 19.10.1 Galaxy's Analyze Data interface consists of four regions: the masthead (A) at the top, the tool menu; (B) on the left‐hand side, the work area; (C) in the middle; and the history panel (D) on the right. The Get Data section has been expanded in the tool menu and the Upload File tool has been selected. In the work area, a local file containing TAF1 ChIP‐Seq data has been chosen (see , step 1); clicking the Execute button will cause the data to be uploaded and appear in the history panel. See the TAF1 screencast (http://galaxycast.org/cpmb‐2009‐1) for more details.
  •   FigureFigure 19.10.2 To change the properties of a dataset (see , step 2), click on the question mark (or the pencil icon) associated with the dataset in the history panel (A). This causes the Edit Attributes page to appear in the center panel (B) where the datatype has been changed from tabular to interval. Clicking Save causes the page to refresh, allowing additional interval‐specific information to be set (C).
  •   FigureFigure 19.10.3 The UCSC Table browser tool has been selected and its interface (A) appears in the center panel. The refGene table has been selected and the output is marked to be sent to Galaxy (see , step 3). Once output style is specified (B), clicking Send query to Galaxy will create a new dataset in the history panel. The history item has been renamed to RefSeq after clicking on the pencil icon next to its name and making the required changes in the Edit Attributes page (see Fig. ) which appears.
  •   FigureFigure 19.10.4 Selecting the Get flanks tool (see , step 4) from the Operate on Genomic Intervals Section (A) allows the creation of new data containing the region 1000 nucleotides upstream of our RefSeq genes (B).
  •   FigureFigure 19.10.5 The Join tool is used to create a dataset that contains the coordinates of putative promoters and TAF1 binding sites side by side (see , step 6).
  •   FigureFigure 19.10.6 The Build custom track tool (see , step 7) allows the user to design a custom track suitable for display at the UCSC Genome Browser (D) by progressively adding new tracks containing varying datasets (AC).
  •   FigureFigure 19.10.7 A dataset containing exons and overlapping SNPs was created (see , step 4) using the Join tool and displayed in the middle panel by clicking on the eye icon next to dataset 3. A red rectangle has been drawn around an exon, which overlaps with four SNPs. See the Exons and SNPs screencast (http://galaxycast.org/cpmb‐2009‐2) for more details.
  •   FigureFigure 19.10.8 To create a workflow from an existing history (see ), the user needs to make sure that they are logged in and then select History Options and click Extract Workflow. A new workflow will be populated from the current history as shown; the workflow can now be renamed and created. See the Workflow screencast (http://galaxycast.org/cpmb‐2009‐4) for more details.
  •   FigureFigure 19.10.9 The Workflow Editor allows users to click to add new tools and connect the output of one tool to the input of another by simple clicking and dragging. Here, the output of the Sort tool is being connected to the Select first tool (see , step 9), as is shown by the green rope; when the mouse button is released, the connection will be created and the rope will become white.
  •   FigureFigure 19.10.10 Several options exist for obtaining multi‐species alignments (see ). The Extract MAF blocks tool (A) creates a MAF dataset, which contains only the trimmed alignment blocks that overlap a specified set of intervals. The Stitch MAF blocks tool (B) creates a FASTA file, which contains a single alignment block per provided interval. See the SeqAlign screencast (http://galaxycast.org/cpmb‐2009‐6) for more details.

Videos

Literature Cited

Literature Cited
   Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32:D493‐D496.
   Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., Miller, W., Pedersen, J.S., Pohl, A., Raney, B.J., Rhead, B., Rosenbloom, K.R., Smith, K.E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A.S., Haussler, D., and Kent, W.J. 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36:D773‐D779.
   Kim, T.H., Barrera, L.O., Zheng, M., Qu, C., Singer, M.A., Richmond, T.A., Wu, Y., Green, R.D., and Ren, B. 2005. A high‐resolution map of active promoters in the human genome. Nature 436:876‐880.
   Taylor, J., Schenck, I., Blankenberg, D., and Nekrutenko, A. 2007. Using galaxy to perform large‐scale interactive data analyses. Curr. Protoc. Bioinformatics 19:10.5.1‐10.5.25.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library