Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists
1The Galaxy Team, Pennsylvania State University, University Park, Pennsylvania
2The Huck Institutes for the Life Sciences, Pennsylvania State University, University Park, Pennsylvania
3Channing Laboratory, Harvard Medical School, Boston, Massachusetts
4OpenHelix LLC, Bellevue, Washington
5Emory University, Atlanta, Georgia
Abstract
High-throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high-throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. Either way, it allows experimentalists without informatics or programming expertise to perform complex large-scale analysis with just a Web browser. Curr. Protoc. Mol. Biol. 89:19.10.1-19.10.21. © 2010 by John Wiley & Sons, Inc.
Keywords: Galaxy; analysis; bioinformatics; workflow; algorithm; pipeline; genomics; SNPs
Table of Contents
- Introduction
- Basic Protocol 1: An Introduction to the Galaxy Approach: Finding Promoters Containing TAF1 Binding Sites Identified From a ChIP-Seq Experiment
- Basic Protocol 2: Combining and Filtering Genome Annotations: Finding Exons with the Highest Number of Nucleotide Polymorphisms
- Support Protocol 1: Saving Results in Galaxy and Sharing Data with Others
- Basic Protocol 3: Generating a Workflow From a History in Galaxy
- Support Protocol 2: Modify a Parameter in the Workflow in Galaxy
- Support Protocol 3: Running Workflows with Galaxy
- Support Protocol 4: Sharing Workflows with Galaxy
- Basic Protocol 4: Generating Workflows from Scratch with Galaxy
- Basic Protocol 5: Extracting Sequences and Alignments with Galaxy: An SNPs in Exons Example
- Commentary
- Literature Cited
- Figures
- Topics
- Bioinformatics
- Genetics and Genomics
- Molecular Biology
Materials
Basic Protocol 1: An Introduction to the Galaxy Approach: Finding Promoters Containing TAF1 Binding Sites Identified From a ChIP-Seq Experiment
- A file containing genomic coordinates for TAF1-binding sites from the ChIP-Seq experiment (an example file can be downloaded at http://galaxy.psu.edu/CPMB/TAF1_ChIP.txt; Kim et al., 2005)
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
Basic Protocol 2: Combining and Filtering Genome Annotations: Finding Exons with the Highest Number of Nucleotide Polymorphisms
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
Support Protocol 1: Saving Results in Galaxy and Sharing Data with Others
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
- Results from Basic Protocol 2
- A Galaxy account (created by clicking Register in the Galaxy interface); histories must be linked to a user to be stored and shared
Basic Protocol 3: Generating a Workflow From a History in Galaxy
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
- History created from Basic Protocol 2
- A Galaxy account (created by clicking Register in the Galaxy interface); all workflow manipulation in Galaxy requires the user to be logged in with an account
Support Protocol 2: Modify a Parameter in the Workflow in Galaxy
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
- Workflow created by Basic Protocol 3
Support Protocol 3: Running Workflows with Galaxy
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
- Workflow saved in Basic Protocol 3
Support Protocol 4: Sharing Workflows with Galaxy
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
- Workflow created by Basic Protocol 3
Basic Protocol 4: Generating Workflows from Scratch with Galaxy
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
- A Galaxy account (created by clicking Register in the Galaxy interface); all workflow manipulation in Galaxy requires the user to be logged in with an account
Basic Protocol 5: Extracting Sequences and Alignments with Galaxy: An SNPs in Exons Example
- An internet-accessible computer with any modern Web browser (Firefox, Safari, Opera, Internet Explorer)
- Completed and saved history created by Basic Protocol 2 and Support Protocol 1
Figures
-
Figure 19.10.1Galaxy's Analyze Data interface consists of four regions: the masthead (A) at the top, the tool menu; (B) on the left-hand side, the work area; (C) in the middle; and the history panel (D) on the right. The Get Data section has been expanded in the tool menu and the Upload File tool has been selected. In the work area, a local file containing TAF1 ChIP-Seq data has been chosen (see Basic Protocol 1, step 1); clicking the Execute button will cause the data to be uploaded and appear in the history panel. See the TAF1 screencast (http://galaxycast.org/cpmb-2009-1) for more details.
-
Figure 19.10.2To change the properties of a dataset (see Basic Protocol 1, step 2), click on the question mark (or the pencil icon) associated with the dataset in the history panel (A). This causes the Edit Attributes page to appear in the center panel (B) where the datatype has been changed from tabular to interval. Clicking Save causes the page to refresh, allowing additional interval-specific information to be set (C).
-
Figure 19.10.3The UCSC Table browser tool has been selected and its interface (A) appears in the center panel. The refGene table has been selected and the output is marked to be sent to Galaxy (see Basic Protocol 1, step 3). Once output style is specified (B), clicking Send query to Galaxy will create a new dataset in the history panel. The history item has been renamed to RefSeq after clicking on the pencil icon next to its name and making the required changes in the Edit Attributes page (see Fig. 19.10.2) which appears.
-
Figure 19.10.4Selecting the Get flanks tool (see Basic Protocol 1, step 4) from the Operate on Genomic Intervals Section (A) allows the creation of new data containing the region 1000 nucleotides upstream of our RefSeq genes (B).
-
Figure 19.10.5The Join tool is used to create a dataset that contains the coordinates of putative promoters and TAF1 binding sites side by side (see Basic Protocol 1, step 6).
-
Figure 19.10.6The Build custom track tool (see Basic Protocol 1, step 7) allows the user to design a custom track suitable for display at the UCSC Genome Browser (D) by progressively adding new tracks containing varying datasets (A-C).
-
Figure 19.10.7A dataset containing exons and overlapping SNPs was created (see Basic Protocol 2, step 4) using the Join tool and displayed in the middle panel by clicking on the eye icon next to dataset 3. A red rectangle has been drawn around an exon, which overlaps with four SNPs. See the Exons and SNPs screencast (http://galaxycast.org/cpmb-2009-2) for more details.
-
Figure 19.10.8To create a workflow from an existing history (see Basic Protocol 3), the user needs to make sure that they are logged in and then select History Options and click Extract Workflow. A new workflow will be populated from the current history as shown; the workflow can now be renamed and created. See the Workflow screencast (http://galaxycast.org/cpmb-2009-4) for more details.
-
Figure 19.10.9The Workflow Editor allows users to click to add new tools and connect the output of one tool to the input of another by simple clicking and dragging. Here, the output of the Sort tool is being connected to the Select first tool (see Basic Protocol 4, step 9), as is shown by the green rope; when the mouse button is released, the connection will be created and the rope will become white.
-
Figure 19.10.10Several options exist for obtaining multi-species alignments (see Basic Protocol 5). The Extract MAF blocks tool (A) creates a MAF dataset, which contains only the trimmed alignment blocks that overlap a specified set of intervals. The Stitch MAF blocks tool (B) creates a FASTA file, which contains a single alignment block per provided interval. See the SeqAlign screencast (http://galaxycast.org/cpmb-2009-6) for more details.
Literature Cited
| Literature Cited | |
| Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32:D493-D496. | |
| Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., Giardine, B., Harte, R.A., Hinrichs, A.S., Hsu, F., Miller, W., Pedersen, J.S., Pohl, A., Raney, B.J., Rhead, B., Rosenbloom, K.R., Smith, K.E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A.S., Haussler, D., and Kent, W.J. 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36:D773-D779. | |
| Kim, T.H., Barrera, L.O., Zheng, M., Qu, C., Singer, M.A., Richmond, T.A., Wu, Y., Green, R.D., and Ren, B. 2005. A high-resolution map of active promoters in the human genome. Nature 436:876-880. | |
| Taylor, J., Schenck, I., Blankenberg, D., and Nekrutenko, A. 2007. Using galaxy to perform large-scale interactive data analyses. Curr. Protoc. Bioinformatics 19:10.5.1-10.5.25. | |
Did you know we publish 20-30 new protocols monthly? Stay informed! Sign up for NEW PROTOCOL ALERTS.
PUBLISH YOUR PROTOCOL on CurrentProtocols.com.
Read our editors' blog for news, commentaries, and the latest developments in methods in and out of the lab.
Tools & Calculators
Your Recently Viewed Protocols
- Bioremediation of Turbid Surface Water Using Seed Extract from Moringa oleifera Lam. (Drumstick) Tree
- Chemistry of Minor Groove Binder–Oligonucleotide Conjugates
- Assessment of Histone Acetylation Levels in Relation to Cell Cycle Phase
- Identifying Protein Domains with the Pfam Database
- Aphid Transmission of Plant Viruses




Join the Conversation
Post new comment