User Ratings

Your rating: None (1 vote)
Your rating: None (1 vote)
Your rating: None (3 votes)
Add your comments

Analysis of Gene‐Expression Data Using J‐Express

Anne Kristin Stavrum1,2,  Kjell Petersen3,  Inge Jonassen3,2,  Bjarte Dysvik4

1Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
2University of Bergen, Bergen, Norway
3Computational Biology Unit, BCCS, University of Bergen, Bergen, Norway
4MolMine AS, Thormoehlens, Bergen, Norway




Unit Number: 
Unit 7.3
DOI: 
10.1002/0471250953.bi0703s21
Online Posting Date: 
March, 2008
GO TO THE FULL TEXT:
PDF or HTML at Wiley Online Library
Are you the author of this protocol? Login or register and return to this page.

Abstract

The J-Express package has been designed to facilitate the analysis of microarray data with an emphasis on efficiency, usability, and comprehensibility. The J-Express system provides a powerful and integrated platform for the analysis of microarray gene expression data. It is platform-independent in that it requires only the availability of a Java virtual machine on the system. The system includes a range of analysis tools and a project management system supporting the organization and documentation of an analysis project. This unit describes the J-Express tool, emphasizing central concepts and principles, and gives examples of how it can be used to explore gene expression data sets. Curr. Protoc. Bioinform. 21:7.3.1-7.3.25. © 2008 by John Wiley & Sons, Inc.

Keywords: gene expression; J-Express; microarray; spot intensity quantitation

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Create a Gene-Expression Matrix from Spot Intensity Data with J-Express
  • Basic Protocol 2: Analyze a Gene-Expression Matrix Using J-Express
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  • Figure 7.3.1
    Synthetic data were generated from seven seed profiles by addition of (white) noise. To the left are shown the seed profiles and to the right the resulting synthetic data. The color of each profile is that of the seed profile from which it was generated. If the profiles are thought of as generated from a time-series experiment, the x axis corresponds to the time points. The y axis gives the log-ratio of a gene's expression level (logarithm of the expression level of a gene at a certain time point divided by its expression level in a reference sample). For example the “black genes” have an expression level that does not change much during the time course, whereas the “red genes” are unchanged during first few time steps (but below reference level), then increase through a number of time steps, and stay the same for the last few time steps. The data were derived by defining the seven template profiles and generating profiles by adding noise, specifically by adding random numbers between –0.5 and 0.5 (uniform probability) to each gene at each time point.

  • Figure 7.3.2
    Data-import pipeline. Spot-intensity data are loaded from a file. A subset of the genes is selected through a filtering step, the intensity values for the remaining genes are normalized, and log-ratios are calculated. The prepared data set is a gene-expression data matrix that can be analyzed using, e.g., clustering methods.

  • Figure 7.3.3
    Screen shot of the SpotPix suite in J-Express including a visualization of a lowess normalization. The SpotPix suite shows the experimental design linking data files to samples. For each data file, a sequence of processes (including filters, plots, and normalization procedures) can be edited and executed. The figure shows a process batch including a plot that has been executed so that the plot is included in the screenshot. The plot shows the “regression line” defined by the lowess normalization procedure just above the plot in the process batch.

  • Figure 7.3.4
    The profile similarity search in J-Express allows the user to find the profiles most similar to a query profile when a particular dissimilarity measure is used. The figure illustrates the difference between (A) Euclidean distance, and (B) Pearson correlation–based dissimilarity measure (mathematically, the dissimilarity measure is 1 minus the correlation coefficient). See Background Information for more about dissimilarity measures.

  • Figure 7.3.5
    The user-defined Profile Search window in J-Express allows the user to define a search profile consisting of a lower and upper limit on the expression values and to find all profiles matching that profile. The search profile is defined by the red/green barred boxes and the matching expression profiles are shown in black.

  • Figure 7.3.6
    Example showing hierarchical clustering of the synthetic data set using: (A) single linkage, (B) average linkage, and (C) complete linkage. To the very right of each clustering is shown from which seed each profile was generated (this is shown using the gene group visualization functionality in the dendrogram window of J-Express).

  • Figure 7.3.7
    (A) K-means dialog box; (B) K-means result when clustering the synthetic data set. Each cluster is represented by its mean profile and by the bars showing the variation within the cluster at each data point.

  • Figure 7.3.8
    (A) PCA window with applied density map and a selected green area. (B) Result from PCA selection.

  • Figure 7.3.9
    (A) PCA window with over 6000 points (genes). (B) The same number of points with density threshold to find outliers.

  • Figure 7.3.10
    (A) SOM training control window. (B) SOM visualized in PCA window.

  • Figure 7.3.11
    Results from a GSEA analysis (window bottom right). The figure in the middle of the GSEA window shows the path of the running sum used to find the Enrichment Score (ES) for a gene set. The ES is determined by the highest (or in this case the lowest) point along this walk. The genes that appear at or before this point are the ones contributing to the score and are referred to as the “Leading Edge.” These may be important genes. The data set used shows the life cycle of Plasmodium falciparum (Bozdech et al. 2003). The genes in the dataset were ranked according to their correlation to the search profile shown in the top right-hand corner, and Gene Ontology was used to create gene sets (window top middle). When browsing the GSEA result window, the corresponding GO terms will be selected in the GO tree and the gene profiles will be selected in the Gene Graph window (window bottom left).

  • Figure 7.3.12
    Data flow. Data are loaded from a data medium (typically a hard disk) through a loader/saver module and maintained within the J-Express system as a data set. The project-management system holds the different data sets loaded, as well as derived data sets produced by the user through analysis and processing (e.g., normalization/filtering) steps. The system also stores information on relationships between data sets.

  • Figure 7.3.13
    Illustration of distance measures for pairs of points in a two-dimensional space. (A) Euclidean distance; (B) Manhattan (city block) distance.

  • Figure 7.3.14
    Different experimental designs using two-channel system. In a two-channel system, one typically uses either a common control hybridized to each array (in either one of the two channels), or one performs competitive hybridizations between all (or a subset of) the pairs of samples under analysis. Presently, J-Express supports the first experimental design (left). Note that, on the left, all samples are hybridized together with a common control (referred to as A in the example), while, if one uses the all-pairs approach, every possible pair of samples is hybridized together.

  • Figure 7.3.15
    The result of applying filters on the (original) synthetic data set using: (A) requiring at least 5 values with absolute values above 2; (B) lower limit on standard deviation only.

  • Figure 7.3.16
    J-Express allows the user to normalize the expression profiles of genes (rows in the gene-expression matrix). The example shows the results of normalizing the synthetic data set by (A) mean normalization and (B) mean and variance normalization.

Literature Cited

Literature Cited
    Beibbarth, T., Fellenberg, K., Brors, B., Arribas-Prat, R., Boer, J.M., Hauser, N.C., Scheideler, M., Hoheisel, J.D., Schütz, G., Poustka, A., and Vingron, M. 2001. Processing and quality control of DNA array hybridization data. Bioinformatics 16: 1014-1022.
    Bø, T.H., Dysvik, B., and Jonassen, I. 2004. LSimpute: Accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 32: e34.
    Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 22: 185-193.
    Bozdech, Z., Llina, M., Pulliam, B.L., Wong, E.D., Zhu, J., and DeRisi, J.L. 2003. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol 1: 001-016.
    Brazma, A. and Vilo, J. 2000. Gene expression data analysis. FEBS Lett. 480: 17-24.
    Brazma, A., Jonassen, I., Vilo, J., and Ukkonen, E. 1998. Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 8: 1202-1215.
    Breitling, R., Armengaud, P., Amtmann, A., and Herzyk, P. 2004. Rank products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 573: 83-92.
    Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M. Jr., and Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U.S.A. 97: 262-267.
    Cleveland, W.S. 1979. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74: 829-836.
    Cui, X. and Churchill, G.A. 2003. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 4: 210.
    Dysvik, B. and Jonassen, I. 2001. J-Express: Exploring gene expression data using Java. Bioinformatics 17: 369-370.
    Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95: 14863-14868.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., and Lander, E.S. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286: 531-537.
    Holter, N.S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J.R., and Fedoroff, N.V. 2000. Fundamental patterns underlying gene expression profiles: Simplicity from complexity. Proc. Natl. Acad. Sci. U.S.A. 97: 8409-8414.
    Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., and Speed, T.P. 2003. Summaries of affymetrix GeneChip probe level data. Nucleic Acids Res. 31: e15
    Jain, A.K. and Dubes, R.C. 1988. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey.
    Joliffe, I.T. 1986. Principal Component Analysis. Springer-Verlag, New York.
    Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A. 2002. The KEGG databases at GenomeNet. Nucleic Acids Res. 30: 42-46.
    Kohonen, T. 1997. Self-Organizing Maps. Springer-Verlag, New York.
    Peña, J.M., Lozano, J.A., and Larrañaga, P. 1999. An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20: 1027-1040.
    Quackenbush, J. 2001. Computational analysis of microarray data. Nat. Rev. Genet. 2: 418-427.
    Raychaudhuri, S.J., Stuart, M., and Altman, R.B. 2000. Principal components analysis to summarize microarray experiments: Application to sporulation time series. Pacific Symposium on Biocomputing, 455-466. Stanford Medical Informatics, Stanford University, Calif.
    Spellman, P.T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W.L., Goncalves, J., Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M., Aronow, B.J., Robinson, A., Bassett, D., Stoeckert, C.J. Jr., and Brazma, A. 2002. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3: RESEARCH0046.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., and Mesirovak, J.P. 2005. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102: 15545-15550.
    Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, T.R. 1999. Interpreting gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. U.S.A. 96: 2907-2912.
    Tornen, P., Kolehmainen, M., Wong, G., and Castren, E. 1999. Analysis of gene expression data using self-organizing maps. FEBS. Lett. 451: 142-146.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Bostein, D., and Altman, R.B. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17: 520-525.
    Tusher, V.G., Tibshirani, R., and Chu, G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98: 5116-5121.
    Workman, C., Jensen, L.J., Jarmer, H., Berka, R., Gautier, L., Nielsen, H.B., Saxild, H.-H., Nielsen, C., Brunak, S., and Knudsen, S. 2002. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biology 3: research0048.
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
Looking for Answers?
Do you have tips, tricks, or improvements to share?

Join the Conversation

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.