Pattern Discovery in Expression Profiling Data

Fumiaki Katagiri1, Jane Glazebrook1

1 University of Minnesota, St. Paul, Minnesota
Publication Name:  Current Protocols in Molecular Biology
Unit Number:  Unit 22.5
DOI:  10.1002/0471142727.mb2205s85
Online Posting Date:  January, 2009
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


In expression profiling studies, it is often necessary to identify groups of genes with similar expression profiles in a variety of samples, and/or groups of samples with similar expression profiles. Each profile can be expressed as a single data point in a space with the same number of dimensions as there are parameters in the profiles. In this way, pattern discovery among expression profiles is translated into pattern discovery in the spatial distribution of data points: the similarity between profiles is defined by the distance between the corresponding data points. Various multivariate analysis methods, such as clustering and dimensionality reduction methods, are used to summarize the data point distribution to help the investigator recognize major trends. As different methods may identify different features of the distribution, it is important to analyze a particular data set with multiple methods. Curr. Protoc. Mol. Biol. 85:22.5.1‐22.5.15. © 2009 by John Wiley & Sons, Inc.

Keywords: hierarchical clustering; K‐means; dimensionality reduction; multivariate analysis; principal component analysis; self‐organizing maps; Pearson correlation coefficient

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • General Concepts
  • Multivariate Analysis Methods
  • Acknowledgment
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Alter, O., Brown, P.O., and Botstein, D. 2000. Singular value decomposition for genome‐wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A. 97:10101‐10106.
   Batagelj, V. and Mrvar, A. 1998. Pajek—Program for large network analysis. Connections 21:47‐57.
   Davidson, G.S., Hendrickson, B., Johnson, D.K., Meyers, C.E., and Wylie, B.N. 1998. Knowledge mining with VxInsight: Discovery through interaction. J. Intell. Inf. Syst. 11:259‐285.
   Dembele, D. and Kastner, P. 2003. Fuzzy C‐means method for clustering microarray data. Bioinformatics 19:973‐980.
   Eichler, G.S., Huang, S., and Ingber, D.E. 2003. Gene expression dynamics inspector (GEDI): For integrative analysis of expression profiles. Bioinformatics 19:2321‐2322.
   Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868.
   Gasch, A.P. and Eisen, M.B. 2002. Exploring the conditional coregulation of yeast gene expression through fuzzy k‐means clustering. Genome Biol. 3:RESEARCH0059.
   Hastie, T., Tibshirani, R., Eisen, M.B., Alizadeh, A., Levy, R., Staudt, L., Chan, W.C., Botstein, D., and Brown, P. 2000. ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1:RESEARCH0003.
   Herwig, R., Poustka, A.J., Muller, C., Bull, C., Lehrach, H., and O'Brien, J. 1999. Large‐scale clustering of cDNA‐fingerprinting data. Genome Res. 9:1093‐1105.
   Hirai, M.Y., Yano, M., Goodenowe, D.B., Kanaya, S., Kimura, T., Awazuhara, M., Arita, M., Fujiwara, T., and Saito, K. 2004. Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proc. Natl. Acad. Sci. U.S.A. 101:10205‐10210.
   Holter, N.S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J.R., and Fedoroff, N.V. 2000. Fundamental patterns underlying gene expression profiles: Simplicity from complexity. Proc. Natl. Acad. Sci. U.S.A. 97:8409‐8414.
   Katagiri, F. and Glazebrook, J. 2003. Local context finder (LCF) reveals multidimensional relationships among mRNA expression profiles of Arabidopsis responding to pathogen infection. Proc. Natl. Acad. Sci. U.S.A. 100:10842‐10847.
   Kaufman, L. and Rousseeuw, P.J. 1990. Finding Groups in Data. John Wiley & Sons, Inc., Hoboken, N.J.
   Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., Wylie, B.N., and Davidson, G.S. 2001. A gene expression map for Caenorhabditis elegans. Science 293:2087‐2092.
   Kishino, H. and Waddell, P.J. 2000. Correspondence analysis of genes and tissue types and finding genetic links from microarray data. Genome Inform. Ser. Workshop Genome Inform. 11:83‐95.
   Kohonen, T. 1997. Self‐Organizing Maps. Springer, Berlin.
   Li, H. and Gui, J. 2006. Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics 7:302‐317.
   MacQueen, J.B. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281‐297. University of California Press, Berkeley, Calif.
   Roweis, S.T. and Saul, L.K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323‐2326.
   Schafer, J. and Strimmer, K. 2005. An empirical bayes approach to inferring large‐scale gene association networks. Bioinformatics 21:754‐764.
   Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13:2498‐2504.
   Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, T.R. 1999. Interpreting patterns of gene expression with self‐organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. U.S.A. 96:2907‐2912.
   Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., and Church, G.M. 1999. Systematic determination of genetic network architecture. Nat. Genet. 22:281‐285.
   Tenenbaum, J.B., de Silva, V., and Langford, J.C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290:2319‐2323.
   Teschendorff, A.E., Journee, M., Absil, P.A., Sepulchre, R., and Caldas, C. 2007. Elucidating the altered transcriptional programs in breast cancer using independent component analysis. PLoS Comput. Biol. 3:e161.
   Van Poecke, R.M., Sato, M., Lenarz‐Wyatt, L., Weisberg, S., and Katagiri, F. 2007. Natural variation in RPS2‐mediated resistance among Arabidopsis accessions: Correlation between gene expression profiles and phenotypic responses. Plant Cell 19:4046‐4060.
PDF or HTML at Wiley Online Library