Using GenePattern for Gene Expression Analysis

Heidi Kuehn1, Arthur Liberzon1, Michael Reich1, Jill P. Mesirov1

1 Broad Institute of MIT and Harvard, Cambridge, Massachusetts
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 7.12
DOI:  10.1002/0471250953.bi0712s22
Online Posting Date:  June, 2008
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

The abundance of genomic data now available in biomedical research has stimulated the development of sophisticated statistical methods for interpreting the data, and of special visualization tools for displaying the results in a concise and meaningful manner. However, biologists often find these methods and tools difficult to understand and use correctly. GenePattern is a freely available software package that addresses this issue by providing more than 100 analysis and visualization tools for genomic research in a comprehensive user‐friendly environment for users at all levels of computational experience and sophistication. This unit demonstrates how to prepare and analyze microarray data in GenePattern. Curr. Protoc. Bioinform. 22:7.12.1–7.12.39. © 2008 by John Wiley & Sons, Inc.

Keywords: GenePattern; microarray data analysis; workflow; clustering; classification; differential; expression analysis pipelines

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Preparing the Dataset
  • Basic Protocol 1: Creating a GCT File
  • Basic Protocol 2: Creating a CLS File
  • Basic Protocol 3: Preprocessing Gene Expression Data
  • Basic Protocol 4: Differential Analysis: Identifying Differentially Expressed Genes
  • Basic Protocol 5: Class Discovery: Clustering Methods
  • Basic Protocol 6: Class Prediction: Classification Methods
  • Basic Protocol 7: Pipelines: Reproducible Analysis Methods
  • Alternate Protocol 1: Using the GenePattern Desktop Client
  • Alternate Protocol 2: Using the GenePattern Programming Environment
  • Support Protocol 1: Setting User Preferences for the GenePattern Web Client
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

   Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57:289‐300.
   Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey, Calif.
   Brunet, J., Tamayo, P., Golub, T.R., and Mesirov, J.P. 2004. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. U.S.A. 101:4164‐4169.
   Cover, T.M. and Hart, P.E. 1967. Nearest neighbor pattern classification, IEEE Trans. Info. Theory 13:21‐27.
   D'haeseleer, P. 2005. How does gene expression clustering work? Nat. Biotechnol. 23:1499‐1501.
   Getz, G., Monti, S., and Reich, M. 2006. Workshop: Analysis Methods for Microarray Data. October 18‐20, 2006. Cambridge, MA.
   Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., and Lander, E.S. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression. Science 286:531‐537.
   Gould, J., Getz, G., Monti, S., Reich, M., and Mesirov, J.P. 2006. Comparative gene marker selection suite. Bioinformatics 22:1924‐1925.
   Lu, J., Getz, G., Miska, E.A., Alvarez‐Saavedra, E., Lamb, J., Peck, D., Sweet‐Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A, Downing, J.R., Jacks, T., Horvitz, H.R., and Golub, T.R. 2005. MicroRNA expression profiles classify human cancers. Nature 435:834‐838.
   MacQueen, J.B. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 (L. Le Cam and J. Neyman, eds.) pp. 281‐297. University of California Press, Berkeley, California.
   Monti, S., Tamayo, P., Mesirov, J.P., and Golub, T. 2003. Consensus clustering: A resampling‐based method for class discovery and visualization of gene expression microarray data. Functional Genomics Special Issue. Machine Learning Journal 52:91‐118.
   Quackenbush, J. 2002. Microarray data normalization and transformation. Nat. Genet. 32:496‐501.
   Slonim, D.K. 2002. From patterns to pathways: Gene expression data analysis comes of age. Nat. Genet. 32:502‐508.
   Slonim, D.K., Tamayo, P., Mesirov, J.P., Golub, T.R., and Lander, E.S. 2000. Class prediction and discovery using gene expression data. In Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB). (R. Shamir, S. Miyano, S. Istrail, P. Pevzner, and M. Waterman, eds.) pp. 263‐272. ACM Press, New York.
   Specht, D.F. 1990. Probabilistic neural networks. Neural Netw. 3:109‐118.
   Storey, J.D. and Tibshirani, R. 2003. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U.S.A. 100:9440‐9445.
   Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Dmitrovsky, E., Lander, E.S., and Golub, T.R. 1999. Interpreting gene expression with self‐organizing maps: Methods and application to hematopoeitic differentiation. Proc. Natl. Acad. Sci. U.S.A. 96:2907‐2912.
   Vapnik, V. 1998. Statistical Learning Theory. John Wiley & Sons, New York.
   Westfall, P.H. and Young, S.S. 1993. Resampling‐Based Multiple Testing: Examples and Methods for p‐Value Adjustment (Wiley Series in Probability and Statistics). John Wiley & Sons, New York.
   Wit, E. and McClure, J. 2004. Statistics for Microarrays. John Wiley & Sons, West Sussex, England.
   Zeeberg, B.R., Riss, J., Kane, D.W., Bussey, K.J., Uchio, E., Linehan, W.M., Barrett, J.C., and Weinstein, J.N. 2004. Mistaken identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5:80.
Key References
   Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., and Mesirov, J.P. 2006. GenePattern 2.0. Nature Genetics 38:500‐501.
  Overview of GenePattern 2.0, including comparison with other tools.
   Wit and McClure, 2004. See above.
  Describes setting up a microarray experiment and analyzing the results.
Internet Resources
   http://www.genepattern.org
  Download GenePattern software and view GenePattern documentation.
   http://www.genepattern.org/tutorial/gp_concepts.html
  GenePattern concepts guide.
   http://www.genepattern.org/tutorial/gp_web_client.html
  GenePattern Web Client guide.
   http://www.genepattern.org/tutorial/gp_java_client.html
  GenePattern Desktop Client guide.
   http://www.genepattern.org/tutorial/gp_programmer.html
  GenePattern Programmer's guide.
   http://www.genepattern.org/tutorial/gp_fileformats.html
  GenePattern file formats.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library