Analysis of Expression Data: An Overview

Anoop Grewal1, Peter Lambert1, Jordan Stockton2

1 NextBio, Cupertino, California, 2 Agilent Technologies, Santa Clara, California
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 11.4
DOI:  10.1002/0471142905.hg1104s54
Online Posting Date:  July, 2007
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


After providing a brief introduction to microarray chips and experimental details, this overview discusses analysis techniques. Data analysis from microarray experiments generally involves two parts: acquiring and normalizing the data, and interpreting it. This unit focuses mostly on the latter, as it is less technology‐specific. Curr. Protoc. Hum. Genet. 54:11.4.1‐11.4.14. © 2007 by John Wiley & Sons, Inc.

Keywords: microarray data analysis; genomic expression; transcriptomics

PDF or HTML at Wiley Online Library

Table of Contents

  • Experimental Design
  • Raw Data Output
  • Data Normalization
  • Data Analysis
  • Data Standards
  • Informatics and Databases
  • Conclusion
  • Literature Cited
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Asburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel‐Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., and Sherlock, G. 2000. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25:25‐29.
   Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., and Edgar, R. 2007 NCBI GEO: Mining tens of millions of expression profiles—database and tools update. Nuc. Acids Res. 35:D760‐D765.
   Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. B 57:2889‐3000.
   Bolstad, B.M. 2006. RMAExpress. URL:
   Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, CA., Causton, HC., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze‐Kremer, S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. 2001 Minimum information about a microarray experiment (MIAME)‐toward standards for microarray data. Nat. Genet. 29(4): 365‐371.
   Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., Luo, S., McCurdy, S., Foy, M., Ewan, M., Roth, R., George, D., Eletr, S., Albrecht, G., Vermaas, E., Williams, S.R., Moon, K., Burcham, T., Pallas, M., DuBridge, R.B., Kirchner, J., Fearon, K., Mao, J., and Corcoran, K. 2000. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18:630‐634.
   Cope, L.M., Irizarry, R.A., Jaffee, H.A., Wu, Z., and Speed, T.P. 2004. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 20:323‐331.
   Dahlquist, K.D., Salomonis, N., Vranizan, K., Lawlor, S.C., and Conklin, B.R. 2002. GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat. Genet. 31:19‐20.
   Dudoit, S., Fridlyand, J., and Speed, T. 2000. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Tech. Rep. 576, Dept. of Statistics, University of California, Berkeley.
   Durbin, B.P., Hardin, J.S., Hawkins, D.M., and Rocke, D.M. 2002. A variance‐stabilizing transformation for gene expression microarray data. Bioinformatics 18:S105‐S110.
   Fehlbaum, P., Guihal, C., Bracco, L., and Cochet, O. 2005. A microarray configuration to quantify expression levels and relative abundance of splice variants. Nucleic Acids Res. 10:e47.
   GeneLogic. 2002. Datasets.
   Gentleman, R.C., Carey, V.J., Bates, D.J., Bolstad, B.M., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G.K., Tierney, L., Yang, Y.H., and Zhang, J. 2004. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 5:R80.
   Hughes, J.D., Estep, P.W., Tavazoie, S., and Church, G.M. 2000. Computational identification of cis‐regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296:1205‐1214.
   Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., and Speed, T.P. 2003. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31:e15.
   Joho, K. and Su, Q.J. 2006 Addressing the challenges of organizing and correlating diverse high throughput data types. NextBio. (March 23, 2007).
   Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A. 2002. The KEGG databases at GenomeNet. Nucleic Acids Res. 30:42‐46.
   Kerr, M.K. and Churchill, G.A. 2001. Statistical design and the analysis of gene expression microarrays. Genet. Res. 77:123‐128.
   Kerr, M.K., Martin, M., and Churchill, G.A. 2000. Analysis of variance for gene expression microarray data. J. Comput. Biol. 7:819‐837.
   Li, C. and Wong, W. 2001. Model‐based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc. Natl. Acad. Sci. U.S.A. 98:31‐36.
   Lipshutz, R., Fodor, S., Gingeras, T., and Lockart, D. 1999. High density synthetic oligonucleotide arrays. Nat. Genet. 21:20‐24.
   Pan, Q., Shai, O., Misquitta, C., Zhang, W., Saltzman, A.L., Mohammad, N., Babak, T., Siu, H., Hughes, T.R., Morris, Q.D., Frey, B.J., and Blencowe, B.J. 2004. Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. Mol. Cell. 16:929‐941.
   Parkinson, H., Kapushesky, M., Shojatalab, M., Abeygunawardena, N., Coulson, R., Farne, A., Holloway, E., Kolesnikov, N., Lilja, P., Lukk, M., Mani, R., Rayner, T., Sharma, A., William, E., Sarkans, U., and Brazma A. 2007 ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35:D747‐750.
   Reich, M, Liefeld, T, Gould, J, Lerner, J, Tamayo, P, Mesirov, J.P. 2006 GenePattern 2.0 Nature Genet. 38:500‐501.
   Saeed, A.I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A., Trush, V., and Quackenbush, J. 2003. TM4: A free, open‐source system for microarray data management and analysis. Biotechniques 34:374‐378.
   Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467‐470.
   Shalon, D., Smith, S.J., and Brown, P.O. 1996. A DNA microarray system for analyzing complex DNA samples using two‐color fluorescent probe hybridization. Genome Res. 6:639‐645.
   Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13:2498‐2504.
   Stein, A., Van Loo, P., Thijs, G., Mayer, H., de Martin, R., Moreau, Y., and DeMoor, B. 2005. TOUCAN2: The all‐inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res. 33:W393‐W396.
   Thijs, G., Marchal, K., Lescot, M., Rombauts, S., DeMoor, B., Rouze, P., and Moreau, Y. 2002. A Gibbs sampling method to detect over‐represented motifs in upstream regions of coexpressed genes. J. Comput. Biol. 9:447‐464.
   Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. 2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. U.S.A. 99:6567‐6572.
   Tusher, V.G., Tibshirani, R., and Chu, G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98:5116‐5121.
   vonEschenbach, A.C. and Buetow, K. (2006) Cancer informatics vision: caBIG. 2:22‐24.
   Wolfsberg, T.G., Gabrielian, A.E., Campbell, M.J., Cho, R.J., Spouge, J.L., and Landsman, D. 1999. Candidate regulatory sequence elements for cell cycle–dependent transcription in Saccharomyces cerevisiae. Gen. Res. 9:775‐792.
   Wu, Z., LeBlanc, R., and Irizarry, R.A. 2004. Stochastic Models Based on Molecular Hybridization Theory for Short Oligonucleotide Microarrays Technical report, Johns Hopkins University, Dept. of Biostatistics Working Papers.
   Yang, Y.H., Dudoit, S., Luu, P., Lin, D.M., Peng, V., Ngai, J., and Speed, T.P. 2002. Normalization for cDNA microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 4:e15.
Internet Resources
  The Gene Expression Expression Omnibus (GEO) is a public database of expression data derived from a number of different expression analysis technologies.
  ArrayExpress is a public repository for gene expression data, focused on providing a rich source of experimental background for each experiment set.
  Web site for Biocarta Pathways—interactive graphic models of molecular and cellular pathways.
  Kyoto Encyclopedia of Genes and Genomes.
  The Cancer Biomedical Informatics Grid (caBIG) launched by the National Cancer Institute (NCI). This initiative aims to accelerate research discoveries and improve patient outcomes by linking researchers, physicians, and patients throughout the cancer community.
  An open source cancer array informatics project, caArray, from the NCI's Center for Bioinformatics (NCICB). CaArray consists of a microarray database, data analysis and visualization tools.
  geWorkbench (genomics Workbench) is the Bioinformatics platform of MAGNet, the National Center for the Multi‐scale Analysis of Genomic and Cellular Networks. geWorkbench is a Java‐based open‐source platform for integrated genomics.
  GenePattern is a powerful analysis workflow tool developed to support multidisciplinary genomic research programs and designed to encourage rapid integration of new techniques.
  webGenome is an application for creating genomics plots, designed to operate as a plotting client for other applications.
PDF or HTML at Wiley Online Library