Loading and Preparing Data for Analysis in Spotfire

Deepak Kaushal1, Clayton W. Naeve1

1 Hartwell Center for Bioinformatics and Biotechnology, St. Jude Children's Research Hospital, Memphis, Tennessee
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 7.8
DOI:  10.1002/0471250953.bi0708s6
Online Posting Date:  September, 2004
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


This unit strictly focuses on data preparation within Spotfire. Microarray data exist in a variety of formats, which often depend on the particular array technology and detection instruments used. The first protocols in this unit describe loading Affymetrix and GenePix data into Spotfire. Once the data are loaded, it is necessary to filter and preprocess the data prior to analysis. Subsequently, the data transformation and normalization techniques presented here, are critical to correctly performing powerful microarray data mining expeditions. These steps extract or enhance meaningful data characteristics and prepare the data for the application of certain analysis methods such as statistical tests to compute significance and clustering methods—which mostly require data to be normally distributed. The unit outlines several methods for normalizing the data within an experiment and between multiple experiments.

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Uploading GenePix Data Into Spotfire
  • Alternate Protocol 1: Uploading Affymetrix Text Data Into Spotfire
  • Support Protocol 1: Filtering and Preprocessing Microarray Data
  • Support Protocol 2: Log Transformation of Microarray Data
  • Basic Protocol 2: Normalization of Microarray Data within an Experiment
  • Basic Protocol 3: Normalization of Microarray Data Between Experiments
  • Basic Protocol 4: Row Summarization
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Cheok, M.H., Yang, W., Pui, C.H., Downing, J.R., Cheng, C., Naeve, C.W., Relling, M.V., and Evans, W.E. 2003. Treatment‐specific changes in gene expression discriminate in vivo drug response in human leukemia cells. Nat. Genet. 34:85‐90.
   Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868.
   Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., and Lander, E.S. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531‐537.
   Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson, J. Jr., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P.O. 1999. The transcriptional program in the response of human fibroblasts to serum. Science 283:83‐87.
   Jolliffe, I.T. 1986. Principal Component Analysis. Springer Series in Statistics. Springer‐Verlag, New York.
   Kerr, M.K. and Churchill, G.A. 2001. Experimental design for gene expression microarrays. Biostatistics 2:183‐201.
   Kozal, M.J., Shah, N., Shen, N., Yang, R., Fucini, R., Merigan, T.C., Richman, D.D., Morris, D., Hubbell, E., Chee, M., and Gingeras, T.R. 1996. Extensive polymorphisms observed in HIV‐1 clade B protease gene using high‐density oligonucleotide arrays. Nat. Med. 2:753‐759.
   Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar‐Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., Zeitlinger, J., Jennings, E.G., Murray, H.L., Gordon, D.B., Ren, B., Wyrick, J.J., Tagne, J.B., Volkert, T.L., Fraenkel, E., Gifford, D.K., and Young, R.A. 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799‐804.
   Leung, Y.F. and Cavalieri, D. 2003. Fundamentals of cDNA microarray data analysis. Trends Genet. 19:649‐659.
   MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability 1967:281‐297.
   Sankoff, D. and Kruskal, J.B. 1983. Time Warps, String Edits, and Macromolecules. The Theory and Practice of Sequence Comparison. Addison‐Wesley, Reading Mass.
   Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467‐470.
   Schena, M., Heller, R.A., Theriault, T.P., Konrad, K., Lachenmeier, E., and Davis, R.W. 1998. Microarrays: Biotechnology's discovery platform for functional genomics. Trends Biotechnol. 16:301‐306.
   Smyth, G.K. and Speed, T. 2003. Normalization of cDNA microarray data. Methods 31:265‐273.
   Smyth, G.K., Yang, Y.H., and Speed, T. 2003. Statistical issues in cDNA microarray data analysis. Methods Mol. Biol. 224:111‐136.
   Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., and Church, G.M. 1999. Systematic determination of genetic network architecture. Nat. Genet. 22:281‐285.
   Yang, Y., Buckley, M.J., Dudoit, S., and Speed, T.R. 2002. Comparison of methods for image analysis on cDNA microarray data. J. Comp. Stat. 11:108‐136.
   Yeoh, E.J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R. 2002. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. 2002. Cancer Cell 1:133‐143.
PDF or HTML at Wiley Online Library