Metabolomic Data Processing, Analysis, and Interpretation Using MetaboAnalyst

Jianguo Xia1, David S. Wishart2

1 Department of Computing Science, University of Alberta, Alberta, Canada, 2 National Research Council, National Institute for Nanotechnology (NINT), Edmonton, Alberta, Canada
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 14.10
DOI:  10.1002/0471250953.bi1410s34
Online Posting Date:  June, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


MetaboAnalyst is a comprehensive, Web‐based tool designed for processing, analyzing, and interpreting metabolomic data. It handles most of the common metabolomic data types including compound concentration lists, spectral bin lists, peak lists, and raw MS spectra. In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst supports a number of data‐analysis tasks using a range of univariate, multivariate, and machine‐learning methods. MetaboAnalyst also offers two newly developed approaches—Metabolite Set Enrichment Analysis (MSEA) and Metabolic Pathway Analysis (MetPA)—for metabolomic data interpretation. MSEA helps detect biologically meaningful metabolite sets that have been enriched in human metabolomic studies, while MetPA allows users to identify any metabolic pathways that have been perturbed. MetaboAnalyst enables facile interactive exploration and visualization of nearly all of its results. At the end of each session, it produces a detailed analysis report with graphical, tabular, and textual output that summarizes each analytical method used and each result generated. Curr. Protoc. Bioinform. 34:14.10.1‐14.10.48. © 2011 by John Wiley & Sons, Inc.

Keywords: Web application; metabolomics; bioinformatics; univariate analysis; multivariate analysis; metabolite set enrichment analysis; metabolic pathway analysis

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Data Uploading, Processing, and Normalization
  • Basic Protocol 2: Identification of Significant Variables
  • Basic Protocol 3: Multivariate Exploratory Data Analysis
  • Basic Protocol 4: Functional Interpretation of Metabolomic Data
  • Guidelines for Understanding the Results
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Ametaj, B.N., Zebeli, Q., Saleem, F., Psychogios, N., Lewis, J.L., Dunn, S.M., Xia, J., and Wishart, D.S. 2010. Metabolomics reveals unhealthy alterations in rumen metabolism with increased proportion of cereal grain in the diet of dairy cows. Metabolomics 5:375‐386.
   Bijlsma, S., Bobeldijk, I., Verheij, E.R., Ramaker, R., Kochhar, S., Macdonald, I.A., van Ommen, B., and Smilde, A.K. 2006. Large‐scale human metabolomics studies: A strategy for data (pre‐) processing and validation. Anal. Chem. 78:567‐574.
   Breiman, L. 2001. Random forests. J. Mach. Learn Res. 45:5‐32.
   Brodsky, L., Moussaieff, A., Shahaf, N., Aharoni, A., and Rogachev, I. 2010. Evaluation of peak picking quality in LC‐MS metabolomics data. Anal. Chem. 82:9177‐9187.
   Craig, A., Cloarec, O., Holmes, E., Nicholson, J.K., and Lindon, J.C. 2006. Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78:2262‐2267.
   Deming, S.N. 1986. Chemometrics: An overview. Clin. Chem. 32:1702‐1706.
   Dieterle, F., Ross, A., Schlotterbeck, G., and Senn, H. 2006. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures: Application in 1H NMR metabonomics, Anal. Chem. 78:4281‐4290.
   Draghici, S., Khatri, P., Tarca, A.L., Amin, K., Done, A., Voichita, C., Georgescu, C., and Romero, R. 2007. A systems biology approach for pathway level analysis. Genome Res. 17:1537‐1545.
   Efron, B. and Tibshirani, R. 2007. On testing the significance of sets of genes. Ann. Appl. Stat. 1:107‐129.
   Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. 2001. Empirical Bayes analysis of a microarray experiment, J. Am. Stat. Assoc. 96:1151‐1160.
   Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868.
   Eisner, R., Stretch, C., Eastman, T., Xia, J., Hau, D., Damaraju, S., Greiner, R., Wishart, D.S., and Baracos, V.E. 2010. Learning to predict cancer‐associated skeletal muscle wasting from 1H‐NMR profiles of urinary metabolites. Metabolomics 3:207‐214.
   Fiehn, O. 2002. Metabolomics—the link between genotypes and phenotypes. Plant Mol. Biol. 48:155‐171.
   Frolkis, A., Knox, C., Lim, E., Jewison, T., Law, V., Hau, D.D., Liu, P., Gautam, B., Ly, S., Guo, A.C., Xia, J., Liang, Y., Shrivastava, S., and Wishart, D.S. 2010. SMPDB: The Small Molecule Pathway Database. Nucleic Acids Res. 38:D480‐D487.
   Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y., and Zhang, J. 2004. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 5:R80.
   Goeman, J.J., van de Geer, S.A., de Kort, F., and van Houwelingen, H.C. 2004. A global test for groups of genes: Testing association with a clinical outcome. Bioinformatics 20:93‐99.
   Goffard, N. and Weiller, G. 2007. PathExpress: A web‐based tool to identify relevant pathways in gene expression data. Nucleic Acids Res. 35:W176‐W181.
   Hackstadt, A.J. and Hess, A.M. 2009. Filtering for increased power for microarray data analysis. BMC Bioinformatics 10:11.
   Hendriks, M.M., Smit, S., Akkermans, W.L., Reijmers, T.H., Eilers, P.H., Hoefsloot, H.C., Rubingh, C.M., de Koster, C.G., Aerts, J.M., and Smilde, A.K. 2007. How to distinguish healthy from diseased? Classification strategy for mass spectrometry‐based clinical proteomics. Proteomics 7:3672‐3680.
   Hummel, M., Meister, R., and Mansmann, U. 2008. GlobalANCOVA: Exploration and assessment of gene group effects. Bioinformatics 24:78‐85.
   Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y. 2008. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36:D480‐D484.
   Kastenmuller, G., Romisch‐Margl, W., Wagele, B., Altmaier, E., and Suhre, K. 2010. metaP‐server: A web‐based metabolomics data analysis tool. J. Biomed. Biotechnol. 2011.
   Katajamaa, M., Miettinen, J., and Oresic, M. 2006. MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 22:634‐636.
   Lommen, A. 2009. MetAlign: Interface‐driven, versatile metabolomics tool for hyphenated full‐scan mass spectrometry data preprocessing. Anal. Chem. 81:3079‐3086.
   Neuweger, H., Albaum, S.P., Dondrup, M., Persicke, M., Watt, T., Niehaus, K., Stoye, J., and Goesmann, A. 2008. MeltDB: A software platform for the analysis and integration of metabolomics experiment data.Bioinformatics 24:2726‐2732.
   Pavlidis, P. and Noble, W.S. 2001. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2:RESEARCH0042.
   Psihogios, N.G., Kalaitzidis, R.G., Dimou, S., Seferiadis, K.I., Siamopoulos, K.C., and Bairaktari, E.T. 2007. Evaluation of tubulointerstitial lesions' severity in patients with glomerulonephritides: An NMR‐Based metabonomic study. J. Proteome Res. 6:3760‐3770.
   Smith, C.A., Want, E.J., O'Maille, G., Abagyan, R., and Siuzdak, G. 2006. XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78:779‐787.
   Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. 2007. pcaMethods—a bioconductor package providing PCA methods for incomplete data. Bioinformatics 23:1164‐1167.
   Sturm, M., Bertsch, A., Gropl, C., Hildebrandt, A., Hussong, R., Lange, E., Pfeifer, N., Schulz‐Trieglaff, O., Zerck, A., Reinert, K., and Kohlbacher, O. 2008. OpenMS—an open‐source software framework for mass spectrometry. BMC Bioinformatics 9:163.
   Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., and Mesirov, J.P. 2005. Gene set enrichment analysis: A knowledge‐based approach for interpreting genome‐wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102:15545‐15550.
   Tusher, V.G., Tibshirani, R., and Chu, G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98:5116‐5121.
   van den Berg, R.A., Hoefsloot, H.C., Westerhuis, J.A., Smilde, A.K., and van der Werf, M.J. 2006. Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics 7:142.
   Westerhuis, C.A., Hoefsloot, C.J.H., Smit, S., Vis, J.D., Smilde, A.K., van Velzen, E.J.J., van Duijnhoven, J.P.M., and van Dorsten, F.A. 2007. Assessment of PLSDA cross validation. Metabolomics 4:81‐89.
   Wishart, D.S., 2008. Quantitative metabolomics using NMR. Trends Anal. Chem. 27:228‐237.
   Wishart, D.S., Tzur, D., Knox, C., Eisner, R., Guo, A.C., Young, N., Cheng, D., Jewell, K., Arndt, D., Sawhney, S., Fung, C., Nikolai, L., Lewis, M., Coutouly, M.A., Forsythe, I., Tang, P., Shrivastava, S., Jeroncic, K., Stothard, P., Amegbey, G., Block, D., Hau, D.D., Wagner, J., Miniaci, J., Clements, M., Gebremedhin, M., Guo, N., Zhang, Y., Duggan, G.E., MacInnis, G.D., Weljie, A.M., Dowlatabadi, R., Bamforth, F., Clive, D., Greiner, R., Li, L., Marrie, T., Sykes, B.D., Vogel, H.J., and Querengesser, L. 2007. HMDB: The human metabolome database. Nucleic Acids Res. 35:D521‐D526.
   Wishart, D.S., Knox, C., Guo, A.C., Eisner, R., Young, N., Gautam, B., Hau, D.D., Psychogios, N., Dong, E., Bouatra, S., Mandal, R., Sinelnikov, I., Xia, J., Jia, L., Cruz, J.A., Lim, E., Sobsey, C.A., Shrivastava, S., Huang, P., Liu, P., Fang, L., Peng, J., Fradette, R., Cheng, D., Tzur, D., Clements, M., Lewis, A., De Souza, A., Zuniga, A., Dawe, M., Xiong, Y., Clive, D., Greiner, R., Nazyrova, A., Shaykhutdinov, R., Li, L., Vogel, H.J., and Forsythe, I. 2009. HMDB: A knowledgebase for the human metabolome. Nucleic Acids Res. 37:D603‐D610.
   Xia, J. and Wishart, D.S. 2010a. MSEA: A web‐based tool to identify biologically meaningful patterns in quantitative metabolomics data. Nucleic Acids Res. 38:W71‐77. Epub May 10, 2010.
   Xia, J. and Wishart, D.S. 2010b. MetPA: A web‐based metabolomics tool for pathway analysis and visualization. Bioinformatics 26:2342‐2344.
   Xia, J., Psychogios, N., Young, N., and Wishart, D.S. 2009. MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Res. 37:W652‐W660.
PDF or HTML at Wiley Online Library