Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis

Jianguo Xia1, David S. Wishart2

1 Department of Microbiology and Immunology, McGill University, Montreal, Quebec, 2 National Research Council, National Institute for Nanotechnology (NINT), Edmonton, Alberta
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 14.10
DOI:  10.1002/cpbi.11
Online Posting Date:  September, 2016
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


MetaboAnalyst ( is a comprehensive Web application for metabolomic data analysis and interpretation. MetaboAnalyst handles most of the common metabolomic data types from most kinds of metabolomics platforms (MS and NMR) for most kinds of metabolomics experiments (targeted, untargeted, quantitative). In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst also supports a number of data analysis and data visualization tasks using a range of univariate, multivariate methods such as PCA (principal component analysis), PLS‐DA (partial least squares discriminant analysis), heatmap clustering and machine learning methods. MetaboAnalyst also offers a variety of tools for metabolomic data interpretation including MSEA (metabolite set enrichment analysis), MetPA (metabolite pathway analysis), and biomarker selection via ROC (receiver operating characteristic) curve analysis, as well as time series and power analysis. This unit provides an overview of the main functional modules and the general workflow of the latest version of MetaboAnalyst (MetaboAnalyst 3.0), followed by eight detailed protocols. © 2016 by John Wiley & Sons, Inc.

Keywords: batch effect correction; biomarker analysis; chemometrics; integrative pathway analysis; metabolomics; metabolic pathway analysis; metabolite set enrichment analysis; power analysis; ROC curve; sample size estimation; Web application

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Data Uploading, Processing, and Normalization
  • Basic Protocol 2: Identification of Significant Variables
  • Basic Protocol 3: Multivariate Exploratory Data Analysis
  • Basic Protocol 4: Functional Interpretation of Metabolomic Data
  • Basic Protocol 5: Biomarker Analysis Based on Receiver Operating Characteristic (ROC) Curves
  • Basic Protocol 6: Time‐Series and Two‐Factor Data Analysis
  • Basic Protocol 7: Sample Size Estimation and Power Analysis
  • Basic Protocol 8: Integrated Pathway Analysis & Batch Effect Correction
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Ametaj, B.N., Zebeli, Q., Saleem, F., Psychogios, N., Lewis, J.L., Dunn, S.M., Xia, J., and Wishart, D.S. 2010. Metabolomics reveals unhealthy alterations in rumen metabolism with increased proportion of cereal grain in the diet of dairy cows. Metabolomics 5:375‐386.
  Bijlsma, S., Bobeldijk, I., Verheij, E.R., Ramaker, R., Kochhar, S., Macdonald, I.A., van Ommen, B., and Smilde, A.K. 2006. Large‐scale human metabolomics studies: A strategy for data (pre‐) processing and validation. Anal. Chem. 78:567‐574. doi: 10.1021/ac051495j.
  Breiman, L. 2001. Random forests. Mach. Learn. 45:5‐32. doi: 10.1023/A:1010933404324.
  Carroll, A.J., Badger, M.R., and Harvey Millar, A. 2010. The MetabolomeExpress Project: Enabling web‐based processing, analysis and transparent dissemination of GC/MS metabolomics datasets. BMC Bioinform. 11:376. doi: 10.1186/1471‐2105‐11‐376.
  Craig, A., Cloarec, O., Holmes, E., Nicholson, J.K., and Lindon, J.C. 2006. Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78:2262‐2267. doi: 10.1021/ac0519312.
  Deming, S.N. 1986. Chemometrics: An overview. Clin. Chem. 32:1702‐1706.
  Dieterle, F., Ross, A., Schlotterbeck, G., and Senn, H. 2006. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal. Chem. 78:4281‐4290. doi: 10.1021/ac051632c.
  Draghici, S., Khatri, P., Tarca, A.L., Amin, K., Done, A., Voichita, C., Georgescu, C., and Romero, R. 2007. A systems biology approach for pathway level analysis. Genome Res. 17:1537‐1545. doi: 10.1101/gr.6202607.
  Efron, B. and Tibshirani, R. 2007. On testing the significance of sets of genes. Ann. Appl. Stat. 1:107‐129. doi: 10.1214/07‐AOAS101.
  Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. 2001. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96:1151‐1160. doi: 10.1198/016214501753382129.
  Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U. S. A. 95:14863‐14868. doi: 10.1073/pnas.95.25.14863.
  Eisner, R., Stretch, C., Eastman, T., Xia, J., Hau, D., Damaraju, S., Greiner, R., Wishart, D.S., and Baracos, V.E. 2010. Learning to predict cancer‐associated skeletal muscle wasting from 1H‐NMR profiles of urinary metabolites. Metabolomics 3:207‐214.
  Eisner, R., Stretch, C., Eastman, T., Xia, J.G., Hau, D., Damaraju, S., Greiner, R., Wishart, D.S., and Baracos, V.E. 2011. Learning to predict cancer‐associated skeletal muscle wasting from H‐1‐NMR profiles of urinary metabolites. Metabolomics 7:25‐34. doi: 10.1007/s11306‐010‐0232‐9.
  Fiehn, O. 2002. Metabolomics—the link between genotypes and phenotypes. Plant Mol. Biol. 48:155‐171. doi: 10.1023/A:1013713905833.
  Frolkis, A., Knox, C., Lim, E., Jewison, T., Law, V., Hau, D.D., Liu, P., Gautam, B., Ly, S., Guo, A.C., Xia, J., Liang, Y., Shrivastava, S., and Wishart, D.S. 2010. SMPDB: The Small Molecule Pathway Database. Nucleic. Acids Res. 38:D480‐487. doi: 10.1093/nar/gkp1002.
  Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y., and Zhang, J. 2004. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 5:R80. doi: 10.1186/gb‐2004‐5‐10‐r80.
  Goeman, J.J., van de Geer, S.A., de Kort, F., and van Houwelingen, H.C. 2004. A global test for groups of genes: Testing association with a clinical outcome. Bioinformatics 20:93‐99. doi: 10.1093/bioinformatics/btg382.
  Goffard, N. and Weiller, G. 2007. PathExpress: A web‐based tool to identify relevant pathways in gene expression data. Nucleic. Acids Res. 35:W176‐181. doi: 10.1093/nar/gkm261.
  Hackstadt, A.J. and Hess, A.M. 2009. Filtering for increased power for microarray data analysis. BMC Bioinformatics 10:11. doi: 10.1186/1471‐2105‐10‐11.
  Hendriks, M.M., Smit, S., Akkermans, W.L., Reijmers, T.H., Eilers, P.H., Hoefsloot, H.C., Rubingh, C.M., de Koster, C.G., Aerts, J.M., and Smilde, A.K. 2007. How to distinguish healthy from diseased? Classification strategy for mass spectrometry‐based clinical proteomics. Proteomics 7:3672‐3680. doi: 10.1002/pmic.200700046.
  Hummel, M., Meister, R., and Mansmann, U. 2008. GlobalANCOVA: Exploration and assessment of gene group effects. Bioinformatics 24:78‐85. doi: 10.1093/bioinformatics/btm531.
  Johnson, W.E., Li, C., and Rabinovic, A. 2007. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118‐127. doi: 10.1093/biostatistics/kxj037.
  Katajamaa, M., Miettinen, J., and Oresic, M. 2006. MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 22:634‐636. doi: 10.1093/bioinformatics/btk039.
  Kessler, N., Neuweger, H., Bonte, A., Langenkamper, G., Niehaus, K., Nattkemper, T.W., and Goesmann, A. 2013. MeltDB 2.0‐advances of the metabolomics software system. Bioinformatics 29:2452‐2459. doi: 10.1093/bioinformatics/btt414.
  Lommen, A. 2009. MetAlign: Interface‐driven, versatile metabolomics tool for hyphenated full‐scan mass spectrometry data preprocessing. Anal. Chem. 81:3079‐3086. doi: 10.1021/ac900036d.
  Meinicke, P., Lingner, T., Kaever, A., Feussner, K., Goebel, C., Feussner, I., Karlovsky, P., and Morgenstern, B. 2008. Metabolite‐based clustering and visualization of mass spectrometry data using one‐dimensional self‐organizing maps. Algorithm Mol. Biol. 3:9. doi: 10.1186/1748‐7188‐3‐9.
  Pavlidis, P. and Noble, W.S. 2001. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol. 2:RESEARCH0042. doi: 10.1186/gb‐2001‐2‐10‐research0042.
  Psihogios, N.G., Kalaitzidis, R.G., Dimou, S., Seferiadis, K.I., Siamopoulos, K.C., and Bairaktari, E.T. 2007. Evaluation of tubulointerstitial lesions' severity in patients with glomerulonephritides: An NMR‐Based metabonomic study. J. Proteome Res. 6:3760‐3770. doi: 10.1021/pr070172w.
  Smilde, A.K., Jansen, J.J., Hoefsloot, H.C., Lamers, R.J., van der Greef, J., and Timmerman, M.E. 2005. ANOVA‐simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics 21:3043‐3048. doi: 10.1093/bioinformatics/bti476.
  Smith, C.A., Want, E.J., O'Maille, G., Abagyan, R., and Siuzdak, G. 2006. XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78:779‐787. doi: 10.1021/ac051437y.
  Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. 2007. pcaMethods—a bioconductor package providing PCA methods for incomplete data. Bioinformatics 23:1164‐1167. doi: 10.1093/bioinformatics/btm069.
  Sturm, M., Bertsch, A., Gropl, C., Hildebrandt, A., Hussong, R., Lange, E., Pfeifer, N., Schulz‐Trieglaff, O., Zerck, A., Reinert, K., and Kohlbacher, O. 2008. OpenMS—an open‐source software framework for mass spectrometry. BMC Bioinformatics 9:163. doi: 10.1186/1471‐2105‐9‐163.
  Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., and Mesirov, J.P. 2005. Gene set enrichment analysis: A knowledge‐based approach for interpreting genome‐wide expression profiles. Proc. Natl. Acad Sci. U. S. A. 102:15545‐15550. doi: 10.1073/pnas.0506580102.
  Tai, Y.C. and Speed, T.P. 2006. A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Stat. 34:2387‐2412. doi: 10.1214/009053606000000759.
  Tautenhahn, R., Patti, G.J., Rinehart, D., and Siuzdak, G. 2012. XCMS Online: A web‐based platform to process untargeted metabolomic data. Anal. Chem. 84:5035‐5039. doi: 10.1021/ac300698c.
  Tsugawa, H., Cajka, T., Kind, T., Ma, Y., Higgins, B., Ikeda, K., Kanazawa, M., VanderGheynst, J., Fiehn, O., and Arita, M. 2015. MS‐DIAL: Data‐independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12:523‐526. doi: 10.1038/nmeth.3393.
  Tusher, V.G., Tibshirani, R., and Chu, G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad Sci. U. S. A. 98:5116‐5121. doi: 10.1073/pnas.091062498.
  van den Berg, R.A., Hoefsloot, H.C., Westerhuis, J.A., Smilde, A.K., and van der Werf, M.J. 2006. Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics 7:142. doi: 10.1186/1471‐2164‐7‐142.
  van Iterson, M., ‘t Hoen, P.A., Pedotti, P., Hooiveld, G.J., den Dunnen, J.T., van Ommen, G.J., Boer, J.M., and Menezes, R.X. 2009. Relative power and sample size analysis on gene expression profiling data. BMC Genomics 10:439. doi: 10.1186/1471‐2164‐10‐439.
  Westerhuis, C.A., Hoefsloot, C.J.H., Smit, S., Vis, J.D., Smilde, A.K., van Velzen, E.J.J., van Duijnhoven, J.P.M., and van Dorsten, F.A. 2007. Assessment of PLSDA cross validation. Metabolomics 4:81‐89. doi: 10.1007/s11306‐007‐0099‐6.
  Xia, J. and Wishart, D.S. 2010a. MetPA: A web‐based metabolomics tool for pathway analysis and visualization. Bioinformatics 26:2342‐2344. doi: 10.1093/bioinformatics/btq418.
  Xia, J. and Wishart, D.S. 2010b. MSEA: A web‐based tool to identify biologically meaningful patterns in quantitative metabolomic data. Nucleic. Acids Res. 38:W71‐77. doi: 10.1093/nar/gkq329.
  Xia, J., Sinelnikov, I.V., and Wishart, D.S. 2011. MetATT: A web‐based metabolomics tool for analyzing time‐series and two‐factor datasets. Bioinformatics 27:2455‐2456. doi: 10.1093/bioinformatics/btr392.
  Xia, J., Psychogios, N., Young, N., and Wishart, D.S. 2009. MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic. Acids Res. 37:W652‐660. doi: 10.1093/nar/gkp356.
  Xia, J., Broadhurst, D.I., Wilson, M., and Wishart, D.S. 2013. Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics 9:280‐299. doi: 10.1007/s11306‐012‐0482‐9.
  Xia, J., Sinelnikov, I.V., Han, B., and Wishart, D.S. 2015. MetaboAnalyst 3.0–making metabolomics more meaningful. Nucleic. Acids Res. 43:W251‐257. doi: 10.1093/nar/gkv380.
  Xia, J., Mandal, R., Sinelnikov, I.V., Broadhurst, D., and Wishart, D.S. 2012. MetaboAnalyst 2.0—a comprehensive server for metabolomic data analysis. Nucleic. Acids Res. 40:W127‐133. doi: 10.1093/nar/gks374.
PDF or HTML at Wiley Online Library