Metaproteomics: Extracting and Mining Proteome Information to Characterize Metabolic Activities in Microbial Communities

Paul E. Abraham1, Richard J. Giannone1, Weili Xiong2, Robert L. Hettich1

1 Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, 2 Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, Tennessee
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 13.26
DOI:  10.1002/0471250953.bi1326s46
Online Posting Date:  June, 2014
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Contemporary microbial ecology studies usually employ one or more “omics” approaches to investigate the structure and function of microbial communities. Among these, metaproteomics aims to characterize the metabolic activities of the microbial membership, providing a direct link between the genetic potential and functional metabolism. The successful deployment of metaproteomics research depends on the integration of high‐quality experimental and bioinformatic techniques for uncovering the metabolic activities of a microbial community in a way that is complementary to other “meta‐omic” approaches. The essential, quality‐defining informatics steps in metaproteomics investigations are: (1) construction of the metagenome, (2) functional annotation of predicted protein‐coding genes, (3) protein database searching, (4) protein inference, and (5) extraction of metabolic information. In this article, we provide an overview of current bioinformatic approaches and software implementations in metaproteome studies in order to highlight the key considerations needed for successful implementation of this powerful community‐biology tool. Curr. Protoc. Bioinform. 46:13.26.1‐13.26.14. © 2014 by John Wiley & Sons, Inc.

Keywords: metaproteomics; proteomics; mass spectrometry; shotgun proteomics; protein database search; protein inference; metagenomics

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Metagenomic Database Construction and Subsequent Effect on Metaproteome
  • Metaproteomics Workflow
  • Concluding Remarks
  • Acknowledgments
  • Disclaimer
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

  Abraham, P., Adams, R., Giannone, R.J., Kalluri, U., Ranjan, P., Erickson, B., Shah, M., Tuskan, G.A., and Hettich, R.L. 2012. Defining the boundaries and characterizing the landscape of functional genome expression in vascular tissues of Populus using shotgun proteomics. J. Proteome Res. 11:449‐460.
  Baker, B.J. and Banfield, J.F. 2003. Microbial communities in acid mine drainage. FEMS Microbiol. Ecol. 44:139‐152.
  Baker, B.J., Comolli, L.R., Dick, G.J., Hauser, L.J., Hyatt, D., Dill, B.D., Land, M.L., Verberkmoes, N.C., Hettich, R.L., and Banfield, J.F. 2010. Enigmatic, ultrasmall, uncultivated Archaea. Proc. Natl. Acad. Sci. U.S.A. 107:8806‐8811.
  Bern, M. and Goldberg, D. 2006. De novo analysis of peptide tandem mass spectra by spectral graph partitioning. J. Comput. Biol. 13:364‐378.
  Bresler, G., Bresler, M., and Tse, D. 2013. Optimal assembly for high throughput shotgun sequencing. BMC Bioinformatics 14:S18.
  Bruneel, O., Volant, A., Gallien, S., Chaumande, B., Casiot, C., Carapito, C., Bardil, A., Morin, G., Brown, G.E., Personne, C.J., Le Paslier, D., Schaeffer, C., Van Dorsselaer, A., Bertin, P.N., Elbaz‐Poulichet, F., and Arsene‐Ploetze, F. 2011. Characterization of the active bacterial community involved in natural attenuation processes in arsenic‐rich creek sediments. Microb. Ecol. 61:793‐810.
  Callister, S.J., Wilkins, M.J., Nicora, C.D., Williams, K.H., Banfield, J.F., VerBerkmoes, N.C., Hettich, R.L., N'Guessan, L., Mouser, P.J., Elifantz, H., Smith, R.D., Lovley, D.R., Lipton, M.S., and Long, P.E. 2010. Analysis of biostimulated microbial communities from two field experiments reveals temporal and spatial differences in proteome profiles. Environ. Sci. Technol. 44:8897‐8903.
  Caspi, R., Altman, T., Dreher, K., Fulcher, C.A., Subhraveti, P., Keseler, I.M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L.A., Ong, Q., Paley, S., Pujar, A., Shearer, A.G., Travers, M., Weerasinghe, D., Zhang, P., and Karp, P.D. 2012. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 40:D742‐D753.
  Craig, R. and Beavis, R.C. 2004. TANDEM: Matching proteins with tandem mass spectra. Bioinformatics 20:1466‐1467.
  Dasari, S., Chambers, M.C., Slebos, R.J., Zimmerman, L.J., Ham, A.J., and Tabb, D.L. 2010. TagRecon: High‐throughput mutation identification through sequence tagging. J. Proteome Res. 9:1716‐1726.
  Denef, V.J., VerBerkmoes, N.C., Shah, M.B., Abraham, P., Lefsrud, M., Hettich, R.L., and Banfield, J.F. 2009. Proteomics‐inferred genome typing (PIGT) demonstrates inter‐population recombination as a strategy for environmental adaptation. Environ. Microbiol. 11:313‐325.
  Edgar, R.C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460‐2461.
  Elias, J.E. and Gygi, S.P. 2007. Target‐decoy search strategy for increased confidence in large‐scale protein identifications by mass spectrometry. Nat. Methods 4:207‐214.
  Eng, J.K., McCormack, A.L., and Yates, J.R. 1994. An approach to correlate tandem mass‐spectral data of peptides with amino‐acid‐sequences in a protein database. J. Am. Soc. Mass Spectrom. 5:976‐989.
  Erickson, A.R., Cantarel, B.L., Lamendella, R., Darzi, Y., Mongodin, E.F., Pan, C.L., Shah, M., Halfvarson, J., Tysk, C., Henrissat, B., Raes, J., Verberkmoes, N.C., Fraser, C.M., Hettich, R.L., and Jansson, J.K. 2012. Integrated metagenomics/metaproteomics reveals human host‐microbiota signatures of Crohn's disease. PLoS One 7:e49138.
  Frank, A. and Pevzner, P. 2005. PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77:964‐973.
  Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W., and Bryant, S.H. 2004. Open mass spectrometry search algorithm. J. Proteome Res. 3:958‐964.
  Giannone, R.J., Huber, H., Karpinets, T., Heimerl, T., Kuper, U., Rachel, R., Keller, M., Hettich, R.L., and Podar, M. 2011. Proteomic characterization of cellular and molecular processes that enable the Nanoarchaeum equitans–Ignicoccus hospitalis relationship. PLoS One 6:e22942.
  Handelsman, J., Rondon, M.R., Brady, S.F., Clardy, J., and Goodman, R.M. 1998. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem. Biol. 5:R245‐R249.
  Herbst, F.A., Bahr, A., Duarte, M., Pieper, D.H., Richnow, H.H., von Bergen, M., Seifert, J., and Bombach, P. 2013. Elucidation of in situ polycyclic aromatic hydrocarbon degradation by functional metaproteomics (protein‐SIP). Proteomics 13:2910‐2920.
  Hettich, R.L., Pan, C.L., Chourey, K., and Giannone, R.J. 2013. Metaproteomics: Harnessing the power of high performance mass spectrometry to identify the suite of proteins that control metabolic activities in microbial communities. Anal. Chem. 85:4203‐4214.
  Jagtap, P., Goslinga, J., Kooren, J.A., McGowan, T., Wroblewski, M.S., Seymour, S.L., and Griffin, T.J. 2013. A two‐step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics 13:1352‐1357.
  Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32:D277‐D280.
  Karp, P.D., Ouzounis, C.A., Moore‐Kochlacs, C., Goldovsky, L., Kaipa, P., Ahren, D., Tsoka, S., Darzentas, N., Kunin, V., and Lopez‐Bigas, N. 2005. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33:6083‐6089.
  Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74:5383‐5392.
  Letunic, I., Yamada, T., Kanehisa, M., and Bork, P. 2008. iPath: Interactive exploration of biochemical pathways and networks. Trends Biochem. Sci. 33:101‐103.
  Lo, I., Denef, V.J., VerBerkmoes, N.C., Shah, M.B., Goltsman, D., DiBartolo, G., Tyson, G.W., Allen, E.E., Ram, R.J., Detter, J.C., Richardson, P., Thelen, M.P., Hettich, R.L., and Banfield, J.F. 2007. Strain‐resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature 446:537‐541.
  Lochner, A., Giannone, R.J., Keller, M., Antranikian, G., Graham, D.E., and Hettich, R.L. 2011. Label‐free quantitative proteomics for the extremely thermophilic bacterium Caldicellulosiruptor obsidiansis reveal distinct abundance patterns upon growth on cellobiose, crystalline cellulose, and switchgrass. J. Proteome Res. 10:5302‐5314.
  Luo, C.W., Tsementzi, D., Kyrpides, N.C., and Konstantinidis, K.T. 2012. Individual genome assembly from complex community short‐read metagenomic datasets. ISME J. 6:898‐901.
  Ma, Z.Q., Dasari, S., Chambers, M.C., Litton, M.D., Sobecki, S.M., Zimmerman, L.J., Halvey, P.J., Schilling, B., Drake, P.M., Gibson, B.W., and Tabb, D.L. 2009. IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. J. Proteome Res. 8:3872‐3881.
  Ma, Z.Q., Chambers, M.C., Ham, A.J.L., Cheek, K.L., Whitwell, C.W., Aerni, H.R., Schilling, B., Miller, A.W., Caprioli, R.M., and Tabb, D.L. 2011. ScanRanker: Quality assessment of tandem mass spectra via sequence tagging. J. Proteome Res. 10:2896‐2904.
  McHardy, A.C., Martin, H.G., Tsirigos, A., Hugenholtz, P., and Rigoutsos, I. 2007. Accurate phylogenetic classification of variable‐length DNA fragments. Nat. Methods 4:63‐72.
  Miller, J.R., Koren, S., and Sutton, G. 2010. Assembly algorithms for next‐generation sequencing data. Genomics 95:315‐327.
  Mo, L., Dutta, D., Wan, Y., and Chen, T. 2007. MSNovo: A dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. Anal. Chem. 79:4870‐4878.
  Morowitz, M.J., Denef, V.J., Costello, E.K., Thomas, B.C., Poroyko, V., Relman, D.A., and Banfield, J.F. 2011. Strain‐resolved community genomic analysis of gut microbial colonization in a premature infant. Proc. Natl. Acad. Sci. U.S.A. 108:1128‐1133.
  Morris, R.M., Nunn, B.L., Frazar, C., Goodlett, D.R., Ting, Y.S., and Rocap, G. 2010. Comparative metaproteomics reveals ocean‐scale shifts in microbial nutrient utilization and energy transduction. ISME J. 4:673‐685.
  Muller, J., Szklarczyk, D., Julien, P., Letunic, I., Roth, A., Kuhn, M., Powell, S., von Mering, C., Doerks, T., Jensen, L.J., and Bork, P. 2010. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non‐supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38:D190‐D195.
  Namiki, T., Hachiya, T., Tanaka, H., and Sakakibara, Y. 2012. MetaVelvet: An extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40:e155.
  Nesvizhskii, A.I. and Aebersold, R. 2005. Interpretation of shotgun proteomic data: The protein inference problem. Mol. Cell. Proteomics 4:1419‐1440.
  Noguchi, H., Taniguchi, T., and Itoh, T. 2008. MetaGeneAnnotator: Detecting species‐specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res. 15:387‐396.
  Park, C.Y., Klammer, A.A., Kall, L., MacCoss, M.J., and Noble, W.S. 2008. Rapid and accurate peptide identification from tandem mass spectra. J. Proteome Res. 7:3022‐3027.
  Peng, Y., Leung, H.C.M., Yiu, S.M., and Chin, F.Y.L. 2011. Meta‐IDBA: A de Novo assembler for metagenomic data. Bioinformatics 27:I94‐I101.
  Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. 1999. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551‐3567.
  Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E.L.L., Eddy, S.R., Bateman, A., and Finn, R.D. 2012. The Pfam protein families database. Nucleic Acids Res. 40:D290‐D301.
  Qin, J.J., Li, Y.R., Cai, Z.M., Li, S.H., Zhu, J.F., Zhang, F., Liang, S.S., Zhang, W.W., Guan, Y.L., Shen, D.Q., Peng, Y.Q., Zhang, D.Y., Jie, Z.Y., Wu, W.X., Qin, Y.W., Xue, W.B., Li, J.H., Han, L.C., Lu, D.H., Wu, P.X., Dai, Y.L., Sun, X.J., Li, Z.S., Tang, A.F., Zhong, S.L., Li, X.P., Chen, W.N., Xu, R., Wang, M.B., Feng, Q., Gong, M.H., Yu, J., Zhang, Y.Y., Zhang, M., Hansen, T., Sanchez, G., Raes, J., Falony, G., Okuda, S., Almeida, M., LeChatelier, E., Renault, P., Pons, N., Batto, J.M., Zhang, Z.X., Chen, H., Yang, R.F., Zheng, W.M., Li, S.G., Yang, H.M., Wang, J., Ehrlich, S.D., Nielsen, R., Pedersen, O., Kristiansen, K., and Wang, J. 2012. A metagenome‐wide association study of gut microbiota in type 2 diabetes. Nature 490:55‐60.
  Rappe, M.S. and Giovannoni, S.J. 2003. The uncultured microbial majority. Ann. Rev. Microbiol. 57:369‐394.
  Rho, M.N., Tang, H.X., and Ye, Y.Z. 2010. FragGeneScan: Predicting genes in short and error‐prone reads. Nucleic Acids Res. 38:e191.
  Rooijers, K., Kolmeder, C., Juste, C., Dore, J., de Been, M., Boeren, S., Galan, P., Beauvallet, C., de Vos, W.M., and Schaap, P.J. 2011. An iterative workflow for mining the human intestinal metaproteome. BMC Genomics 12:6.
  Sanger, F., Nicklen, S., and Coulson, A.R. 1977. DNA sequencing with chain‐terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74:5463‐5467.
  Selengut, J.D., Haft, D.H., Davidsen, T., Ganapathy, A., Gwinn‐Giglio, M., Nelson, W.C., Richter, A.R., and White, O. 2007. TIGRFAMs and genome properties: Tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35:D260‐D264.
  Sowell, S.M., Wilhelm, L.J., Norbeck, A.D., Lipton, M.S., Nicora, C.D., Barofsky, D.F., Carlson, C.A., Smith, R.D., and Giovanonni, S.J. 2009. Transport functions dominate the SAR11 metaproteome at low‐nutrient extremes in the Sargasso Sea. ISME J. 3:93‐105.
  Tabb, D.L., Fernando, C.G., and Chambers, M.C. 2007. MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6:654‐661.
  Tanner, S., Shu, H., Frank, A., Wang, L.C., Zandi, E., Mumby, M., Pevzner, P.A., and Bafna, V. 2005. InsPecT: Identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77:4626‐4639.
  Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., and Natale, D.A. 2003. The COG database: An updated version includes eukaryotes. BMC Bioinformatics 4:41.
  Treangen, T.J., Koren, S., Sommer, D.D., Liu, B., Astrovskaya, I., Ondov, B., Darling, A.E., Phillippy, A.M., and Pop, M. 2013. MetAMOS: A modular and open source metagenomic assembly and analysis pipeline. Genome Biology 14:R2.
  Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S., and Banfield, J.F. 2004. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37‐43.
  Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D.Y., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden‐Tillson, H., Pfannkoch, C., Rogers, Y.H., and Smith, H.O. 2004. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66‐74.
  Verberkmoes, N.C., Russell, A.L., Shah, M., Godzik, A., Rosenquist, M., Halfvarson, J., Lefsrud, M.G., Apajalahti, J., Tysk, C., Hettich, R.L., and Jansson, J.K. 2009. Shotgun metaproteomics of the human distal gut microbiota. ISME J. 3:179‐189.
  Wang, Y.F., Ahn, T.H., Li, Z., and Pan, C.L. 2013. Sipros/ProRata: A versatile informatics system for quantitative community proteomics. Bioinformatics 29:2064‐2065.
  Whitman, W.B., Coleman, D.C., and Wiebe, W.J. 1998. Prokaryotes: The unseen majority. Proc. Natl. Acad. Sci. U.S.A. 95:6578‐6583.
  Wilkins, M.R., Appel, R.D., Van Eyk, J.E., Chung, M.C.M., Gorg, A., Hecker, M., Huber, L.A., Langen, H., Link, A.J., Paik, Y.K., Patterson, S.D., Pennington, S.R., Rabilloud, T., Simpson, R.J., Weiss, W., and Dunn, M.J. 2006. Guidelines for the next 10 years of proteomics. Proteomics 6:4‐8.
  Yamada, T., Letunic, I., Okuda, S., Kanehisa, M., and Bork, P. 2011. iPath2.0: Interactive pathway explorer. Nucleic Acids Res. 39:W412‐W415.
  Yang, X., Dondeti, V., Dezube, R., Maynard, D.M., Geer, L.Y., Epstein, J., Chen, X., Markey, S.P., and Kowalak, J.A. 2004. DBParser: Web‐based software for shotgun proteomic data analyses. J. Proteome Res. 3:1002‐1008.
  Yatsunenko, T., Rey, F.E., Manary, M.J., Trehan, I., Dominguez‐Bello, M.G., Contreras, M., Magris, M., Hidalgo, G., Baldassano, R.N., Anokhin, A.P., Heath, A.C., Warner, B., Reeder, J., Kuczynski, J., Caporaso, J.G., Lozupone, C.A., Lauber, C., Clemente, J.C., Knights, D., Knight, R., and Gordon, J.I. 2012. Human gut microbiome viewed across age and geography. Nature 486:222‐227.
  Yok, N.G. and Rosen, G.L. 2011. Combining gene prediction methods to improve metagenomic gene annotation. BMC Bioinformatics 12:20.
  Zhang, B., Chambers, M.C., and Tabb, D.L. 2007. Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J. Proteome Res. 6:3549‐3557.
  Zhang, Y.Y., Fonslow, B.R., Shan, B., Baek, M.C., and Yates, J.R. 2013. Protein analysis by shotgun/bottom‐up proteomics. Chem. Rev. 113:2343‐2394.
  Zybailov, B., Mosley, A.L., Sardiu, M.E., Coleman, M.K., Florens, L., and Washburn, M.P. 2006. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J. Proteome Res. 5:2339‐2347.
PDF or HTML at Wiley Online Library