PepArML: A Meta‐Search Peptide Identification Platform for Tandem Mass Spectra

Nathan J. Edwards1

1 Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C.
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 13.23
DOI:  10.1002/0471250953.bi1323s44
Online Posting Date:  December, 2013
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

The PepArML meta‐search peptide identification platform for tandem mass spectra provides a unified search interface to seven search engines; a robust cluster, grid, and cloud computing scheduler for large‐scale searches; and an unsupervised, model‐free, machine‐learning‐based result combiner, which selects the best peptide identification for each spectrum, estimates false‐discovery rates, and outputs pepXML format identifications. The meta‐search platform supports Mascot; Tandem with native, k‐score and s‐score scoring; OMSSA; MyriMatch; and InsPecT with MS‐GF spectral probability scores—reformatting spectral data and constructing search configurations for each search engine on the fly. The combiner selects the best peptide identification for each spectrum based on search engine results and features that model enzymatic digestion, retention time, precursor isotope clusters, mass accuracy, and proteotypic peptide properties, requiring no prior knowledge of feature utility or weighting. The PepArML meta‐search peptide identification platform often identifies two to three times more spectra than individual search engines at 10% FDR. Curr. Protoc. Bioinform. 44:13.23.1‐13.23.23. © 2013 by John Wiley & Sons, Inc.

Keywords: proteomics; tandem mass spectra; machine learning; cloud computing

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Upload Tandem Mass Spectra
  • Alternate Protocol 1: Batch Upload of Many, Large, or Vendor‐Format Spectra Datafiles
  • Support Protocol 1: Registration and Login
  • Basic Protocol 2: Configure and Initiate the Search
  • Basic Protocol 3: Monitor and Manage the Search Jobs
  • Alternate Protocol 2: Run Search Jobs in the Cloud
  • Basic Protocol 4: Combine Search Results using PepArML Combiner
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Upload Tandem Mass Spectra

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and log in to PepArML ( protocol 3Support Protocol). To follow the example analysis, download the example spectra datafile 17mix‐test2.mzXML.gz (see Table 13.23.1).

Alternate Protocol 1: Batch Upload of Many, Large, or Vendor‐Format Spectra Datafiles

  Necessary Resources
  • The PepArML batch uploader must be downloaded (see Table 13.23.1) from the Edwards lab and installed. If vendor‐format conversion and peak‐picking/peak‐detection/centroiding using the ProteoWizard tools (Kessner et al., 2008) is required, then the uploader must be run on Windows computers and may require instrument vendor software to be installed. Users must register for PepArML ( protocol 3Support Protocol). To follow the example analysis, download the example spectra datafile 17mix‐test2.mzXML.gz (see Table 13.23.1).

Support Protocol 1: Registration and Login

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. A valid e‐mail address is required for registration.

Basic Protocol 2: Configure and Initiate the Search

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2).

Basic Protocol 3: Monitor and Manage the Search Jobs

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2) and a peptide identification search configured and submitted ( protocol 4).

Alternate Protocol 2: Run Search Jobs in the Cloud

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register for PepArML ( protocol 3Support Protocol). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2) and a peptide identification analysis configured and submitted ( protocol 4). Users should have verified that the search jobs are being scheduled and are completing successfully ( protocol 5). Finally, users must have signed up for an EC2 capable account with Amazon Web Services at http://aws.amazon.com.

Basic Protocol 4: Combine Search Results using PepArML Combiner

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2) and a peptide identification search configured and submitted ( protocol 4). Finally, search jobs must have completed and the corresponding result files populated ( protocol 5 and, optionally, protocol 7 as described here).
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
  Breiman, L. 2001. Random forests. Mach. Learn. 45:5‐32.
  Craig, R. and Beavis, R.C. 2004. TANDEM: Matching proteins with tandem mass spectra. Bioinformatics 20:1466‐1467.
  Edwards, N., Wu, X., and Tseng, C.‐W., 2009. An unsupervised, Model‐Free, Machine‐Learning combiner for peptide identifications from tandem mass spectra. Clin. Proteomics 5 (1).
  Elias, J.E. and Gygi, S.P. 2007. Target‐decoy search strategy for increased confidence in large‐scale protein identifications by mass spectrometry. Nat. Methods 4:207‐214.
  Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W., and Bryant, S.H. 2004. Open mass spectrometry search algorithm. J. Proteome Res. 3:958‐964.
  Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74:5383‐5392.
  Kessner, D., Chambers, M., Burke, R., Agus, D., and Mallick, P. 2008. ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics 24:2534‐2536.
  Kim, S., Gupta, N., and Pevzner, P.A. 2008. Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. J. Proteome Res. 7:3354‐3363.
  MacLean, B., Eng, J.K., Beavis, R.C., and McIntosh, M. 2006. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 22:2830‐2832.
  Mallick, P., Schirle, M., Chen, S.S., Flory, M.R., Lee, H., Martin, D., Ranish, J., Raught, B., Schmitt, R., Werner, T., Kuster, B., and Aebersold, R. 2006. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25:125‐131.
  Nesvizhskii, A.I., Keller, A., Kolker, E., and Aebersold, R. 2003. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75:4646‐4658.
  Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J., amd Gygi, S.P. 2003. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC−MS/MS) for Large‐Scale protein analysis: The yeast proteome. J. Proteome Res. 2:43‐50.
  Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. 1999. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551‐3567.
  Tabb, D.L., Fernando, C.G., and Chambers, M.C. 2007. MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6:654‐661.
  Tanner, S., Shu, H., Frank, A., Wang, L.C., Zandi, E., Mumby, M., Pevzner, P.A., and Bafna, V. 2005. InsPecT: Identification of post translationally modified peptides from tandem mass spectra. Anal. Chem. 77:4626‐4639.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library