Identifying Proteomic LC‐MS/MS Data Sets with Bumbershoot and IDPicker

Jerry D. Holman1, Ze‐Qiang Ma1, David L. Tabb1

1 Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 13.17
DOI:  10.1002/0471250953.bi1317s37
Online Posting Date:  March, 2012
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The identification of peptides and proteins by LC‐MS/MS requires the use of bioinformatics. Tools developed in the Tabb Laboratory contribute significant flexibility and discrimination to this process. The Bumbershoot tools (MyriMatch, DirecTag, TagRecon, and Pepitome) enable the identification of peptides represented by MS/MS scans. All of these tools can work directly from instrument capture files of multiple vendors, such as Thermo RAW format, or from standard XML‐based formats, such as mzML or mzXML. Peptide identifications are written to mzIdentML or pepXML format. Protein assembly is handled by the IDPicker algorithm. Raw identifications are filtered to a confident set by use of the target‐decoy strategy. IDPicker arranges large sets of input files into a hierarchy for reporting, and the software applies a parsimony algorithm to report the smallest possible number of proteins to explain the observed peptides. This protocol details the use of these tools for new users. Curr. Protoc. Bioinform. 37:13.17.1‐13.17.15. © 2012 by John Wiley & Sons, Inc.

Keywords: shotgun proteomics; protein database search; sequence tagging; protein assembly; proteome informatics; peptide‐spectrum matches

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: MyriMatch: Database Search for Peptide Identification
  • Alternate Protocol 1: TagRecon: Sequence Tagging for Peptide Identification with PTMs
  • Basic Protocol 2: IDPicker: Identification Filtering and Protein Assembly
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Dasari, S., Chambers, M.C., Slebos, R.J., Zimmerman, L.J., Ham, A.‐J.L., and Tabb, D.L. 2010. TagRecon: High‐throughput mutation identification through sequence tagging. J. Proteome Res. 9:1716‐1726.
   Dasari, S., Chambers, M.C., Codreanu, S.G., Liebler, D.C., Collins, B.C., Pennington, S.R., Gallagher, W.M. and Tabb, D.L. 2011a. Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem. Res. Toxicol. 24:204‐216.
   Dasari, S., Chambers, M.C., Martinez, M.A., Carpenter, K.L., Ham, A‐J.L., Vega‐Montoto, L.J., and Tabb, D.L. 2011b. Pepitome: Evaluating improved spectral library search for identification complementarity and quality assessment. J. Proteome Res. January 5, 2012. Epub ahead of print.
   Elias, J.E. and Gygi, S.P. 2010. Target‐decoy search strategy for mass spectrometry‐based proteomics. Methods Mol. Biol. 604:55‐71.
   Eng, J.K., McCormack, A.L., and Yates, J.R. III. 1994. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5:976‐989.
   Hunt, D.F., Yates, J.R. 3rd, Shabanowitz, J., Winston, S., and Hauer, C.R. 1986. Protein sequencing by tandem mass spectrometry. Proc. Natl. Acad. Sci. U.S.A. 83:6233‐6237.
   Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74:5383‐5392.
   Kessner, D., Chambers, M., Burke, R., Agus, D., and Mallick, P. 2008. ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics 24:2534‐2536.
   Lam, H., Deutsch, E.W., Eddes, J.S., Eng, J.K., King, N., Stein, S.E., and Aebersold, R. 2007. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7:655‐667.
   Ma, Z.‐Q., Dasari, S., Chambers, M.C., Litton, M.D., Sobecki, S.M., Zimmerman, L.J., Halvey, P.J., Schilling, B., Drake, P.M., Gibson, B.W., and Tabb, D.L. 2009. IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. J. Proteome Res. 8:3872‐3881.
   Mann, M. and Wilm, M. 1994. Error‐tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66:4390‐4399.
   Olsen, J.V., Ong, S.‐E., and Mann, M. 2004. Trypsin cleaves exclusively C‐terminal to arginine and lysine residues. Mol. Cell. Proteomics 3:608‐614.
   Paizs, B. and Suhai, S. 2005. Fragmentation pathways of protonated peptides. Mass Spectrom. Rev. 24:508‐548.
   Roepstorff, P. and Fohlman, J. 1984. Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed. Mass Spectrom. 11:601.
   Swaney, D.L., McAlister, G.C., Wirtala, M., Schwartz, J.C., Syka, J.E.P., and Coon, J.J. 2007. Supplemental activation method for high‐efficiency electron‐transfer dissociation of doubly protonated peptide precursors. Anal. Chem. 79:477‐485.
   Tabb, D.L., Saraf, A., and Yates, J.R. 2003. GutenTag: High‐throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 75:6415‐6421.
   Tabb, D.L., Friedman, D.B., and Ham, A.‐J.L. 2006. Verification of automated peptide identifications from proteomic tandem mass spectra. Nat. Protoc. 1:2213‐2222.
   Tabb, D.L., Fernando, C.G. and Chambers, M.C. 2007. MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6:654‐661.
   Tabb, D.L., Ma, Z.‐Q., Martin, D.B., Ham, A.‐J.L., and Chambers, M.C. 2008. DirecTag: Accurate sequence tags from peptide MS/MS through statistical scoring. J. Proteome Res. 7:3838‐3846.
   Tanner, S., Shu, H., Frank, A., Wang, L.‐C., Zandi, E., Mumby, M., Pevzner, P.A., and Bafna, V. 2005. InsPecT: Identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77:4626‐4639.
   Wysocki, V.H., Tsaprailis, G., Smith, L.L., and Breci, L.A. 2000. Mobile and localized protons: A framework for understanding peptide dissociation. J. Mass Spectrom. 35:1399‐1406.
   Zhang, B., Chambers, M.C., and Tabb, D.L. 2007. Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J. Proteome Res. 6:3549‐3557.
Internet Resources
  Matrix Science Data File Format page. Many file formats have been created to support peptide identification, and this Web site enumerates and diagrams some of the most common types.
  Tabb Laboratory Web page. The Bumbershoot and IDPicker tools described in this protocol may be acquired from the Tabb Laboratory Team City server, which is accessible from the Software page at this Web site.
  NIST Spectral Libraries. The National Institute of Standards and Technologies has amassed spectral libraries for a large variety of samples and instruments; these collections are available from their Web site.
PDF or HTML at Wiley Online Library