Overview of Tandem Mass Spectrometry (MS/MS) Database Search Algorithms

Eugene Kapp1, Frédéric Schütz2

1 Ludwig Institute for Cancer Research, Melbourne, Australia, 2 Swiss Institute of Bioinformatics, Lausanne, Switzerland
Publication Name:  Current Protocols in Protein Science
Unit Number:  Unit 25.2
DOI:  10.1002/0471140864.ps2502s49
Online Posting Date:  August, 2007
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Mass spectrometry–based methods for the identification of proteins are fundamental platform technologies for proteomics. One comprehensive approach is to subject trypsinized peptides to tandem mass spectrometry (MS/MS) to obtain detailed structural information. Different strategies are available for interpreting MS/MS data and hence deducing the amino acid sequence of the peptides. The most common method is to use a search algorithm to identify peptides by correlating experimental and theoretical MS/MS data (the latter being generated from possible peptides in the protein sequence database). Identified peptides are collated and protein entries from the sequence database inferred. This unit focuses on the most widely used tandem MS peptide identification search algorithms (commercial and open source), their availability, ease of use, strengths, speed and scoring, as well as their relative sensitivity and specificity. Curr. Protoc. Protein Sci. 49:25.2.1‐25.2.19. © 2007 by John Wiley & Sons, Inc.

Keywords: SEQUEST; Mascot; X!Tandem; OMSSA; PLGS; Sorcerer; ProteinPilot; Phenyx; SpectrumMill

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Sequest
  • Spectrum Mill
  • X!Tandem
  • Mascot
  • Proteinlynx Global Server
  • Phenyx
  • Omssa
  • Peaks (Spider)
  • Proteinpilot
  • Sequest Sorcerer
  • Acknowledgements
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Biemann, K., Cone, C., Webster, B.R., and Arsenault, G.P. 1966. Determination of the amino acid sequence in oligopeptides by computer interpretation of their high‐resolution mass spectra. J. Am. Chem. Soc. 88:5598‐5606.
   Cargile, B.J., Bundy, J.L., and Stephenson, J.L., Jr. 2004. Potential for false positive identifications from large databases through tandem mass spectrometry. J. Proteome Res. 3:1082‐1085.
   Carr, S., Aebersold, R., Baldwin, M., Burlingame, A., Clauser, K., and Nesvizhskii, A. 2004. The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol. Cell Proteomics 3:531‐533.
   Chelius, D., Wu, S.L., and Bondarenko, P.V. 2002. Identification of N‐linked oligosaccharides of rat insulin‐like growth factor binding protein‐4. Growth Horm. IGF Res. 12:169‐177.
   Chiang, D. 2006. Ten things you absolutely need to know about proteomics analysis for mass spectrometry. Sage‐N Research, Inc.
   Colinge, J., Masselot, A., Giron, M., Dessingy, T., and Magnin, J. 2003. OLAV: Towards high‐throughput tandem mass spectrometry data identification. Proteomics 3:1454‐1463.
   Colinge, J., Masselot, A., Cusin, I., Mahe, E., Niknejad, A., Argoud‐Puy, G., Reffas, S., Bederr, N., Gleizes, A., Rey, P.A., and Bougueleret, L. 2004. High‐performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics. Proteomics 4:1977‐19784.
   Craig, R. and Beavis, R.C. 2003. A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 17:2310‐2316.
   Craig, R. and Beavis, R.C. 2004. TANDEM: Matching proteins with tandem mass spectra. Bioinformatics 20:1466‐1467.
   Desiere, F., Deutsch, E.W., King, N.L., Nesvizhskii, A.I., Mallick, P., Eng, J., Chen, S., Eddes, J., Loevenich, S.N., and Aebersold, R. 2006. The PeptideAtlas project. Nucleic Acids Res. 34:D655‐D658.
   Dongre, A.R., Jones, J.L., Somogyi, Á.A., and Wysocki, V.H. 1996. Influence of peptide composition, gas‐phase basicity, and chemical modification on fragmentation efficiency: Evidence for the mobile proton model. J. Am. Chem. Soc. 118:8365‐8374.
   Duncan, D.T., Craig, R., and Link, A.J. 2005. Parallel tandem: A program for parallel processing of tandem mass spectra using PVM or MPI and X!Tandem. J. Proteome Res. 4:1842‐1847.
   Edwards, N. and Lippert, R. 2004. Sequence database compression for peptide identification from tandem mass spectra. proc. 4th workshop on algorithms in bioinformatics WABI. Bergen, Norway, Springer‐Verlag.
   Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P., and Gygi, S.P. 2004. Intensity‐based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22:214‐219.
   Eng, J.K., McCormack, A.L., and III, J.R.Y. 1994. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5:976‐989.
   Fenyo, D. and Beavis, R.C. 2003. A method for assessing the statistical significance of mass spectrometry‐based protein identifications using general scoring schemes. Anal. Chem. 75:768‐774.
   Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W., and Bryant, S.H. 2004. Open mass spectrometry search algorithm. J. Proteome Res. 3:958‐964.
   Gibson, B.W. and Biemann, K. 1984. Strategy for the mass spectrometric verification and correction of the primary structures of proteins deduced from their DNA sequences. Proc. Natl. Acad. Sci. U.S.A. 81:1956‐1960.
   Griffin, P.R., MacCoss, M.J., Eng, J.K., Blevins, R.A., Aaronson, J.S., and Yates, J.R., 3rd 1995. Direct database searching with MALDI‐PSD spectra of peptides. Rapid Commun. Mass Spectrom. 9:1546‐1551.
   Guo, T., Rudnick, P.A., Wang, W., Lee, C.S., Devoe, D.L., and Balgley, B.M. 2006. Characterization of the human salivary proteome by capillary isoelectric focusing/nanoreversed‐phase liquid chromatography coupled with ESI‐tandem MS. J. Proteome Res. 5:1469‐1478.
   Heller, M., Ye, M., Michel, P.E., Morier, P., Stalder, D., Junger, M.A., Aebersold, R., Reymond, F., and Rossier, J.S. 2005. Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides. J. Proteome Res. 4:2273‐2282.
   Kapp, E.A., Schutz, F., Reid, G.E., Eddes, J.S., Moritz, R.L., O'Hair, R.A., Speed, T.P., and Simpson, R.J. 2003. Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation. Anal. Chem. 75:6251‐6264.
   Kapp, E.A., Schutz, F., Connolly, L.M., Chakel, J.A., Meza, J.E., Miller, C.A., Fenyo, D., Eng, J.K., Adkins, J.N., Omenn, G.S., and Simpson, R.J. 2005. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5:3475‐3490.
   Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74:5383‐5392.
   Keller, A., Eng, J., Zhang, N., Li, X.J., and Aebersold, R. 2005. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1:2005‐0017.
   Kinter, M. and Sherman, N.E. 2000. Collisionally induced dissociation of protonated peptide ions and the interpretation of product ion spectra. In Protein Sequencing and Identification Using Tandem Mass Spectrometry. (M. Kinter and N.E. Sherman, eds.) Wiley‐Interscience, Inc.
   Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty‐Kirby, A., and Lajoie, G. 2003. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17:2337‐2342.
   Maclean, B., Eng, J.K., Beavis, R.C., and McIntosh, M. 2006. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 22:2830‐2832.
   Mann, M. and Wilm, M. 1994. Error‐tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66:4390‐4399.
   Nesvizhskii, A.I., Keller, A., Kolker, E., and Aebersold, R. 2003. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75:4646‐4658.
   Nielsen, M.L., Savitski, M.M., and Zubarev, R.A. 2005. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol. Cell Proteomics 4:835‐845.
  Pappin, D.J., Hojrup, P., and Bleasby, A.J. 1993. Rapid identification of proteins by peptide‐mass fingerprinting. Curr. Biol. 6:327‐332.
   Patel, A.A., Seymour, S.L., Shilov, I.V., Stanick, W.A., Hattan, S.J., Hunter, C.L., Tang, W.H, Parker, K., Schaeffer, D.A, and Purkayastha, B. 2005. Application of a novel tag‐based protein identification algorithm to serum. In 53rd ASMS Conference on Mass Spectrometry San Antonio, TX.
   Patel, A.A., Tang, W.H., Seymour, S.L., Shilov, I.V., and Schaeffer, D.A. 2006. Investigation of atypical peptides found via thorough database search. 54rd ASMS Conference on Mass Spectrometry, Seattle, WA.
   Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. 1999. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551‐3567.
   Rooney, F.R. 2006. Assessing the diversity of the immunopeptidome. 54rd ASMS Conference on Mass Spectrometry Seattle, WA.
   Sadygov, R.G. and Yates, J.R., 3rd 2003. A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75:3792‐3798.
   Seymour, S.L. 2005a. Methodology advances in ease of use in protein ID and expression analysis. In 7th International Symposium on Mass Spectrometry in the Health & Life Sciences San Francisco, CA.
   Seymour, S.L. 2005b. Pro Group: Criteria for Publication of Proteomic Data. MCP Workshop Paris, France.
   Seymour, S.L. 2006. Assembly of peptide MS/MS database search results to determine which proteins to report. In ABRF 2006 Integrating Science Tools, and Technologies with Systems Biology, Long Beach, CA.
   Seymour, S.L., Loboda, A., Tang, W.H., Nimkar, S., and Schaeffer, D.A 2004. A new protein identification software analysis tool to group proteins and assemble and view results. In 52nd ASMS Conference on Mass Spectrometry Nashville, TN.
   Seymour, S.L., Shilov, I.V., Patel, A.A., Loboda, A., Keating, S.P., Tang, W.H., and Schaeffer, D.A. 2006. A next generation search engine that substantially improves peptide identification by using sequence temperatures and feature probabilities. In 54rd ASMS Conference on Mass Spectrometry Seattle, WA.
   Shadforth, I., Xu, W., Crowther, D., and Bessant, C. 2006. GAPP: A fully automated software for the confident identification of human peptides from tandem mass spectra. J. Proteome Res.
   Simpson, R.J. 2003. Proteins and proteomics: A laboratory manual. Cold Spring Harbor Laboratory Press New York.
   Steen, H. and Mann, M. 2004. The ABC's (and XYZ's) of peptide sequencing. Nat. Rev. Mol. Cell. Biol. 9:699‐711.
   Tanner, S., Shu, H., Frank, A., Wang, L.C., Zandi, E., Mumby, M., Pevzner, P.A., and Bafna, V. 2005. InsPecT: Identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77:4626‐4639.
   Yates, J.R. 3rd, Eng, J.K., and McCormack, A.L., 1995a. Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67:3202‐3210.
   Yates, J.R., 3rd, Eng, J.K., McCormack, A.L., and Schieltz, D., 1995b. Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67:1426‐1436.
   Yates, J.R., Eng, J.K., Clauser, K.R., and Burlingame, A.L. 1996. Search of sequenced databases with uninterpreted high‐energy collision‐induced dissociation spectra of peptides. J. Am. Soc. Mass Spectrom. 7:1089‐1098.
PDF or HTML at Wiley Online Library