Scoring Large‐Scale Affinity Purification Mass Spectrometry Datasets with MiST

Erik Verschueren1, John Von Dollen1, Peter Cimermancic1, Natali Gulbahce2, Andrej Sali1, Nevan J. Krogan1

1 California Institute for Quantitative Biomedical Sciences, San Francisco, California, 2 Department of Cellular & Molecular Pharmacology, University of California, San Francisco, San Francisco, California
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 8.19
DOI:  10.1002/0471250953.bi0819s49
Online Posting Date:  March, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


High‐throughput Affinity Purification Mass Spectrometry (AP‐MS) experiments can identify a large number of protein interactions, but only a fraction of these interactions are biologically relevant. Here, we describe a comprehensive computational strategy to process raw AP‐MS data, perform quality controls, and prioritize biologically relevant bait‐prey pairs in a set of replicated AP‐MS experiments with Mass spectrometry interaction STatistics (MiST). The MiST score is a linear combination of prey quantity (abundance), abundance invariability across repeated experiments (reproducibility), and prey uniqueness relative to other baits (specificity). We describe how to run the full MiST analysis pipeline in an R environment and discuss a number of configurable options that allow the lay user to convert any large‐scale AP‐MS data into an interpretable, biologically relevant protein‐protein interaction network. © 2015 by John Wiley & Sons, Inc.

Keywords: affinity purification mass spectrometry; protein interactions; scoring algorithms; interaction networks; proteomics

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Data Pre‐Processing
  • Basic Protocol 2: Quality Control
  • Basic Protocol 3: Calculating the MiST Score
  • Support Protocol 1: Installation of MiST
  • Guidelines for Understanding Results
  • Commentary
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Arifuzzaman, M., Maeda, M., Itoh, A., Nishikata, K., Takita, C., Saito, R., Ara, T., Nakahigashi, K., Huang, H.‐C., Hirai, A., Tsuzuki, K., Nakamura, S., Altaf‐Ul‐Amin, M., Oshima, T., Baba, T., Yamamoto, N., Kawamura, T., Ioka‐Nakamichi, T., Kitagawa, M., Tomita, M., Kanaya, S., Wada, C., and Mori, H. 2006. Large‐scale identification of protein‐protein interaction of Escherichia coli K‐12. Genome. Res. 16:686‐691.
  Blake, J.A. and Harris, M.A. 2008. The Gene ontology (GO) project: Structured vocabularies for molecular biology and their application to genome and expression analysis. Curr. Protoc. Bioinformatics 23:7.2:7.2.1‐7.2.9.
  Choi, H., Liu, G., Mellacheruvu, D., Tyers, M., Gingras, A.C., and Nesvizhskii, A.I. 2012. Analyzing protein‐protein interactions from affinity purification‐mass spectrometry data with SAINT. Curr. Protoc. Bioinformatics 39:8.15.1‐8.15.23.
  Choi, H., Larsen, B., Lin, Z.‐Y., Breitkreutz, A., Mellacheruvu, D., Fermin, D., Qin, Z.S., Tyers, M., Gingras, A.‐C., and Nesvizhskii, A.I. 2011. SAINT: Probabilistic scoring of affinity purification‐mass spectrometry data. Nat. Methods 8:70‐73.
  Clauser, K.R., Baker, P., and Burlingame, A.L. 1999. Role of accurate mass measurement (±10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal. Chem. 71:2871‐2882.
  Coggill, P., Finn, R.D., and Bateman, A. 2008. Identifying protein domains with the Pfam database. Curr. Protoc. Bioinformatics 23:2.5:2.5.1‐2.5.17.
  Cox, J. and Mann, M. 2008. MaxQuant enables high peptide identification rates, individualized p.p.b.‐range mass accuracies and proteome‐wide protein quantification. Nat. Biotechnol. 26:1367‐1372.
  Dennis, G., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C., and Lempicki, R.A. 2003. DAVID: Database for annotation, visualization, and integrated discovery. Genome Biol 4:P3.
  Eden, E., Navon, R., Steinfeld, I., Lipson, D., and Yakhini, Z. 2009. GOrilla: A tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10:48.
  Ewing, R.M., Chu, P., Elisma, F., Li, H., Taylor, P., Climie, S., McBroom‐Cerajewski, L., Robinson, M.D., O'Connor, L., Li, M., Taylor, R., Dharsee, M., Ho, Y., Heilbut, A., Moore, L., Zhang, S., Ornatsky, O., Bukhman, Y.V., Ethier, M., Sheng, Y., Vasilescu, J., Abu‐Farha, M., Lambert, J.P., Duewel, H.S., Stewart, I.I., Kuehl, B., Hogue, K., Colwill, K., Gladwish, K., Muskat, B., Kinach, R., Adams, S.L., Moran, M.F., Morin, G.B., Topaloglou, T., and Figeys, D. 2007. Large‐scale mapping of human protein‐protein interactions by mass spectrometry. Mol. Syst. Biol. 3:89.
  Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., and Punta, M. 2014. Pfam: The protein families database. Nucleic Acids Res. 42:D222‐D230.
  Gene, T. and Consortium, O. 2000. Gene ontology: Tool for the. Nat. Gen. 25:25‐29.
  Grigoriev, A. 2003. On the number of protein‐protein interactions in the yeast proteome. Nucleic Acids Res. 31:4157‐4161. Available at:
  Hughes, N.C., Wong, E.Y. K., Fan, J., and Bajaj, N. 2007. Determination of carryover and contamination for mass spectrometry‐based chromatographic assays. AAPS J. 9:E353‐E360.
  Jäger, S., Cimermancic, P., Gulbahce, N., Johnson, J.R., McGovern, K.E., Clarke, S.C., Shales, M., Mercenne, G., Pache, L., Li, K., Hernandez, H., Jang, G.M., Roth, S.L., Akiva, E., Marlett, J., Stephens, M., D'Orso, I., Fernandes, J., Fahey, M., Mahon, C., O'Donoghue, A.J., Todorovic, A., Morris, J.H., Maltby, D.A., Alber, T., Cagney, G., Bushman, F.D., Young, J.A., Chanda, S.K., Sundquist, W.I., Kortemme, T., Hernandez, R.D., and Craik, C.S., 2011. Global landscape of HIV–human protein complexes. Nature 481:365‐370.
  MacLean, B., Tomazela, D.M., Shulman, N., Chambers, M., Finney, G.L., Frewen, B., Kern, R., Tabb, D.L., Liebler, D.C., and MacCoss, M.J. 2010. Skyline: An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26:966‐968.
  Maere, S., Heymans, K., and Kuiper, M. 2005. BiNGO: A cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21:3448‐3449.
  Martin, A., Ochagavia, M.E., Rabasa, L.C., Miranda, J., Fernandez‐de‐Cossio, J., and Bringas, R. 2010. BisoGenet: A new tool for gene network building, visualization and analysis. BMC Bioinform. 11:91.
  Mellacheruvu, D., Wright, Z., Couzens, A.L., Lambert, J.‐P., St‐Denis, N.A., Li, T., Miteva, Y.V., Hauri, S., Sardiu, M.E., Low, T.Y., Halim V.A., Bagshaw, R.D., Hubner N.C., Al‐Hakim, A., Bouchard, A., Faubert, D., Fermin, D., Dunham, W.H., Goudreault, M., Lin, Z.Y., Badillo, B.G., Pawson, T., Durocher, D., Coulombe, B., Aebersold, R., Superti‐Furga, G., Colinge, J., Heck, A.J., Choi, H., Gstaiger, M., Mohammed, S., Cristea, I.M., Bennett, K.L., Washburn, M.P., Raught, B., Ewing, R.M., Gingras, A.C., and Nesvizhskii, A.I. 2013. The CRAPome: A contaminant repository for affinity purification‐mass spectrometry data. Nat. Methods 10:730‐736.
  Mering, C.V. 2003. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res. 31:258‐261. Available at: [Accessed March 10, 2011].
  Morris, J.H., Knudsen, G.M., Verschueren, E., Johnson, J.R., Cimermancic, P., Greninger, A.L., and Pico, A.R. 2014. Affinity purification‐mass spectrometry and network analysis to understand protein‐protein interactions. Nat. Protoc. 9:2539‐2554.
  Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M. 1999. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27:29‐34.
  Ruepp, A., Waegele, B., Lechner, M., Brauner, B., Dunger‐Kaltenbach, I., Fobo, G., Frishman, G., Montrone, C., and Mewes, H.W. 2009. CORUM: The comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Res. 38:D497‐D501.
  Shannon, P. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13:2498‐2504. Available at:
  Sowa, M.E., Bennett, E.J., Gygi, S.P., and Harper, J.W. 2009. Defining the human deubiquitinating enzyme interaction landscape. Cell 138:389‐403.
  Stark, C., Breitkreutz, B.‐J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. 2006. BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 34:D535‐D539.
  Su, G., Morris, J.H., Demchak, B., and Bader, G.D. 2014. Biological network exploration with cytoscape 3. Curr. Protoc. Bioinformatics 47:8.13:8.13.1‐8.13.24.
  Tanabe, M. and Kanehisa, M. 2012. Using the KEGG database resource. Curr. Protoc. Bioinformatics 38:1.12:1.12.1‐1.12.43.
  Vinayagam, A., Hu, Y., Kulkarni, M., Roesel, C., Sopko, R., Mohr, S.E., and Perrimon, N. 2013. Protein complex‐based analysis framework for high‐throughput data sets. Sci. Signal. 6:rs5‐rs5.
  Yu, X., Ivanic, J., Wallqvist, A., and Reifman, J. 2009. A novel scoring approach for protein co‐purification data reveals high interaction specificity. PLoS Comput. Biol. 5:e1000515.
Internet Resources
  The Github repository is the main online resource for this unit. We have opened the MiST repository to the public but currently it is only editable by approved collaborators. In the near future we will also update the webpage at to reflect the protocols described in this manuscript.
PDF or HTML at Wiley Online Library