Using OrthoMCL to Assign Proteins to OrthoMCL‐DB Groups or to Cluster Proteomes Into New Ortholog Groups

Steve Fischer1, Brian P. Brunk2, Feng Chen3, Xin Gao2, Omar S. Harb2, John B. Iodice1, Dhanasekaran Shanmugam2, David S. Roos2, Christian J. Stoeckert1

1 Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, 2 Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, 3 Bayer Business and Technology Services, Bayer Healthcare Pharmaceuticals, Wayne, New Jersey
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 6.12
DOI:  10.1002/0471250953.bi0612s35
Online Posting Date:  September, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


OrthoMCL is an algorithm for grouping proteins into ortholog groups based on their sequence similarity. OrthoMCL‐DB is a public database that allows users to browse and view ortholog groups that were pre‐computed using the OrthoMCL algorithm. Version 4 of this database contained 116,536 ortholog groups clustered from 1,270,853 proteins obtained from 88 eukaryotic genomes, 16 archaean genomes, and 34 bacterial genomes. Future versions of OrthoMCL‐DB will include more proteomes as more genomes are sequenced. Here, we describe how you can group your proteins of interest into ortholog clusters using two different means provided by the OrthoMCL system. The OrthoMCL‐DB Web site has a tool for uploading and grouping a set of protein sequences, typically representing a proteome. This method maps the uploaded proteins to existing groups in OrthoMCL‐DB. Alternatively, if you have proteins from a set of genomes that need to be grouped, you can download, install, and run the stand‐alone OrthoMCL software. Curr. Protoc. Bioinform. 35:6.12.1‐6.12.19. © 2011 by John Wiley & Sons, Inc.

Keywords: OrthoMCL; ortholog groups; paralog; proteome; Markov clustering; reciprocal best hits; MCL

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: Assign a Proteome to OrthoMCL‐DB Groups
  • Basic Protocol 2: Create Ortholog Groups from Your Proteomes Using the OrthoMCL Software
  • Support Protocol 1: Downloading, Installing, and Configuring the OrthoMCL Programs
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths‐Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C., and Eddy, S.R. 2004. The Pfam protein families database. Nucleic Acids Res. 32:D138‐D141.
   Bell, K.S., Sebaihia, M., Pritchard, L., Holden, M.T., Hyman, L.J., Holeva, M.C., Thomson, N.R., Bentley, S.D., Churcher, L.J., Mungall, K., Atkin, R., Bason, N., Brooks, K., Chillingworth, T., Clark, K., Doggett, J., Fraser, A., Hance, Z., Hauser, H., Jagels, K., Moule, S., Norbertczak, H., Ormond, D., Price, C., Quail, M.A., Sanders, M., Walker, D., Whitehead, S., Salmond, G.P., Birch, P.R., Parkhill, J., and Toth, I.K. 2004. Genome sequence of the enterobacterial phytopathogen Erwinia carotovora subsp. atroseptica and characterization of virulence factors. Proc. Natl. Acad. Sci. U.S.A. 101:11105‐11110.
   Chen, F., Mackey, A.J., Stoeckert, C.J. Jr., and Roos, D.S. 2006. OrthoMCL‐DB: Querying a comprehensive multi‐species collection of ortholog groups. Nucleic Acids Res. 34:D363‐D368.
   Chen, F., Mackey, A.J., Vermunt, J.K., and Roos, D.S. 2007. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One. 2:e383.
   Enright, A.J., Van Dongen, S., and Ouzounis, C.A. 2002. An efficient algorithm for large‐scale detection of protein families. Nucleic Acids Res. 30:1575‐1584.
   The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25:25‐29.
   Li, L., Stoeckert, C.J. Jr., and Roos, D.S. 2003. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178‐2189.
   Webb, E., and International Union of Biochemistry and Molecular Biology. Enzyme nomenclature 1992: Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes. 1984th ed. Academic Press, New York.
Key References
   Li et al., 2003. See above.
  The original paper describing the OrthoMCL algorithm.
  Chen et al., 2006. See above.
  A paper describing the OrthoMCL‐DB.
  Chen et al., 2007. See above.
  A paper comparing OrthoMCL to other approaches.
Internet Resources
  The OrthoMCL‐Db site
  Submit a set of proteins to find Pfam domains
  Submit a set of proteins for multiple sequence alignment
  Download software to visualize groups using Biolayout.
PDF or HTML at Wiley Online Library