Inference of Episodic Changes in Natural Selection Acting on Protein Coding Sequences via CODEML

Joseph P. Bielawski1, Jennifer L. Baker2, Joseph Mingrone1

1 Department of Mathematics & Statistics, Dalhousie University, Halifax, Nova Scotia, 2 Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 6.15
DOI:  10.1002/cpbi.2
Online Posting Date:  June, 2016
This unit provides protocols for using the CODEML program from the PAML package to make inferences about episodic natural selection in protein‐coding sequences. The protocols cover inference tasks such as maximum likelihood estimation of selection intensity, testing the hypothesis of episodic positive selection, and identifying sites with a history of episodic evolution. We provide protocols for using the rich set of models implemented in CODEML to assess robustness, and for using bootstrapping to assess if the requirements for reliable statistical inference have been met. An example dataset is used to illustrate how the protocols are used with real protein‐coding sequences. The workflow of this design, through automation, is readily extendable to a larger‐scale evolutionary survey. © 2016 by John Wiley & Sons, Inc.

Keywords: codon model; natural selection; episodic evolution; maximum likelihood; dN/dS ratio; experimental design

Table of Contents

  • Introduction
  • Basic Protocol 1: Maximum Likelihood Estimation of Episodic Selection Intensity
  • Basic Protocol 2: Using the Bootstrap to Assess if the Requirements For Inference Have Been Met
  • Basic Protocol 3: Testing the Hypothesis of Episodic Evolution and Making Site‐Specific Inferences
  • Support Protocol 1: Obtain and Install Paml
  • Support Protocol 2: Obtain and Install CODEML_SBA FOR UNIX/UNIX‐LIKE and OS X Systems
  • Support Protocol 3: Labeling the Foreground Branch of a Newick Tree
  • Support Protocol 4: Assess Robustness of Results to Alternative Models for Codon Frequencies
  • Support Protocol 5: Smoothed Bootstrap Aggregation for Identifying Sites with a History of Postive Selection
  • Guidelines for Undertanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
