Interpreting de novo Variation in Human Disease Using denovolyzeR

James S. Ware1, Kaitlin E. Samocha2, Jason Homsy3, Mark J. Daly2

1 NIHR Cardiovascular Biomedical Research Unit at Royal Brompton Hospital and Imperial College London, London, 2 Analytical and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, 3 Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 7.25
DOI:  10.1002/0471142905.hg0725s87
Online Posting Date:  October, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Spontaneously arising (de novo) genetic variants are important in human disease, yet every individual carries many such variants, with a median of 1 de novo variant affecting the protein‐coding portion of the genome. A recently described mutational model provides a powerful framework for the robust statistical evaluation of such coding variants, enabling the interpretation of de novo variation in human disease. Here we describe a new open‐source software package, denovolyzeR, that implements this model and provides tools for the analysis of de novo coding sequence variants. © 2015 by John Wiley & Sons, Inc.

Keywords: de novo variant; exome sequencing

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Assessing the Genome‐Wide Burden of De Novo Variants
  • Basic Protocol 2: Assessing the Number of Genes with Multiple De Novo Variants
  • Basic Protocol 3: Assessing the Frequency of De Novo Variants in Individual Genes
  • Basic Protocol 4: Assessing a Pre‐Specified Gene Set
  • Support Protocol 1: Getting Help
  • Support Protocol 2: Viewing the Mutational Probability Tables
  • Support Protocol 3: Using an Alternative Mutational Probability Table
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


Basic Protocol 1: Assessing the Genome‐Wide Burden of De Novo Variants

  • A computer running the R software environment, available for Unix, Windows, and MacOS from http://www.r‐
  • The denovolyzeR package. The latest stable release can be installed directly from the Comprehensive R Archive Network (CRAN) from within R:
    • install.packages(“denovolyzeR”)
    • Other download and installation options, including for the latest development version, are described at
  • dplyr and reshape packages. These dependencies may be installed automatically when denovolyzeR is installed (depending on your installation route). Otherwise they can be installed by running:
    • install.packages(“dplyr”,“reshape”)
  • A table of de novo variants. The minimum input comprises two columns of data: gene names, and variant classes (functional consequence of each variant)
    • Example data is included in the denovolyzeR package, and will be used in this protocol. The dataset comprises a data.frame of de novo variants identified in 1078 individuals with autism (Samocha et al., ), named autismDeNovos
    • It is assumed that readers are able to import their own data into the R environment, using the read.table function or equivalent (in R, ?read.table will provide help).

Basic Protocol 2: Assessing the Number of Genes with Multiple De Novo Variants

  • See protocol 1

Basic Protocol 3: Assessing the Frequency of De Novo Variants in Individual Genes

  • See protocol 1

Basic Protocol 4: Assessing a Pre‐Specified Gene Set

  • See protocol 1

Support Protocol 1: Getting Help

  • An alternative probability table. Examples are available to download from
PDF or HTML at Wiley Online Library



Literature Cited

  Conrad, D.F., Keebler, J.E., DePristo, M.A., Lindsay, S.J., Zhang, Y., Casals, F., Idaghdour, Y., Hartl, C.L., Torroja, C., Garimella, K.V., Zilversmit, M., Cartwright, R., Rouleau, G.A., Daly, M., Stone, E.A., Hurles, M.E., and Awadalla, P. 2011. Variation in genome‐wide mutation rates within and between human families. Nat. Genet. 43:712‐714. doi: 10.1038/ng.862.
  Cunningham, F., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho‐Silva, D., Clapham, P., Coates, G., Fitzgerald, S., Gil, L., Giron, C.G., Gordon, L., Hourlier, T., Hunt, S.E., Janacek, S.H., Johnson, N., Juettemann, T., Kahari, A.K., Keenan, S., Martin, F.J., Maurel, T., McLaren, W., Murphy, D.N., Nag, R., Overduin, B., Parker, A., Patricio, M., Perry, E., Pignatelli, M., Riat, H.S., Sheppard, D., Taylor, K., Thormann, A., Vullo, A., Wilder, S.P., Zadissa, A., Aken, B.L., Birney, E., Harrow, J., Kinsella, R., Muffato, M., Ruffier, M., Searle, S.M., Spudich, G., Trevanion, S.J., Yates, A., Zerbino, D.R., and Flicek, P. 2015. Ensembl 2015. Nucleic Acids Res. 43:D662‐669. doi: 10.1093/nar/gku1010.
  Darnell, J.C., Van Driesche, S.J., Zhang, C., Hung, K.Y., Mele, A., Fraser, C.E., Stone, E.F., Chen, C., Fak, J.J., Chi, S.W., Licatalosi, D.D., Richter, J.D., and Darnell, R.B. 2011. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146:247‐261. doi: 10.1016/j.cell.2011.06.013.
  Durinck, S., Spellman, P.T., Birney, E., and Huber, W. 2009. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4:1184‐1191. doi: 10.1038/nprot.2009.97.
  Krawczak, M., Ball, E.V., and Cooper, D.N. 1998. Neighboring‐nucleotide effects on the rates of germ‐line single‐base‐pair substitution in human genes. Am. J. Hum. Genet. 63:474‐488. doi: 10.1086/301965.
  Kryukov, G.V., Pennacchio, L.A., and Sunyaev, S.R. 2007. Most rare missense alleles are deleterious in humans: Implications for complex disease and association studies. Am. J. Hum. Genet. 80:727‐739. doi: 10.1086/513473.
  Lynch, M. 2010. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. U.S.A. 107:961‐968. doi: 10.1073/pnas.0912629107.
  Neale, B.M., Kou, Y., Liu, L., Ma'ayan, A., Samocha, K.E., Sabo, A., Lin, C.F., Stevens, C., Wang, L.S., Makarov, V., Polak, P., Yoon, S., Maguire, J., Crawford, E.L., Campbell, N.G., Geller, E.T., Valladares, O., Schafer, C., Liu, H., Zhao, T., Cai, G., Lihm, J., Dannenfelser, R., Jabado, O., Peralta, Z., Nagaswamy, U., Muzny, D., Reid, J.G., Newsham, I., Wu, Y., Lewis, L., Han, Y., Voight, B.F., Lim, E., Rossin, E., Kirby, A., Flannick, J., Fromer, M., Shakir, K., Fennell, T., Garimella, K., Banks, E., Poplin, R., Gabriel, S., DePristo, M., Wimbish, J.R., Boone, B.E., Levy, S.E., Betancur, C., Sunyaev, S., Boerwinkle, E., Buxbaum, J.D., Cook, E.H., Jr., Devlin, B., Gibbs, R.A., Roeder, K., Schellenberg, G.D., Sutcliffe, J.S., and Daly, M.J. 2012. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485:242‐245. doi: 10.1038/nature11011.
  Ng, S.B., Bigham, A.W., Buckingham, K.J., Hannibal, M.C., McMillin, M.J., Gildersleeve, H.I., Beck, A.E., Tabor, H.K., Cooper, G.M., Mefford, H.C., Lee, C., Turner, E.H., Smith, J.D., Rieder, M.J., Yoshiura, K., Matsumoto, N., Ohta, T., Niikawa, N., Nickerson, D.A., Bamshad, M.J., and Shendure, J. 2010. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42:790‐793. doi: 10.1038/ng.646.
  R Core Team. 2015. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.
  Samocha, K.E., Robinson, E.B., Sanders, S.J., Stevens, C., Sabo, A., McGrath, L.M., Kosmicki, J.A., Rehnstrom, K., Mallick, S., Kirby, A., Wall, D.P., MacArthur, D.G., Gabriel, S.B., DePristo, M., Purcell, S.M., Palotie, A., Boerwinkle, E., Buxbaum, J.D., Cook, E.H., Jr., Gibbs, R.A., Schellenberg, G.D., Sutcliffe, J.S., Devlin, B., Roeder, K., Neale, B.M., and Daly, M.J. 2014. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46:944‐950. doi: 10.1038/ng.3050.
Key Reference
  Samocha et al., 2014. See above.
  This paper provides the first exposition of the analytical framework implemented in the denovolyzeR package, and demonstrates the application of this framework to the study of autism spectrum disorders and intellectual disability.
Internet Resources
  The home page of the R project. The R software environment can be downloaded from this site.
  The Comprehensive R Archive Network (CRAN) is a network of ftp and Web servers around the world that store identical, up‐to‐date, versions of code and documentation for R. The latest stable version of denovolyzeR should always be available on CRAN, and archived there in perpetuity.
  The home page for the denovolyzeR project, with further supporting material
  GitHub is a Web‐based Git repository hosting service. The latest development version of denovolyzeR is hosted here prior to release on CRAN.
PDF or HTML at Wiley Online Library