Genome‐Scale Sequencing to Identify Genes Involved in Mendelian Disorders

Thomas C. Markello1, David R. Adams1

1 Undiagnosed Diseases Program, National Institutes of Health, Bethesda, Maryland
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 6.13
DOI:  10.1002/0471142905.hg0613s79
Online Posting Date:  October, 2013
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The analysis of genome‐scale sequence data can be defined as the interrogation of a complete set of genetic instructions in a search for individual loci that produce or contribute to a pathological state. Bioinformatic analysis of sequence data requires sufficient discriminant power to find this needle in a haystack. Current approaches make choices about selectivity and specificity thresholds, and the quality, quantity, and completeness of the data in these analyses. There are many software tools available for individual, analytic component‐tasks, including commercial and open‐source options. Three major types of techniques have been included in most published exome projects to date: frequency/population genetic analysis, inheritance state consistency, and predictions of deleteriousness. The required infrastructure and use of each technique during analysis of genomic sequence data for clinical and research applications are discussed. Future developments will alter the strategies and sequence of using these tools and are also discussed. Curr. Protoc. Hum. Genet. 79:6.13.1‐6.13.19. © 2013 by John Wiley & Sons, Inc.

Keywords: exome; Mendelian inheritance; next generation sequencing; bioinformatics; clinical sequencing

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., and McVean, G.A. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491:56‐65.
  Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P., Kondrashov, A.S., and Sunyaev, S.R. 2010. A method and server for predicting damaging missense mutations. Nat. Methods 7:248‐249.
  Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A web‐based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89:19.10.11‐19.10.21.
  Cao, A., Galanello, R., Furbetta, M., Muroni, P.P., Garbato, L., Rosatelli, C., Scalas, M.T., Addis, M., Ruggeri, R., Maccioni, L., and Melis, M.A. 1978. Thalassaemia types and their incidence in Sardinia. J. Med. Genet. 15:443‐447.
  Chen, B., Gagnon, M., Shahangian, S., Anderson, N.L., Howerton, D.A., and Boone, D.J. 2009. Good Laboratory Practices for Molecular Genetic Testing for Heritable Diseases and Conditions. Division of Laboratory Systems, National Center for Preparedness, Detection, and Control of Infectious Diseases, Coordinating Center for Infectious Diseases, Atlanta, GA.
  Cooper, G.M., Stone, E.A., Asimenos, G., Green, E.D., Batzoglou, S., and Sidow, A. 2005. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15:901‐913.
  Davydov, E.V., Goode, D.L., Sirota, M., Cooper, G.M., Sidow, A., and Batzoglou, S. 2010. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6:e1001025.
  Edwards, A.W.F. 2000. Foundations of mathematical genetics, 2nd ed. Cambridge University Press, Cambridge, U.K.
  Eigen, M. and Winkler, R. 1981. Laws of the Game: How the Principles of Nature Govern Chance, 1st American ed. Knopf. Distributed by Random House, New York.
  Ewing, B. and Green, P. 1998. Base‐calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8:186‐194.
  Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base‐calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8:175‐185.
  Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Carvalho‐Silva, D., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., Gil, L., Gordon, L., Hendrix, M., Hourlier, T., Johnson, N., Kahari, A.K., Keefe, D., Keenan, S., Kinsella, R., Komorowska, M., Koscielny, G., Kulesha, E., Larsson, P., Longden, I., McLaren, W., Muffato, M., Overduin, B., Pignatelli, M., Pritchard, B., Riat, H.S., Ritchie, G.R., Ruffier, M., Schuster, M., Sobral, D., Tang, Y.A., Taylor, K., Trevanion, S., Vandrovcova, J., White, S., Wilson, M., Wilder, S.P., Aken, B.L., Birney, E., Cunningham, F., Dunham, I., Durbin, R., Fernandez‐Suarez, X.M., Harrow, J., Herrero, J., Hubbard, T.J., Parker, A., Proctor, G., Spudich, G., Vogel, J., Yates, A., Zadissa, A., and Searle, S.M. 2012. Ensembl 2012. Nucleic Acids Res. 40:D84‐D90.
  Fuentes Fajardo, K.V., Adams, D., NISC Comparative Sequencing Program, Mason, C.E., Sincan, M., Tifft, C., Toro, C., Boerkoel, C.F., Gahl, W., and Markello, T. 2012. Detecting false‐positive signals in exome sequencing. Hum. Mutat. 33:609‐613.
  Grantham, R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862‐864.
  Green, R.C., Berg, J.S., Grody, W.W., Kalia, S.S., Korf, B.R., Martin, C.L., McGuire, A., Nussbaum, R.L., O'Daniel, J.M., Ormond , K.E., Rehm, H.L., Watson, M.S.W., Williams, M.S., and Biesecker, L.G. 2013. ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing, Bethesda, MD.
  Hillman‐Jackson, J., Clements, D., Blankenberg, D., Taylor, J., Nekrutenko, A., and Galaxy Team. 2012. Using Galaxy to perform large‐scale interactive data analyses. Curr. Protoc. Bioinform. 38:10.5.1‐10.5.47.
  Hsu, F., Kent, W.J., Clawson, H., Kuhn, R.M., Diekhans, M., and Haussler, D. 2006. The UCSC known genes. Bioinformatics 22:1036‐1046.
  Johnston, J.J, Teer, J.K., Cherukuri, P.F., Hansen, N.F., Loftus, S.K., NISC, Chong, K., Mullikin, J.C., and Biesecker, L.C. 2010. Massively parallel sequencing of exons on the X chromosome identifies RBM10 as the gene that causes a syndromic form of cleft palate. Am. J. Hum. Genet. 86:743‐748.
  Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078‐2079.
  McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. 2010. The genome analysis toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Res. 20:1297‐1303.
  Muller, H.J. 1950. Our load of mutations. Am. J. Hum. Genet. 2:111‐176.
  Ng, P.C. and Henikoff, S. 2006. Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet. 7:61‐80.
  Ng, S.B., Bigham, A.W., Buckingham, K.J., Hannibal, M.C., McMillin, M.J., Gildersleeve, H.I., Beck, A.E., Tabor, H.K., Cooper, G.M., Mefford, H.C., Lee, C., Turner, E.H., Smith, J.D., Rieder, M.J., Yoshiura, K., Matsumoto, N., Ohta, T., Niikawa, N., Nickerson, D.A., Bamshad, M.J., and Shendure, J. 2010. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42:790‐793.
  Peters, B.A., Kermani, B.G., Sparks, A.B., Alferov, O., Hong, P., Alexeev, A., Jiang, Y., Dahl, F., Tang, Y.T., Haas, J., Robasky, K., Zaranek, A.W., Lee, J.H., Ball, M.P., Peterson, J.E., Perazich, H., Yeung, G., Liu, J., Chen, L., Kennemer, M.I., Pothuraju, K., Konvicka, K., Tsoupko‐Sitnikov, M., Pant, K.P., Ebert, J.C., Nilsen, G.B., Baccash, J., Halpern, A.L., Church, G.M., and Drmanac, R. 2012. Accurate whole‐genome sequencing and haplotyping from 10 to 20 human cells. Nature 487:190‐195.
  Pruitt, K.D., Harrow, J., Harte, R.A., Wallin, C., Diekhans, M., Maglott, D.R., Searle, S., Farrell, C.M., Loveland, J.E., Ruef, B.J., Hart, E., Suner, M.M., Landrum, M.J., Aken, B., Ayling, S., Baertsch, R., Fernandez‐Banet, J., Cherry, J.L., Curwen, V., Dicuccio, M., Kellis, M., Lee, J., Lin, M.F., Schuster, M., Shkeda, A., Amid, C., Brown, G., Dukhanina, O., Frankish, A., Hart, J., Maidak, B.L., Mudge, J., Murphy, M.R., Murphy, T., Rajan, J., Rajput, B., Riddick, L.D., Snow, C., Steward, C., Webb, D., Weber, J.A., Wilming, L., Wu, W., Birney, E., Haussler, D., Hubbard, T., Ostell, J., Durbin, R., and Lipman, D. 2009. The consensus coding sequence (CCDS) project: Identifying a common protein‐coding gene set for the human and mouse genomes. Genome Res. 19:1316‐1323.
  Pruitt, K.D., Tatusova, T., Brown, G.R., and Maglott, D.R. 2012. NCBI Reference Sequences (RefSeq): Current status, new features and genome annotation policy. Nucleic Acids Res. 40:D130‐D135.
  Rimmer, A., Mathieson, I., Lunter, G., and McVean, G. 2012. Platypus: An Integrated Variant Caller.
  Roach, J.C., Glusman, G., Smit, A.F., Huff, C.D., Hubley, R., Shannon, P.T., Rowen, L., Pant, K.P., Goodman, N., Bamshad, M., Shendure, J., Drmanac, R., Jorde, L.B., Hood, L., and Galas, D.J. 2010. Analysis of genetic inheritance in a family quartet by whole‐genome sequencing. Science 328:636‐639.
  Schwarz, J.M., Rodelsperger, C., Schuelke, M., and Seelow, D. 2010. MutationTaster evaluates disease‐causing potential of sequence alterations. Nat. Methods 7:575‐576.
  Silver, N. 2012. The signal and the noise: Why so many predictions fail–but some don't. Penguin Press, New York.
  Simpson, J.T. and Durbin, R. 2012. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22:549‐556.
  Smith, C.A.B. 1953. The detection of linkage in human genetics. J. R. Stat. Soc. B 15:153‐192.
  Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shiel, J.A., Thomas, N.S., Abeysinghe, S., Krawczak, M., and Cooper, D.N. 2003. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21:577‐581.
  Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shaw, K., and Cooper, D.N. 2012. The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr. Protoc. Bioinform. 39:1.13.1‐1.13.20.
  Teer, J.K., Green, E.D., Mullikin, J.C., and Biesecker, L.G. 2011.
  Teer, J.K., Green, E.D., Mullikin, J.C., and Biesecker, L.G. 2012. VarSifter: Visualizing and analyzing exome‐scale sequence variation data on a desktop computer. Bioinformatics 28:599‐600.
  Tennessen, J.A., Bigham, A.W., O'Connor, T.D., Fu, W., Kenny, E.E., Gravel, S., McGee, S., Do, R., Liu, X., Jun, G., Kang, H.M., Jordan, D., Leal, S.M., Gabriel, S., Rieder, M.J., Abecasis, G., Altshuler, D., Nickerson, D.A., Boerwinkle, E., Sunyaev, S., Bustamante, C.D., Bamshad, M.J., and Akey, J.M. 2012. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337:64‐69.
  Wang, K., Li, M., and Hakonarson, H. 2010. ANNOVAR: Functional annotation of genetic variants from high‐throughput sequencing data. Nucleic Acids Res. 38:e164.
  Wei, X., Walia, V., Lin, J.C., Teer, J.K., Prickett, T.D., Gartner, J., Davis, S., Stemke‐Hale, K., Davies, M.A., Gershenwald, J.E., Robinson, W., Robinson, S., Rosenberg, S.A., and Samuels, Y. 2011. Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nat. Genet. 43:442‐446.
  Yandell, M., Huff, C., Hu, H., Singleton, M., Moore, B., Xing, J., Jorde, L.B., and Reese, M.G. 2011. A probabilistic disease‐gene finder for personal genomes. Genome Res. 21:1529‐1542.
  Yang, Z. 1995. A space‐time process model for the evolution of DNA sequences. Genetics 139:993‐1005.
PDF or HTML at Wiley Online Library