Identification of Mutations in Zebrafish Using Next‐Generation Sequencing

Katrin Henke1, Margot E. Bowen1, Matthew P. Harris1

1 Department of Genetics, Harvard Medical School, and Department of Orthopedics, Boston Children's Hospital, Boston, Massachusetts
Publication Name:  Current Protocols in Molecular Biology
Unit Number:  Unit 7.13
DOI:  10.1002/0471142727.mb0713s104
Online Posting Date:  October, 2013
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Whole‐genome sequencing (WGS) has been used in many invertebrate model organisms as an efficient tool for mapping and identification of mutations affecting particular morphological or physiological processes. However, the application of WGS in highly polymorphic, larger genomes of vertebrates has required new experimental and analytical approaches. As a consequence, a wealth of different analytical tools has been developed. As the generation and analysis of data stemming from WGS can be unwieldy and daunting to researchers not accustomed to many common bioinformatic analyses and Unix‐based computational tools, we focus on how to manage and analyze next‐generation sequencing datasets without an extensive computational infrastructure and in‐depth bioinformatic knowledge. Here we describe methods for the analysis of WGS for use in mapping and identification of mutations in the zebrafish. We stress key elements of the experimental design and the analytical approach that allow the use of this method across different sequencing platforms and in different model organisms with annotated genomes. Curr. Protoc. Mol. Biol. 104:7.13.1‐7.13.33. © 2013 by John Wiley & Sons, Inc.

Keywords: whole‐genome sequencing; WGS; mutation mapping; zebrafish; next‐generation sequencing

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning of Mapping Experiments
  • Basic Protocol 1: Preparation of the DNA Library for Next‐Generation Sequencing
  • Basic Protocol 2: Sequence Data Alignment and Variant Identification
  • Support Protocol 1: Software and Datasets Used for Data Analysis
  • Basic Protocol 3: Linkage Mapping Based on Homozygosity‐by‐Descent
  • Support Protocol 2: Verification of Linkage
  • Basic Protocol 4: Identification of Candidate Mutations
  • Support Protocol 3: Identifying Candidate Causative Mutations in Regions Covered by Only One Read
  • Basic Protocol 5: Identification of Small Insertions or Deletions within a Linked Interval as Candidate Mutations
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Preparation of the DNA Library for Next‐Generation Sequencing

  Materials
  • F2 generation of a genetic cross, sorted by phenotype into mutants and siblings (Fig. A)
  • Reagents for DNA extraction: e.g., DNeasy Blood & Tissue Kit (Qiagen, cat. no. 69504)
  • Optional: Kit for library preparation for next‐generation sequencing, e.g., TruSeq DNA Sample Preparation kit (Illumina, cat. no. CES FC‐121‐2001)
  • Spectrophotometer, e.g., NanoDrop (see APPENDICES & )
  • Additional reagents and equipment for DNA extraction (unit 2.1), quantitation of nucleic acids (APPENDICES & ), and library preparation for Illumina sequencing (Son and Taylor, )
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

  Afgan, E., Chapman, B., Jadan, M., Franke, V. and Taylor, J. 2012. Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy. Curr. Protoc. Bioinform. 38:11.9.1‐11.9.20.
  Arnold, C.N., Xia, Y., Lin, P., Ross, C., Schwander, M., Smart, N.G., Müller, U. and Beutler, B. 2011. Rapid identification of a disease allele in mouse through whole genome sequencing and bulk segregation analysis. Genetics 187:633‐641.
  Austin, R.S., Vidaurre, D., Stamatiou, G., Breit, R., Provart, N.J., Bonetta, D., Zhang, J., Fung, P., Gong, Y., Wang, P.W., McCourt, P., and Guttman, D.S. 2011. Next‐generation mapping of arabidopsis genes. Plant J. 67:715‐725.
  Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A web‐based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21.
  Bowen, M.E., Henke, K., Siegfried, K.R., Warman, M.L., and Harris, M.P. 2012. Efficient mapping and cloning of mutations in zebrafish by low‐coverage whole‐genome sequencing. Genetics 190:1017‐1024.
  Bradley, K.M., Elmore, J.B., Breyer, J.P., Yaspan, B.L., Jessen, J.R., Knapik, E.W., and Smith, J.R. 2007. A major zebrafish polymorphism resource for genetic mapping. Genome Biol. 8:R55.
  Coe, T.S., Hamilton, P.B., Griffiths, A.M., Hodgson, D.J., Wahab, M.A., and Tyler, C.R. 2009. Genetic variation in strains of zebrafish (Danio rerio) and the implications for ecotoxicology studies. Ecotoxicology 18:144‐150.
  Cuperus, J.T., Montgomery, T.A., Fahlgren, N., Burke, R.T., Townsend, T., Sullivan, C.M., and Carrington, J.C. 2010. Identification of MIR390a precursor processing–defective mutants in arabidopsis by direct genome sequencing. PNAS 107:466‐471.
  Doitsidou, M., Poole, R.J., Sarin, S., Bigelow, H., and Hobert, O. 2010. C. elegans mutant identification with a one‐step whole‐genome‐sequencing and SNP mapping strategy. PloS One 5:e15435.
  Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Carvalho‐Silva, D., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., Gil, L., Gordon, L., Hendrix, M., Hourlier, T., Johnson, N., Kähäri, A.K., Keefe, D., Keenan, S., Kinsella, R., Komorowska, M., Koscielny, G., Kulesha, E., Larsson, P., Longden, I., McLaren, W., Muffato, M., Overduin, B., Pignatelli, M., Pritchard, B., Riat, H.S., Ritchie, G.R., Ruffier, M., Schuster, M., Sobral, D., Tang, Y.A., Taylor, K., Trevanion, S., Vandrovcova, J., White, S., Wilson, M., Wilder, S.P., Aken, B.L., Birney, E., Cunningham, F., Dunham, I., Durbin, R., Fernández‐Suarez, X.M., Harrow, J., Herrero, J., Hubbard, T.J., Parker, A., Proctor, G., Spudich, G., Vogel, J., Yates, A., Zadissa, A., and Searle, S.M. 2012. Ensembl 2012. Nucleic Acids Res. 40:D84‐D90.
  Geisler, R., Rauch, G.J., Geiger‐Rudolph, S., Albrecht, A., van Bebber, F., Berger, A., Busch‐Nentwich, E., Dahm, R., Dekens, M.P., Dooley, C., Elli, A.F., Gehring, I., Geiger, H., Geisler, M., Glaser, S., Holley, S., Huber, M., Kerr, A., Kirn, A., Knirsch, M., Konantz, M., Küchler, A.M., Maderspacher, F., Neuhauss, S.C., Nicolson, T., Ober, E.A., Praeg, E., Ray, R., Rentzsch, B., Rick, J.M., Rief, E., Schauerte, H.E., Schepp, C.P., Schönberger, U., Schonthaler, H.B., Seiler, C., Sidi, S., Söllner, C., Wehner, A., Weiler, C., and Nüsslein‐Volhard, C. 2007. Large‐scale mapping of mutations affecting zebrafish development. BMC Genomics 8:11.
  Goecks, J., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:R86.
  Guryev, V., Koudijs, M.J., Berezikov, E., Johnson, S.L., Plasterk, R.H.A., van Eeden, F.J.M., and Cuppen, E. 2006. Genetic variation in the zebrafish. Genome Res. 16:491‐497.
  Hill, J.T., Demarest, B.L., Bisgrove, B.W., Gorsi, B., Su, Y‐C., and Yost, H.J. 2013. MMAPPR: Mutation Mapping Analysis Pipeline for Pooled RNA‐seq. Genome Res. 23:687‐697.
  Knapik, E.W., Goodman, A., Ekker, M., Chevrette, M., Delgado, J., Neuhauss, S., Shimoda, N., Driever, W., Fishman, M.C., and Jacob, H.J. 1998. A microsatellite genetic linkage map for zebrafish (Danio rerio). Nat. Genet. 18:338‐343.
  Leshchiner, I., Alexa, K., Kelsey, P., Adzhubei, I, Austin, C., Cooney, J., Anderson, H., King, M.J., Stottmann, R.W., Garnaas, M.K., Ha, S., Drummond, I.A., Paw, B.H., North, T.E., Beier, D.R., Goessling, W., and Sunyaev, SR. 2012. Mutation mapping and identification by whole genome sequencing. Genome Res. 22:1541‐1548
  Li, H. and Homer, N. 2010. A survey of sequence alignment algorithms for next‐generation sequencing. Brief. Bioinform. 11:473‐483.
  Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R.; 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map Format and SAMtools. Bioinformatics 25:2078‐2079.
  Liu, S., Yeh, C‐T., Tang, H.M., Nettleton, D., and Schnable, P.S. 2012. Gene mapping via bulked segregant RNA‐Seq (BSR‐Seq). PloS One 7:e36406.
  McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Res. 20:1297‐1303.
  Miller, A.C., Obholzer, N.D., Shah, A.N., Megason, S.G., and Moens, C.B. 2013. RNA‐seq based mapping and candidate identification of mutations from forward genetic screens. Genome Res. 23:679‐686.
  Noveroske, J.K., Weber, J.S., and Justice, M.J. 2000. The Mutagenic action of N‐ethyl‐N‐nitrosourea in the mouse. Mamm. Genome 11:478‐483.
  Nusslein‐Volhard, C. and Dahm, R. 2002. Zebrafish: A Practical Approach. 1st ed. Oxford University Press, New York.
  Obholzer, N., Swinburne, I.A., Schwab, E., Nechiporuk, A.V., Nicolson, T., and Megason, S.G. 2012. Rapid positional cloning of zebrafish mutations by linkage and homozygosity mapping using whole‐genome sequencing. Development 139:4280‐4290.
  Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M, Lander, E.S., Getz, G., and Mesirov, J.P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24‐26.
  Schneeberger, K., Ossowski, S., Lanz, C., Juul, T., Petersen, A.H., Nielsen, K.L., Jørgensen, J., Weigel, D., and Andersen, S.O. 2009. SHOREmap: Simultaneous mapping and mutation identification by deep sequencing. Nat. Methods 6:550‐551.
  Sobreira, N.L.M., Cirulli, E.T., Avramopoulos, D., Wohler, E., Oswald, G.L., Stevens, E.L., Ge, D., Shianna, K.V., Smith, J.P., Maia, J.M., Gumbs, C.E., Pevsner, J., Thomas, G., Valle, D., Hoover‐Fong, J.E., and Goldstein, D.B. 2010. Whole‐genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet. 6:e100991.
  Son, M.S. and Taylor, R.K. 2011. Preparing DNA libraries for multiplexed paired‐end deep sequencing for Illumina GA sequencers. Curr. Protoc. Microbiol. 20:1E.4.1‐1E.4.13.
  Stickney, H.L., Schmutz, J., Woods, I.G., Holtzer, C.C., Dickson, M.C., Kelly, P.D., Myers, R.M., and Talbot, W.S. 2002. Rapid mapping of zebrafish mutations with SNPs and oligonucleotide microarrays. Genome Res. 12:1929‐1934.
  Uchida, N., Sakamoto, T., Kurata, T., and Tasaka, M. 2011. Identification of EMS‐induced causal mutations in a non‐reference Arabidopsis thaliana accession by whole genome sequencing. Plant Cell Physiol. 52:716‐722.
  Voz, M.L., Coppieters, W., Manfroid, I., Baudhuin, A., Von Berg, V., Charlier, C., Meyer, D., Driever, W., Martial, J.A., and Peers, B. 2012. Fast homozygosity mapping and identification of a Zebrafish ENU‐induced mutation by whole‐genome sequencing. PLoS ONE 7:e34671.
  Wang, K., Li, M., and Hakonarson, H. 2010. ANNOVAR: Functional annotation of genetic variants from high‐throughput sequencing data. Nucleic Acids Res. 38:e164.
  Zuryn, S., Le Gras, S., Jamet, K., and Jarriault, S. 2010. A strategy for direct mapping and identification of mutations by whole‐genome sequencing. Genetics 186:427‐430.
Key Reference
  Bowen, et al., 2012. See above.
  The protocol described here is based on the technique developed by the authors of this paper. More detailed information about the limitations of the technique, for example minimal number of reads needed for mapping, as well as how many potential candidate mutations can be expected to be identified, can be found in this paper.
Internet Resource
  http://seqanswers.com/wiki/Software/list
  See for useful Internet links and resources. An extensive list of algorithms used in next‐generation sequence analysis software can be found at the URL above.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library