Bisulfite Sequence Analyses Using CyVerse Discovery Environment: From Mapping to DMRs

Jawon Song1, Greg Zynda1, Samuel Beck2, Nathan M. Springer3, Matthew W. Vaughn1

1 Texas Advanced Computing Center, University of Texas at Austin, Austin, Texas, 2 Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, 3 Microbial and Plant Genomics Institute, Department of Plant Biology, University of Minnesota, Saint Paul, Minnesota
Publication Name:  Current Protocols in Plant Biology
Unit Number:   
DOI:  10.1002/cppb.20034
Online Posting Date:  September, 2016
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Epigenetic modification of DNA through methylation is known to be involved in multiple biological processes such as gene suppression. However, the exact mechanism of how DNA methylations play their part is yet unclear. In mammals, CpG islands (CGI) have been studied extensively for their involvement in cancer. Whereas in plants, despite the fact that there are not only CpG but also CHG and CHH contexts of methylation, an efficient and easy‐to‐use pipeline to decipher these phenomena is still to be developed. Both ZED‐align and BisuKit are user‐friendly apps deployed on CyVerse infrastructure where users can use their bisulfite sequence files to run multiple command line‐based packages with minimal intervention. © 2016 by John Wiley & Sons, Inc.

Keywords: bisulfite alignment; DNA methylation; differentially methylated regions; epigenetic modifications; low‐coverage sequencing data analysis; whole genome bisulfite sequencing

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Bisulfite Mapping: from Reads to Methylation ratios through Discovery Environment in CyVerse
  • Basic Protocol 2: Identification of Differentially Methylated regions using BisuKit Pipeline using Discovery Environment
  • Alternate Protocol 1: Generating Differentially Methylated regions using outputs of ZED‐align
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Bisulfite Mapping: from Reads to Methylation ratios through Discovery Environment in CyVerse

  Materials
  • Computer with Internet access
  • Input files:
  • Reference genome in FASTA format and in .fai format available for download from UCSC Genome Browser (http://hgdownload.cse.ucsc.edu), Ensembl (http://useast.ensembl.org/info/data/ftp/index.html), or directly from CyVerse Data Store accessible from Discovery Environment (/iplant/home/shared/iplantcollaborative/genomeservices/builds/1.0.0/24_77)
  • Bisulfite reads in FASTQ format (single‐end or paired‐end)

Basic Protocol 2: Identification of Differentially Methylated regions using BisuKit Pipeline using Discovery Environment

  Materials
  • Computer with Internet access
  • Input files:
  • Reference genome in FASTA format and in .fai format available for download from UCSC Genome Browser (http://hgdownload.cse.ucsc.edu), Ensembl (http://useast.ensembl.org/info/data/ftp/index.html), or directly from CyVerse Data Store accessible from Discovery Environment (/iplant/home/shared/iplantcollaborative/genomeservices/builds/1.0.0/24_77)
  • Context specific Bismark methylation extractor output for top (OT), bottom (OB), complementary top (CTOT), and complementary bottom (CTOB) strand (Fig.  )

Alternate Protocol 1: Generating Differentially Methylated regions using outputs of ZED‐align

  Materials
  • Computer with Internet access
  • Input files:
  • Genome file in FASTA format
  • $PREFIX_methratios.txt file generated using ZED‐align app (see protocol 1)
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

  Akalin, A., Kormaksson, M., Li, S., Garrett‐Bakelman, F.E., Figueroa, M.E., Melnick, A., and Mason, C.E. 2012. methylKit: A comprehensive R package for the analysis of genome‐wide DNA methylation profiles. Genome Biol. 13:R87. doi: 10.1186/gb‐2012‐13‐10‐r87.
  Beck, S., Lee, B.‐K., Rhee, C., Song, J., Woo, A.J., and Kim, J. 2014. CpG island‐mediated global gene regulatory modes in mouse embryonic stem cells. Nat. Commun. 5:5490. doi: 10.1038/ncomms6490.
  Bock, C. 2012. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 13:705‐719. doi: 10.1038/nrg3273.
  Cedar, H. and Bergman, Y. 2009. Linking DNA methylation and histone modification: Patterns and paradigms. Nat. Rev. Genet. 10:295‐304. doi: 10.1038/nrg2540.
  Deaton, A.M. and Bird, A. 2011. CpG islands and the regulation of transcription. Genes Devel. 25:1010‐1022. doi: 10.1101/gad.2037511.
  Esteller, M. 2007. Cancer epigenomics: DNA methylomes and histone‐modification maps. Nat. Rev. Genet. 8:286‐298. doi: 10.1038/nrg2005.
  Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base‐calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8:175‐185. doi:10.1101/gr.8.3.175.
  Fojtová, M., Kovarík, A., and Matyásek, R. 2001. Cytosine methylation of plastid genome in higher plants. Fact or artefact? Plant Sci. 160:585‐593. doi: 10.1016/S0168‐9452(00)00411‐8.
  Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., Muir, A., Merchant, N., Lowry, S., Mock, S., Helmke, M., Kubach, A., Narro, M., Hopkins, N., Micklos, D., Hilgert, U., Gonzales, M., Jordan, C., Skidmore, E., Dooley, R., Cazes, J., McLay, R., Lu, Z., Pasternak, S., Koesterke, L., Piel, W.H., Grene, R., Noutsos, C., Gendler, K., Feng, X., Tang, C., Lent, M., Kim, S.J., Kvilekval, K., Manjunath, B.S., Tannen, V., Stamatakis, A., Sanderson, M., Welch, S.M., Cranston, K.A., Soltis, P., Soltis, D., O'Meara, B., Ane, C., Brutnell, T., Kleibenstein, D.J., White, J.W., Leebens‐Mack, J., Donoghue, M.J., Spalding, E.P., Vision, T.J., Myers, C.R., Lowenthal, D., Enquist, B.J., Boyle, B., Akoglu, A., Andrews, G., Ram, S., Ware, D., Stein, L., and Stanzione, D. 2011. The iPlant collaborative: Cyberinfrastructure for plant biology. Front. Plant Sci. 2:34. doi: 10.3389/fpls.2011.00034.
  Harris, R.A., Wang, T., Coarfa, C., Nagarajan, R.P., Hong, C., Downey, S.L., Johnson, B.E., Fouse, S.D., Delaney, A., Zhao, Y., Olshen, A., Ballinger, T., Zhou, X., Forsberg, K.J., Gu, J., Echipare, L., O'Geen, H., Lister, R., Pelizzola, M., Xi, Y., Epstein, C.B., Bernstein, B.E., Hawkins, R.D., Ren, B., Chung, W.Y., Gu, H., Bock, C., Gnirke, A., Zhang, M.Q., Haussler, D., Ecker, J.R., Li, W., Farnham, P.J., Waterland, R.A., Meissner, A., Marra, M.A., Hirst, M., Milosavljevic, A., and Costello, J.F. 2010. Comparison of sequencing‐based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat. Biotechnol. 28:1097‐1105. doi: 10.1038/nbt.1682.
  Hebestreit, K., Dugas, M., and Klein, H.‐U. 2013. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 29:1647‐1653. doi: 10.1093/bioinformatics/btt263.
  Irizarry, R.A., Ladd‐Acosta, C., Carvalho, B., Wu, H., Brandenburg, S.A., Jeddeloh, J.A., Wen, B., and Feinberg, A.P. 2008. Comprehensive high‐throughput arrays for relative methylation (CHARM). Genome Res. 18:780‐790. doi: 10.1101/gr.7301508.
  Krueger, F. and Andrews, S.R. 2011. Bismark: A flexible aligner and methylation caller for Bisulfite‐Seq applications. Bioinformatics 27:1571‐1572. doi: 10.1093/bioinformatics/btr167.
  Li, Q., Song, J., West, P.T., Zynda, G., Eichten, S.R., Vaughn, M.W., and Springer, N.M. 2015. Examining the causes and consequences of context‐specific differential DNA methylation in maize. Plant Physiol. 168:1262‐1274. doi: 10.1104/pp.15.00052.
  Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R., and Subgroup, 1000 Genome Project Data Processing. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078‐2079. doi: 10.1093/bioinformatics/btp352.
  Li, S., Garrett‐Bakelman, F.E., Akalin, A., Zumbo, P., Levine, R., To, B.L., Lewis, I.D., Brown, A.L., D'Andrea, R.J., Melnick, A., and Mason, C.E. 2013. An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinformatics 14:1‐9. doi: 10.1186/1471‐2105‐14‐1.
  Park, Y., Figueroa, M.E., Rozek, L.S., and Sartor, M.A. 2014. MethylSig: A whole genome DNA methylation analysis pipeline. Bioinformatics 30:2414‐2422. doi: 10.1093/bioinformatics/btu339.
  Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J. P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24‐26. doi: 10.1038/nbt.1754.
  Skinner, M.E., Uzilov, A.V., Stein, L.D., Mungall, C.J., and Holmes, I.H. 2009. JBrowse: A next‐generation genome browser. Genome Res. 19:1630‐1638. doi: 10.1101/gr.094607.109.
  Suzuki, M.M. and Bird, A. 2008. DNA methylation landscapes: Provocative insights from epigenomics. Nat. Rev. Genet. 9:465‐476. doi: 10.1038/nrg2341.
  Wang, Z., Li, X., Jiang, Y., Shao, Q., Liu, Q., Chen, B., and Huang, D. 2015. swDMR: A sliding window approach to identify differentially methylated regions based on whole genome bisulfite sequencing. PloS One 10:e0132866. doi: 10.1371/journal.pone.0132866.
  Xi, Y. and Li, W. 2009. BSMAP: Whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10:232. doi: 10.1186/1471‐2105‐10‐232.
  Ziller, M.J., Hansen, K.D., Meissner, A., and Aryee, M.J. 2015. Coverage recommendations for methylation analysis by whole‐genome bisulfite sequencing. Nat. Methods 12:230‐232. doi: 10.1038/nmeth.3152.
Internet Resources
  http://www.cyverse.org
  CyVerse provides services aimed for enabling scientific discovery.
  http://de.iplantcollaborative.org/
  Discovery Environment, which is one of the services provided by CyVerse, allows users to both manage and analyze their scientific data using a user‐interface.
  http://www.iplantcollaborative.org/learning‐center/discovery‐environment/tour
  Basic tutorial page for using Discovery Environment.
  http://github.com/zyndagj/ZED‐bsmap‐align
  This repository provides source codes of software used in .
  http://github.com/wonaya/BisuKit
  This repository provides source codes of software used in .
  http://github.com/ShengLi/edmr
  This repository provides R scripts for eDMR.
  http://github.com/al2na/methylKit
  This repository provides R scripts for methylKit.
  http://rpy2.bitbucket.org
  This repository provides python libraries, rPy2.
  http://hgdownload.soe.ucsc.edu/downloads.html
  The UCSC Genome Database hosts genome sequence files as well as annotations.
  http://useast.ensembl.org/info/data/ftp/index.html
  Ensembl Genome Database hosts genome sequence files as well as many other data types.
  https://en.wikipedia.org/wiki/FASTQ_format#Quality
  This Web site provides information on FASTQ format and quality of data.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library