BEDTools: The Swiss‐Army Tool for Genome Feature Analysis

Aaron R. Quinlan1

1 Department of Public Health Sciences, Center for Public Health Genomics, Department of Biochemistry and Molecular Genetics, and Department of Computer Science. University of Virginia, Charlottesville, Virginia
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 11.12
DOI:  10.1002/0471250953.bi1112s47
Online Posting Date:  September, 2014
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Technological advances have enabled the use of DNA sequencing as a flexible tool to characterize genetic variation and to measure the activity of diverse cellular phenomena such as gene isoform expression and transcription factor binding. Extracting biological insight from the experiments enabled by these advances demands the analysis of large, multi‐dimensional datasets. This unit describes the use of the BEDTools toolkit for the exploration of high‐throughput genomics datasets. Several protocols are presented for common genomic analyses, demonstrating how simple BEDTools operations may be combined to create bespoke pipelines addressing complex questions. Curr. Protoc. Bioinform. 47:11.12.1‐11.12.34. © 2014 by John Wiley & Sons, Inc.

Keywords: genomics; bioinformatics; genome analysis; genome intervals; genome features

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Installing and Preparing to Use BEDTools
  • Basic Protocol 2: Finding Intersections Between Genome Interval Files
  • Basic Protocol 3: Measuring Coverage in Whole‐Genome DNA Sequencing Experiments
  • Alternate Protocol 1: Identifying Specific Genomic Regions With High or Low Sequence Coverage
  • Basic Protocol 4: Measuring Coverage in Targeted DNA Sequencing Experiments
  • Alternate Protocol 2: Identifying Specific Targeted Intervals That Lacked Coverage
  • Basic Protocol 5: Measuring Transcription Factor Occupancy at Transcription Start Sites
  • Basic Protocol 6: Comparing Intervals Among Many Datasets
  • Alternate Protocol 3: Comparing Quantitative Measures Among Multiple Bedgraph Files
  • Basic Protocol 7: Statistics for Measuring Dataset Similarity
  • Guidelines For Understanding Results
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Installing and Preparing to Use BEDTools

Basic Protocol 2: Finding Intersections Between Genome Interval Files

Basic Protocol 3: Measuring Coverage in Whole‐Genome DNA Sequencing Experiments

Alternate Protocol 1: Identifying Specific Genomic Regions With High or Low Sequence Coverage

Basic Protocol 4: Measuring Coverage in Targeted DNA Sequencing Experiments

Alternate Protocol 2: Identifying Specific Targeted Intervals That Lacked Coverage

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Danecek, P. , Auton, A. , Abecasis, G. , Albers, C.A. , Banks, E. , DePristo, M.A. , Handsaker, R.E. , Lunter, G. , Marth, G.T. , Sherry, S.T. , McVean, G. , and Durbin, R. 2011. The variant call format and VCFtools. Bioinformatics 27:2156‐2158.
   Dunham, I. , Kundaje, A. , Aldred, S.F. , Collins, P.J. , Davis, C.A. , Doyle, F. , Epstein, C.B. , Frietze, S. , Harrow, J. , Kaul, R. , Khatun, J. , Lajoie, B.R. , Landt, S.G. , Lee, B.K. , Pauli, F. , Rosenbloom, K.R. , Sabo, P. , Safi, A. , Sanyal, A. , Shoresh, N. , Simon, J.M. , Song, L. , Trinklein, N.D. , Altshuler, R.C. , Birney, E. , Brown, J.B. , Cheng, C. , Djebali, S. , Dong, X. , Ernst, J. , Furey, T.S. , Gerstein, M. , Giardine, B. , Greven, M. , Hardison, R.C. , Harris, R.S. , Herrero, J. , Hoffman, M.M. , Iyer, S. , Kelllis, M. , Kheradpour, P. , Lassmann, T. , Li, Q. , Lin, X. , Marinov, G.K. , Merkel, A. , Mortazavi, A. , Parker, S.C. , Reddy, T.E. , Rozowsky, J. , Schlesinger, F. , Thurman, R.E. , Wang, J. , Ward, L.D. , Whitfield, T.W. , Wilder, S.P. , Wu, W. , Xi, H.S. , Yip, K.Y. , Zhuang, J. , Bernstein, B.E. , Green, E.D. , Gunter, C. , Snyder, M. , Pazin, M.J. , Lowdon, R.F. , Dillon, L.A. , Adams, L.B. , Kelly, C.J. , Zhang, J. , Wexler, J.R. , Good, P.J. , Feingold, E.A. , Crawford, G.E. , Dekker, J. , Elinitski, L. , Farnham, P.J. , Giddings, M.C. , Gingeras, T.R. , Guigo, R. , Hubbard, T.J. , Kellis, M. , Kent, W.J. , Lieb, J.D. , Margulies, E.H. , Myers, R.M. , Starnatoyannopoulos, J.A. , Tennebaum, S.A. , Weng, Z. , White, K.P. , Wold, B. , Yu, Y. , Wrobel, J. , Risk, B.A. , Gunawardena, H.P. , Kuiper, H.C. , Maier, C.W. , Xie, L. , Chen, X. , Mikkelsen, T.S. , Gillespie, S. , Goren, A. , Ram, O. , Zhang, X. , Wang, L. , Issner, R. , Coyne, M.J. , Durham, T. , Ku, M. , Truong, T. , Eaton, M.L. , Dobin, A. , Tanzer, A. , Lagarde, J. , Lin, W. , Xue, C. , Williams, B.A. , Zaleski, C. , Roder, M. , Kokocinski, F. , Abdelhamid, R.F. , Alioto, T. , Antoshechkin, I. , Baer, M.T. , Batut, P. , Bell, I. , Bell, K. , Chakrabortty, S. , Chrast, J. , Curado, J. , Derrien, T. , Drenkow, J. , Dumais, E. , Dumais, J. , Duttagupta, R. , Fastuca, M. , Fejes‐Toth, K. , Ferreira, P. , Foissac, S. , Fullwood, M.J. , Gao, H. , Gonzalez, D. , Gordon, A. , Howald, C. , Jha, S. , Johnson, R. , Kapranov, P. , King, B. , Kingswood, C. , Li, G. , Luo, O.J. , Park, E. , Preall, J.B. , Presaud, K. , Ribeca, P. , Robyr, D. , Ruan, X. , Sammeth, M. , Sandu, K.S. , Schaeffer, L. , See, L.H. , Shahab, A. , Skancke, J. , Suzuki, A.M. , Takahashi, H. , Tilgner, H. , Trout, D. , Walters, N. , Wang, H. , Hayashizaki, Y. , Reymond, A. , Antonarakis, S.E. , Hannon, G.J. , Ruan, Y. , Carninci, P. , Sloan, C.A. , Learned, K. , Malladi, V.S. , Wong, M.C. , Barber, G.P. , Cline, M.S. , Dreszer, T.R. , Heitner, S.G. , Karolchik, D. , Kirkup, V.M. , Meyer, L.R. , Long, J.C. , Maddren, M. , Raney, B.J. , Grasfeder, L.L. , Giresi, P.G. , Battenhouse, A. , Sheffield, N.C. , Showers, K.A. , London, D. , Bhinge, A.A. , Shestak, C. , Schaner, M.R. , Kim, S.K. , Zhang, Z.Z. , Mieczkowski, P.A. , Mieczkowska, J.O. , Liu, Z. , McDaniell, R.M. , Ni, Y. , Rashid, N.U. , Kim, M.J. , Adar, S. , Zhang, Z. , Wang, T. , Winter, D. , Keefe, D. , Iyer, V.R. , Sandhu, K.S. , Zheng, M. , Wang, P. , Gertz, J. , Vielmetter, J. , Partridge, E.C. , Varley, K.E. , Gasper, C. , Bansal, A. , Pepke, S. , Jain, P. , Amrhein, H. , Bowling, K.M. , Anaya, M. , Cross, M.K. , Muratet, M.A. , Newberry, K.M. , McCue, K. , Nesmith, A.S. , Fisher‐Aylor, K.I. , Pusey, B. , DeSalvo, G. , Parker, S.L. , Balasubramanian, S. , Davis, N.S. , Meadows, S.K. , Eggleston, T. , Newberry, J.S. , Levy, S.E. , Absher, D.M. , Wong, W.H. , Blow, M.J. , Visel, A. , Pennachio, L.A. , Elnitski, L. , Petrykowska, H.M. , Abyzov, A. , Aken, B. , Barrell, D. , Barson, G. , Berry, A. , Bignell, A. , Boychenko, V. , Bussotti, G. , Davidson, C. , Despacio‐Reyes, G. , Diekhans, M. , Ezkurdia, I. , Frankish, A. , Gilbert, J. , Gonzalez, J.M. , Griffiths, E. , Harte, R. , Hendrix, D.A. , Hunt, T. , Jungreis, I. , Kay, M. , Khurana, E. , Leng, J. , Lin, M.F. , Loveland, J. , Lu, Z. , Manthravadi, D. , Mariotti, M. , Mudge, J. , Mukherjee, G. , Notredame, C. , Pei, B. , Rodriguez, J.M. , Saunders, G. , Sboner, A. , Searle, S. , Sisu, C. , Snow, C. , Steward, C. , Tapanari, E. , Tress, M.L. , van Baren, M.J. , Washieti, S. , Wilming, L. , Zadissa, A. , Zhengdong, Z. , Brent, M. , Haussler, D. , Valencia, A. , Raymond, A. , Addleman, N. , Alexander, R.P. , Auerbach, R.K. , Bettinger, K. , Bhardwaj, N. , Boyle, A.P. , Cao, A.R. , Cayting, P. , Charos, A. , Cheng, Y. , Eastman, C. , Euskirchen, G. , Fleming, J.D. , Grubert, F. , Habegger, L. , Hariharan, M. , Harmanci, A. , Iyenger, S. , Jin, V.X. , Karczewski, K.J. , Kasowski, M. , Lacroute, P. , Lam, H. , Larnarre‐Vincent, N. , Lian, J. , Lindahl‐Allen, M. , Min, R. , Miotto, B. , Monahan, H. , Moqtaderi, Z. , Mu, X.J. , O'Geen, H. , Ouyang, Z. , Patacsil, D. , Raha, D. , Ramirez, L. , Reed, B. , Shi, M. , Slifer, T. , Witt, H. , Wu, L. , Xu, X. , Yan, K.K. , Yang, X. , Struhl, K. , Weissman, S.M. , Tenebaum, S.A. , Penalva, L.O. , Karmakar, S. , Bhanvadia, R.R. , Choudhury, A. , Domanus, M. , Ma, L. , Moran, J. , Victorsen, A. , Auer, T. , Centarin, L. , Eichenlaub, M. , Gruhl, F. , Heerman, S. , Hoeckendorf, B. , Inoue, D. , Kellner, T. , Kirchmaier, S. , Mueller, C. , Reinhardt, R. , Schertel, L. , Schneider, S. , Sinn, R. , Wittbrodt, B. , Wittbrodt, J. , Jain, G. , Balasundaram, G. , Bates, D.L. , Byron, R. , Canfield, T.K. , Diegel, M.J. , Dunn, D. , Ebersol, A.K. , Frum, T. , Garg, K. , Gist, E. , Hansen, R.S. , Boatman, L. , Haugen, E. , Humbert, R. , Johnson, A.K. , Johnson, E.M. , Kutyavin, T.M. , Lee, K. , Lotakis, D. , Maurano, M.T. , Neph, S.J. , Neri, F.V. , Nguyen, E.D. , Qu, H. , Reynolds, A.P. , Roach, V. , Rynes, E. , Sanchez, M.E. , Sandstrom, R.S. , Shafer, A.O. , Stergachis, A.B. , Thomas, S. , Vernot, B. , Vierstra, J. , Vong, S. , Weaver, M.A. , Yan, Y. , Zhang, M. , Akey, J.A. , Bender, M. , Dorschner, M.O. , Groudine, M. , MacCoss, M.J. , Navas, P. , Stamatoyannopoulos, G. , Stamatoyannopoulos, J.A. , Beal, K. , Brazma, A. , Flicek, P. , Johnson, N. , Lukk, M. , Luscombe, N.M. , Sobral, D. , Vaquerizas, J.M. , Batzoglou, S. , Sidow, A. , Hussami, N. , Kyriazopoulou‐Panagiotopoulou, S. , Libbrecht, M.W. , Schaub, M.A. , Miller, W. , Bickel, P.J. , Banfai, B. , Boley, N.P. , Huang, H. , Li, J.J. , Noble, W.S. , Bilmes, J.A. , Buske, O.J. , Sahu, A.O. , Kharchenko, P.V. , Park, P.J. , Baker, D. , Taylor, J. , and Lochovsky, L. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489:57‐74.
   Durbin, R.M. , Abecasis, G.R. , Altshuler, D.L. , Auton, A. , Brooks, L.D. , Gibbs, R.A. , Hurles, M.E. , and McVean, G.A. 2010. A map of human genome variation from population‐scale sequencing. Nature 467:1061‐1073.
   Favorov, A. , Mularoni, L. , Cope, L.M. , Medvedeva, Y. , Mironov, A.A. , Makeev, V.J. , and Wheelan, S.J. 2012. Exploring massive, genome scale datasets with the GenometriCorr package. PLoS Comput. Biol. 8:e1002529.
   Kent, W.J. , Sugnet, C.W. , Furey, T.S. , Roskin, K.M. , Pringle, T.H. , Zahler, A.M. , and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12:996‐1006.
   Kent, W.J. , Zweig, A.S. , Barber, G. , Hinrichs, A.S. , and Karolchik, D. 2010. BigWig and BigBed: Enabling browsing of large distributed datasets. Bioinformatics 26:2204‐2207.
   Li, H. , Handsaker, B. , Wysoker, A. , Fennell, T. , Ruan, J. , Homer, N. , Marth, G. , Abecasis, G. , and Durbin, R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078‐2079.
   Maurano, M.T. , Humbert, R. , Rynes, E. , Thurman, R.E. , Haugen, E. , Wang, H. , Reynolds, A.P. , Sandstrom, R. , Qu, H. , Brody, J. , Shafer, A. , Neri, F. , Lee, K. , Kutyavin, T. , Stehling‐Sun, S. , Johnson, A.K. , Canfield, T.K. , Giste, E. , Diegel, M. , Bates, D. , Hansen, R.S. , Neph, S. , Sabo, P.J. , Heimfeld, S. , Raubitschek, A. , Ziegler, S. , Cotsapas, C. , Sotoodehnia, N. , Glass, I. , Sunyaev, S.R. , Kaul, R. , and Stamatoyannopoulos, J.A. 2012. Systematic localization of common disease‐associated variation in regulatory DNA. Science 337:1190‐1195.
   Ng, S.B. , Buckingham, K.J. , Lee, C. , Bigham, A.W. , Tabor, H.K. , Dent, K.M. , Huff, C.D. , Shannon, P.T. , Jabs, E.W. , Nickerson, D.A. , Shendure, J. , and Bamshad, M.J. 2010. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42:30‐35.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library