Searching for Non‐B DNA‐Forming Motifs Using nBMST (Non‐B DNA Motif Search Tool)

R.Z. Cer1, K.H. Bruce1, D.E. Donohue1, N.A. Temiz1, U.S. Mudunuri1, M. Yi1, N. Volfovsky1, A. Bacolla2, B.T. Luke1, J.R. Collins1, R.M. Stephens1

1 Advanced Biomedical Computing Center, Information Systems Program, SAIC‐Frederick, Inc., National Cancer Institute‐Frederick, Frederick, Maryland, 2 The Dell Pediatric Research Institute, Division of Pharmacology and Toxicology, The University of Texas at Austin, Austin, Texas
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 18.7
DOI:  10.1002/0471142905.hg1807s73
Online Posting Date:  April, 2012
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

This unit describes basic protocols on using the non‐B DNA Motif Search Tool (nBMST) to search for sequence motifs predicted to form alternative DNA conformations that differ from the canonical right‐handed Watson‐Crick double‐helix, collectively known as non‐B DNA, and on using the associated PolyBrowse, a GBrowse–based genomic browser. The nBMST is a Web‐based resource that allows users to submit one or more DNA sequences to search for inverted repeats (cruciform DNA), mirror repeats (triplex DNA), direct/tandem repeats (slipped/hairpin structures), G4 motifs (tetraplex, G‐quadruplex DNA), alternating purine‐pyrimidine tracts (left‐handed Z‐DNA), and A‐phased repeats (static bending). The nBMST is versatile, simple to use, does not require bioinformatics skills, and can be applied to any type of DNA sequences, including viral and bacterial genomes, up to an aggregate of 20 megabasepairs (Mbp). Curr. Protoc. Hum. Genet. 73:18.7.1‐18.7.22. © 2012 by John Wiley & Sons, Inc.

Keywords: nBMST; non‐B DNA; nucleotide sequence analysis; G‐quadruplex; triplex; cruciform; Z‐DNA; hairpin; slipped DNA; alternative DNA structure; tandem repeats; PolyBrowse

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using the nBMST Server
  • Basic Protocol 2: Using the PolyBrowse Viewer
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Using the nBMST Server

  Materials
  • Computer with Internet access
  • Up‐to‐date Web browser, such as Firefox (Windows, Mac OS X, and Linux; http://www.mozilla.org/firefox); Safari (Windows, Mac OS X; http://www.apple.com/safari); or Internet Explorer (Windows; http://www.microsoft.com/ie)
  • A text file up to 20 Mb with one or more DNA sequences in FASTA format. A FASTA file begins with a greater than sign (>) character in the header followed without any spaces by a description, and, on a new line or lines, the DNA sequence. The DNA sequences may contain only the letters A, C, G, T, or N, and uppercase and lowercase letters and spaces are allowed. If there is more than one DNA sequence, each sequence must be separated by a description line. Below are two examples of FASTA sequences. Only short sequences are shown for simplicity:
  • >seq1
  • TTTATAATTTTATAATTATAAAATTTTATAATTTTATAATTTTATAATTTTATAATTATTTATAAT
  • >seq2
  • gggtgggttgggtgggg

Basic Protocol 2: Using the PolyBrowse Viewer

  Materials
  • Computer with Internet access
  • An up‐to‐date Web browser, such as Firefox (Windows, Mac OS X, and Linux; http://www.mozilla.org/firefox); Safari (Windows, Mac OS X; http://www.apple.com/safari); or Internet Explorer (Windows; http://www.microsoft.com/ie)
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

   Adachi, M. and Tsujimoto, Y. 1990. Potential Z‐DNA elements surround the breakpoints of chromosome translocation within the 5′ flanking region of bcl‐2 gene. Oncogene 5:1653‐1657.
   Akagi, K., Li, J., Stephens, R.M., Volfovsky, N., and Symer, D.E. 2008. Extensive variation between inbred mouse strains due to endogenous L1 retrotransposition. Genome Res. 18:869‐880.
   Akagi, K., Stephens, R.M., Li, J., Evdokimov, E., Kuehn, M.R., Volfovsky, N., and Symer, D.E. 2009. MouseIndelDB: A database integrating genomic indel polymorphisms that distinguish mouse strains. Nucleic Acids Res. 38:D600‐D606.
   Bacolla, A. and Wells, R.D. 2004. Non‐B DNA conformations, genomic rearrangements, and human disease. J. Biol. Chem. 279:47411‐47414.
   Bacolla, A. and Wells, R.D. 2009. Non‐B DNA conformations as determinants of mutagenesis and human disease. Mol. Carcinog. 48:273‐285.
   Boehm, T., Mengle‐Gaw, L., Kees, U.R., Spurr, N., Lavenir, I., Forster, A., and Rabbitts, T.H. 1989. Alternating purine‐pyrimidine tracts may promote chromosomal translocations seen in a variety of human lymphoid tumours. EMBO J. 8:2621‐2631.
   Brouwer, J.R., Willemsen, R., and Oostra, B.A. 2009. Microsatellite repeat instability and neurological disease. Bioessays 31:71‐83.
   Cahoon, L.A. and Seifert, H.S. 2009. An alternative DNA structure is necessary for pilin antigenic variation in Neisseria gonorrhoeae. Science 325:764‐767.
   Carvalho, C.M., Ramocki, M.B., Pehlivan, D., Franco, L.M., Gonzaga‐Jauregui, C., Fang, P., McCall, A., Pivnick, E.K., Hines‐Dowell, S., Seaver, L.H., Friehling, L., Lee, S., Smith, R., Del Gaudio, D., Withers, M., Liu, P., Cheung, S.W., Belmont, J.W., Zoghbi, H.Y., Hastings, P.J., and Lupski, J.R. 2011. Inverted genomic segments and complex triplication rearrangements are mediated by inverted repeats in the human genome. Nat. Genet. 43:1074‐1081.
   Cer, R.Z., Bruce, K.H., Mudunuri, U.S., Yi, M., Volfovsky, N., Luke, B.T., Bacolla, A., Collins, J.R., and Stephens, R.M. 2011. Non‐B DB: A database of predicted non‐B DNA‐forming motifs in mammalian genomes. Nucleic Acids Res. 39:D383‐D391.
   Cooper, D.N., Bacolla, A., Ferec, C., Vasquez, K.M., Kehrer‐Sawatzki, H., and Chen, J.M. 2011. On the sequence‐directed nature of human gene mutation: The role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum. Mutat. 32:1075‐1099.
   Dragileva, E., Hendricks, A., Teed, A., Gillis, T., Lopez, E.T., Friedberg, E.C., Kucherlapati, R., Edelmann, W., Lunetta, K.L., MacDonald, M.E., and Wheeler, V.C. 2009. Intergenerational and striatal CAG repeat instability in Huntington's disease knock‐in mice involve different DNA repair genes. Neurobiol. Dis. 33:37‐47.
   Emanuel, B.S. 2008. Molecular mechanisms and diagnosis of chromosome 22q11.2 rearrangements. Dev. Disabil. Res. Rev. 14:11‐18.
   Entezam, A. and Usdin, K. 2008. ATR protects the genome against CGG.CCG‐repeat expansion in Fragile X premutation mice. Nucleic Acids Res. 36:1050‐1056.
   Fleming, K., Riser, D.K., Kumari, D., and Usdin, K. 2003. Instability of the fragile X syndrome repeat in mice: The effect of age, diet and mutations in genes that affect DNA replication, recombination and repair proficiency. Cytogenet. Genome Res. 100:140‐146.
   Foiry, L., Dong, L., Savouret, C., Hubert, L., te Riele, H., Junien, C., and Gourdon, G. 2006. Msh3 is a limiting factor in the formation of intergenerational CTG expansions in DM1 transgenic mice. Hum. Genet. 119:520‐526.
   Gaddis, S.S., Wu, Q., Thames, H.D., DiGiovanni, J., Walborg, E.F., MacLeod, M.C., and Vasquez, K.M. 2006. A web‐based search engine for triplex‐forming oligonucleotide target sequences. Oligonucleotides 16:196‐201.
   Gal, M., Katz, T., Ovadia, A., and Yagil, G. 2003. TRACTS: A program to map oligopurine.oligopyrimidine and other binary DNA tracts. Nucleic Acids Res 31:3682‐3685.
   Gomes‐Pereira, M., Fortune, M.T., Ingram, L., McAbney, J.P., and Monckton, D.G. 2004. Pms2 is a genetic enhancer of trinucleotide CAG.CTG repeat somatic mosaicism: Implications for the mechanism of triplet repeat expansion. Hum. Mol. Genet. 13:1815‐1825.
   Inagaki, H., Ohye, T., Kogo, H., Kato, T., Bolor, H., Taniguchi, M., Shaikh, T.H., Emanuel, B.S., and Kurahashi, H. 2009. Chromosomal instability mediated by non‐B DNA: Cruciform conformation and not DNA sequence is responsible for recurrent translocation in humans. Genome Res. 19:191‐198.
   Jenjaroenpun, P. and Kuznetsov, V.A. 2009. TTS mapping: Integrative WEB tool for analysis of triplex formation target DNA sequences, G‐quadruplets and non‐protein coding regulatory DNA elements in the human genome. BMC Genomics 10 Suppl 3:S9.
   Kehrer‐Sawatzki, H., Haussler, J., Krone, W., Bode, H., Jenne, D.E., Mehnert, K.U., Tummers, U., and Assum, G. 1997. The second case of a t(17;22) in a family with neurofibromatosis type 1: Sequence analysis of the breakpoint regions. Hum. Genet. 99:237‐247.
   Kikin, O., D'Antonio, L., and Bagga, P.S. 2006. QGRS Mapper: A web‐based server for predicting G‐quadruplexes in nucleotide sequences. Nucleic Acids Res. 34:W676‐W682.
   Kostadinov, R., Malhotra, N., Viotti, M., Shine, R., D'Antonio, L., and Bagga, P. 2006. GRSDB: A database of quadruplex forming G‐rich sequences in alternatively processed mammalian pre‐mRNA sequences. Nucleic Acids Res. 34:D119‐D124.
   Kurahashi, H., Inagaki, H., Ohye, T., Kogo, H., Kato, T., and Emanuel, B.S. 2006. Chromosomal translocations mediated by palindromic DNA. Cell Cycle 5:1297‐1303.
   Kurahashi, H., Inagaki, H., Hosoba, E., Kato, T., Ohye, T., Kogo, H., and Emanuel, B.S. 2007. Molecular cloning of a translocation breakpoint hotspot in 22q11. Genome Res. 17:461‐469.
   Kurahashi, H., Inagaki, H., Ohye, T., Kogo, H., Tsutsumi, M., Kato, T., Tong, M., and Emanuel, B.S. 2010. The constitutional t(11;22): Implications for a novel mechanism responsible for gross chromosomal rearrangements. Clin. Genet. 78:299‐309.
   Lawson, A.R., Hindley, G.F., Forshew, T., Tatevossian, R.G., Jamie, G.A., Kelly, G.P., Neale, G.A., Ma, J., Jones, T.A., Ellison, D.W., and Sheer, D. 2011. RAF gene fusion breakpoints in pediatric brain tumors are characterized by significant enrichment of sequence microhomology. Genome Res. 21:505‐514.
   Li, H., Xiao, J., Li, J., Lu, L., Feng, S., and Droge, P. 2009. Human genomic Z‐DNA segments probed by the Z alpha domain of ADAR1. Nucleic Acids Res. 37:2737‐2746.
   Lin, Y. and Wilson, J.H. 2011. Transcription‐induced DNA toxicity at trinucleotide repeats: Double bubble is trouble. Cell Cycle 10:611‐618.
   Lopez Castel, A., Cleary, J.D., and Pearson, C.E. 2010. Repeat instability as the basis for human diseases and as a potential target for therapy. Nat. Rev. Mol. Cell. Biol. 11:165‐170.
   Manley, K., Shirley, T.L., Flaherty, L., and Messer, A. 1999. Msh2 deficiency prevents in vivo somatic instability of the CAG repeat in Huntington disease transgenic mice. Nat. Genet. 23:471‐473.
   McMurray, C.T. 2010. Mechanisms of trinucleotide repeat instability during human development. Nat. Rev. Genet. 11:786‐799.
   Messaed, C. and Rouleau, G.A. 2009. Molecular mechanisms underlying polyalanine diseases. Neurobiol. Dis. 34:397‐405.
   Mirkin, S.M. 2007. Expandable DNA repeats and human disease. Nature 447:932‐940.
   Orr, H.T. and Zoghbi, H.Y. 2007. Trinucleotide repeat disorders. Annu. Rev. Neurosci. 30:575‐621.
   Pearson, C.E., Nichol Edamura, K., and Cleary, J.D. 2005. Repeat instability: Mechanisms of dynamic mutations. Nat. Rev. Genet. 6:729‐742.
   Phan, A.T., Kuryavyi, V., and Patel, D.J. 2006. DNA architecture: From G to Z. Curr. Opin. Struct. Biol. 16:288‐298.
   Punga, T. and Buhler, M. 2010. Long intronic GAA repeats causing Friedreich ataxia impede transcription elongation. EMBO Mol. Med. 2:120‐129.
   Rimokh, R., Rouault, J.P., Wahbi, K., Gadoux, M., Lafage, M., Archimbaud, E., Charrin, C., Gentilhomme, O., Germain, D., Samarut, J., et al. 1991. A chromosome 12 coding region is juxtaposed to the MYC protooncogene locus in a t(8;12)(q24;q22) translocation in a case of B‐cell chronic lymphocytic leukemia. Genes Chromosomes Cancer 3:24‐36.
   Scaria, V., Hariharan, M., Arora, A., and Maiti, S. 2006. Quadfinder: Server for identification and analysis of quadruplex‐forming motifs in nucleotide sequences. Nucleic Acids Res. 34:W683‐W685.
   Schroth, G.P., Chou, P.J., and Ho, P.S. 1992. Mapping Z‐DNA in the human genome. Computer‐aided mapping reveals a nonrandom distribution of potential Z‐DNA‐forming sequences in human genes. J. Biol. Chem. 267:11846‐11855.
   Seite, P., Leroux, D., Hillion, J., Monteil, M., Berger, R., Mathieu‐Mahul, D., and Larsen, C.J. 1993. Molecular analysis of a variant 18;22 translocation in a case of lymphocytic lymphoma. Genes Chromosomes Cancer 6:39‐44.
   Shelbourne, P.F., Keller‐McGandy, C., Bi, W.L., Yoon, S.R., Dubeau, L., Veitch, N.J., Vonsattel, J.P., Wexler, N.S., Arnheim, N., and Augood, S.J. 2007. Triplet repeat mutation length gains correlate with cell‐type specific vulnerability in Huntington disease brain. Hum. Mol. Genet. 16:1133‐1142.
   Sheridan, M.B., Kato, T., Haldeman‐Englert, C., Jalali, G.R., Milunsky, J.M., Zou, Y., Klaes, R., Gimelli, G., Gimelli, S., Gemmill, R.M., Drabkin, H.A., Hacker, A.M., Brown, J., Tomkins, D., Shaikh, T.H., Kurahashi, H., Zackai, E.H., and Emanuel, B.S. 2010. A palindrome‐mediated recurrent translocation with 3:1 meiotic nondisjunction: The t(8;22)(q24.13;q11.21). Am. J. Hum. Genet. 87:209‐218.
   Simsek, D., Brunet, E., Wong, S.Y., Katyal, S., Gao, Y., McKinnon, P.J., Lou, J., Zhang, L., Li, J., Rebar, E.J., Gregory, P.D., Holmes, M.C., and Jasin, M. 2011. DNA ligase III promotes alternative nonhomologous end‐joining during chromosomal translocation formation. PLoS Genet. 7(6)e1002080.
   Sinclair, P.B., Parker, H., An, Q., Rand, V., Ensor, H., Harrison, C. J., and Strefford, J.C. 2011. Analysis of a breakpoint cluster reveals insight into the mechanism of intrachromosomal amplification in a lymphoid malignancy. Hum. Mol. Genet. 20:2591‐2602.
   Stankiewicz, P. and Lupski, J.R. 2010. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61:437‐455.
   Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: A building block for a model organism system database. Genome Res. 12:1599‐1610.
   Thandla, S.P., Ploski, J.E., Raza‐Egilmez, S.Z., Chhalliyil, P.P., Block, A.W., de Jong, P.J., and Aplan, P.D. 1999. ETV6‐AML1 translocation breakpoints cluster near a purine/pyrimidine repeat region in the ETV6 gene. Blood 93:293‐299.
   Tong, M., Kato, T., Yamada, K., Inagaki, H., Kogo, H., Ohye, T., Tsutsumi, M., Wang, J., Emanuel, B.S., and Kurahashi, H. 2010. Polymorphisms of the 22q11.2 breakpoint region influence the frequency of de novo constitutional t(11;22)s in sperm. Hum. Mol. Genet. 19:2630‐2637.
   van den Broek, W.J., Nelen, M.R., Wansink, D.G., Coerwinkel, M.M., te Riele, H., Groenen, P.J., and Wieringa, B. 2002. Somatic expansion behaviour of the (CTG)n repeat in myotonic dystrophy knock‐in mice is differentially affected by Msh3 and Msh6 mismatch‐repair proteins. Hum. Mol. Genet. 11:191‐198.
   Vandyke, D.L., Weiss, L., Roberson, J.R., and Babu, V.R. 1983. The frequency and mutation‐rate of balanced autosomal rearrangements in man estimated from prenatal genetic‐studies for advanced maternal age. Am. J. Hum. Genet. 35:301‐308.
   Wells, R.D. 2007. Non‐B DNA conformations, mutagenesis and disease. Trends Biochem. Sci. 32:271‐278.
   Wells, R.D. 2008. DNA triplexes and Friedreich ataxia. FASEB J. 22:1625‐1634.
   Wells, R.D., Dere, R., Hebert, M.L., Napierala, M., and Son, L.S. 2005. Advances in mechanisms of genetic instability related to hereditary neurological diseases. Nucleic Acids Res. 33:3785‐3798.
   Wheeler, V.C., Lebel, L.A., Vrbanac, V., Teed, A., te Riele, H., and MacDonald, M.E. 2003. Mismatch repair gene Msh2 modifies the timing of early disease in Hdh(Q111) striatum. Hum. Mol. Genet. 12:273‐281.
   Wiemels, J.L. and Greaves, M. 1999. Structure and possible mechanisms of TEL‐AML1 gene fusions in childhood acute lymphoblastic leukemia. Cancer Res. 59:4075‐4082.
   Yadav, V.K., Abraham, J.K., Mani, P., Kulshrestha, R., and Chowdhury, S. 2008. QuadBase: Genome‐wide database of G4 DNA: Occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes. Nucleic Acids Res. 36:D381‐D385.
   Zhang, R., Lin, Y., and Zhang, C.T. 2008. Greglist: A database listing potential G‐quadruplex regulated genes. Nucleic Acids Res. 36:D372‐D376.
   Zhao, J., Bacolla, A., Wang, G., and Vasquez, K.M. 2010. Non‐B DNA structure‐induced genetic instability and evolution. Cell Mol. Life Sci. 67:43‐62.
   Zu, T., Gibbens, B., Doty, N.S., Gomes‐Pereira, M., Huguet, A., Stone, M.D., Margolis, J., Peterson, M., Markowski, T.W., Ingram, M.A., Nan, Z., Forster, C., Low, W.C., Schoser, B., Somia, N.V., Clark, H.B., Schmechel, S., Bitterman, P.B., Gourdon, G., Swanson, M.S., Moseley, M., Ranum, L.P. 2011. Non‐ATG‐initiated translation directed by microsatellite expansions. Proc. Natl. Acad. Sci. U.S.A. 108:260‐265.
Internet Resources
   http://nonb.abcc.ncifcrf.gov
  Non‐B DB, a database resource for integrated annotations and analysis of non‐B DNA‐forming motifs.
   http://pbrowse3.abcc.ncifcrf.gov/cgi‐bin/gb2/gbrowse/Human_37/
  PolyBrowse, ABCC genome browser for variations and annotations.
  http://tandem.bu.edu/trf/trf.submit.options.html
  Tandem Repeats Finder.
   http://miracle.igib.res.in/quadfinder/crux.html
  QuadFinder to find cruciform DNA.
   http://quadbase.igib.res.in/
  QuadBase, a database of quadruplex motifs.
   http://tubic.tju.edu.cn/greglist/
  Greglist, a database of G‐quadruplex regulated genes.
   http://bioinformatics.ramapo.edu/GRSDB2/
  GRSDB, a database of G‐Rich sequences.
   http://bioinformatics.ramapo.edu/QGRS/index.php
  Quadruplex forming G‐Rich Sequences (QGRS) Mapper.
   http://tandem.bu.edu/irf/irf.download.html
  Inverted Repeat Finder, a command line version of the IRF algorithm used to investigate inverted repeat structure of the human genome.
   http://ggeda.bii.a‐star.edu.sg/∼piroonj/TTS_mapping/TTS_mapping.php
  Triplex Target DNA Site (TTS) Mapping.
   http://spi.mdanderson.org/tfo/
  Triplex‐Forming Oligonucleotide Target Sequence Search program.
   http://bioportal.weizmann.ac.il/tracts/tracts.html
  The Tracts program to detect and analyze binary tracts in a DNA sequence.
   http://gac‐web.cgrb.oregonstate.edu/zDNA/
  Z‐Hunt tool to find Z‐DNA.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library