Identifying Functional Annotation for Noncoding Genomic Sequences

Douglas P. Mortlock1, Steven Pregizer1

1 Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 1.10
DOI:  10.1002/0471142905.hg0110s72
Online Posting Date:  January, 2012
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The recent success of genome‐wide association studies has generated a trove of biologically significant variants implicated in human disease. However, many, if not most, of these variants fall in noncoding regions that have traditionally lacked much functional annotation. New data sets and tools allow for a more detailed assessment of potential importance of noncoding genetic variants. An overview of types of regulatory annotation that are currently available, and approaches to analyzing this data are provided with emphasis on usage of the UCSC genome browser. Curr. Protoc. Hum. Genet. 72:1.10.1‐1.10.10 © 2012 by John Wiley & Sons, Inc.

Keywords: cis‐regulatory element; epigenomics; ENCODE; conservation; ChIP‐seq

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Key Concepts
  • Basic Protocol 1: Identifying Candidate Noncoding Regulatory Elements Using the UCSC Genome Browser and the Encode Integrated Regulation Track
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


Basic Protocol 1: Identifying Candidate Noncoding Regulatory Elements Using the UCSC Genome Browser and the Encode Integrated Regulation Track

  • Computer with high‐speed internet connection
  • Web browser
  • Genomic coordinates for region of interest
PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. 2007. High‐resolution profiling of histone methylations in the human genome. Cell 129:823‐837.
   Bernstein, B.E., Meissner, A., and Lander, E.S. 2007. The mammalian epigenome. Cell 128:669‐681.
   Blankenberg, D., Kuster, G.V., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A Web‐based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21.
   Boyle, A.P., Davis, S., Shulha, H.P., Meltzer, P., Margulies, E.H., Weng, Z., Furey, T.S., and Crawford, G.E. 2008. High‐resolution mapping and characterization of open chromatin across the genome. Cell 132:311‐322.
   Cheung, V.G., Conlin, L.K., Weber, T.M., Arcaro, M., Jen, K.Y., Morley, M., and Spielman, R.S. 2003. Natural variation in human gene expression assessed in lymphoblastoid cells. Nat. Genet. 33:422‐425.
   Cooper, S.J., Trinklein, N.D., Anton, E.D., Nguyen, L., and Myers, R.M. 2006. Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 16:1‐10.
   Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., Boyer, L.A., Young, R.A., and Jaenisch, R. 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. U.S.A. 107:21931‐21936.
   Emilsson, V., Thorleifsson, G., Zhang, B., Leonardson, A.S., Zink, F., Zhu, J., Carlson, S., Helgason, A., Walters, G.B., Gunnarsdottir, S., Mouy, M., Steinthorsdottir, V., Eiriksdottir, G.H., Bjornsdottir, G., Reynisdottir, I., Gudbjartsson, D., Helgadottir, A., Jonasdottir, A., Styrkarsdottir, U., Gretarsdottir, S., Magnusson, K.P., Stefansson, H., Fossdal, R., Kristjansson, K., Gislason, H.G., Stefansson, T., Leifsson, B.G., Thorsteinsdottir, U., Lamb, J.R., Gulcher, J.R., Reitman, M.L., Kong, A., Schadt, E.E., and Stefansson, K. 2008. Genetics of gene expression and its effect on disease. Nature 452:423‐428.
   Encode Project Consortium. 2011. A user's guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol. 9:e1001046.
   Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. 2009. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19:92‐105.
   Griffith, O.L., Montgomery, S.B., Bernier, B., Chu, B., Kasaian, K., Aerts, S., Mahony, S., Sleumer, M.C., Bilenky, M., Haeussler, M., Griffith, M., Gallo, S.M., Giardine, B., Hooghe, B., Van Loo, P., Blanco, E., Ticoll, A., Lithwick, S., Portales‐Casamar, E., Donaldson, I.J., Robertson, G., Wadelius, C., De Bleser, P., Vlieghe, D., Halfon, M.S., Wasserman, W., Hardison, R., Bergman, C.M., and Jones, S.J. 2008. ORegAnno: An open‐access community‐driven resource for regulatory annotation. Nucleic Acids Res. 36:D107‐D113.
   Gupta, R., Bhattacharyya, A., Agosto‐Perez, F.J., Wickramasinghe, P., and Davuluri, R.V. 2010. MPromDb update 2010: An integrated resource for annotation and visualization of mammalian gene promoters and ChIP‐seq experimental data. Nucleic Acids Res. 39:D92‐97.
   Hindorff, L.A., MacArthur, J. (European Bioinformatics Institute), Wise, A., Junkins, H.A., Hall, P.N., Klemm, A.K., and Manolio, T.A. 2011. A Catalog of Published Genome‐Wide Association Studies. Available at: Accessed November 16th, 2011.
   Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., and Manolio, T.A. 2009. Potential etiologic and functional implications of genome‐wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. U.S.A. 106:9362‐9367.
   Huang, H.Y., Chien, C.H., Jen, K.H., and Huang, H.D. 2006. RegRNA: An integrated web server for identifying regulatory RNA motifs and elements. Nucleic Acids Res. 34:W429‐W434.
   Jacobs, G.H., Chen, A., Stevens, S.G., Stockwell, P.A., Black, M.A., Tate, W.P., and Brown, C.M. 2009. Transterm: A database to aid the analysis of regulatory sequences in mRNAs. Nucleic Acids Res. 37:D72‐D76.
   Jiang, C., Xuan, Z., Zhao, F., and Zhang, M.Q. 2007. TRED: A transcriptional regulatory element database, new entries and other development. Nucleic Acids Res. 35:D137‐D140.
   Loots, G.G. and Ovcharenko, I. 2004. rVISTA 2.0: Evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 32:W217‐W221.
   Lukashin, I., Novichkov, P., Boffelli, D., Paciorkowski, A.R., Minovitsky, S., Yang, S., and Dubchak, I. 2011. VISTA Region Viewer (RViewer) ‐ a computational system for prioritizing genomic intervals for biomedical studies. In press.
   Margulies, E.H., Blanchette, M., Program, N.C.S., Haussler, D., and Green, E.D. 2003. Identification and characterization of multi‐species conserved sequences. Genome Res. 13:2507‐2518.
   Matys, V., Kel‐Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre‐Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki‐Potapov, B., Saxel, H., Kel, A.E., and Wingender, E. 2006. TRANSFAC and its module TRANSCompel: Transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34:D108‐D110.
   McCauley, J.L., Kenealy, S.J., Margulies, E.H., Schnetz‐Boutaud, N., Gregory, S.G., Hauser, S.L., Oksenberg, J.R., Pericak‐Vance, M.A., Haines, J.L., and Mortlock, D.P. 2007. SNPs in Multi‐species Conserved Sequences (MCS) as useful markers in association studies: A practical approach. BMC Genomics 8:266.
   Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R., and Siepel, A. 2010. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20:110‐121.
   Reed, N.P. and Mortlock, D.P. 2010. Identification of a distant cis‐regulatory element controlling pharyngeal arch‐specific expression of zebrafish gdf6a/radar. Dev. Dyn. 239:1047‐1060.
   Veyrieras, J.B., Kudaravalli, S., Kim, S.Y., Dermitzakis, E.T., Gilad, Y., Stephens, M., and Pritchard, J.K. 2008. High‐resolution mapping of expression‐QTLs yields insight into human gene regulation. PLoS Genet. 4:e100214.
   Visel, A., Minovitsky, S., Dubchak, I., and Pennacchio, L.A. 2007. VISTA Enhancer Browser–a database of tissue‐specific human enhancers. Nucleic Acids Res. 35:D88‐D92.
   Wingender, E., Dietze, P., Karas, H., and Knuppel, R. 1996. TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24:238‐241.
   Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway, H., Kelly, K., Walter, K., Abnizova, I., Gilks, W., Edwards, Y.J., Cooke, J.E., and Elgar, G. 2004. Highly conserved non‐coding sequences are associated with vertebrate development. PLoS Biol. 3:e7.
   Zuvich, R.L., Bush, W.S., McCauley, J.L., Beecham, A.H., De Jager, P.L., Ivinson, A.J., Compston, A., Hafler, D.A., Hauser, S.L., Sawcer, S.J., Pericak‐Vance, M.A., Barcellos, L.F., Mortlock, D.P., and Haines, J.L. 2011. Interrogating the complex role of chromosome 16p13.13 in multiple sclerosis susceptibility: Independent genetic signals in the CIITA‐CLEC16A‐SOCS1 gene complex. Hum. Mol. Genet. 20:3517‐3524.
   Zweig, A.S., Karolchik, D., Kuhn, R.M., Haussler, D., and Kent, W.J. 2008. UCSC genome browser tutorial. Genomics 92:75‐84.
PDF or HTML at Wiley Online Library