Using TESS to Predict Transcription Factor Binding Sites in DNA Sequence
1University of Pennsylvania, Philadelphia, Pennsylvania
Abstract
This unit describes how to use the Transcription Element Search System (TESS). This Web site predicts transcription factor binding sites (TFBS) in DNA sequence using two different kinds of models of sites, strings and positional weight matrices. The binding of transcription factors to DNA is a major part of the control of gene expression. Transcription factors exhibit sequence-specific binding; they form stronger bonds to some DNA sequences than to others. Identification of a good binding site in the promoter for a gene suggests the possibility that the corresponding factor may play a role in the regulation of that gene. However, the sequences transcription factors recognize are typically short and allow for some amount of mismatch. Because of this, binding sites for a factor can typically be found at random every few hundred to a thousand base pairs. TESS has features to help sort through and evaluate the significance of predicted sites. Curr. Protoc. Bioinform. 21:2.6.1-2.6.15. © 2008 by John Wiley & Sons, Inc.
Keywords: transcription factor; DNA sequence; genome; promoter; gene regulation
Figures
-
Figure 2.6.1Sample of TESS site search results. The windows show (clockwise from upper left) a tabulation of predicted sites with scores and p-values, annotated sequences, and details of a weight matrix model for a predicted site.
-
Figure 2.6.2The title, navigation bar, and disclaimers from the TESS home page. The line of links (Home | Site Searches ) is the primary navigation bar. When these links are clicked, different links appear in the secondary navigation bar, which is the lower line (About | TESS ).
-
Figure 2.6.3The section of the job submission form for entering the minimal parameters for a TESS search. The circled i icons at the right edge of the form are links for help for each parameter.
-
Figure 2.6.4The section of the form for selecting which databases are included in the search and for filtering which factors are included in the search. The number buttons, 0 to 5, are short cuts to queries with the corresponding number of search terms.
-
Figure 2.6.5The Factor Filters expanded to select only mammalian factors. This search is case sensitive but does not require a complete match.
-
Figure 2.6.6The String Scoring section of the form showing the default parameters (see text for more detail).
-
Figure 2.6.7The Matrix Scoring and Output Control sections of the job submission form. The default is to use log-likelihood scoring and not to perform Poisson significance thresholding.
-
Figure 2.6.8The Expert Parameters section for adjusting the matrix smoothing, background models, and ambiguous query base handling.
-
Figure 2.6.9The page that appears when a job has been successfully submitted. Note the job number for future reference. Click on the URL to retrieve the results.
-
Figure 2.6.10This is the central results page. The top shows the results links table. The bottom is the top portion of a table that summarizes the search parameters.
-
Figure 2.6.11A sample of a Tabular Results page. Use the top section to navigate through the pages of tabular results. There are paging buttons and direct sort-column-sensitive landmark links. Use the middle section to control which columns appear in the table. Check the columns you want to see, then click Select. The bottom section is the table of predicted binding sites. Click on a column header to sort the table on that column.
-
Figure 2.6.12A sample of the Annotated Sequence results page. The sites with L
d scores better than the Secondary Log-Likelihood Deficit are indicated with double bars. Blue bars indicate hits in the forward sense; red hits are in the reverse sense. -
Figure 2.6.13A sample of the Poisson Significance results page. This table lists the p-values for the number of hits observed for each model. The number of hits (N) and the estimated expected rate of random occurrence (Rate) are used to calculate the p-value. The actual threshold used taking into account both t
a and td is indicated at the right.
Literature Cited
| Literature Cited | |
| Berg, O.G. 1990. Base-pair specificity of protein-DNA recognition: A statistical-mechanical model. Biomed. Biochim. Acta 49: 963-975. | |
| Chen, Q.K., Hertz, G.Z., and Stormo, G.D. 1997. PromFD 1.0: A computer program that predicts eukaryotic pol II promoters using strings and IMD matrices. Comput. Appl. Biosci. 13: 29-35. | |
| Day, W.H. and McMorris, F.R. 1992. Critical comparison of consensus methods for molecular sequences. Nucleic Acids Res. 20: 1093-1099. | |
| Fitzwater, T. and Polisky, B. 1996. A SELEX primer. Methods Enzymol. 267: 275-301. | |
| Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, A.D. 2002. The Human Genome Browser at UCSC. Genome Res. 12: 996-1006. | |
| Loots, G.G., Ovcharenko, I., Pachter, L., Rubin, E., and Dubchak, I. 2002. rVISTA: A high throughput comparative approach to identifying eukaryotic transcriptional regulatory elements in noncoding genomic sequences. Genome Res. 12: 832-839. | |
| Quandt, K., Frech, K., Karas, H., Wingender, E., and Werner, T. 1995. MatInd and MatInspector: New, fast, and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23: 4878-4884. | |
| Schug, J. and Overton, G.C. 1997. Modeling transcription factor binding sites with Gibbs sampling and the minimum description length encoding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5: 268-271. | |
| Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMaker: A Web server for aligning two genomic DNA sequences. Genome Res. 10: 577-586. | |
| Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: A building block for a model organism system database. Genome Res. 12: 1599-1610. | |
| Vlieghe, D., Sandelin, A., De Bleser, P.J., Vleminckx, K., Wasserman, W.W., van Roy, F., and Lenhard, B. 2006. A new generation of jaspar, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. 34: D95-D97. | |
| Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., and Urbach, S. 2001. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29: 281-283. | |
| Internet Resources | |
| http://www.pcbi.upenn.edu/tess | |
| The TESS Web site. | |
| http://www.biobase.de | |
| Web site for the company that now maintains TRANSFAC. | |
Troubleshooting Tips
|
TOOLS & CALCULATORS |





Join the Conversation
Post new comment