User Ratings

Your rating: None
Your rating: None
Your rating: None
Add your comments

An Introduction to Recognizing Functional Domains

Gary D. Stormo1

1Washington University School of Medicine, St. Louis, Missouri

Unit Number: 
Unit 2.1
DOI: 
10.1002/0471250953.bi0201s34
Online Posting Date: 
June, 2011
GO TO THE FULL TEXT:
PDF or HTML at Wiley Online Library
Are you the author of this protocol? Login or register and return to this page.

Abstract

This unit provides an overview of issues involved in domain recognition in protein and DNA sequences. It opens with a discussion of the two primary methods of domain representation, namely consensus sequences and alignment matrices (e.g., the log-odds matrix). The unit continues with a brief overview of some of the resources available for identifying functional domains in nucleotide sequences (e.g., transcription factor binding sites). In addition, it reviews databases such as Pfam and InterPro, which are available for protein analysis. Curr. Protoc. Bioinform. 34:2.1.1-2.1.6. © 2011 by John Wiley & Sons, Inc.

Keywords: functional domains; protein domains; transcription factor binding sites; regulatory sites; promoters

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Domain Representations
  • DNA Analyses
  • Protein Analyses
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  • Figure 2.1.1
    Binding site representation for CREB (M00178 from TRANSFAC using TESS, unit 2.6). (A) Alignment matrix that lists the number of times each base occurs at each position in a collection of 20 known binding sites. (B) Consensus sequence, showing the most common base at each position, or using ambiguous base representations when two bases are very common (see appendix 1A for IUPAC ambiguous base code). (C) Weight matrix for the set of sites, using the logarithms of the observed counts over those expected by chance. All numbers in the alignment matrix are increased by one as a small sample size correction. (D) The scores obtained with the matrix in C for two specific sequences. The scores are the sum of the bold numbers, corresponding to the base at each position in each sequence.

Literature Cited

Literature Cited
    Eddy, S.R. 1998. Profile hidden Markov models. Bioinformatics 14:755-763.
    Portales-Casamar, E., Thongjuea, S., Kwon, A.T., Arenillas, D., Zhao, X., Valen, E., Yusuf, D., Lenhard, B., Wasserman, W.W., and Sandelin, A. 2010. JASPAR 2010: The greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 38:D105-D110.
    Stormo, G.D. 2000. DNA binding sites: Representation and discovery. Bioinformatics 16:16-23.
    Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhäuser, R., Prüss, M., Schacherer, F., Thiele, S., and Urbach, S. 2001. The TRANSFAC system on gene expression regulation. Nucl. Acids Res. 29:281-283.
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
Looking for Answers?
Do you have tips, tricks, or improvements to share?

Join the Conversation

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.