An Introduction to Sequence Similarity (“Homology”) Searching
1Washington University, School of Medicine, St. Louis, Missouri
Abstract
Homologous sequences usually have the same, or very similar, functions, so new sequences can be reliably assigned functions if homologous sequences with known functions can be identified. Homology is inferred based on sequence similarity, and many methods have been developed to identify sequences that have statistically significant similarity. This unit provides an overview of some of the basic issues in identifying similarity among sequences and points out other units in this chapter that describe specific programs that are useful for this task. Curr. Protoc. Bioinform. 27:3.1.1-3.1.7. © 2009 by John Wiley & Sons, Inc.
Keywords: sequence similarity; homology; dynamic programming; similarity-scoring matrices; sequence alignment; multiple alignment; sequence evolution
Table of Contents
- An Introduction to Identifying Homologous Sequences
- Optimal Sequence Alignments
- Scoring Sequence Similarity
- Fast Searching Methods
- The Significance of an Alignment Score
- Making and Using Multiple Sequence Alignments
- Summary
- Literature Cited
- Figures
Figures
-
Figure 3.1.1Dynamic programming algorithm for optimum sequence alignment. The two sequences are written across the top and along the right side of the matrix. The score of each element is determined by the simple rules shown for the enlarged section and described by the equations below it. (The top row and left column have special rules as described in the text.) The score of the best global alignment is the element S(n,m) and the alignment with that score can be obtained by backtracking through the matrix, determining the path that generated the score at each element.
Literature Cited
| Literature Cited | |
| Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. | |
| Eddy, S.R. 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6:361-365. | |
| Fitch, W. and Smith, T. 1983. Optimal sequence alignments. Proc. Natl. Acad. Sci. U.S.A. 80:1382-1386. | |
| Gribskov, M., McLachlan, A.D., and Eisenberg, D. 1987. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84:4355-4358. | |
| Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264-2268. | |
| Krogh, A., Brown, M., Mian, I.S., Sjölander, K., and Haussler, D. 1994. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235:1501-1531. | |
| Lipman, D.J., Altschul, S.F., and Kececioglu, J.D. 1989. A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. U.S.A. 86:4412-4415. | |
| Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195-197. | |
Troubleshooting Tips
|
TOOLS & CALCULATORS |





Join the Conversation
Post new comment