Sequence Similarity Searching Using the BLAST Family of Programs
Tyra G. Wolfsberg1, Thomas L. Madden1
1National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, Maryland
1National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, Maryland
Publication Name:
Current Protocols in Protein Science
Unit Number:
Unit 2.5
DOI:
10.1002/0471140864.ps0205s15
Online Posting Date:
May, 2001 Abstract
The BLAST (Basic Local Alignment Search Tool) family of sequence similarity search programs allows users to input either a nucleotide or amino acid query sequence, and search a nucleotide or amino acid sequence database. The program returns a list of the sequence hits, alignments to the query sequence, and statistical values. This unit describes how to choose an appropriate BLAST program and database, perform the search, and interpret the results.
Table of Contents
- Unit Introduction
- Accessing BLAST Programs and Documentation
- Introduction to Blast
- Examples of BLAST Searches
- Searching Strategies
- Sequence Alignment Algorithms
- Appendix A: Blast Parameters
- Appendix B: Sequence Identifier Syntax
- Literature Cited
- Figures
- Tables
Figures
-
Figure 2.5.1Submitting a BLASTP search using the NCBI's World Wide Web interface.
-
Figure 2.5.2Example of the top portion of a BLASTP report.
-
Figure 2.5.3Example of the graphical view of a BLASTP report. The bars are color coded by the strength of the database match. The strongest matches (those with a bit score >200) are red, followed by pink (bit score 80 to 200), green (50 to 80), blue (40 to 50), and black (<40).
-
Figure 2.5.4Example of the hit list from a BLASTP report.
-
Figure 2.5.5Example of a BLASTP alignment.
-
Figure 2.5.6Example of the graphical view of a BLASTX report.
-
Figure 2.5.7Example of the hit list from a BLASTX report.
-
Figure 2.5.8Example of selected BLASTX alignments.
-
Figure 2.5.9Example of the graphical view of a TBLASTN report.
-
Figure 2.5.10Example of the hit list from a TBLASTN report.
-
Figure 2.5.11Example of a TBLASTN alignment.
-
Figure 2.5.12Example of the graphical view of a BLASTN report.
-
Figure 2.5.13Example of the hit list from a BLASTN report.
-
Figure 2.5.14Example of a BLASTN alignment.
-
Figure 2.5.15Example of the hit list from a PSI-BLAST report.
-
Figure 2.5.16Example of a hit list from a BLASTP report in which the query sequence was not filtered. Black squares, added manually by the authors, indicate hits that would not appear if the query had been filtered.
Literature Cited
| Literature Cited | |
| Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., et al. 1991. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252:1651-1656. | |
| Altschul, S.F. 1991. Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219:555-565. | |
| Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. | |
| Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nature Genet. 6:119-129. | |
| Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389-3402. | |
| Bairoch, A. and Apweiler, R. 1998. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998. Nucl. Acids Res. 26:38-42. | |
| Barker, W.C., Garavelli, J.S., Haft, D.H., Hunt, L.T., Marzec, C.R., Orcutt, B.C., Srinivasarao, G.Y., Yeh, L.S.L., Ledley, R.S., Mewes, H.W., Pfeiffer, F., and Tsugita, A. 1998. The PIR-International Protein Sequence Database. Nucl. Acids Res. 26:27-32. | |
| Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., and Ouellette, B.F. 1998. GenBank. Nucl. Acids Res. 26:1-7. | |
| Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbESTdatabase for expressed sequence tags. Nature Genet. 4:332-333. | |
| Chandrasekharappa, S.C., Guru, S.C., Manickam, P., Olufemi, S.E., Collins, F.S., Emmert-Buck, M.R., Debelenko, L.V., Zhuang, Z., Lubensky, I.A., Liotta, L.A., et al. 1997. Positional cloning of the gene for multiple endocrine neoplasia-type 1. Science 276:404-407. | |
| Chang, Z.Y., Nygaard, P., Chinault, A.C., and Kellems, R.E. 1991. Deduced amino acid sequence of Escherichia coli adenosine deaminase reveals evolutionarily conserved amino acid residues: Implications for catalytic function. Biochemistry 30:2273-2280. | |
| Claverie, J.M. and Makalowski, W. 1994. Alu alert. Nature 371:752. | |
| Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, Vol. 5, suppl. 3. (M.O. Dayhoff, ed.) pp. 345-352. Natl. Biomed. Res. Found., Washington, D.C. | |
| Gish, W. and States, D.J. 1993. Identification of protein coding regions by database similarity search. Nature Genet. 3:266-272. | |
| Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89:10915-10919. | |
| Holm, L. and Sander, C. 1997. An evolutionary treasure: Unification of a broad set of amidohydrolases related to urease. Proteins 28:72-82. | |
| Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264-2268. | |
| Karlin, S. and Altschul, S.F. 1993. Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. U.S.A. 90:5873-5877. | |
| Lavin, M.F. and Shiloh, Y. 1997. The genetic defect in ataxia-telangiectasia. Annu. Rev. Immunol. 15:177-202. | |
| Olson, M., Hood, L., Cantor, C., and Botstein, D. 1989. A common language for physical mapping of the human genome. Science 245:1434-1435. | |
| Ostell, J.M. and Kans, J.A. 1998. The NCBI data model. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp. 121-144. John Wiley & Sons, New York. | |
| Ouellette, B.F.F. 1998. The GenBank sequence database. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp. 16-45. John Wiley & Sons, New York. | |
| Ouellette, B.F. and Boguski, M.S. 1997. Database divisions and homology search files: A guide for the perplexed. Genome Res. 7:952-955. | |
| Pearson, W.R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183:63-98. | |
| Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444-2448. | |
| Schuler, G.D. 1998. Sequence alignment and database searching. In Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins (A.D. Baxevanis and B.F.F. Ouellette, eds.) pp. 145-171. John Wiley & Sons, New York. | |
| Schwartz, R.M. and Dayhoff, M.O. 1978. Matrices for detecting distant relationships. In Atlas of Protein Sequence and Structure, Vol. 5, suppl. 3. (M.O. Dayhoff, ed.) pp. 353-358. Natl. Biomed. Res. Found., Washington, D.C. | |
| Seabra, M.C., Brown, M.S., and Goldstein, J.L. 1993. Retinal degeneration in choroideremia: Deficiency of rab geranylgeranyl transferase. Science 259:377-381. | |
| Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195-197. | |
| Smith, M.W., Holmsen, A.L., Wei, Y.H., Peterson, M., and Evans, G.A. 1994. Genomic sequence sampling: A strategy for high resolution sequence-based physical mapping of complex genomes. Nature Genet. 7:40-47. | |
| Stoesser, G., Moseley, M.A., Sleep, J., McGowran, M., Garcia-Pastor, M., and Sterk, P. 1998. The EMBL nucleotide sequence database. Nucl. Acids Res. 26:8-15. | |
| Tateno, Y., Fukami-Kobayashi, K., Miyazaki, S., Sugawara, H., and Gojobori, T. 1998. DNA Data Bank of Japan at work on genome sequence data. Nucl. Acids Res. 26:16-20. | |
| Wolfsberg, T.G., Straight, P.D., Gerena, R.L., Huovila, A.P., Primakoff, P., Myles, D.G., and White, J.M. 1995. ADAM, a widely distributed and developmentally regulated gene family encoding membrane proteins with a disintegrin and metalloprotease domain. Dev. Biol. 169:378-383. | |
| Wootton, J.C. and Federhen, S. 1993. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149-163. | |
| Wootton, J.C. and Federhen, S. 1996. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266:554-571. | |
| Zhang, J. and Madden, T.L. 1997. PowerBLAST: A new network BLAST application for interactive or automated sequence analysis and annotation. Genome Res. 7:649-656. | |
| Zhang, Z., Berman, P., and Miller, W. 1998. Alignments without low-scoring regions. J. Comput. Biol. 5:197-210. | |
Troubleshooting Tips
|
TOOLS & CALCULATORS |





Join the Conversation
>> Seq = getgenpept('AAA59174')
When i execute above command in matlab returns a error message. which is given below....
Warning: Unable to locate GenPept information for access number AAA59174. Trying FASTA...
> In getgenpept at 73
plseeeeee,,, help.
i can't use Blast program in Matlab.
Matlab returns error when i run
>>getpdb( '1A00','ToFile', 'collagen.pdb') or
>>Seq = getgenpept('AAA59174').
Please,,,Help me. What can i do????????
Post new comment