
Using NCBI BLAST
Abstract
BLAST is the most widely used software in bioinformatics research. Its main function is to compare a sequence of interest, the query sequence, to sequences in a large database. BLAST then reports the best matches, or hits, found in the database. This simple program has two primary applications. First, if the function of the query sequence is unknown, it may be possible to infer its function based on the recognized functions of similar sequences. Second, if the researcher has a query sequence with a known function, it may be possible to identify sequences in the database that have similar functions. The utility of BLAST therefore depends on the researcher's choice of query sequence and database. An appreciation for the functions and limitations of BLAST are vital to using this program effectively. This unit will introduce the basic concepts behind BLAST, walk through BLAST searching protocols, and interpret common results. Curr. Protoc. Essential Lab. Tech. 1:11.1.1-11.1.36. © 2009 by John Wiley & Sons, Inc.
Keywords: BLAST; sequence alignment; sequence analysis; sequence annotation
Table of Contents
- Overview and Principles
- Strategic Planning
- Basic Protocol 1: Selecting a Sequence Using Entrez
- Basic Protocol 2: Search a Nucleotide Database Using a Nucleotide Query: Nucleotide BLAST (BLASTN)
- Basic Protocol 3: Search a Protein Database Using a Protein Query: Protein BLAST (BLASTP)
- Basic Protocol 4: Search a Protein Database Using a Translated Nucleotide Query: BLASTX
- Basic Protocol 5: Search a Translated Nucleotide Database Using a Protein Query: TBLASTN
- Basic Protocol 6: Search a Translated Nucleotide Database Using a Translated Nucleotide Query: TBLASTX
- Support Protocol 1: Preparing a Sequence in FASTA Format
- Support Protocol 2: Formatting a Sequence in GenBank/GenPept
- Understanding Results
- Troubleshooting
- A Practical Example
- Literature Cited
- Figures
- Tables
Figures
-

Figure 11.1.3 Screenshot of the NCBI Web page (http://www.ncbi.nlm.nih.gov/) showing the databases available for an Entrez search. -

Figure 11.1.4 Screenshot of the BLAST program selection homepage. -

Figure 11.1.5 Screenshot of the nucleotide BLAST form. -

Figure 11.1.7 Example screenshot of a BLAST results page. -

Figure 11.1.8 Screenshot of the protein BLAST form. -

Figure 11.1.9 Screenshot of the BLASTX form, showing the genetic codes available for the query sequence. -

Figure 11.1.10 Screenshot of the TBLASTN form. -

Figure 11.1.11 Screenshot of the TBLASTX form, showing the genetic codes available for the query sequence. -

Figure 11.1.16 Screenshot of the formatting options in a BLAST result page. -

Figure 11.1.17 Screenshot of the Search Summary showing the different parameters used by BLAST in the search. -

Figure 11.1.19 Screenshot of the Graphics Summary of the BLASTP search of the yeast formaldehyde dehydrogenase gene (SFA1; accession number: NP_010113) against the protein NR database. -

Figure 11.1.20 Screenshot of the Graphics Summary of the BLASTP search of the hydra tyrosine kinase HTK30 (accession number: AAC34124) against the protein nr database. -

Figure 11.1.21 Screenshot of part of the Descriptions section of the BLASTP search of the yeast formaldehyde dehydrogenase gene (SFA1; accession number: NP_010113) against the protein NR database. -

Figure 11.1.24 Screenshot showing how to alter the Algorithm parameters to return 5000 hits to the query sequence. -

Figure 11.1.25 Screenshot showing how to limit your BLAST search to Tetrahymena thermophila sequences in the database. -

Figure 11.1.27 Graphical result of a BLASTP search of the Tetrahymena formaldehyde dehydrogenase protein (accession: XP_001013202) against the nr protein database. -

Figure 11.1.28 Using the Query subrange boxes to limit the BLAST search to only part of your sequence. -

Figure 11.1.29 Graphical result of a BLASTP search of the Tetrahymena formaldehyde dehydrogenase protein (accession: XP_001013202) against all the human sequences in the nr protein database.
Videos
Literature Cited
| Literature Cited | |
| Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. | |
| Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25:3389-3402. | |
| Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. 1978. "A model of evolutionary change in proteins". In Atlas of Protein Sequence and Structure. Vol. 5 (M.O. Dayhoff, ed.) pp. 345-352. National Biomedical Research Foundation. | |
| Eddy, S.R. 2004a. Where did the BLOSUM62 alignment score matrix come from? Nat. Biotechnol. 22:1035-1036. | |
| Eddy, S.R. 2004b. What is dynamic programming? Nat. Biotechnol. 22:909-910. | |
| Henikoff, S. and Henikoff, J. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89:10915-10919. | |
| Korf, I., Yandell, M., and Bedell, J. 2003. Blast. O'Reilly Media, Inc. | |
| Ladunga, I. 2009. Finding similar nucleotide sequences using network BLAST searches. Curr. Protoc. Bioinform. 26:3.3.1-3.3.26. | |
| Leonard, S.A., Littlejohn, T.G., and Baxevanis, A.D. 2006. Common file formats. Curr. Protoc. Bioinform. 16:A.1B.1-A.1B.9. | |
| Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443-453. | |
| Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195-197. | |
| Stover, N.A., Cavalcanti, A.R., Li, A.J., Richardson, B.C., and Landweber, L.F. 2005. Reciprocal fusions of two genes in the formaldehyde detoxification pathway in ciliates and diatoms. Mol. Biol. Evol. 22:1539-1542. | |
| Wheeler, D. 2003. Selecting the right protein-scoring matrix. Curr. Protoc. Bioinform. 00:3.5.1-3.5.6. | |














