Naomi A. Stover1, Andre R.O. Cavalcanti2

1 Bradley University, Peoria, Illinois, 2 Pomona College, Claremont, California
Publication Name:  Current Protocols Essential Laboratory Techniques
Unit Number:  Unit 11.1
DOI:  10.1002/cpet.8
Online Posting Date:  May, 2017
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


BLAST is the most widely used software in bioinformatics research. Its main function is to compare a sequence of interest, the query sequence, to sequences in a large database. BLAST then reports the best matches, or “hits,” found in the database. This simple program has two primary applications. First, if the function of the query sequence is unknown, it may be possible to infer its function based on the recognized functions of similar sequences. Second, if the researcher has a query sequence with a known function, it may be possible to identify sequences in the database that have similar functions. The utility of BLAST therefore depends on the researcher's choice of query sequence and database. An appreciation for the functions and limitations of BLAST is vital to using this program effectively. This unit will introduce the basic concepts behind BLAST, walk through BLAST searching protocols, and interpret common results. © 2017 by John Wiley & Sons, Inc.

Keywords: BLAST; sequence alignment; sequence analysis; sequence annotation

PDF or HTML at Wiley Online Library

Table of Contents

  • Overview and Principles
  • Strategic Questions
  • Strategic Planning
  • Protocols
  • Basic Protocol 1: Selecting a Sequence Using Entrez
  • Basic Protocol 2: Search a Nucleotide Database Using a Nucleotide Query: Nucleotide BLAST (BLASTN)
  • Support Protocol 1: Search a Protein Database Using a Protein Query: Protein BLAST (BLASTP)
  • Basic Protocol 3: Search a Protein Database Using a Translated Nucleotide Query: BLASTX
  • Basic Protocol 4: Search a Translated Nucleotide Database Using a Protein Query: TBLASTN
  • Basic Protocol 5: Search a Translated Nucleotide Database Using a Translated Nucleotide Query: TBLASTX
  • Support Protocol 2: Preparing a Sequence in FASTA Format
  • Support Protocol 3: Formatting a Sequence in GenBank/GenPept
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
   Altschul, S.F. , Gish, W. , Miller, W. , Myers, E.W. , and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410. doi: 10.1016/S0022‐2836(05)80360‐2
   Altschul, S.F. , Madden, T.L. , Schäffer, A.A. , Zhang, J. , Zhang, Z. , Miller, W. , and Lipman, D.J. 1997. Gapped BLAST and PSI–BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402. doi:
   Bateman, A. , Pearson, W.R. , Stein, L.D. , Stormo, G.D. , and Yates, J.R. III . (eds.) 2017. Current Protocols in Bioinformatics. Chapter 3. John Wiley & Sons, Hoboken, N.J.
   Chang, W.‐J. , Zaila, K.E. , and Coppola, T.W. 2016. Submitting a sequence to GenBank. Curr. Protoc. Essen. Lab. Tech. 12:11.2.1–11.2.24. doi: 10.1002/9780470089941.et1101s12
   Dayhoff, M.O. , Schwartz, R.M. , and Orcutt, B.C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure. Vol. 5 ( M.O. Dayhoff , ed.) pp. 345‐352. National Biomedical Research Foundation, Washington, D.C.
   Eddy, S.R. 2004a. Where did the BLOSUM62 alignment score matrix come from? Nat. Biotechnol. 22:1035‐1036. doi: 10.1038/nbt0804‐1035
   Engel, S.R. , and MacPherson, K.A. 2016. Using model organism databases (MODs). Curr. Protoc. Essen. Lab. Tech. 13:11.4.1–11.4.22. doi: 10.1002/cpet.4
   Eddy, S.R. 2004b. What is dynamic programming? Nat. Biotechnol. 22:909‐910. doi: 10.1038/nbt0704‐909
   Henikoff, S. and Henikoff, J. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89:10915‐10919.
   Korf, I. , Yandell, M. , and Bedell, J. 2003. BLAST. O'Reilly Media, Inc., Sebastopol, Calif.
   Ladunga, I. 2009. Finding similar nucleotide sequences using network BLAST searches. Curr. Protoc. Bioinform. 26:3.3.1‐3.3.26. doi: 10.1002/0471250953.bi0303s26
   Leonard, S.A. , Littlejohn, T.G. , and Baxevanis, A.D. 2006. Common file formats. Curr. Protoc. Bioinform. 16:A.1B.1‐A.1B.9. doi: 10.1002/0471250953.bia01bs16
   Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443‐453.‐2836(70)90057‐4
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.‐2836(81)90087‐5
   Stover, N.A. , Cavalcanti, A.R. , Li, A.J. , Richardson, B.C. , and Landweber, L.F. 2005. Reciprocal fusions of two genes in the formaldehyde detoxification pathway in ciliates and diatoms. Mol. Biol. Evol. 22:1539‐1542. doi: 10.1093/molbev/msi151
   Wheeler, D. 2003. Selecting the right protein–scoring matrix. Curr. Protoc. Bioinform. 00:3.5.1‐3.5.6. doi: 10.1002/0471250953.bi0305s00
   Zufall, R.A. 2017. Beyond simple homology searches: Multiple sequence alignments and phylogenetic trees. Curr. Protoc. Essen. Lab. Tech. 1:11.3.1–1.3.17. doi: 10.1002/9780470089941.et1103s01
PDF or HTML at Wiley Online Library