Using BLAST for Performing Sequence Alignment

Matthew D. Healy1

1 Bristol Myers Squibb Pharmaceutical Research Institute, Wallingford, Connecticut
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 6.8
DOI:  10.1002/0471142905.hg0608s52
Online Posting Date:  January, 2007
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

BLAST is a widely used genetic sequence comparison program developed at the National Center for Biotechnology Information (NCBI). In this unit, three Basic Protocols and one Support Protocol are provided for general‐purpose BLAST searches on the NCBI and ENSEMBL Web‐accessible BLAST servers. Key parameters affecting how the search algorithm works are reviewed, with advice on modifying search parameters for specific situations. Many other public and private Web sites offer BLAST interfaces which may differ from those described in this unit, but the general principles will be similar. The Support Protocol describes how to obtain sequences in various formats from NCBI for use in BLAST searches. It is emphasized that no algorithm can be a substitute for biological understanding; performing a BLAST search takes only a few minutes but understanding the implications of the results takes much longer.

Keywords: Algorithms; Molecular Sequence Data; Sequence Alignment; Software

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: BLASTP: Searching the NCBI Protein Databases Using a Protein Query Sequence
  • Basic Protocol 2: BLASTN: Searching the NCBI Nucleotide Databases Using a Nucleotide Query Sequence
  • Basic Protocol 3: Searching the Ensembl Human Genomic Nucleotide Database Using a Nucleotide Query Sequence
  • Support Protocol 1: Downloading Protein and Nucleotide Sequences from the NCBI Databases
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: BLASTP: Searching the NCBI Protein Databases Using a Protein Query Sequence

  Materials
  • A computer with Internet access running any standard Web browser such as Internet Explorer or Mozilla. While a broadband connection is preferable, most features of the NCBI BLAST interface perform acceptably over a dialup connection.

Basic Protocol 2: BLASTN: Searching the NCBI Nucleotide Databases Using a Nucleotide Query Sequence

  Materials
  • A computer with Internet access running any standard Web browser such as Internet Explorer or Mozilla. While a broadband connection is preferable, most features of the NCBI BLAST interface perform acceptably over a dialup connection.

Basic Protocol 3: Searching the Ensembl Human Genomic Nucleotide Database Using a Nucleotide Query Sequence

  Materials
  • A computer with Internet access running any standard Web browser such as Internet Explorer or Mozilla. Since ENSEMBL is very graphics‐rich, a broadband connection is strongly preferable to a dial‐up connection.

Support Protocol 1: Downloading Protein and Nucleotide Sequences from the NCBI Databases

  Materials
  • A computer with Internet access running any standard Web browser such as Internet Explorer or Mozilla. While a broadband connection is preferable, most features of the NCBI search interface perform acceptably over a dialup connection. Note that if the reader is downloading a large amount of sequence data from NCBI, a fast connection is essential.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

   Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nat. Genet. 6:119‐129.
   Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   Altschul, S.F., Wootton, J.C., Gertz, E.M., Agarwala, R., Morgulis, A., Schaffer, A.A., and Yu, Y.K. 2005. Protein database searches using computationally adjusted substitution matrices. FEBS Journal 272:5099‐5100.
   Gilks, W.R., Audit, B., de Angelis, D., Tsoka, S., and Ouzounis, C.A. 2005. Percolation of annotation errors through hierarchically structured protein sequence databases. Math. Biosci. 193:223‐234.
   Jackson, D.G., Healy, M.D., Davison, D.B. 2003. Bioinformatics: Not just for sequences anymore. Biosilico 1:103‐111.
   Jones, D.T. and Swindells, M.B. 2002. Getting the most from PSI‐BLAST. Trends Biochem. Sci. 27:161‐164.
   McGinnis, S., and Madden, T.L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32:W20‐W25.
Key References
   Altschul et al., 1997. See above.
  In the late 1990s NCBI made major improvements to the BLAST algorithms; this paper summarizes how those improvements work and why they matter.
   Korf, I., Yandell, M., and Bedell, J. 2003. BLAST. O'Reilly Media, Sebastopol, CA.
  This is an entire book dedicated to the BLAST program, from a leading publisher of technical books.
   Woodford, N. 2004. Public databases: retrieving and manipulating sequences for beginners. Methods Mol. Biol. 266:17‐28.
  This general discussion of how to use the major sequence databases can supplement of this Unit.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library