Finding Similar Nucleotide Sequences Using Network BLAST Searches

Istvan Ladunga1

1 Departments of Statistics, Biochemistry, and School of Biological Sciences, University of Nebraska‐Lincoln, Lincoln, Nebraska
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 3.3
DOI:  10.1002/cpbi.29
Online Posting Date:  June, 2017
The Basic Local Alignment Search Tool (BLAST) is the first tool in the annotation of nucleotide or amino acid sequences. BLAST is a flagship of bioinformatics due to its performance and user‐friendliness. Beginners and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNAs, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but indicate hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low‐complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez and PubMed, as well as structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge. © 2017 by John Wiley & Sons, Inc.

Keywords: BLAST; sequence alignment; database search; homology search; mapping; nucleic acid; DNA; RNA; genome; blastn; Megablast

Table of Contents

  • Introduction
  • Basic Protocol 1: Using the WEB‐Interface Blast from the NCBI Blast Server for Nucleotide Sequences
  • Basic Protocol 2: The Default Blastn Result Output
  • Support Protocol 1: Setting Optional Parameters
  • Support Protocol 2: Formatting Results of a Blast Search
  • Alternate Protocol 1: Megablast Search for Ribosomal RNA
  • Alternate Protocol 2: Finding Transcribed Gene Copies and Splice Variants Using Megablast
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
Literature Cited

Key References
  Altschul et al., 1994. See above.
  An excellent review on the application of pair‐wise BLAST tools for the identification of possible coding regions, for the elucidation of gene structure and protein function. This review discusses significance sequence filtering, database issues, alignment statistics, gap costs, scoring systems, and others.
  Altschul et al., 1997. See above.
  This is the original research paper on gapped alignment BLAST and position‐specific iterative BLAST. A series of algorithmic and performance improvements, gap penalty, and statistical considerations, as well as biological examples with marginal similarities are covered.
  Baxevanis, A. D., & Ouellette, B. F. (2005). Bioinformatics. A practical guide to the analysis of genes and proteins. Hoboken, NJ: John Wiley & Sons.
  A widely taught, clearly written textbook that introduces pairwise sequence similarity searches, biological databases, and many other areas of bioinformatics. Reviews the general concepts of alignments, scoring matrices, and BLAST with practical applications and guidelines for interpretation.
  Korf et al., 2003. See above.
  An excellent overview of theory and practice of the BLAST tools as of 2003. This most comprehensive and easy‐to‐understand textbook is highly recommended to everyone in bioinformatics or computational biology.
Internet Resources
  The NCBI BLAST Web site.
  The Entrez Documentation at NCBI.
  The Entrez site for nucleic acid searches at NCBI.
  The BioPerl site.
  The full documentation for BLAST at NCBI.
  The new Server for the Washington University BLAST.
  The RepeatMasker Web site.‐bin/WEBRepeatMasker.
