An Introduction to Sequence Similarity (“Homology”) Searching

William R. Pearson1

1 University of Virginia School of Medicine, Charlottesville, Virginia
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 3.1
DOI:  10.1002/0471250953.bi0301s42
Online Posting Date:  June, 2013
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Sequence similarity searching, typically with BLAST, is the most widely used and most reliable strategy for characterizing newly determined sequences. Sequence similarity searches can identify “homologous” proteins or genes by detecting excess similarity— statistically significant similarity that reflects common ancestry. This unit provides an overview of the inference of homology from significant similarity, and introduces other units in this chapter that provide more details on effective strategies for identifying homologs. Curr. Protoc. Bioinform. 42:3.1.1‐3.1.8. © 2013 by John Wiley & Sons, Inc.

Keywords: sequence similarity; homology; orthlogy; paralogy; sequence alignment; multiple alignment; sequence evolution

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • An Introduction to Identifying Homologous Sequences
  • Inferring Homology from Similarity
  • Inferring Function from Homology
  • From Pairwise to Multiple Sequence Alignment
  • Summary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   Edgar, R.C. 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792‐1797.
   Gerlt, J.A. and Babbitt, P.C. 2000. Can sequence determine function? Genome Biol. 1:REVIEWS0005.
   Gonzalez, M.W. and Pearson, W.R. 2010. Homologous over‐extension: A challenge for iterative similarity searches. Nucleic Acids Res. 38:2177‐2189.
   Johnson, L.S., Eddy, S.R., and Portugaly, E. 2010. Hidden markov model speed heuristic and iterative hmm search procedure. BMC Bioinformatics 11:431.
   Katoh, K., Misawa, K., Kuma, K., and Miyata, T. 2002. MAFFT: A novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 30:3059‐3066.
   Koonin, E.V. 2005. Orthologs, paralogs, and evolutionary genomics. Ann. Rev. Genet. 39:309‐338.
   Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., and Higgins, D.G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947‐2948.
   Li, W., McWilliam, H., Goujon, M., Cowley, A., Lopez, R., and Pearson, W.R. 2012. PSI‐Search: Iterative HOE‐reduced profile Ssearch searching. Bioinformatics 28:1650‐1651.
   Pearson, W.R. 1991. Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the smith‐waterman and FASTA algorithms. Genomics 11:635‐650.
   Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
   Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library