Gene Identification in Prokaryotic Genomes, Phages, Metagenomes, and EST Sequences with GeneMarkS Suite

Mark Borodovsky1, Alex Lomsadze1

1 Georgia Institute of Technology, Atlanta, Georgia
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 4.5
DOI:  10.1002/0471250953.bi0405s35
Online Posting Date:  September, 2011
This unit describes how to use several gene‐finding programs from the GeneMark line developed for finding protein‐coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced‐out introns. These bioinformatics tools were demonstrated to have state‐of‐the‐art accuracy and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence‐analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self‐training (unsupervised training). Curr. Protoc. Bioinform. 35:4.5.1‐4.5.17. © 2011 by John Wiley & Sons, Inc.

Keywords: gene finding; hidden Markov model; unsupervised parameter estimation

Table of Contents

  • Introduction
  • Basic Protocol 1: Using GeneMarkS
  • Basic Protocol 2: Using GeneMark.hmm for Prokaryotic Gene Prediction
  • Basic Protocol 3: Using GeneMark for Prokaryotic Gene Prediction
  • Basic Protocol 4: Using the Heuristic Approach for Prokaryotic Model Building
  • Basic Protocol 5: Using MetaGeneMark for Finding Genes in Metagenomes
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
