PubSearch and PubFetch: A Simple Management System for Semiautomated Retrieval and Annotation of Biological Information from the Literature

Danny Yoo1, Iris Xu1, Tanya Z. Berardini1, Seung Yon Rhee1, Vijay Narayanasamy2, Simon Twigger2

1 Carnegie Institution, Stanford, California, 2 Medical College of Wisconsin, Milwaukee, Wisconsin
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 9.7
DOI:  10.1002/0471250953.bi0907s13
Online Posting Date:  March, 2006
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


For most systems in biology, a large body of literature exists that describes the complexity of the system based on experimental results. Manual review of this literature to extract targeted information into biological databases is difficult and time consuming. To address this problem, we developed PubSearch and PubFetch, which store literature, keyword, and gene information in a relational database, index the literature with keywords and gene names, and provide a Web user interface for annotating the genes from experimental data found in the associated literature. A set of protocols is provided in this unit for installing, populating, running, and using PubSearch and PubFetch. In addition, we provide support protocols for performing controlled vocabulary annotations. Intended users of PubSearch and PubFetch are database curators and biology researchers interested in tracking the literature and capturing information about genes of interest in a more effective way than with conventional spreadsheets and lab notebooks.

Keywords: literature curation; ontology; controlled vocabulary; annotation; genes; relational database; web application

PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Populating PubSearch
  • Support Protocol 1: Installing PubSearch
  • Support Protocol 2: Installing PubFetch for Use Outside of PubSearch
  • Alternate Protocol 1: Other Ways to Populate PubSearch
  • Basic Protocol 2: Setting up a PDF Repository for Full‐Text Indexing
  • Basic Protocol 3: Using PubSearch to Search Data
  • Basic Protocol 4: Using PubSearch to Add and Update Data
  • Basic Protocol 5: Using PubSearch to Make Gene Ontology Annotations
  • Basic Protocol 6: Generating and Loading InterProToGo Annotations
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   The Gene Ontology Consortium. 2001. Creating the gene ontology resource: Design and implementation. Genome Res. 11:1425‐1433.
   Müller, H., Kenny, E.E., and Sternberg, P.W. 2004. Textpresso: An ontology‐based information retrieval and extraction system for biological literature. PLoS Biol 2:e309.
   Rhee, S.Y., Beavis, W., Berardini, T.Z., Chen, G., Dixon, D., Doyle, A., Garcia‐Hernandez, M., Huala, E., Lander, G., Montoya, M., Miller, N., Mueller, L.A., Mundodi, S., Reiser, L., Tacklind, J., Weems, D.C., Wu, Y., Xu, I., Yoo, D., Yoon, J., and Zhang, P. 2003. The Arabidopsis Information Resource (TAIR): A model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucl. Acids Res. 31:224‐228.
Internet Resources
  Gene Ontology's SourceForge repository.
  PubSearch homepage.
  PubSearch demo version.‐pubsearch‐dv
  PubSearch support mailing list.
  Generic Model Organism Database project home page.
PDF or HTML at Wiley Online Library