Creating Databases for Biological Information: An Introduction
1Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
Abstract
The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, and relational databases, as well as ACeDB. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system.
Figures
-
Figure 9.1.1A relational schema for protein sequences separates information in distinct tables to minimize redundancy.
-
Figure 9.1.2A flat-file representation of the same data will cause two proteins that share the same function of taxon to duplicate the information in common_name, genus, species, go-accession, and description.
-
Figure 9.1.3The protein database as an ACeDB schema.
Troubleshooting Tips
|
TOOLS & CALCULATORS |





Join the Conversation
Post new comment