User Ratings

Your rating: None
Your rating: None
Your rating: None
Add your comments

Common File Formats

Shonda A. Leonard1,  Timothy G. Littlejohn2,  Andreas D. Baxevanis3

1IBM Life Sciences, St. Leonards, NSW, Australia
2Bethesda, Maryland


Unit Number: 
Appendix 1B
DOI: 
10.1002/0471250953.bia01bs16
Online Posting Date: 
January, 2007
GO TO THE FULL TEXT:
PDF or HTML at Wiley Online Library
Are you the author of this protocol? Login or register and return to this page.

Abstract

This appendix discusses a few of the file formats frequently encountered in bioinformatics. Specifically, it reviews the rules for generating FASTA files and provides guidance for interpreting NCBI descriptor lines, commonly found in FASTA files. In addition, it reviews the construction of GenBank, Phylip, MSF and Nexus files.

Keywords: file format; FASTA; NCBI descriptor lines; GenBank; Phylip; MSF; Nexus

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Unit Introduction
  • FASTA Files
  • GenBank Flat Files
  • Phylip Files
  • MSF Files
  • Nexus Files
  • Converting between File Formats
  • Disclaimer
  • Bibliography
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  • Figure A.1B.1
    A sample FASTA file that contains the sequences for two homologous proteins, actophorin and yeast cofilin. Note that a greater-than sign (>) designates the beginning of each entry and that each of the lines of sequence contains less than 80 characters.

  • Figure A.1B.2
    A sample GenBank record. Circled numbers identify the fields listed in Table A.1B.1.

  • Figure A.1B.3
    A sample PHYLIP-formatted file. The five sequences shown are HIV-1 and HIV-2 gag proteins from a variety of isolates. See text for details.

  • Figure A.1B.4
    A sample MSF-formatted file. The five sequences shown are HIV-1 and HIV-2 gag proteins from a variety of isolates. See text for details.

  • Figure A.1B.5
    A sample Nexus-formatted file. The five sequences shown are HIV-1 and HIV-2 gag proteins from a variety of isolates. See text for details.

Literature Cited

 Internet Resources
    http://iubio.bio.indiana.edu/cgi-bin/readseq.cgi

ReadSeq biosequence interconversion tool.

    http://www.ebi.ac.uk/clustalw

ClustalW multiple sequence alignment interface.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
Looking for Answers?
Do you have tips, tricks, or improvements to share?

Join the Conversation

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.