Metagenomics sequence formats

From TheSeed
Revision as of 02:45, 5 October 2007 by RobEdwards (talk | contribs) (What files to upload)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

File formats

To upload sequence data to the metagenomics RAST server, we accept several file formats.

  • You can upload a fasta file containing just the nucleotide sequences. This is the simplest format, just have a regular Valid fasta format nucleotide sequence file, and upload it. However, there may be some limitation on the file size.
  • You can compress the sequence file containing just the nucleotide sequences with gzip, a popular compression tool. This will significantly reduce the size of the file to upload, and hence speed things up.
  • You can also include a separate quality file in this same compressed file. To do this, compress both files into a single archive:
   gzip archive.gz sequence.fa sequence.qual
  
   and then upload the archive.gz file (don't worry, we'll take care of the name!)


If you do this, we will renumber the sequences and their corresponding quality scores at the same time. At the moment we don't use the quality scores, although we are experimenting with assembly tools that may take advantage of them. Therefore, the inclusion of quality scores is completely optional.