RAST Quality Report

From TheSeed
Revision as of 08:35, 15 February 2008 by Marland (talk | contribs)
Jump to navigation Jump to search

The RAST Server offers a brief quality report on the Job Details of your genome. The purpose of this page is to explain what those statistics mean and how we compute them.

For those explanations, PEG refers to protein encoding gene and is equivalent to a CDS (Coding Sequence).

Summary

Number of Features: Total number of features (PEGs + RNAs)

Number of Warnings: Total number of non-fatal warning-conditions detected*.

Number of fatal problems: Total number of fatal error-conditions detected*.

*The difference between warnings and fatal problems is the impact on the RAST pipeline. While both are serious quality problems found by our automated control, only fatal problems require (your) intervention, eg. running the automated correction methods provided by the RAST pipeline. Please note that you can decide to apply all available automatic corrections during the upload.

Below you will find detailed explanations of these warnings and errors.


Possibly missing genes: Crude but conservative estimate of the expected number of "undercalled" PEGs in the remaining gaps between features:

 Estimated number of potentially "missing" PEGs :=
 (number of base-pairs in gaps longer than 2 kbp) / (1 kbp/PEG)

Since the probability of a "random" gap longer than 2 kbp is less than 1 in 22000, such gaps are quite unlikely due to chance. Therefore the 2 kbp minimum gap threshold is very conservative, so the estimated number of "missing" PEGs should also be conservative.

Gene problems

Genes with Bad Starts: Non-truncated PEGs with non-ATG/GTG/TTG STARTs. These will be shown as warnings since we currently do not offer automated correction methods for them.

Genes with Bad Stops: Non-truncated PEGs with non-TAA/TAG/TGA STOPs (Or whatever is appropriate for a variant genetic code). Bad STOPs should be considered fatal problems, but have been downgraded to "Warnings" as they should never occur in RAST.

Too Short: Number of PEGs shorter than the (default) threshold of 90 bp. Such PEGs are usually considered "lint".


Overlaps

We recognize the following classes of overlaps:

Embedded PEGs: Number of PEGs completely contained within another PEG. Considered a fatal error on first pass through the corrector. The following procedure is applied to automatically correct his error: If neither PEG is in a FIGfam, the embedded PEG is eliminated. If one PEG is in a FIGfam, automated removal eliminates the PEG that is not in a FIGfam. If both PEGs are in FIGfams, the shorter PEG is removed if it is less than half the length of the longer PEG. If an embedded PEG cannot be removed because both it and the PEG it is embedded in are in FIGfams and the PEGs have comparable lengths, then this problem is downgraded to a "Warning" so that processing may still proceed.

Bad RNA Overlaps: Number of PEGs that overlap an RNA by more than the (default) threshold of 20 bp. Such overlaps are considered a fatal problem and the offending PEGs are unconditionally removed when automated correction has been selected.

Convergent overlaps: Number of pairs of opposite-strand PEGs oriented towards each other, such that the STOP of each PEG is inside the other PEG, the START is not inside the other PEG, and the overlap exceeds the (default) threshold of 50 bp. Such overlaps are considered "Warning" conditions, not fatal. Overlaps by less than threshold are not reported.

  ------->
      <-------

Divergent overlaps: Number of pairs of opposite-strand PEGs oriented away from each other, such that the START of each PEG is inside the other PEG, the STOP is not inside the other PEG, and the overlap exceeds the (default) threshold of 150 bp. Such overlaps are considered "Warning" conditions, not fatal. Overlaps by less than threshold are not reported.

  <-------
      ------->


Same-strand overlaps: Number of pairs of same-strand PEGs oriented the same direction, such that overlap by more than the (default) threshold of 120 bp. Such overlaps are considered "Warning" conditions, not fatal. (They are also a proxy for the number of frameshift errors.)

  ------->                <-------
      ------->                <-------


In addition there is flag that should never be reported to you. But just in case...

Impossible Overlap: This serves as a code development flag. It is a "This Can't Happen!" condition that should never occur; if observed, it indicates that a severe logic error may exist within the overlap detection software.