Difference between revisions of "SEED Viewer Numbers"

Latest revision as of 12:48, 15 August 2007

The SeedViewer, RAST-Server and MG-RAST-Server offer you a large number of statistics and detailed numbers about your organism. The purpose of this page is to explain how we calculate these numbers and what they mean.

MG-RAST

On the details page of your organism, you will find the following numbers:

Number of sequences

This is the total number of sequences submitted by the user for this genome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.

Total sequence length

This is the sum of the lengths of all submitted sequences.

Average read length

This is the Total sequence length divided by the Number of sequences

Longest sequence id'

This is the identifier string of the longest sequence submitted.

Longest sequence length

This is the length of the longest sequence submitted.

Shortest sequence id

This is the identifier string of the shortest sequence submitted.

Shortest sequence length

This is the length of the shortest sequence submitted

RAST

On the details page of your organism, you will find the following numbers:

Number of features

This is the number of features the RAST-Server could identify in your uploaded genome and match to our database.

Number of warnings

This is the number of warnings issued by the pipeline while processing your genome. This refers to inconsistencies detected, which are not fatal to annotation, but which should be investigated further. The numbers for each type of warnings are listed below.

Number of fatal problems

This is the number of problems which cause the pipeline to be unable to process your genome. These problems have to be addressed before the annotation process can finish.

Possibly missing genes

Convergent overlaps

Divergent overlaps

Same strand overlaps

SeedViewer

On the Organism Overview page, there are a number of statistical counts about the selected genome:

Size

This is the number of basepairs of sequence of this genome.

Number of Fragments

This is the number of fragments which included at least one coding sequence that could be matched to our database.

Number of Subsystems

The number of different subsystem in which at least one member was found in the fragments of the genome.

Number of Coding Sequences

The number of protein encoding genes found in the submitted fragments that matched against our database.

Number of RNAs

The number of RNAs found in the submitted fragments that matched against our database.

Protein Encoding Genes

The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.

non-hypothetical

This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like hypothetical protein or putative protein

hypothetical

This is the number of coding sequences which were assigned to be hypothetical (or a synonym)

Subsystem Counts

The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.

@@ Line 15: / Line 15: @@
 * ''Longest sequence id'
-:
+: This is the identifier string of the longest sequence submitted.
 * ''Longest sequence length''
+: This is the length of the longest sequence submitted.
+* ''Shortest sequence id''
+: This is the identifier string of the shortest sequence submitted.
+* ''Shortest sequence length''
+: This is the length of the shortest sequence submitted
+=== RAST ===
+On the details page of your organism, you will find the following numbers:
+* ''Number of features''
+: This is the number of features the RAST-Server could identify in your uploaded genome and match to our database.
+* ''Number of warnings''
+: This is the number of warnings issued by the pipeline while processing your genome. This refers to inconsistencies detected, which are not fatal to annotation, but which should be investigated further. The numbers for each type of warnings are listed below.
+* ''Number of fatal problems''
+: This is the number of problems which cause the pipeline to be unable to process your genome. These problems have to be addressed before the annotation process can finish.
+* ''Possibly missing genes''
+:
+* ''Convergent overlaps''
 :
-* ''Shortest sequence id''
+* ''Divergent overlaps''
 :
-* ''Shortest sequence length''
+* ''Same strand overlaps''
 :
+=== SeedViewer ===
+On the ''Organism Overview'' page, there are a number of statistical counts about the selected genome:
+* ''Size''
+: This is the number of basepairs of sequence of this genome.
+* ''Number of Fragments''
+: This is the number of fragments which included at least one coding sequence that could be matched to our database.
+* ''Number of Subsystems''
+: The number of different subsystem in which at least one member was found in the fragments of the genome.
+* ''Number of Coding Sequences''
+: The number of protein encoding genes found in the submitted fragments that matched against our database.
+* ''Number of RNAs''
+: The number of RNAs found in the submitted fragments that matched against our database.
+* ''Protein Encoding Genes''
+: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.
+: '''non-hypothetical'''
+: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''
+: '''hypothetical'''
+: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)
+* ''Subsystem Counts''
+The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.

Difference between revisions of "SEED Viewer Numbers"

Latest revision as of 12:48, 15 August 2007

MG-RAST

RAST

SeedViewer

Navigation menu

Search