https:/// /w/api.php?action=feedcontributions&user=DanielPaarmann&feedformat=atomTheSeed - User contributions [en]2024-03-29T12:31:56ZUser contributionsMediaWiki 1.35.6 /w/index.php?title=RAST_Download_Formats&diff=1795RAST Download Formats2008-02-15T17:55:20Z<p>DanielPaarmann: </p>
<hr />
<div>The RAST service provides the following download formats for your genome:<br />
<br />
GTF<br />
GFF3<br />
GenBank<br />
EMBL<br />
<br />
EC numbers stripped<br />
<br />
<br />
Genome Directory</div>DanielPaarmann /w/index.php?title=RAST_Quality_Report&diff=1794RAST Quality Report2008-02-15T17:04:41Z<p>DanielPaarmann: </p>
<hr />
<div>The RAST Server offers a brief quality report on the [[RAST_Tutorial#Job_Details|Job Details]] of your genome. The purpose of this page is to explain what those statistics mean and how we compute them.<br />
<br />
For those explanations, PEG refers to protein encoding gene and is equivalent to a CDS (Coding Sequence).<br />
<br />
== Summary == <br />
<br />
<br />
'''Number of Features:''' Total number of features (PEGs + RNAs)<br />
<br />
'''Number of Warnings:''' Total number of non-fatal warning-conditions detected*.<br />
<br />
'''Number of fatal problems:''' Total number of fatal error-conditions detected*.<br />
<br />
<nowiki>*</nowiki>The difference between warnings and fatal problems is the impact on the RAST pipeline. While both are serious quality problems found by our automated control, only fatal problems require (your) intervention, eg. running the automated correction methods provided by the RAST pipeline. Please note that you can decide to apply all available automatic corrections during the upload.<br />
<br />
Below you will find detailed explanations of these warnings and errors.<br />
<br />
<br />
'''Possibly missing genes:'''<br />
Crude but conservative estimate of the expected number of "undercalled" PEGs in the remaining gaps between features:<br />
<br />
Estimated number of potentially "missing" PEGs :=<br />
(number of base-pairs in gaps longer than 2 kbp) / (1 kbp/PEG)<br />
<br />
Since the probability of a "random" gap longer than 2 kbp is less than 1 in 22000, such gaps are quite unlikely due to chance. Therefore the 2 kbp minimum gap threshold is very conservative, so the estimated number of "missing" PEGs should also be conservative.<br />
<br />
== Gene problems ==<br />
<br />
'''Genes with Bad Starts:''' <br />
Non-truncated PEGs with non-ATG/GTG/TTG STARTs. These will be shown as warnings since we currently do not offer automated correction methods for them.<br />
<br />
'''Genes with Bad Stops:''' <br />
Non-truncated PEGs with non-TAA/TAG/TGA STOPs (Or whatever is appropriate for a variant genetic code). Bad STOPs should be considered fatal problems, but have been downgraded to "Warnings" as they should never occur in RAST.<br />
<br />
'''Too Short''': <br />
Number of PEGs shorter than the (default) threshold of 90 bp. Such PEGs are usually considered "lint".<br />
<br />
<br />
== Overlaps ==<br />
<br />
We recognize the following classes of overlaps:<br />
<br />
'''Embedded PEGs:'''<br />
Number of PEGs completely contained within another PEG. Considered a fatal error on first pass through the corrector. The following procedure is applied to automatically correct his error: If neither PEG is in a FIGfam, the embedded PEG is eliminated. If one PEG is in a FIGfam, automated removal eliminates the PEG that is not in a FIGfam. If both PEGs are in FIGfams, the shorter PEG is removed if it is less than half the length of the longer PEG. If an embedded PEG cannot be removed because both it and the PEG it is embedded in are in FIGfams and the PEGs have comparable lengths, then this problem is downgraded to a "Warning" so that processing may still proceed.<br />
<br />
'''Bad RNA Overlaps:'''<br />
Number of PEGs that overlap an RNA by more than the (default) threshold of 20 bp. Such overlaps are considered a fatal problem and the offending PEGs are unconditionally removed when automated correction has been selected.<br />
<br />
'''Convergent overlaps:'''<br />
Number of pairs of opposite-strand PEGs oriented towards each other, such that the STOP of each PEG is inside the other PEG, the START is not inside the other PEG, and the overlap exceeds the (default) threshold of 50 bp. Such overlaps are considered "Warning" conditions, not fatal. <br />
Overlaps by less than threshold are not reported.<br />
<br />
-------><br />
<-------<br />
<br />
'''Divergent overlaps:'''<br />
Number of pairs of opposite-strand PEGs oriented away from each other, such that the START of each PEG is inside the other PEG, the STOP is not inside the other PEG, and the overlap exceeds the (default) threshold of 150 bp. Such overlaps are considered "Warning" conditions, not fatal.<br />
Overlaps by less than threshold are not reported.<br />
<br />
<-------<br />
-------><br />
<br />
<br />
'''Same-strand overlaps:'''<br />
Number of pairs of same-strand PEGs oriented the same direction, such that overlap by more than the (default) threshold of 120 bp. Such overlaps are considered "Warning" conditions, not fatal. <br />
(They are also a proxy for the number of frameshift errors.)<br />
<br />
-------> <-------<br />
-------> <-------<br />
<br />
<br />
<br />
In addition there is flag that should never be reported to you. But just in case...<br />
<br />
'''Impossible Overlap:'''<br />
This serves as a code development flag. It is a "This Can't Happen!" condition that should never occur; if observed, it indicates that a severe logic error may exist within the overlap detection software.</div>DanielPaarmann /w/index.php?title=RAST_Quality_Report&diff=1790RAST Quality Report2008-02-11T22:01:54Z<p>DanielPaarmann: </p>
<hr />
<div>The RAST Server offers a brief quality report on the Job Details of your genome. The purpose of this page is to explain what those statistics mean and how we compute them.<br />
<br />
For those explanations, PEG refers to protein encoding gene and is equivalent to a CDS (Coding Sequence).<br />
<br />
== Summary == <br />
<br />
<br />
'''Number of Features:''' Total number of features (PEGs + RNAs)<br />
<br />
'''Number of Warnings:''' Total number of non-fatal warning-conditions detected.<br />
<br />
'''Number of fatal problems:''' Total number of fatal error-conditions detected.<br />
<br />
The difference between warnings and fatal problems is the impact on the RAST pipeline. While both are serious quality problems found by our automated control, only fatal problems require (your) intervention, eg. running the automated correction methods provided by the RAST pipeline. Please note that you can decide to apply all available automatic corrections during the upload.<br />
<br />
<br />
'''Possibly missing genes:'''<br />
Crude but conservative estimate of the expected number of "undercalled" PEGs in the remaining gaps between features:<br />
<br />
Estimated number of potentially "missing" PEGs :=<br />
(number of base-pairs in gaps longer than 2 kbp) / (1 kbp/PEG)<br />
<br />
Since the probability of a "random" gap longer than 2 kbp is less than 1 in 22000, such gaps are quite unlikely due to chance. Therefore the 2 kbp minimum gap threshold is very conservative, so the estimated number of "missing" PEGs should also be conservative.<br />
<br />
== Gene problems ==<br />
<br />
'''Genes with Bad Starts:''' <br />
Non-truncated PEGs with non-ATG/GTG/TTG STARTs. These will be shown as warnings since we currently do not offer automated correction methods for them.<br />
<br />
'''Genes with Bad Stops:''' <br />
Non-truncated PEGs with non-TAA/TAG/TGA STOPs (Or whatever is appropriate for a variant genetic code). Bad STOPs should be considered fatal problems, but have been downgraded to "Warnings" as they should never occur in RAST.<br />
<br />
'''Too Short''': <br />
Number of PEGs shorter than the (default) threshold of 90 bp. Such PEGs are usually considered "lint".<br />
<br />
<br />
== Overlaps ==<br />
<br />
We recognize the following classes of overlaps:<br />
<br />
'''Embedded PEGs:'''<br />
Number of PEGs completely contained within another PEG. Considered a fatal error on first pass through the corrector. The following procedure is applied to automatically correct his error: If neither PEG is in a FIGfam, the embedded PEG is eliminated. If one PEG is in a FIGfam, automated removal eliminates the PEG that is not in a FIGfam. If both PEGs are in FIGfams, the shorter PEG is removed if it is less than half the length of the longer PEG. If an embedded PEG cannot be removed because both it and the PEG it is embedded in are in FIGfams and the PEGs have comparable lengths, then this problem is downgraded to a "Warning" so that processing may still proceed.<br />
<br />
'''Bad RNA Overlaps:'''<br />
Number of PEGs that overlap an RNA by more than the (default) threshold of 20 bp. Such overlaps are considered a fatal problem and the offending PEGs are unconditionally removed when automated correction has been selected.<br />
<br />
'''Convergent overlaps:'''<br />
Number of pairs of opposite-strand PEGs oriented towards each other, such that the STOP of each PEG is inside the other PEG, the START is not inside the other PEG, and the overlap exceeds the (default) threshold of 50 bp. Such overlaps are considered "Warning" conditions, not fatal. <br />
Overlaps by less than threshold are not reported.<br />
<br />
-------><br />
<-------<br />
<br />
'''Divergent overlaps:'''<br />
Number of pairs of opposite-strand PEGs oriented away from each other, such that the START of each PEG is inside the other PEG, the STOP is not inside the other PEG, and the overlap exceeds the (default) threshold of 150 bp. Such overlaps are considered "Warning" conditions, not fatal.<br />
Overlaps by less than threshold are not reported.<br />
<br />
<-------<br />
-------><br />
<br />
<br />
'''Same-strand overlaps:'''<br />
Number of pairs of same-strand PEGs oriented the same direction, such that overlap by more than the (default) threshold of 120 bp. Such overlaps are considered "Warning" conditions, not fatal. <br />
(They are also a proxy for the number of frameshift errors.)<br />
<br />
-------> <-------<br />
-------> <-------<br />
<br />
<br />
<br />
In addition there is flag that should never be reported to you. But just in case...<br />
<br />
'''Impossible Overlap:'''<br />
This serves as a code development flag. It is a "This Can't Happen!" condition that should never occur; if observed, it indicates that a severe logic error may exist within the overlap detection software.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1725MG-RAST Numbers2007-10-03T16:13:55Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your metagenome. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your MG-Rast job, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths (bp) of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the ''Total sequence length'' divided by the ''Number of sequences''<br />
<br />
* ''Longest sequence id''<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length (bp) of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer - Overview ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs (total length) of all of the sequences submitted for a given metagenome.<br />
<br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of submitted sequences which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystems where one or more functional roles were found in the submitted fragments of the metagenome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if there are multiple matches on a single fragment.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) <br />
: * ''non-hypothetical''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: * ''hypothetical''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
: '''Known bug:''' In some cases coding sequences do not have any functional assignment, but are not counted as hypothetical protein. That causes the number of hypothetical and non-hypothetical coding sequences not to add up to the total number of fragments.<br />
<br />
* ''Subsystem Counts''<br />
: The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. <br />
: '''Note:''' Not every coding sequence is part of a subsystem and a single coding sequence may fulfill functional roles in more than one subsystem (and thus be counted multiple times).<br />
<br />
<br />
=== SeedViewer - Taxonomy ===<br />
<br />
The taxonomic classification is calculated in several independent ways. First, all sequences are compared to the different rDNA databases: (1) RDP, (2)the European Ribosomal Database project, and (3)Greengenes. The criteria for a sequence being similar is a BLASTN E value < 1x10-5 and at least 50nt in the alignment.<br />
<br />
We also calculate the taxonomic profile of your sample from all the protein similarities computed to annotate the metagenome. The advantage of this approach is that we use a lot more data than is available for the 16S analysis, however, the disadvantage of this approach is that it is obviously limited to those genomes that are in our underlying SEED database.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1724MG-RAST Numbers2007-10-03T16:03:28Z<p>DanielPaarmann: /* SeedViewer - Overview */</p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your metagenome. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your MG-Rast job, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths (bp) of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the ''Total sequence length'' divided by the ''Number of sequences''<br />
<br />
* ''Longest sequence id''<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length (bp) of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer - Overview ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs (total length) of all of the sequences submitted for a given metagenome.<br />
<br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of submitted sequences which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystems where one or more functional roles were found in the submitted fragments of the metagenome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if there are multiple matches on a single fragment.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) <br />
: * ''non-hypothetical''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: * ''hypothetical''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
: '''Known bug:''' In some cases coding sequences do not have any functional assignment, but are not counted as hypothetical protein. That causes the number of hypothetical and non-hypothetical coding sequences not to add up to the total number of fragments.<br />
<br />
* ''Subsystem Counts''<br />
: The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. <br />
: '''Note:''' Not every coding sequence is part of a subsystem and a single coding sequence may fulfill functional roles in more than one subsystem (and thus be counted multiple times).<br />
<br />
=== SeedViewer - Taxonomy ===<br />
<br />
The taxonomic classification is calculated in several independent ways. First, all sequences are compared to the different rDNA databases: (1) RDP, (2)the European Ribosomal Database project, and (3)Greengenes. The criteria for a sequence being similar is a BLASTN E value < 1x10-5 and at least 50nt in the alignment.<br />
<br />
We also calculate the taxonomic profile of your sample from all the protein similarities computed to annotate the metagenome. The advantage of this approach is that we use a lot more data than is available for the 16S analysis, however, the disadvantage of this approach is that it is obviously limited to those genomes that are in our underlying SEED database.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1723MG-RAST Numbers2007-10-03T16:02:30Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your metagenome. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your MG-Rast job, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths (bp) of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the ''Total sequence length'' divided by the ''Number of sequences''<br />
<br />
* ''Longest sequence id''<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length (bp) of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer - Overview ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs (total length) of all of the sequences submitted for a given metagenome.<br />
<br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of submitted sequences which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystems where one or more functional roles were found in the submitted fragments of the metagenome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if there are multiple matches on a single fragment.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) <br />
: * ''non-hypothetical''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: * ''hypothetical''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
: '''Known bug:''' In some cases coding sequences do not have any functional assignment, but are not counted as hypothetical protein. That causes hypothetical and non-hypothetical not to add up to the total number of fragments.<br />
<br />
* ''Subsystem Counts''<br />
: The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. <br />
: '''Note:''' Not every coding sequence is part of a subsystem and a single coding sequence may fulfill functional roles in more than one subsystem (and thus be counted multiple times).<br />
<br />
=== SeedViewer - Taxonomy ===<br />
<br />
The taxonomic classification is calculated in several independent ways. First, all sequences are compared to the different rDNA databases: (1) RDP, (2)the European Ribosomal Database project, and (3)Greengenes. The criteria for a sequence being similar is a BLASTN E value < 1x10-5 and at least 50nt in the alignment.<br />
<br />
We also calculate the taxonomic profile of your sample from all the protein similarities computed to annotate the metagenome. The advantage of this approach is that we use a lot more data than is available for the 16S analysis, however, the disadvantage of this approach is that it is obviously limited to those genomes that are in our underlying SEED database.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1722MG-RAST Numbers2007-10-03T15:55:53Z<p>DanielPaarmann: /* SeedViewer */</p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your metagenome. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your MG-Rast job, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths (bp) of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the ''Total sequence length'' divided by the ''Number of sequences''<br />
<br />
* ''Longest sequence id''<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length (bp) of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer - Overview ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs (total length) of all of the sequences submitted for a given metagenome.<br />
<br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of submitted sequences which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystems where one or more functional roles were found in the submitted fragments of the metagenome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if there are multiple matches on a single fragment.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
: The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. <br />
: '''Note:''' Not every coding sequence is part of a subsystem and a single coding sequence may fulfill functional roles in more than one subsystem (and thus be counted multiple times).<br />
<br />
=== SeedViewer - Taxonomy ===<br />
<br />
The taxonomic classification is calculated in several independent ways. First, all sequences are compared to the different rDNA databases: (1) RDP, (2)the European Ribosomal Database project, and (3)Greengenes. The criteria for a sequence being similar is a BLASTN E value < 1x10-5 and at least 50nt in the alignment.<br />
<br />
We also calculate the taxonomic profile of your sample from all the protein similarities computed to annotate the metagenome. The advantage of this approach is that we use a lot more data than is available for the 16S analysis, however, the disadvantage of this approach is that it is obviously limited to those genomes that are in our underlying SEED database.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1721MG-RAST Numbers2007-10-03T15:54:18Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your metagenome. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your MG-Rast job, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths (bp) of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the ''Total sequence length'' divided by the ''Number of sequences''<br />
<br />
* ''Longest sequence id''<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length (bp) of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs (total length) of all of the sequences submitted for a given metagenome.<br />
<br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of submitted sequences which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystems where one or more functional roles were found in the submitted fragments of the metagenome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if there are multiple matches on a single fragment.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
: The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.<br />
<br />
=== SeedViewer - Taxonomy ===<br />
<br />
The taxonomic classification is calculated in several independent ways. First, all sequences are compared to the different rDNA databases: (1) RDP, (2)the European Ribosomal Database project, and (3)Greengenes. The criteria for a sequence being similar is a BLASTN E value < 1x10-5 and at least 50nt in the alignment.<br />
<br />
We also calculate the taxonomic profile of your sample from all the protein similarities computed to annotate the metagenome. The advantage of this approach is that we use a lot more data than is available for the 16S analysis, however, the disadvantage of this approach is that it is obviously limited to those genomes that are in our underlying SEED database.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1720MG-RAST Numbers2007-10-03T15:17:23Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your metagenome. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your MG-Rast job, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths (bp) of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the ''Total sequence length'' divided by the ''Number of sequences''<br />
<br />
* ''Longest sequence id''<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length (bp) of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs (total length) of all of the sequences submitted for a given metagenome.<br />
<br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of submitted sequences which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystems where one or more functional roles were found in the submitted fragments of the metagenome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if there are multiple matches on a single fragment.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.<br />
<br />
=== SeedViewer - Taxonomy ===<br />
<br />
The taxonomic classification is calculated in several independent ways. First, all sequences are compared to the different rDNA databases: (1) RDP, (2)the European Ribosomal Database project, and (3)Greengenes. The criteria for a sequence being similar is a BLASTN E value < 1x10-5 and at least 50nt in the alignment.<br />
<br />
We also calculate the taxonomic profile of your sample from all the protein similarities computed to annotate the metagenome. The advantage of this approach is that we use a lot more data than is available for the 16S analysis, however, the disadvantage of this approach is that it is obviously limited to those genomes that are in our underlying SEED database.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1719MG-RAST Numbers2007-10-03T15:17:04Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your metagenome. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your MG-Rast job, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths (bp) of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the ''Total sequence length'' divided by the ''Number of sequences''<br />
<br />
* ''Longest sequence id'<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length (bp) of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs (total length) of all of the sequences submitted for a given metagenome.<br />
<br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of submitted sequences which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystems where one or more functional roles were found in the submitted fragments of the metagenome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if there are multiple matches on a single fragment.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.<br />
<br />
=== SeedViewer - Taxonomy ===<br />
<br />
The taxonomic classification is calculated in several independent ways. First, all sequences are compared to the different rDNA databases: (1) RDP, (2)the European Ribosomal Database project, and (3)Greengenes. The criteria for a sequence being similar is a BLASTN E value < 1x10-5 and at least 50nt in the alignment.<br />
<br />
We also calculate the taxonomic profile of your sample from all the protein similarities computed to annotate the metagenome. The advantage of this approach is that we use a lot more data than is available for the 16S analysis, however, the disadvantage of this approach is that it is obviously limited to those genomes that are in our underlying SEED database.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1717MG-RAST Numbers2007-10-02T17:53:36Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your organism. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your organism, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this genome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the '''Total sequence length''' divided by the '''Number of sequences'''<br />
<br />
* ''Longest sequence id'<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs of sequence of this genome. <br />
: '''Known bug:''' Unfortunately there is currently a bug which shows a higher than actual sequence length. The MG-RAST Job Details page shows the correct sequence size. <br />
<br />
* ''Number of Fragments''<br />
: This is the number of fragments which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystem in which at least one member was found in the fragments of the genome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
: '''Note:''' This number may be higher than the ''Number of Fragments'' if ...<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1716MG-RAST Numbers2007-10-02T16:59:08Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your organism. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your organism, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this genome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the '''Total sequence length''' divided by the '''Number of sequences'''<br />
<br />
* ''Longest sequence id'<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs of sequence of this genome. <br />
: '''Known bug:''' Unfortunately there is currently a bug which adds 1 to the length of each sequence.<br />
<br />
* ''Number of Fragments''<br />
: This is the number of fragments which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystem in which at least one member was found in the fragments of the genome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1715MG-RAST Numbers2007-10-02T16:58:34Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your organism. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your organism, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this genome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the '''Total sequence length''' divided by the '''Number of sequences'''<br />
<br />
* ''Longest sequence id'<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Metagenome Overview'' page, there are a number of statistical counts about the selected metagenome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs of sequence of this genome. <br />
'''Known bug:''' Unfortunately there is currently a bug which adds 1 to the length of each sequence.<br />
<br />
* ''Number of Fragments''<br />
: This is the number of fragments which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystem in which at least one member was found in the fragments of the genome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1714MG-RAST Numbers2007-10-02T16:55:07Z<p>DanielPaarmann: </p>
<hr />
<div>The MG-RAST-Server and SEED Viewer offer you a large number of statistics and detailed numbers about your organism. The purpose of this page is to explain how we calculate these numbers and what they mean.<br />
<br />
=== MG-RAST ===<br />
<br />
On the details page of your organism, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this genome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the '''Total sequence length''' divided by the '''Number of sequences'''<br />
<br />
* ''Longest sequence id'<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Organism Overview'' page, there are a number of statistical counts about the selected genome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs of sequence of this genome.<br />
<br />
* ''Number of Fragments''<br />
: This is the number of fragments which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystem in which at least one member was found in the fragments of the genome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1713MG-RAST Numbers2007-10-02T16:54:20Z<p>DanielPaarmann: </p>
<hr />
<div>=== MG-RAST ===<br />
<br />
On the details page of your organism, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this genome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the '''Total sequence length''' divided by the '''Number of sequences'''<br />
<br />
* ''Longest sequence id'<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted<br />
<br />
<br />
=== SeedViewer ===<br />
<br />
On the ''Organism Overview'' page, there are a number of statistical counts about the selected genome:<br />
<br />
* ''Size''<br />
: This is the number of basepairs of sequence of this genome.<br />
<br />
* ''Number of Fragments''<br />
: This is the number of fragments which included at least one coding sequence that could be matched to our database.<br />
<br />
* ''Number of Subsystems''<br />
: The number of different subsystem in which at least one member was found in the fragments of the genome.<br />
<br />
* ''Number of Coding Sequences''<br />
: The number of protein encoding genes found in the submitted fragments that matched against our database.<br />
<br />
* ''Number of RNAs''<br />
: The number of RNAs found in the submitted fragments that matched against our database.<br />
<br />
* ''Protein Encoding Genes''<br />
: The numbers are given in absolute and percent value. They should add up to 100% (given rounding error) and their sum should be the equal to the number of coding sequences displayed on the left.<br />
: '''non-hypothetical'''<br />
: This is the number of coding sequences, which were annotated with a function which is not hypothetical. Values for hypothetical include a list of synonyms like ''hypothetical protein'' or ''putative protein''<br />
: '''hypothetical'''<br />
: This is the number of coding sequences which were assigned to be hypothetical (or a synonym)<br />
<br />
* ''Subsystem Counts''<br />
The numbers in the tree of the subsystem hierarchy represent the number of coding sequences which are part of the according group, subgroup, subsystem or role. Note that not every coding sequence is part of a subsystem and that a single CDS may be part of more than one subsytem.</div>DanielPaarmann /w/index.php?title=MG-RAST_Numbers&diff=1712MG-RAST Numbers2007-10-02T16:53:53Z<p>DanielPaarmann: </p>
<hr />
<div>=== MG-RAST ===<br />
<br />
On the details page of your organism, you will find the following numbers:<br />
<br />
* ''Number of sequences''<br />
: This is the total number of sequences submitted by the user for this genome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.<br />
<br />
* ''Total sequence length''<br />
: This is the sum of the lengths of all submitted sequences.<br />
<br />
* ''Average read length''<br />
: This is the '''Total sequence length''' divided by the '''Number of sequences'''<br />
<br />
* ''Longest sequence id'<br />
: This is the identifier string of the longest sequence submitted.<br />
<br />
* ''Longest sequence length''<br />
: This is the length of the longest sequence submitted.<br />
<br />
* ''Shortest sequence id''<br />
: This is the identifier string of the shortest sequence submitted.<br />
<br />
* ''Shortest sequence length''<br />
: This is the length of the shortest sequence submitted</div>DanielPaarmann /w/index.php?title=SpecialPurposeDBs&diff=1491SpecialPurposeDBs2006-10-13T21:49:31Z<p>DanielPaarmann: </p>
<hr />
<div>== HOPS Database ==<br />
('''H'''ypotheses and '''O'''pen '''P'''roblems revealed by '''S'''ubsystems)<br />
<br />
Sequencing and analysis of hundreds, soon to be thousands, of genomes reveals multiple gaps in our knowledge of basic biochemical and cellular processes. Accurate mapping of the revealed open problems within a framework of specific subsystems and groups of organisms sets the stage for generating hypotheses amenable to experimental validation. In a growing number of cases, predictions of novel genes and pathways delivered by comparative genomics techniques (eg analysis of gene clustering on prokaryotic chromosomes) get successfully verified. <br />
<br />
[http://www.theseed.org/HOPSS/HOPSS.cgi HOPS Database]<br />
<br />
<br />
<br />
== EGGS database: Essential Genes on Genome Scale ==<br />
<br />
SEED maintains an up-to-date database of all microbial gene essentiality data experimentally obtained in the currently published genome-scale gene essentiality screens (listed in Table 1). Comparative analysis of these data across multiple organisms in a rich genomic, biochemical, and phylogenetic contexts provided by the collection of annotated Subsystems greatly facilitates their interpretation and practical applications, such as, understanding of cellular networks, gene and pathway discovery, identification of novel drug targets, and strain engineering.<br />
<br />
[http://theseed.uchicago.edu/FIG/eggs.cgi EGGS Database]</div>DanielPaarmann /w/index.php?title=MediaWiki:Sidebar&diff=1490MediaWiki:Sidebar2006-10-13T21:42:34Z<p>DanielPaarmann: </p>
<hr />
<div>* navigation<br />
** Home_of_the_SEED|Home of the SEED<br />
** Annotating_1000_genomes|Manifesto<br />
** SEED_People| SEED People<br />
** Contact| Contact<br />
* SEEDs<br />
** http://seed-viewer.theseed.org/FIG/index.cgi|SEED-Viewer<br />
** http://theseed.uchicago.edu/FIG/index.cgi|Trial-SEED<br />
** http://seed.sdsu.edu/FIG/index.cgi|Metagenomics SEED<br />
* Help and other Materials<br />
** DownloadPage|Download Page<br />
** Glossary|Glossary<br />
** SOPs|SOPs<br />
** SpecialPurposeDBs|Special Purpose DBs</div>DanielPaarmann /w/index.php?title=Home_of_the_SEED&diff=1489Home of the SEED2006-10-13T19:49:58Z<p>DanielPaarmann: </p>
<hr />
<div>With the growing number of available genomes, the need for an environment to support effective comparative analysis increases. The original SEED Project was started by the [http://thefig.info Fellowship of Interpretation of Genomes (FIG)] as an open source effort. Argonne National Lab and the University of Chicago joined the project, and now much of the activity occurs at those two institutions (as well as the University of Illinois at Urbana-Champaign, Hope college, San Diego State University, the Burnham Institute and a number of other institutions). The cooperative effort focuses on the development of the annotation environment called the SEED and, more importantly, on the development of curated genomic data. <br />
<br />
We provide a public [http://seed-viewer.theseed.org/FIG/index.cgi SEED-Viewer] that allows read-only access to the latest data and annotations. For users interested in editing and learning how to use the system, we also provide a [http://theseed.uchicago.edu/FIG/index.cgi Trial-SEED]. As described in our [[Annotating_1000_genomes|manifesto]] the [[Glossary#annotation|annotation]] is not performed on a gene by gene basis per genome, but rather by [[Glossary#subsystem|subsystem]] by an expert curator across many genomes at a time. <br />
<br />
<br />
We make all our software and data available for download and use on our [[DownloadPage]] page.<br />
<br />
<br />
* When using the SEED, please cite: Overbeek et al., [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=16214803&query_hl=2&itool=pubmed_docsum|Nucleic Acids Res 33(17)], 2005 ([http://www.theseed.org/SubsystemPaperSupplementalMaterial/index.html Supplementary material])<br />
* Our approaches to annotation, gene calling etc are outlined in a series of [[SOPs|Standard Operating Procedures]].</div>DanielPaarmann /w/index.php?title=Home_of_the_SEED&diff=1488Home of the SEED2006-10-13T19:43:31Z<p>DanielPaarmann: </p>
<hr />
<div>With the growing number of available genomes, the need for an environment to support effective comparative analysis increases. The original SEED Project was started at FIG as an open source effort. Argonne National Lab and the University of Chicago joined the project, and now much of the activity occurs at those two institutions (as well as the University of Illinois at Urbana-Champaign, Hope college, San Diego State University, the Burnham Institute and a number of other institutions). The cooperative effort focuses on the development of the annotation environment called the SEED and, more importantly, on the development of curated genomic data. <br />
<br />
We provide a public [http://seed-viewer.theseed.org/FIG/index.cgi SEED-Viewer] that allows read-only access to the latest data and annotations. For users interested in editing and learning how to use the system, we also provide a [http://theseed.uchicago.edu/FIG/index.cgi Trial-SEED]. As described in our [[Annotating_1000_genomes|manifesto]] the [[Glossary#annotation|annotation]] is not performed on a gene by gene basis per genome, but rather by [[Glossary#subsystem|subsystem]] by an expert curator across many genomes at a time. <br />
<br />
<br />
We make all our software and data available for download and use on our [[DownloadPage]] page.<br />
<br />
<br />
* When using the SEED, please cite: Overbeek et al., [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=16214803&query_hl=2&itool=pubmed_docsum|Nucleic Acids Res 33(17)], 2005 ([http://www.theseed.org/SubsystemPaperSupplementalMaterial/index.html Supplementary material])<br />
* Our approaches to annotation, gene calling etc are outlined in a series of [[SOPs|Standard Operating Procedures]].</div>DanielPaarmann