MG RAST v2.0 tutorial

From TheSeed
Revision as of 16:03, 8 September 2008 by Marland (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MG-RAST Tutorial

Start Page The start page of MG-RAST provides users with access to registration, data submission and management tools for uploaded data. Access to public genomes is also available once you login. Only once you have logged in and have selected a metagenome, can you gain access to your jobs, view your results and use comparative tools.


Registration

Registering for the first time? → Choose New Account. Please enter your first and last name as well as your email address into the registration form. Then please select your country and choose a login name. It's recommended to use only letters and digits for your login name, without spaces. You will then shortly receive an email with more information about the registration.

Already have an account for one of our other services? → Choose Existing Account. Please enter your login and email of that account. If your group administrator has given you a group name, please enter it in the group name field, otherwise leave this field blank.


Upload a Genome/Creating a Job

Uploading your metagenome has a few steps. The first is uploading your fasta file. File requirements and suggestions:

• The fasta file name must end in .fa, .fasta, .fas, .fsa, or .fna.

• Files larger than 30 MB should compress (.tgz) their file or contact us for other options.

• Quality files (.qual) may also be included along with the sequence file for submission to MG-RAST. The quality file should be combines into a single archive (that ends in .tgz) and then uploaded to the server.

 How to I create a .tgz file?
 Example: create the file metagenome.tar.gz from two fna files.
 tar -cvzf metagenome.tar.gz  seqfile1.fna seqfile2.fna

The second step requires that you provide a project name, a name for your metagenome and a brief description of the sample. The second tab shows an Upload Summary of the number of files uploaded for submission.

The third and last step asks for users to supply information about the metagenome sample. These description fields were adopted from MIGs (Minimum Information about a Genome Sequence) specification. You can also elect to make your metagenome publicly available, this option is also available if you wish to do so at a later date. During this step you also have the option of removing duplicate sequences from the analysis.

Jobs Overview

The overall status of your metagenome analyses can be viewed from the main portion of the Jobs Overview page. This contains information regarding a user’s personal jobs and all that are public. Information includes each job/metagenome and its status and contains information including job number, name of the user who started the job, metagenome name, and annotation progress.

The table of jobs can be sorted on Job ID or searchable (text boxes in header row). This is especially useful when the user has numerous metagenomes to select from.

Clicking on the bars for a given job in the annotation progress column directs the user to the “Job Details” page where the detailed job status and access to the metagenome analysis can be found. Job Details

Here you are able to:

• Share with selected users by providing their email addresses.

• Make the metagenome publicly accessible

• View detailed information on the processing of your job.

• View your results!!

• Download your results!!


Overview

The overview page has several sections, the first being the overall statistics of your sample. How these numbers were calculated can be found here (http://www.theseed.org/wiki/MG-RAST_Numbers). The second section is a summary table of taxonomic distribution based on best protein similarity to SEED and 16S based similarity to RDP. The third section is the statistical summary in paragraph form along with graphical representations of sequence length and GC distributions. The last section outlines the metagenome description and MIGS data you submitted along with your sequence file.

The navigation bar has new options not previously seen on the start page or job management pages. Now you have access to tools that will allow you to compare your metagenome to other metagenomes in regard to metabolism and phylogeny (Fragment Profile). Also available is metabolic comparisons against bacterial genomes (also known as a recruitment plot). Fragment Profile

To view your metabolic or phylogenetic profiles, first select the category. Once a category is selected you can then choose your dataset in which to based you profile. For metabolic reconstructions the Subsystem dataset is available. For phylogeny, RDP, Silva, European Ribosomal and GREENGENES are all options. Parameters are also changeable; users can change e-value, p-value, percent identity, and minimum alignment length. This will allow you to refine the analysis to suit the sequence characteristics of your sample. We recommend a minimal alignment length of 50bp be used with all RNA databases.

  • Note: Metabolic reconstructions are based on SEED functional roles and Subsystems. (There is also a tool to view this via KEGG maps and do comparisons by going to the “Compare Metagenomes” link in the navigation bar.)

Profile results are presented in two ways: Pie chart and table. Phylogeny and Metabolism are hierarchical and the pie charts reflect that notion. By clicking on a section of the pie chart, an additional chart appears detailing the breakdown of that group. This is possible down to a third level. All selections made to the chart are reflected in the accompanying table (second tab). The numbers shown in the chart and table are actual counts.

Compare Metagenome to other Metagenomes - Heat Maps

You can compare the metabolism or phylogeny of your metagenome with one more other metagenomes. Just as was seen looking at the Fragment Profile, you can select your database and modify your parameters. For metabolic reconstructions the Subsystem dataset is available. For phylogeny, RDP, Silva, European Ribosomal and GREENGENES are all options. Parameters are also changeable; users can change e-value, p-value, percent identity, and minimum alignment length. This will allow you to refine the analysis to suit the sequence characteristics of your sample. We recommend a minimal alignment length of 50bp be used with all RNA databases. The Heat Maps show the relative abundance, which is calculated using the number of sequences in a subsystem/tax class as a fraction of the total number of sequences in a subsystem/dataset. This allows for correction based on the sample size.

Compare Metagenome to Organism - Recruitment Plot

You can compare metabolism of your sample with the metabolic reconstructions from bacterial genomes. Choosing an organism predicted in your sample, you can compare the metabolic coverage. Like most of the comparative tools in MG-RAST you can modify the parameters of the calculated Metabolic Reconstruction including e-value, p-value , percent identity and minimum alignment length.

Compare Metagenome – KEGG Map MG-RAST also enables users to view their sample on KEGG maps and compare with others. Mapping of functional roles to KEGG maps was done using functional assignments from analysis against the SEED. Absolute counts are provided for each KEGG map. These maps are hierarchical, just like the Subsystems, which allow you to browse the sample on various levels or compare it with other metagenomes.