Difference between revisions of "SEED Viewer Manual/Evidence"

From TheSeed
Jump to navigation Jump to search
Line 29: Line 29:
 
The length of the outside box shows the complete length of the respective sequence. The color of the outside box represents the range of the evalue score according to the E-Value Key bar. The length of the inner (white) box depicts the actual section of the sequence the similarity to the other feature is in. Hovering over the box will show you some information about the hit feature (see tooltip graphics below), including the [[Glossary#Functional role|functional role]], the [[Glossary#Subsystem|subsystems]] and some values describing the hit area.
 
The length of the outside box shows the complete length of the respective sequence. The color of the outside box represents the range of the evalue score according to the E-Value Key bar. The length of the inner (white) box depicts the actual section of the sequence the similarity to the other feature is in. Hovering over the box will show you some information about the hit feature (see tooltip graphics below), including the [[Glossary#Functional role|functional role]], the [[Glossary#Subsystem|subsystems]] and some values describing the hit area.
  
If you check some of the checkboxes in front of the [[Glossary#Functional role|functional role]] descriptions of the hit genes, you can access two function via the buttons on top of the Similarity graphics. The button '''Align Selected''' leads to an [[SEED_Viewer_Manual/AlignSeqs|alignment page]] showing a TCoffee alignment for the selected features. '''FASTA Download Selected''' lets you download the selected sequences in aminoacid FASTA format.
+
If you check some of the checkboxes in front of the [[Glossary#Functional role|functional role]] descriptions of the hit genes, you can access two functions via the buttons on top of the Similarity graphics. The button '''Align Selected''' leads to an [[SEED_Viewer_Manual/AlignSeqs|alignment page]] showing a TCoffee alignment for the selected features. '''FASTA Download Selected''' lets you download the selected sequences in aminoacid FASTA format.
  
 
[[Image:EvidenceSims1.png]]
 
[[Image:EvidenceSims1.png]]

Revision as of 07:59, 25 November 2008

The Evidence Page is divided into two parts via a TabView: The Visual Protein Evidence and the Tabular Protein Evidence.

If you are logged in and the feature belongs to your private genome, this page will have additional options for you to annotate the feature. These are described here.

Visual Protein Evidence

After loading the Evidence Page, the first tab of the TabView is selected. It visually shows different pre-computed tool results for the given feature. In this view, you can see evidence for Location of the product of the gene in the cell, evidence for protein Domains and evidence that show Similarities to other features.

Location

Location stand for location of the product of the feature in the cell. This section presents output for tools that look for transmembrane helices (TM) or signal peptides (SP) in the feature. In the example, you can see five transmembrane helices in the protein identified via the Phobius tool. They are visualized as little boxes, and their location on the line depicts the location of the transmembrane helices in the protein.

EvidenceLocation.png

Domains

This section shows pre-computed domains for the selected feature. In the example, you can find a CDD domain and a Pfam domain for the feature. The blue bar marks the location of the domain found in the protein (the line depicts the full length of the protein).

Additional tools can be accessed via the Feature Tools Menu in the menu bar.

EvidenceDomain.png

Similarities

This section graphically lists evidence for similarities to other features in the SEED and other databases. The E-Value Key shown on the top defines the colors that are used to display different E-Value ranges for the similarities to the hit features. Hovering over the E-Value Key shows the value range for each color.

Each similarity is represented by two bars, showing the alignment of the similarity. The first bar is the query feature, the second the hit feature. The abbreviation in front of this bar informs you about the organism the hit feature is in. Hover over the abbreviation to get the complete organism name. To the right of the checkbox you can find the functional role of the hit feature.

The length of the outside box shows the complete length of the respective sequence. The color of the outside box represents the range of the evalue score according to the E-Value Key bar. The length of the inner (white) box depicts the actual section of the sequence the similarity to the other feature is in. Hovering over the box will show you some information about the hit feature (see tooltip graphics below), including the functional role, the subsystems and some values describing the hit area.

If you check some of the checkboxes in front of the functional role descriptions of the hit genes, you can access two functions via the buttons on top of the Similarity graphics. The button Align Selected leads to an alignment page showing a TCoffee alignment for the selected features. FASTA Download Selected lets you download the selected sequences in aminoacid FASTA format.

EvidenceSims1.png

EvidenceHoverSim.png

To change the evidence view with respect to the sorting and the filtering of the hits, you can find a little control box on top of the similarity graphics. Max Sims is the number of similarities that are listed on the page. Max E-Value filters out all similarities that have a higher E-Value than stated here. In the little combo box below these two values, you can decide to see only hits against the SEED database (Just FIG IDs), or also against other databases (Show all Databases). You can Sort the Results By Score, Percent Identity (default) or Score per position. These values locally refer to the hit as known from BLAST hits, so a high percent identity referring to a very small hit region can make this similarity show up as one of the first hits, as shown in the example. Checking Group by Genome will aggregate all hits to features in the same genome. A blue box will mark hits that belong to the same genome. After selecting the right values, you can press the button Resubmit to change the evidence view.

EvidenceFil1.png

Tabular Protein Evidence

Activate the second tab of the large page-spanning TabView to see the tabular view of the evidence. You will find most of the information already shown in the visual view, presented differently and enriched with some additional information. Added are the Identical Proteins and the Functionally coupled sections, while Location information is not presented in this tab.

Similarities

The similarity table lists hits to similar features in the SEED database (or also other databases), like described for the Visual Protein Evidence. Each row in the table represents a hit.

The first column provides a checkbox to select a hit feature. Again, the buttons Align Selected and FASTA Download Selected are present and can be used to get to a TCoffee alignment page or download the protein sequences of the selected features in FASTA format. The two buttons in the column header allow mass selection of the features. All will select all features visible in the table, check to last checked lets you select all features up to a selected feature in the table.

The ID of the hit features, as well as a link to the annotation page is displayed in the column Similar FIG Sequence.

The next four columns describe information to the hit regions of the query and hit features (E-value, Percent Identity, Region in Query peg and Region in Similar Sequence).

The Organism of the hit peg and its Function are shown in the next two columns. If the function is different from the function of the query feature, it is colored. Same function in the table will get the same color.

'Associated Subsystems of the feature are displayed in the next column. If the feature is not associated to a subsystem, you will find the text None added in the cell.

There are three Evidence Codes that can be found in the last column. ISU means that the feature is unique in a cell of a subsystem. This means that there is no other feature in the genome that is thought to have the same function. ICW(number) means the feature is clustered with number features in the genome. FF says that it is in a FIGfam.

The table can be exported via the button export table that can be found on top of the table.

EvidenceSims2.png

You can filter and sort the table using the TabView above the table. The second tab, Sims Filter works the same way as described for the Similarities in the Visual Protein Evidence. The first tab Edit Columns contains a number of columns with additional information that can be added to the display of the table (FIGfams, different aliases to other databases and many others). Just choose a column name, press the arrow to put it into the right field and it will add it to the table.

EvidenceFilter.png

Domains

This section shows pre-computed domains for the selected feature. In the example, you can find a CDD domain and a Pfam domain for the feature. The blue bar marks the location of the domain found in the protein (the line depicts the whole length of the protein).

The table lists the Domain DB (the database for the domain that was hit), the ID in the domain database, the Name of the domain, the Location of the hit in the selected feature, the Score for the hit against the domain, as well as the Function of the domain.

The table can be exported using the export table button.

Additional tools can be accessed via the Feature Tools Menu in the menu bar.

EvidenceDomTable.png

Identical Proteins

Essentially Identical Proteins are proteins that share a common sequence, but the start position of the proteins may vary a little. This definition was made because in different databases or close strains of organisms, it often happens that a protein is present, but the start position may be shifted in the finding genes step. So essentially, this table shows aliases of the feature that were based on protein identity.

The first column of the table shows the Database the alias can be found in, while the second column (ID) offers the alias name and a link to the protein in the respective database. The following two columns describe the Organism and the Assignment for the feature for the alias.

EvidenceEIPs.png

Functionally Coupled

This table lists all functionally coupled genes in the organism. You can see the Score, the ID of the feature and the Function of the feature.

EvidenceFCs.png