Evidence Page

From TheSeed
Revision as of 15:32, 30 April 2008 by Arodri7 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The purpose of the Seedviewer Evidence page is to show all protein and gene sequence related data that supports the annotations given to a FIG sequence.




Visual Protein Evidence Tab

Information on this page is a visual of the Localization, domain structure and similarities to the query protein sequence.

Tabular Protein Evidence Tab

Same information as in the visual protein evidence tab but in tables. This allows the user to view additional information about the similar sequences.



Explanation of "Region in ..." Colors

In the protein similarities table, the table cells that describe the sequence regions responsible for the similarity are colored to reflect the extent and location of the similarity. The coloring is intended to provide a quick visual cue as to the nature of the similarity, and help in assessing broad classes of possible relationships (most particularly, whether there are reasons to doubt that these sequences have the same function).

The intensity (saturation) of the color depends on the fraction of the sequence that is covered by the similarity.

If the similarity extends over at least 90% of the sequence, then the table cell background is white.

If the similarity covers less than 90% of the sequence, then there is a background color. The shorter the region of similarity, the more intense the color.


The color (hue) of the cell depends of which region of the sequence is similar.

The colors go from red through green to blue as the region of matching moves from the beginning to the end of the sequence.

The color is determined by the center of the matching region, so long matches will always have greenish hues. This might be changed in the future to give clearer indication of the position of long matches.


Taken together, the colors of the subject sequence (left column) and the query sequence (right column) tell a story.

Subject
color
Query
color
Some possible interpretations
WhiteWhiteBoth sequences match over (essentially) their full length.
WhiteColoredThe subject sequence is shorter than the query sequence. The subject sequence might be a fragment of a protein (perhaps the result of a sequencing error, or running off the end of a contig). Alternatively, the query sequence might be a multifunctional protein, and the subject sequence does not have all of the functions. A related situation is when proteins found as one peptide in some organisms are found in two separate peptides in others.
ColoredWhiteThe query sequence is shorter than the subject sequence. The query sequence might be a fragment of a protein (perhaps the result of a sequencing error, or running off the end of a contig). Alternatively, the subject sequence might be a multifunctional protein, and the query sequence does not have all of the functions. A related situation is when proteins found as one peptide in some organisms are found in two separate peptides in others.
ColoredSame colorThe query and subject are of similar lengths, and the region of similarity is the same in both sequences. This often happens when analyzing sequences that are distantly related, or when some part(s) of the molecule diverge particularly quickly.
ColoredDifferent colorThis happens when comparing two partial sequences that cover different parts of a molecule. Alternatively, the similarity could be due to a conserved motif shared by two molecules of quite different function.

</BLOCKQUOTE
Final thoughts:

In considering whether two proteins have the same function, one would prefer that the similarity cover essentially the entirety of both molecules (little or no color). When this is not true, one would prefer that the colors match as closely as possible.

An Explanation of the SEED Evidence Codes

Within the SEED, we use evidence codes to reflect significant factors that go into making assignments of function. Some of these codes are computed and reflect information that we consider particularly useful. Others are used to reflect experimental evidence of function.

icw(n): in cluster with

This code indicates that the PEG occurs in a cluster with n other genes from the same subsystem (very strong evidence). There may be several of these for a PEG (up to one for each subsystem the PEG occurs in).

isu: in subsystem unique -- the only entry in a subsystem cell

This code indicates that the PEG occurs in a subsystem, and it is the only PEG for that genome that has been assigned the functional role (i.e., the cell in the spreadsheet contains a single entry). This means that, if you wish to change an annotation, you should discuss it with the owner of the subsystem.

idu(n): in subsystem duplicates

This code indicates that the PEG occurs in a subsystem, but it is in a cell of the spreadsheet containing duplicates (and it is not clustered with other genes connected to the same subsystem). In this case, you may make a change without notifying the owner of the subsystem, since you are probably disambiguating the situation to his benefit.

ff: in FIGfam

This code indicates that the protein-encoding gene is included in a FIGfam.

cwn: Clustered with Nonhypothetical

This code indicates that the protein-encoding gene is functionally coupled to at least one other protein that has been assigned a function that is considered "nonhypothetical". The functional coupling score must be 5 or more for this code to apply. This means that the gene co-occurs on the chromosome in at least 5 instances of genomes that are not close strains with another gene that is considered nonhypothetical.

cwh: Clustered with Hypothetical

This code indicates that the protein-encoding gene is functionally coupled to at least one other protein that has been assigned a function, but none that is considered "nonhypothetical". The functional coupling score must be 5 or more for this code to apply. This means that the gene co-occurs on the chromosome in at least 5 instances of genomes that are not close strains with at least one other gene, but none that is considered nonhypothetical.

dlit: Direct Literature References to the Gene Exist

This code is used to indicate that at least one paper (that is a "nongenome paper" in the sense that it does not reference hundreds of genes) is associated with this gene.

ilit: Indirect Literature References to the Gene Exist

This code is used to indicate that at least one paper (that is a "nongenome paper" in the sense that it does not reference hundreds of genes) is associated with a gene assigned the same functional role, but none to this gene itself (as far as we know).


Explanation of "Function" Colors in Similarities Table

The functions in the similarities table are color-coded to help find assignments that are identical to that of the query, and also the most common alternative functions.

Meaning of each background color
Same function as query
Most common other function
Second most common other function
Third most common other function
Fourth most common other function
Fifth most common other function
Sixth most common other function
Seventh most common other function
Other function

When two or more functions are equally frequent, their ranking (and hence order of colors) relative to one-another is arbitrary.