Difference between revisions of "Keyword Box"
(Changed to a more complicated example gene) |
(Added notes on using negation) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | The NMPDR keyword search works like a typical search engine. You type in the appropriate words, and a list of genes will come back. | + | The NMPDR keyword search works like a typical search engine. You type in the appropriate words, and a list of genes will come back. Our keyword database contains millions of words, including ''vitamins'', ''aldolase'', and ''pyrophosphokinase''. The NMPDR looks at nine specific data items when computing the keywords for a gene. The table below shows each of the nine steps along with the keywords derived by that step for the gene '''fig|171101.1.peg.269''', a dual-role protein encoding gene for ''Streptococcus pneumoniae r6'' that has 40 keywords. |
− | |||
− | {| | + | {|border="2" |
| '''FIG gene identifier''' || fig|171101.1.peg.269 | | '''FIG gene identifier''' || fig|171101.1.peg.269 | ||
|- | |- | ||
Line 24: | Line 23: | ||
− | Notes | + | ====Notes==== |
− | * In the functional role, hyphenated words are stored in their full form (''2-amino-4-hydroxy-6-hydroxymethyldihydropteridine'') as well as broken up on the hyphen boundaries ('''amino | + | * Some keywords appear twice. |
− | * Keywords are case-insensitive | + | * In the functional role, hyphenated words are stored in their full form (''2-amino-4-hydroxy-6-hydroxymethyldihydropteridine'') as well as broken up on the hyphen boundaries ('''amino hydroxy hydroxymethyldihydropteridine'''). |
− | * Special keywords indicate attributes of the gene. | + | * Keywords are case-insensitive. |
− | ** ''virulence'', which indicates the gene participates in the process of helping the organism to damage its host | + | * Special keywords indicate attributes of the gene. Most of these are incomplete: for example, we know certain genes are virulence-associated, but for most of the genes we have no virulence data. |
− | ** ''essential'', which indicates that the gene is essential to to the survival of the organism | + | ** ''virulence'', which indicates the gene participates in the process of helping the organism to damage its host. This attribute is incomplete. |
+ | ** ''essential'', which indicates that the gene is essential to to the survival of the organism. This attribute is incomplete. | ||
** ''iedb'', which indicates that the gene is listed in the [http://www.immuneepitope.org/home.do Immune Epitope Database] | ** ''iedb'', which indicates that the gene is listed in the [http://www.immuneepitope.org/home.do Immune Epitope Database] | ||
+ | |||
+ | |||
+ | ===Advanced Keyword Searching=== | ||
+ | |||
+ | Normally, the search process selects the genes relevant to all the words in the keyword box. You can modify the default behavior using the following control characters. | ||
+ | |||
+ | |||
+ | {|border="2" | ||
+ | | '''char''' || '''Meaning''' || '''Example''' || '''Explanation of Example''' | ||
+ | |- | ||
+ | | '''-''' || negation || 2.7.6.3 '''-'''firmicutes || search for all genes with EC number 2.7.6.3 that are not in firmicutes | ||
+ | |- | ||
+ | | '''()''' || optional || '''('''2.7.6.3 4.1.2.25''')''' || search for any gene with EC number 2.7.6.3 or 4.1.2.25 | ||
+ | |- | ||
+ | | '''""''' || phrase || '''"'''folate biosynthesis'''"''' || search for all genes that participate in folate biosynthesis | ||
+ | |} | ||
+ | |||
+ | ===Using Negation=== | ||
+ | |||
+ | It is illegal to use negation on all the keywords. For example, you can't do | ||
+ | |||
+ | -hypothetical | ||
+ | |||
+ | to get all non-hypothetical proteins. You can trick the the keyword search a little by including a positive keyword for a broad category | ||
+ | |||
+ | bacteria -hypothetical | ||
+ | |||
+ | which will return all non-hypothetical proteins for bacteria. This is not recommended, however, because you will get over a million results. | ||
+ | |||
+ | Queries that rely primarily on negation make the most sense when your result set is already restricted. |
Latest revision as of 13:17, 29 July 2007
The NMPDR keyword search works like a typical search engine. You type in the appropriate words, and a list of genes will come back. Our keyword database contains millions of words, including vitamins, aldolase, and pyrophosphokinase. The NMPDR looks at nine specific data items when computing the keywords for a gene. The table below shows each of the nine steps along with the keywords derived by that step for the gene fig|171101.1.peg.269, a dual-role protein encoding gene for Streptococcus pneumoniae r6 that has 40 keywords.
FIG gene identifier | 171101.1.peg.269 |
The aliases | 15902313, kegg|spd:SPD_0272, kegg|spr:spr0269, NP_357863.1, sp|P59657, spr0269, sulD, tr|Q04MF8, uni|P59657, uni|Q04MF8 |
All words in the functional role | Dihydroneopterin, aldolase 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine, pyrophosphokinase amino, hydroxy, hydroxymethyldihydropteridine |
The genome ID | 171101.1 |
All words in the taxonomy | bacteria, firmicutes, lactobacillales, streptococcaceae, streptococcus, pneumoniae, r6 |
The subsystem names and classifications | folate, biosynthesis, cofactors, vitamins, prosthetic, groups, pigments, folates, pterines |
The EC number | 2.7.6.3, 4.1.2.25 |
The subsystem role | 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine, pyrophosphokinase, amino, hydroxy, hydroxymethylhydroperidine |
Special Keywords | essential |
Notes
- Some keywords appear twice.
- In the functional role, hyphenated words are stored in their full form (2-amino-4-hydroxy-6-hydroxymethyldihydropteridine) as well as broken up on the hyphen boundaries (amino hydroxy hydroxymethyldihydropteridine).
- Keywords are case-insensitive.
- Special keywords indicate attributes of the gene. Most of these are incomplete: for example, we know certain genes are virulence-associated, but for most of the genes we have no virulence data.
- virulence, which indicates the gene participates in the process of helping the organism to damage its host. This attribute is incomplete.
- essential, which indicates that the gene is essential to to the survival of the organism. This attribute is incomplete.
- iedb, which indicates that the gene is listed in the Immune Epitope Database
Advanced Keyword Searching
Normally, the search process selects the genes relevant to all the words in the keyword box. You can modify the default behavior using the following control characters.
char | Meaning | Example | Explanation of Example |
- | negation | 2.7.6.3 -firmicutes | search for all genes with EC number 2.7.6.3 that are not in firmicutes |
() | optional | (2.7.6.3 4.1.2.25) | search for any gene with EC number 2.7.6.3 or 4.1.2.25 |
"" | phrase | "folate biosynthesis" | search for all genes that participate in folate biosynthesis |
Using Negation
It is illegal to use negation on all the keywords. For example, you can't do
-hypothetical
to get all non-hypothetical proteins. You can trick the the keyword search a little by including a positive keyword for a broad category
bacteria -hypothetical
which will return all non-hypothetical proteins for bacteria. This is not recommended, however, because you will get over a million results.
Queries that rely primarily on negation make the most sense when your result set is already restricted.