Thesaurus is a database that stores the relations (links) between words and phrases.
Each link comprises the following elements.
1. Two word or phrase entry references. The word entries and phrase entries are the main part of lexicon.
2. Link type for distinguishing the synonyms, antonyms, deminutives, hypernyms and other relations.
3. Link tags - the set of markers, indicating theme, relation strength or distance, etc.
4. Link flags - the set of hints for word/phrase substitution algorithm.
Word links include translations, grammatical relations - derivatives of several kind, semantic relations - synonyms, antonyms, hypernyms and hyponyms, deminutives and others. The set of words and their relations can be pictured as follows:
There are a lot of cases when thesaurus data can be a main source of improvement.
Synonyms and derivatives can used by the search engine to expand the original search query, making it possible to find more relative documents.
Synonymizer uses thesaurus as the primary source of synonyms for word substitutions.
Translation engine gets the translation, derivatives and hypernyms when performing the fulltext translation.
The information in thesaurus database can be accessed via API calls, SQL queries or ORM classes.
API functions to access the thesaurus database are as follows: sol_Thesaurus and sol_ListLinkstxt.
ORM library includes several .NET classes for thesaurus representation. WordLink class for links between word entries, PhraseLink class for links between phrase entries, and additional classes representing tags and flags.
The Russian Grammatical Dictionary displays the contents of thesaurus in a human friendly way, either as lists of categorized links or as diagram:
The number of links in Russian-English bilingual dictionary for each link type, some rare categories are not included in statistics:
Synonyms are the biggest category of links in thesaurus, almost 18% in Russian-English bilingual thesaurus.
The main criterion for pair of words to be registered as synonyms in thesaurus is to have the same, or almost the same, meaning and to be interchangeable:
In some cases the meaning of the words are almost the same, but have different style of speech or usage conditions. For example, the following words both mean dog, but the right one may be considered rude or vulgar:
Thesaurus allows to mark the links with special signs to represent such additional conditions for synonyms. These signs are called thesaurus tags.
More about synonymy ...
Antonyms are the words with opposite meaning:
More about antonyms ...
More about deminutives ...
Derivatives are several types of word relations in thesaurus. Almost all of this pairs links the words with the same root. In most cases the words in such relation belongs to the different part of speech, making it possible, for example, to find a verb derived from a noun and vice versa.
More about derivatives ...
Translations are one of the biggest group of thesaurus link for multilingual dictionaris:
There are a lot of translation link type in grammatical dictionary, one for each language. For example, "to_russian" links for English-Russian, French-Russian and word translation.
Almost all nouns in Russian are assigned a gender - masculine, feminine or neuter. Usially there is no relation between the grammatical gender and sex. But nouns referring the social role and profession of men and women are often have two variants with the same root and the different suffix for masculine and feminine. Male/female destinction of anumals also generates the pair of nouns with the same root.
More about gender pairs ...
This type of thesaurus links relates the category (hypernym) and the instance of that category (hyponym):
More about hypernyms and hyponyms ...
Grammatical dictionary API
Thesaurus - the Russian version
WordLink class - thesaurus link between word entries
PhraseLink class - thesaurus link between phrase entries
Word and phrase entries in lexicon
Russian Grammatical Dictionary
© Elijah Koziev 2010