Thesaurus

Thesaurus links

Thesaurus is a database that stores the relations (links) between words and phrases.

Each link comprises the following elements.

1. Two word or phrase entry references. The word entries and phrase entries are the main part of lexicon.

2. Link type for distinguishing the synonyms, antonyms, deminutives, hypernyms and other relations.

3. Link tags - the set of markers, indicating theme, relation strength or distance, etc.

4. Link flags - the set of hints for word/phrase substitution algorithm.

Word links include translations, grammatical relations - derivatives of several kind, semantic relations - synonyms, antonyms, hypernyms and hyponyms, deminutives and others. The set of words and their relations can be pictured as follows:

thesaurus

Thesaurus applications

There are a lot of cases when thesaurus data can be a main source of improvement.

Synonyms and derivatives can used by the search engine to expand the original search query, making it possible to find more relative documents.

Synonymizer uses thesaurus as the primary source of synonyms for word substitutions.

Translation engine gets the translation, derivatives and hypernyms when performing the fulltext translation.

Programmatic and user access to thesaurus database

The information in thesaurus database can be accessed via API calls, SQL queries or ORM classes.

API functions to access the thesaurus database are as follows: sol_Thesaurus and sol_ListLinkstxt.

ORM library includes several .NET classes for thesaurus representation. WordLink class for links between word entries, PhraseLink class for links between phrase entries, and additional classes representing tags and flags.

The Russian Grammatical Dictionary displays the contents of thesaurus in a human friendly way, either as lists of categorized links or as diagram:

thesaurus thesaurus semantic network

Thesaurus statistics

The number of links in Russian-English bilingual dictionary for each link type, some rare categories are not included in statistics:

thesaurus statistics
relation name percentage
synonym  17.6
to noun  12.1
to infinitive  10.7
to adjective  10
to english  9.8
to russian  9.7
to verb  7.6
hypernym  7
to adverbial participle  2.9
to imperfective aspect  2.3
to perfective aspect  2.3
from reflexive verb  2.1
hyponym  1.6
to adverb  1.6
antonym  1.1
action  0.4
deminutive  0.3
actor  0.3
gender pair  0.2
neutral style  0.2

Synonyms

Synonyms are the biggest category of links in thesaurus, almost 18% in Russian-English bilingual thesaurus.

The main criterion for pair of words to be registered as synonyms in thesaurus is to have the same, or almost the same, meaning and to be interchangeable:

synonyms in thesaurus

In some cases the meaning of the words are almost the same, but have different style of speech or usage conditions. For example, the following words both mean dog, but the right one may be considered rude or vulgar:

synonyms and tags

Thesaurus allows to mark the links with special signs to represent such additional conditions for synonyms. These signs are called thesaurus tags.

More about synonymy ...

Antonyms

Antonyms are the words with opposite meaning:

 antonyms in thesaurus

More about antonyms ...

Deminutives

 

 deminutives

More about deminutives ...

Derivatives

Derivatives are several types of word relations in thesaurus. Almost all of this pairs links the words with the same root. In most cases the words in such relation belongs to the different part of speech, making it possible, for example, to find a verb derived from a noun and vice versa.

 English derivatives in thesaurus

 Russian derivatives in thesaurus

More about derivatives ...

Translations

Translations are one of the biggest group of thesaurus link for multilingual dictionaris:

 translations in multilingual thesaurus

There are a lot of translation link type in grammatical dictionary, one for each language. For example, "to_russian" links for English-Russian, French-Russian and word translation.

Gender pairs

Almost all nouns in Russian are assigned a gender - masculine, feminine or neuter. Usially there is no relation between the grammatical gender and sex. But nouns referring the social role and profession of men and women are often have two variants with the same root and the different suffix for masculine and feminine. Male/female destinction of anumals also generates the pair of nouns with the same root.

 gender pairs in thesaurus

More about gender pairs ...

Hypernyms and hyponyms

This type of thesaurus links relates the category (hypernym) and the instance of that category (hyponym):

 hypernyms and hyponyms in thesaurus

 

More about hypernyms and hyponyms ...

Additional information about grammatical dictionary

Grammatical dictionary API

Thesaurus - the Russian version

WordLink class - thesaurus link between word entries

PhraseLink class - thesaurus link between phrase entries

Word and phrase entries in lexicon

Russian Grammatical Dictionary

  © Козиев Илья 2019
changed 05-Feb-12