Russian and English Vocabulary and Morphology Database

Composition of the Morphological Dictionary Database

1. Scripts for making a database for the following DBMSs:

  MySQL

  MS SQL

  FireBird

  SQLite

  Oracle

  PostgreSQL

The dictionary contains a Russian or English vocabulary with complete paradigms of modifiable parts of speech (in Russian, those are nouns, adjectives, participles, verbs, and comparative degrees of adverbs; in English, nouns, adjectives, adverbs with synthetic degrees, and verbs) and morphological attributes for each word form; a thesaurus containing a lot of synonyms, antonyms, and paronyms; and a lemmatizer.

The morphological database lets you solve the following tasks:

  Morphological analysis of separate words;

  Getting the necessary word form (declension of nouns, adjectives, and participles; conjugation of verbs; or getting comparative or superlative degrees of adverbs);

  Lemmatization (bringing the word to its basic form), substantivization, or other grammatical transformation;

  Search for synonyms, antonyms, translations, hypernyms, etc.

2. A dictionary editor for MySQL SQLex

3. The ORM Persistent Dictionary library for accessing the dictionary in MySQL, MS SQL, FireBird, Oracle, and also via ODBC from .NET code (C#, etc.)

Program Access to the Dictionary

This version of the grammar dictionary is an ordinary relational database, so if you are an application programmer, you can easily access it. Depending on the database and the programming language, you can use a native API like OCI, a universal API like ODBC, and platform-specific APIs like OLE DB or ADO.NET.

You can generate SQL queries yourself or by using an intermediary like Linq2SQL or ORM Persistent Dictionary, staying within the OOP paradigm.

The grammar server is available as a separate product, where the SQL dictionary is supplemented with specially compiled versions of applications and DLLs that can load the dictionary from the relational database. The grammar server lets you not only make direct queries to the database but also use any functions of the procedural API, including those for morphological or syntactic analysis of sentences.

Editing the Dictionary

The SQLex application lets you edit the grammar dictionary, namely to add, modify, or delete entries or relations between them (thesaurus).

Documentation and Examples

A detailed description of the relational schema of the grammar dictionary.

An article describing the process of loading the SQL dictionary.

An introductory lesson to different methods of word searching in the dictionary.

A lesson describing the method of determining which part of speech the word is.

A lesson on searching and restoring words containing the letter ё.

For the main parts of speech, you can find detailed solutions of typical tasks:

Noun

Verb

Adjective

Participle

Adverb

Adverbial participle

Demo Version of the Database

You can download a script for making a trial version of the morphological dictionary for the following DBMSs:

MySQL (7.6 MB)

MS SQL (7.4 MB)

FireBird (3.9 MB)

PostgreSQL (7.7 MB)

Oracle (6.4 MB)

SQLite (7.7 MB)

The trial version of the dictionary includes a set of dictionary entries and relations between them reduced to 10,000 most frequently used Russian words. As for the relational structure of database tables, the trial version is fully identical to the commercial version. For quick introduction, we recommend using the SQLite version (read more).

An online demo version of the dictionary is available too.

  © Козиев Илья 2019
updated 15-Mar-15