Stemmer plugin API

Stemmer is used by search engine as simplified morphology analyzer. Integra and FAIND distributives include a multilingual stemming engine. It implements Russian, English, Spanish, Finnish, French, Italian and some other languages stemming. You can replace it with your own stemmer plugin very easily.

Plugin API

API is extremely simple. Stemmer DLL must export the following C procedures:

1. Stemmer initialization and creation.

HSTEMMER sol_CreateStemmerForLanguage( const char *lang2 )

lang2 is a 2-char language identifier, "en" for English, "ru" for Russian, "de" for German  and so on.

Return value is a stemmer object handle (pointer) which is used by subsequent API calls.

2. Stemmer object cleanup and destruction.

void sol_DeleteStemmer( HSTEMMER hStemmer )

3. Stemming the word

int sol_Stem( HSTEMMER hStemmer, wchar_t *Word )

Word is UNICODE (wide) string containing the single word.

If possible the stemmer truncates the Word buffer to word's stem and returns 0.

If errors occurs it returns -1 or -2.

Loading with dictionary

Stemmer is an optional dictionary module. The path to module file is specified in dictionary.xml file which is usually stored in c:\program files\integra.

The XML entry <stemmer>...</stemmer> contains something like dictionary\empty\stemmer.dll that is relative path to stemmer DLL. Replace it to your stemmer DLL filename and restart the search system.

Additional information

Russian stemming

Lemmatizator API (ru)

Grammatical dictionary API (ru)

Morphology analyzer API (ru)

Syntax analyzer API (ru)


© Mental Computing 2009  rss  email  icq free counters Ðåéòèíã@Mail.ru