Search results processing and formatting

Introduction

Result dataset formatting: -listfiles options

Convenient representation of search results: -listfiles:html option

Executing action on every file matching query

Searching for files and data retrieval - option -onceperfile

Printable context

Sorting (order of the result)

Introduction

The search engine tries to find only one matching context (snippet of text) per a file by default (see -onceperfile option) . This fragment of text is displayed on screen (this can be redefined in ini file):

displaying matching context

Use command -showcontext=off in order to suppress printing of matching contexts.


Option -printfn is used to print only the names of files matching some criteria. Resultant list can be piped to another program:

faind <commands> -printfn > my_files.txt

In this example simple list of matching files is saved in plain text file (option -listfiles has much more formatting abilities).


Result dataset formatting: -listfiles options

By default, the search engine prints the names of the files that match the given criteria to the standard output. Besides, the results can be stored as a text file in several formats: TXT (names of matching files), HTML (utf-8 encoding), XML (utf-8 encoding) и SQL script.

TXT result file (utf-8 encoded) is generated by use of -listfiles:txt file_name. See an example of generated file. As you can see, it contains filenames only. More informative TXT format in generated by options:

-tune txtres_rich -listfiles:txt file_name

See the result file.

 

Option -listfiles generates XML file with results. For example:

-faind c:\ -name *.txt -listfiles my_files.xml ...

Syntax -listfiles:xml is equivalent to -listfile. Sample results can be viewed here.

Option -listfiles:html filename generates HTML file with results (looking like results of Internet search engines). More information about this format is available here.

Option -listfiles:sql filename creates text file with SQL commands. This file (SQL script) can be executed in RDBMS to put results into relational database. Database scheme (ER diagram) is described here. See an example of file in this format.

Option -listfiles:odbc login/psw@alias. puts search results into SQL database by way of ODBC connection (see description of database scheme).

 

Convenient representation of search results: -listfiles:html option

Although FAIND is command prompt tool, it may be convenient to get search results in more representative format than XML or SQL. Option -listfiles:html filename  generates results in HTML format which can be viewed in a web browser (utf-8 support is required for displaying national character; MSIE, Mozilla of FireFox are OK).

An example of generated HTML file can be seen here.

As you can see, it looks like search results for internet search engines (well, it is simpler). Links refers to the found files, containing query patterns. What is more important this links shows the files that really located in compressed files or archives.


Executing action on every file matching query

It is possible to execute a command for every matching file. Option -exec is used like this:

-exec ls '{}' ';'

Characters {} are substituted with name of matching file. Semicolon character ; bounds -exec command parts. Above example prints short info about every mathing file (GNU/Linux shell command ls does it).

Option -ok does the same work as -exec but ask confirmation before processing each file.


Searching for files and data retrieval - option -onceperfile

Search engine algorithm by default tries to find pattern only once in a file. This search strategy can be called file search.

There is another search strategy - data retrieval. In this mode search engine looks for every context in each file matching the pattern.

Options

-onceperfile=false

tells the search engine to use data retrieval strategy.


Printable context

By default printable context includes 8 lexemes before the first and 8 ones after the last lexeme in fixed context. This default value is set in ini-file. You can set it explicitly by option:

-lexems_margin=N


Sorting (order of the result)

By default the hits are listed in dataset without any sorting. The search results can be sorted to show the most relevant matches first. Command

-sort freq_rank

activates the sorting by frequency of keywords (see -index calc_freq_rank).

The search domain must be indexed with -index frequency command so the statistical information is available for relevancy estimation.

The HTML result page (see -listfiles option) looks like this:

Search result page with document ranking

Only the relative values of document ranks are important.

The versions higher than 0.92 implement some new sorting criteria:

-sort filename

by filenames (alphabetically);

-sort size

by file size;

-sort cdate

by file creation date;

-sort mdate

by file modification date;

Additionally there are two modification commands:

-sort asc

arrange the results in ascending order;

-sort desc

arrange the results in descending order.

Limiting the number of matched contexts

The number of records in result dataset is not limited by default. You can specify the maximum number of context to find by the command:

-maxhitcount=N

Additional information

Embeddable search engine API

Search engine commands

   Mental Computing 2009  home  rss  email  icq  download

changed 18-Apr-10