This page is also available in Spanish/Español | ||
is a new version of the indexer package SWISH-E built by Kevin Hughes. The original URL is http://sunsite.berkeley.edu/SWISH_E. There, you can find the documentation, mail list, etc. Swish-e 1.3.X is free software, so wish-e 2.x is also free software. You can use it wherever you want. More information about the license in the main site | ||
The last stable version is 2.0.5 This version includes the following addons: | ||
Phrase search. Use " to delimite the phrase | ||
Eg: swish-e -w 'tit="this is a phrase"' -f index_file | ||
Eg: swish-e -w 'tit="th* is a pra*"' -f index_file | ||
Limited support of XML documents. It supports tags like <field> </field>. Nested tags are allowed | ||
Eg: <field1> bla bla <field2> bla bla </field2> bla bla</field1> | ||
Fixes several bugs and solves some memory leaks of 1.3.X | ||
New option for sorting results: Use relevance (the old one) or use field(s) (properties) | ||
Eg: swish-e -w 'search' -f index_file -s title address | ||
Faster index proccess and search. | ||
Automatic extraction of fields (metanames) including the reserved word automatic in the Metanames option of the config file (Do not use this feature with html documents. This documents include tags like <p> without their correspondent end tag) | ||
Includes external fitering option from Rainer Scherg (it allows the parsing of PDF documents, WOrd documents, etc). More information | ||
Now, you can put your stopwords in an external file usind the IgnoreWords File:path-to-file in the config file. You can find some stopwords files in german (contributed by Rainer Scherg), english (taken from 1.3.X distribution), dutch (contributed by Bas Meijer) and spanish (contributed by me). This option has been contributed by Rainer Scherg | ||
Eg: IgnoreWords File:/path/german.txt | ||
New option for the config file: TranslateCharacters. This option allows changing some characters for a different ones prior to index a word. This is very useful, for example, for changing accuted values by their correspondent non accuted ones. Well, this is really useful for non english languages | ||
Eg: TranslateCharacters αινσϊ aeiou | ||
With this configuration the word camión will be indexed as camion and the word árbol as arbol | ||
The old option -D now shows more information of the contents of the index file. If you also uses -v 4, the output is even richer | ||
Now, the development of new addons go on. The last beta is 2.1-dev20: | ||
A C library. The code has been partly rewritten to get a C "thread-safe" library. This library is being used in the development of a perl module and a php extension More information about the library | ||
Example, and totally functional, Perl module, based on the C library. This helps the coding of perl CGI scripts | ||
Now, you can define document types using DefaultContents and IndexContents in your config file. Uptoday there is only 3 types of documents: Text (TXT), html (HTML) and xml (XML) | ||
Eg: | ||
DefaultContents TXT | ||
IndexContents XML .xml | ||
IndexContents HTML .htm .html .php .php3 | ||
As an option, the index proccess can use less memory using the economic mode (option -e). If set, the index proccess will write part of the information to temporal files. This option is very useful if your box do not have enough memory. You can detect this condition if your index proccess takes long time (look at your swap). By default (without -e) swish-e stores all data in memory in the index proccess | ||
Extended search output using -x option. If your search uses more than one index file at the same time, it will display the header info of all the index files. Also, for each result line, a new value is added: the index file of the result. All the results are displayed in a mixed way, as if you have searched using just one index file. | ||
Optional compression of the file data (File path, title and properites). The index proccess is slower but you will reduce input/output in the searchs. More information | ||
Like in version 2.0.X it can sorts the result list by relevance or properties, but now, you can also use a combined especification of ascending and descending sorting (using asc and desc). | ||
Eg: swish-e -w 'search' -f index_file -s title asc otherfiled desc | ||
New directive in config file: BumpPositionCounterCharacters. With this option, when one of those characters are found, the word's position counter is incremented. This is usefulfor separating phrases inside a document. | ||
Eg: | ||
BumpPositionCounterCharacters | | ||
See this document: this a phrase | this is another phrase. With the option you cannot find the phrase "phrase this". Without it, you can because "phrase" and "this" have consecutive position counter | ||
New directive in config file: UseWords. With this option, only the words in the list are indexed. Like IgnoreWords, it can use a external file. | ||
Eg: | ||
UseWords word1 word2 word3 | ||
Eg: | ||
UseWords File: path_to_external_file | ||
New command line option -k. It returns all the words in the index file starting with the given character. | ||
Eg: | ||
swish-e -k t -f index_file | ||
Swish-e 2.X has been entirely developed under Linux and it has been tested it in Solaris and Aix. Although, not initially develped for Windows, Windows users can find binaries in http://www.webaugur.com/wares/swish. (Thanks to David Norris). |