SWISH-E 2


This page is also available in Spanish/Español
is a new version of the indexer package SWISH-E built by Kevin Hughes. The original URL is http://sunsite.berkeley.edu/SWISH_E. There, you can find the documentation, mail list, etc. Swish-e 1.3.X is free software, so wish-e 2.x is also free software. You can use it wherever you want. More information about the license in the main site
The last stable version is 2.0.5 This version includes the following addons:
	Phrase search. Use " to delimite the phrase
	Eg: swish-e -w 'tit="this is a phrase"' -f index_file
	Eg: swish-e -w 'tit="th* is a pra*"' -f index_file
	Limited support of XML documents. It supports tags like <field> </field>. Nested tags are allowed
	Eg: <field1> bla bla <field2> bla bla </field2> bla bla</field1>
	Fixes several bugs and solves some memory leaks of 1.3.X
	New option for sorting results: Use relevance (the old one) or use field(s) (properties)
	Eg: swish-e -w 'search' -f index_file -s title address
	Faster index proccess and search.
	Automatic extraction of fields (metanames) including the reserved word automatic in the Metanames option of the config file (Do not use this feature with html documents. This documents include tags like <p> without their correspondent end tag)
	Includes external fitering option from Rainer Scherg (it allows the parsing of PDF documents, WOrd documents, etc). More information
	Now, you can put your stopwords in an external file usind the IgnoreWords File:path-to-file in the config file. You can find some stopwords files in german (contributed by Rainer Scherg), english (taken from 1.3.X distribution), dutch (contributed by Bas Meijer) and spanish (contributed by me). This option has been contributed by Rainer Scherg
	Eg: IgnoreWords File:/path/german.txt
	New option for the config file: TranslateCharacters. This option allows changing some characters for a different ones prior to index a word. This is very useful, for example, for changing accuted values by their correspondent non accuted ones. Well, this is really useful for non english languages
	Eg: TranslateCharacters ביםףת aeiou
	With this configuration the word camión will be indexed as camion and the word árbol as arbol
	The old option -D now shows more information of the contents of the index file. If you also uses -v 4, the output is even richer


Now, the development of new addons go on. The last beta is 2.1-dev20:
	A C library. The code has been partly rewritten to get a C "thread-safe" library. This library is being used in the development of a perl module and a php extension More information about the library
	Example, and totally functional, Perl module, based on the C library. This helps the coding of perl CGI scripts
	Now, you can define document types using DefaultContents and IndexContents in your config file. Uptoday there is only 3 types of documents: Text (TXT), html (HTML) and xml (XML)
	Eg:
	DefaultContents TXT
	IndexContents XML .xml
	IndexContents HTML .htm .html .php .php3
	As an option, the index proccess can use less memory using the economic mode (option -e). If set, the index proccess will write part of the information to temporal files. This option is very useful if your box do not have enough memory. You can detect this condition if your index proccess takes long time (look at your swap). By default (without -e) swish-e stores all data in memory in the index proccess
	Extended search output using -x option. If your search uses more than one index file at the same time, it will display the header info of all the index files. Also, for each result line, a new value is added: the index file of the result. All the results are displayed in a mixed way, as if you have searched using just one index file.
	Optional compression of the file data (File path, title and properites). The index proccess is slower but you will reduce input/output in the searchs. More information
	Like in version 2.0.X it can sorts the result list by relevance or properties, but now, you can also use a combined especification of ascending and descending sorting (using asc and desc).
	Eg: swish-e -w 'search' -f index_file -s title asc otherfiled desc
	New directive in config file: BumpPositionCounterCharacters. With this option, when one of those characters are found, the word's position counter is incremented. This is usefulfor separating phrases inside a document.
	Eg:
	BumpPositionCounterCharacters \|
	See this document: *this a phrase \| this is another phrase*. With the option you cannot find the phrase "phrase this". Without it, you can because "phrase" and "this" have consecutive position counter
	New directive in config file: UseWords. With this option, only the words in the list are indexed. Like IgnoreWords, it can use a external file.
	Eg:
	UseWords word1 word2 word3
	Eg:
	UseWords File: path_to_external_file
	New command line option -k. It returns all the words in the index file starting with the given character.
	Eg:
	swish-e -k t -f index_file


Swish-e 2.X has been entirely developed under Linux and it has been tested it in Solaris and Aix. Although, not initially develped for Windows, Windows users can find binaries in http://www.webaugur.com/wares/swish. (Thanks to David Norris).

Credits

jmruiz@boe.es