Re: [aseek-users] utf-8 vs. unicode

From: Alexander F Avdonkin (no email)
Date: Thu Dec 13 2001 - 09:25:52 EST


 ΠΙΣΑΜ(Α):

>
>
> since aspseek 1.2.5 there is the new utf-8 storage mode.
>
> it says that utf-8 will reduce memory and harddrive space and increase
> indexing and
> search speed. now i have two questions:
>
> 1) with index -b can an existing index/database be converted to utf-8.
> can this be
> done with a productive index? you also have to set up a special
> config entry in
> searchd/aspseek.conf. what happens during conversion? can the
> database still
> be searched or do i have a downtime. at the moment we have indexed
> at about
> 150.000 webpages.

Old database can be still searched during conversion process.
But search will be interruped for a short time when MySQL tables will be
renamed and "searchd" will be restarted

>
> 2) utf-8 will be most efficient with us-ascii. what happens when there
> are also
> words with special chars like the german umlauts. will there
> still be all those
> improvements?
>

Ascii chars in those words will be encoded by 1 byte and umlauts will be
encoded as 2 bytes.
In any way, size of table "wordurl" will be less compared to plain
unicode storage.

>
> Markus Rietzler
> * kommunikation & online service
> * RZF NRW
> * Tel: 0211.4572-130








Hosted Email Solutions

Invaluement Anti-Spam DNSBLs



Powered By FreeBSD   Powered By FreeBSD