Re: [aseek-users] Time frame thoughts

From: Karen Barnes (no email)
Date: Tue Oct 29 2002 - 22:27:52 EST


Hello Matt,

I can understand you are part of the development team, but I can tell you
from experience that this is exactly what happend to me. Here is the actual
configuration I had (by mistake) which caused the indexer to go into an
infinite loop.

The very first time I ran the indexer I ONLY indexed our site which
consisted of roughly 250 pages. Here's the actual configuration:

Period 1m

server http://www.mysite.com/
server http://www.mysite.com/page2.html

and so on.

Keep in mind that this was a FRESH install with NO other URLs in the
database at all. I did NOT have the index follow links as I wanted to
restrict the index to specific pages. When I ran index I could see the same
pages being indexed over and over and over for several minutes. Then I stop
the index (./index -E) and found that I set the Period to 1m (one minute). I
changed this to:

Period 14d

deleted the site using index just to make sure:

./index -C http://www.mysite.com%"

and then ran index again and within a couple of seconds the indexer stop and
everything was indexed and worked as expected.

So the design may say it can't loop, but I can assure you through experiece
that this is how it reacted when I did it this way.

BTW - You asked me to post the printout of "ulimit -a". Have you had a
chance to look at this? Hoping I can stop the "can't connect to host" errors
I'm experiencing.

Regards,
Karen

>On Tue, 29 Oct 2002 at 16:53:54 -0700, Karen Barnes wrote:
>
> > using the "Period" command? For example; if you have this set like the
> > following:
> >
> > Period 14d
> >
> > then you have set a reindex every 14 days and if you run the indexer for
>14
> > days non stop the process is going to start all over again and never
> > finish. When I run an initial crawl I set this to a very large number
>like
>
>This is not correct. The URL queuer queues URLs in incrementally
>increasing
>time slices. It can never loop in time during a single index run and an
>URL
>will not be indexed more than once per run (even if expired).
>
>
>Matt.

_________________________________________________________________
Surf the Web without missing calls! Get MSN Broadband.
http://resourcecenter.msn.com/access/plans/freeactivation.asp








Hosted Email Solutions

Invaluement Anti-Spam DNSBLs



Powered By FreeBSD   Powered By FreeBSD