From: Karen Barnes (no email)
Date: Tue Oct 29 2002 - 22:27:52 EST
I can understand you are part of the development team, but I can tell you
from experience that this is exactly what happend to me. Here is the actual
configuration I had (by mistake) which caused the indexer to go into an
The very first time I ran the indexer I ONLY indexed our site which
consisted of roughly 250 pages. Here's the actual configuration:
and so on.
Keep in mind that this was a FRESH install with NO other URLs in the
database at all. I did NOT have the index follow links as I wanted to
restrict the index to specific pages. When I ran index I could see the same
pages being indexed over and over and over for several minutes. Then I stop
the index (./index -E) and found that I set the Period to 1m (one minute). I
changed this to:
deleted the site using index just to make sure:
./index -C http://www.mysite.com%"
and then ran index again and within a couple of seconds the indexer stop and
everything was indexed and worked as expected.
So the design may say it can't loop, but I can assure you through experiece
that this is how it reacted when I did it this way.
BTW - You asked me to post the printout of "ulimit -a". Have you had a
chance to look at this? Hoping I can stop the "can't connect to host" errors
>On Tue, 29 Oct 2002 at 16:53:54 -0700, Karen Barnes wrote:
> > using the "Period" command? For example; if you have this set like the
> > following:
> > Period 14d
> > then you have set a reindex every 14 days and if you run the indexer for
> > days non stop the process is going to start all over again and never
> > finish. When I run an initial crawl I set this to a very large number
>This is not correct. The URL queuer queues URLs in incrementally
>time slices. It can never loop in time during a single index run and an
>will not be indexed more than once per run (even if expired).
Surf the Web without missing calls! Get MSN Broadband.