From: mark david mcCreary (no email)
Date: Fri Jan 04 2002 - 10:47:14 EST
I have a web archive that constantly has new web pages added to it.
Each web page that I want indexed has this filename pattern -
msgxxxxx.html (where xxxxx is a unique number).
My first attempt at having new pages indexed, was to run a crontab
job calling index. The aspseek.conf file has an include line of
server statements
include /home/mhonarc/aspseek_server_start_url
Which contains lines like this
AuthBasic listone:
Server http://www.internet-tools.com/listone/
The aspseek.conf file also has these lines
Allow msg.*\.html$ \/$
Disallow .*
This works fine the first time I run the index.
The next time the index program is called, no new URL's are found to
process, despite new messages being added to the web site.
Does anybody have any suggestions on how to get around this bug, or
another way to index recently added web pages.
Thanks
mark
|
|
|