Re: [aseek-users] Need help limiting found documents to one directory

From: Kir Kolyshkin (no email)
Date: Mon Jan 13 2003 - 10:12:44 EST


Looks you have done everything right. Hmm...could you check searchd's log
file /usr/local/aspseek/var/aspseek12/dlog.log for some "Subset not found"
messages? Also, as a last resort, try restarting searchd....

KEVIN ZEMBOWER wrote:
> I'm trying to restrict the found documents to one's in a particular directory. Our aspseek search engine is at http://www.jhuccp.org/cgi-bin/s.cgi.
>
> If you enter a search term like 'advocacy', you should get a return of about 424 documents. To do this, aspseek uses this URL:
> http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0
>
> We want to limit the found documents to the ones that have 'advocacy' in them in the /popreporter/ directory. To do this, I created this record in MySQL:
> www:/usr/local/aspseek/etc# mysql -u aspseek12 -p aspseek12
> mysql> select * from subsets;
> +-----------+-------------------------------------+
> | subset_id | mask |
> +-----------+-------------------------------------+
> | 2 | http://www.jhuccp.org/popreporter/% |
> +-----------+-------------------------------------+
>
> When I run index -B, I get:
> www:/usr/local/aspseek/etc# su - -s /bin/bash aspseek
> aspseek at www:~$ sbin/index -B
> Loading configuration from /usr/local/aspseek/etc/db.conf
> Loading configuration from /usr/local/aspseek/etc/ucharset.conf
> Loading configuration from /usr/local/aspseek/etc/stopwords.conf
> Loading configuration from /usr/local/aspseek/etc/aspseek.conf
> Generating subset http://www.jhuccp.org/popreporter/% ... done (96 URLs)
> index process finished.
> aspseek at www:~$
>
> This seems to indicate that I've got the subset set up correctly.
>
> Then, to test this, I manually edit the URL in the browser's location box to:
> http://www.jhuccp.org/cgi-bin/s.cgi?q=advocacy&cs=&ps=20&o=0&ul=http://www.jhuccp.org/popreporter/%
> I've tried variations on this, such as putting the URL in quotes, just using '/popreporter/' etc. Still no joy.
>
> When I submit it, it returns the same 424 documents as before; no restriction to the /popreporter/ directory is done.
>
> I've read in some of the posts to this list that the subset should be set up without the '%', so I also tried that:
> aspseek at www:~$ mysql -u aspseek12 -p aspseek12
> Enter password:
> mysql> select * from subsets;
> +-----------+------------------------------------+
> | subset_id | mask |
> +-----------+------------------------------------+
> | 1 | http://www.jhuccp.org/popreporter/ |
> +-----------+------------------------------------+
> 1 row in set (0.00 sec)
>
> Then I run:
> aspseek at www:~$ sbin/index -a -m -u "http://www.jhuccp.org/popreporter/%"
> Loading configuration from /usr/local/aspseek/etc/db.conf
> Loading configuration from /usr/local/aspseek/etc/ucharset.conf
> Loading configuration from /usr/local/aspseek/etc/stopwords.conf
> Loading configuration from /usr/local/aspseek/etc/aspseek.conf
> Adding URL: http://www.jhuccp.org/popreporter/current.shtml
> Adding URL: http://www.jhuccp.org/popreporter/subscribe.shtml
> Adding URL: http://www.jhuccp.org/popreporter/index.shtml
> Adding URL: http://www.jhuccp.org/popreporter/2002/02-25.shtml
> <snip>
> Adding URL: http://www.jhuccp.org/popreporter/2001/06-11.shtml
> Adding URL: http://www.jhuccp.org/popreporter/2001/06-04.shtml
> Saving real-time database ... done.
> Saving delta files [..................................................] done.
> Deleting 'deleted' records from urlword[s] ... done. (0 records deleted)
> Saving real-time ... done
> Saving redirects ... done
> Splitting href delta file ... done
> Saving href delta files ... done
> Saving direct href delta files ... done
> Calculating ranks [................................................] done.
> Saving lastmods ... done
> Generating word site ... done
> Generating subset http://www.jhuccp.org/popreporter/ ... done (0 URLs)
> index process finished.
> aspseek at www:~$
>
> The dlog.log says, "Subset http://www.jhuccp.org/ not found". Yet, the index command suggests that it found plenty.
>
> Could someone please set me straight on how this should work? Thank you very much for your help.
>
> -Kevin Zembower
>

-- 
== kir_at_asplinux.ru == 7551596_at_ICQ == 6722750_at_sms.beemail.ru ==
Dream like you'll live forever...Love like you've never been hurt...
Work like you don't need the money...and Dance like nobody is watching!
        -- Satchel Paige







Hosted Email Solutions

Invaluement Anti-Spam DNSBLs



Powered By FreeBSD   Powered By FreeBSD