From: Kir Kolyshkin (no email)
Date: Wed Dec 04 2002 - 11:33:50 EST
John Grubb wrote:
> I have posted a similar query on the pdftohtml list
>
> I'm attempting to crawl portions of the web with aspseek. Html output
> is working fine a is very stable. I have configured pdftohtml as a
> converter. It indexes most pdf's fine, so I don't think its a config
> problem, but crashes the crawl on some. when I download the file and
> try it command line it works fine. I'm currently running on the latest
> sources from cvs, having first tried 1.2.6 and 1.2.10. aspseek log
> output is as follows:
>
> ( 2 20 20 182 12 29 7 20) Adding URL: http://www.lsic.com/fin/annual01.pdf
> exec /usr/bin/pdftohtml -i -noframes -stdout /tmp/asi5dQRXA >/tmp/asoXjR7TX
> Address of param: ba072d20
> Address of param: ba07a560
>
> all 20 threads then crash.
>
> Just started using pdftohtml yesterday. do I need different params
Try to run pdftohtml from command line:
/usr/bin/pdftohtml -i -noframes -stdout /tmp/asi5dQRXA > out
and see what's in the file 'out'.
Also, it would be great if you send us output of gdb's 'bt full' here,
like this:
$ ulimit -c 0
$ /path/to/index [your flags]
...it crashes...
$ gdb /path/to/index core | tee crashfile
....gdb starts up...
(gdb) bt full
....backtrace is shown, press 'space' when asked
(gdb) quit
And send the file 'crashfile' to this list, or to
--
== kir_at_asplinux.ru == 7551596_at_ICQ == 6722750_at_sms.beemail.ru ==
Dream like you'll live forever...Love like you've never been hurt...
Work like you don't need the money...and Dance like nobody is watching!
-- Satchel Paige
|
|
|