From: Danil Pismenny (no email)
Date: Fri Apr 12 2002 - 05:32:34 EDT
I've added the debug output to know which site (URL) is added to the
database and which is disallowed and why it is. What is loglevel do
I must use for it? I use the DEBUG loglevel now, but perhaps the
INFO loglevel will be much useful.
Also, some pages that are added to the database are not parsed or
parsed with errors (there is buggy tags in those pages). I added the
debug output that shows which tags are parsed. Is there any needs in
this output for anybody else?
The HTML parser is very strict, I've patched it to parse the tags
attributes that content mix quotes (e.g. <meta content='asdas") and
not closed tags (the tag is automaticaly closed if its length is
more than 1024 chars and there is '<' symbol). Any comments?
-- Danil Pismenny http://dapi.chaz.ru/
|
|
|