From: Greg A. Woods (no email)
Date: Tue Sep 03 2002 - 11:50:50 EDT
[ On Tuesday, September 3, 2002 at 09:57:13 (-0400), Vivek Khera wrote: ]
> Subject: RE: Using blacklists and RBL's with Postfix
>
> >>>>> "PLS" == Paul L Schmehl <Schmehl> writes:
>
>
> PLS> I think the single most significant aspect of his research is the 0
> PLS> false positives result. That alone makes it worth pursuing.
>
> That assumes you have the time and inclination (and wherewithall) to
> generate a statistical description of *your* personal mail patterns.
> You can do this today with spamassassin -- just forget the scores
> published by the project and tune it with scores derived from your
> personal sampling of mail. It will much better match your definition
> of spam that way.
You can do that _today_ with a filter based on Bayes' Theorem. Indeed
that's how Graham's idea is expected to work. Didn't you read his
paper? You just feed it all your good mail and then feed it all the
spam you receive. If integrated into something like your IMAP server it
could classify your e-mail when it's first delivered to your inbox and
then when you "delete" your spam (or perhaps file it to a spam folder,
it'll take the tokens from that message out of the "good" list and put
them in the "bad" list. It will much better match your definition of
spam that way. For an idea of how this could work either via an IMAP
server, or just in a mail client, read the description of how it's
implemented in Apple's latest mail application release:
http://www.apple.com/macosx/jaguar/mail.html
Since nobody's mentioned it yet, and since this is wandering away from
MTA filtering and towards final delivery filtering (i.e. almost out of
the scope of Postfix), I'll note that there's a decent filter designed
specifically for filtering e-mail, and written in C for decent
performance, available here:
http://www.tuxedo.org/~esr/bogofilter/
> That and you have to continually tune it by giving it more samples of
> spam andn non-spam as they evolve.
Indeed, that's the very idea behind using Baye's Theorem to classify
your e-mail! :-)
> I don't see statistical techniques like this working on a server-level
> with multiple types of users, unless they are applied per-user.
Well, if you had an IMAP server where all the users could re-file their
spam into a common spam filter, then for any set of users with similar
types of e-mail a statistical analysis of good vs. bad is likely to be
quite accurate. It probably won't work at a large ISP because they have
too diverse a user base (and some lusers actually want to receive spam!).
-- Greg A. Woods +1 416 218-0098; <>; <> Planix, Inc. <>; VE3TCP; Secrets of the Weird <> - To unsubscribe, send mail to with content (not subject): unsubscribe postfix-users
|
|
|