RE: Using blacklists and RBL's with Postfix

From: Schmehl, Paul L (no email)
Date: Wed Sep 04 2002 - 16:39:40 EDT


I'm afraid I have to disagree with you. Comments embedded in the middle
of words would simply be classified as highly spammish. The Bayesian
approach would have *absolutely* no problem dealing with this.

You seem to have the mistaken impression that a Bayesian filter "cares"
about words. It does not. It takes in email as a single string, and
then it parses that single string for "conspicuous" tokens that indicate
either spammishness or "normalcy". For a spammer to get past properly
done Bayesian filter, they would have to make the spam look like normal
email. They cannot do that *and* get their spammish message across.

E.g. how do you sell something without every saying it's for sale? You
*have* to use some phraseology that indicates that you have something to
sell. Normal email doesn't embed comments within words. Normal email
doesn't embed spaces in words. All of these behaviors are dead
giveaways of spammishness. And the entire premise of Graham's thesis is
that the algorithm needs to recognize the same things that you recognize
as spammish.

You instantly know that spaces between letters is abnormal. So would
the Bayesian filter. You instantly know that comments within words in
HTML is abnormal. So would the Bayesian filter.

This isn't directed at you, Clifton, but I get the distinct impression
that a lot of people have either not read Paul Graham's thesis at all or
have skimmed it quickly without understanding its primary thesis. A
Bayesian filter, as described by Graham, does not depend on keywords.
It depends on "normal" versus "abnormal" patterns in email.

Paul Schmehl ()
Supervisor of Support Services
The University of Texas at Dallas
AVIEN Founding Member
http://www.utdallas.edu/~pauls/

> -----Original Message-----
> From: Clifton Royston [mailto:]
> Sent: Wednesday, September 04, 2002 1:24 PM
> To: Postfix Users List
> Subject: Re: Using blacklists and RBL's with Postfix
>
>
> On Tue, Sep 03, 2002 at 04:52:06PM -0400, Greg A. Woods wrote:
> > > This also presumes (like every other anti-spam silver
> bullet) that
> > > spammers are completely incapable of adapting, which has been
> > > repeatedly shown to be wrong.
> >
> > Do you have a scientific criticism of Graham's assertion about how
> > robust a Bayesian filter should be even when the "attacker" knows
> > exactly what algorithm is being used?
>
> Sure. And in fact, they are *already* counter-adapting to
> both the regular expression body matches and the Bayesian
> approach. I have seen 3 different approaches already!
>
> Approach 1, outline of proof: the Bayesian algorithm deals
> with tokens. It is trivial for an "attacker" who knows the
> algorithm for the tokenization to randomly break up the text
> in each spam being sent out such that the same tokens do not
> reappear and so do not get "learned" and can not be matched.
-
To unsubscribe, send mail to with content
(not subject): unsubscribe postfix-users








Hosted Email Solutions

Invaluement Anti-Spam DNSBLs



Powered By FreeBSD   Powered By FreeBSD