I’ve developed a basic Java implementation of Paul Graham’s bayesian spam filter as an example for my Programming from A to Z course. A full explanation is available. This isn’t a robust spam filter by any means, but simply demonstrates the basics.
The above visualization was quickly hacked out with Processing and shows the words most likely to indicate a spam e-mail with size tied to frequency of occurence (Note these aren’t the same thing, just because it appears more often doesn’t mean it’s more likely to indicate spam. It could just as well appear more often in so-called “good” e-mails.) This also uses an incredibly lame (i.e. small) training set of “bad” and “good” messages and is flawed in many other ways. Someday, I might actually do something interesting with this. Sigh.

No Responses to “Spam Filtering in Java”
Please Wait
Leave a Reply