Thursday, September 09, 2004

My spam fighting idea...

I've had the same email address for about 10 years, I use it publicly, and I get a lot of spam... over 1000 messages / day ... I use several layers of spam cleaning (my mail routes through 2 ISPs with 2 and 1 mail filter each), plus I use Mozilla with it's built-in junk mail filter (which I love because it has learned what I consider spam or not...) ...

BTW: Mozilla folks... After marking receiving about 200,000 pieces of spam, the junk mail filter is getting to be pretty slow now...

Anyway... Stuff still gets through... The main thing that comes through these days are messages that are just images, and a link to a website... The problem is that there isn't a lot of content to filter on in terms of determining whether the message is spam or not... I might have a friend send a message like... "Check out this picture - it's hilarious" ... and a picture of whatever is funny this week... And I'll have spammers send the same thing, but the picture is a big pile of Viagra, and I can go buy it from their website...

My idea is this... I glanced through the patent directory and didn't see anything, so I'm establishing this entry as prior art for anyone that implements it, and giving the idea up to the public doamin...

My [insert spam filter here - on the mail server or on the client] should actually follow the links on these short little messages, and filter on the content that the link leads to... Here's a message that came today... it's actually got a plaintext part and html part, and the plaintext part has nothing to do with the html (the text is designed to be ordinary and defeating the various spam filters), but the email reader displays the html:


CJ: Message headers here...

This is a multi-part message in MIME format.

------=_NextPart_000_0000_EBDD29ED.5AE6B772
Content-Type: multipart/alternative;
boundary="----=_NextPart_001_0001_40C35639.6C2C2D59"


CJ: This plain text throws off content based filters... Most mail
readers won't even display this...


------=_NextPart_001_0001_40C35639.6C2C2D59
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit

Hello!

Her dark, pretty face xvhfglittered there in front of me. I stood with my
mouth open, trying to ljeu think of some way to answer her. We were locked
together this way for dcsqmaybe a couple of seconds; then the sound of the
mill jumped a hitch, and something kmdncommenced to draw her back away from
me. A string somewhere I didn't see hooked on oufb that flowered red skirt and
was tugging her back. Her fingernails peeled khjx.
Her dark, pretty face bpryglittered there in front of me. I stood with my
mouth open, trying to mhtx think of some way to answer her. We were locked
together this way for cpegmaybe a couple of seconds; then the sound of the
mill jumped a hitch, and something plbacommenced to draw her back away from
me. A string somewhere I didn't see hooked on cbes that flowered red skirt and
was tugging her back. Her fingernails peeled euns.

Bye.


CJ: This is the part that I end up seeing... low content, an image, and
a whole bunch of links to drug websites...

------=_NextPart_001_0001_40C35639.6C2C2D59
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body>
<p>
Hi!
</p>
<p>
<a href="http://meltinpb.toprecommend.info/sv/index.php?pid=eph5320">
<img src="cid:mexjheto_mlsjprse_mzwuzxlq" border="0"></a>

</p>
<p>
Mirror sites:   
<a href="http://kyeongsonvxxkdpy.the-only-place.info/sv/index.php?pid=eph5320">#1</a>   
<a href="http://fredoz.mymedsnet.info/sv/index.php?pid=eph5320">#2</a>   
<a href="http://neenieuhzrjxxe.masterjoy.biz/sv/index.php?pid=eph5320">#3</a>

</p>
<p>
wbr,
Starbuck Davis
</p>
</body>
</html>

</p>
<p>
wbr,
Starbuck Davis
</p>
</body>
</html>



SO... My filters have been fooled... Following the first link however I'm
taken to a pretty slick looking website... The content on the website includes:



Super Viagra, or CIALIS is used to treat erectile dysfunction, also known as impotence. This is when a man cannot get, or keep, a hard erect penis suitable for sexual activity.


The active ingredient in CIALIS tablets, tadalafil, belongs to a group of medicines called phosphodiesterase type 5 inhibitors.
Following sexual stimulation CIALIS helps the blood vessels in your penis to relax which allows the blood flow into your penis.
The result is the improved erectile function.
CIALIS will not help you if you do not have erectile dysfunction.



So I'm pretty sure that there's some good content there that would have my
filters freaking out...

This technique would have a couple of advantages:

1) If _every single email_ that a spammer sent resulted in their website delivering 10K of data, there would be some pretty serious (ie. expensive) bandwidth used up... (And might result in spammers effectively DOS'ing their own sites)...

2) The website(s) for these spammers pretty professional... they spell "Viagra", not "V1agr!" - so the link is more representative of the content of the message than the actual message content...

3) Legitimate emailers that send 50-100 messages wouldn't have bandwidth or DOS issues... legitimate emailers would have links to sites that are of interest to me... the content on their sites would match my preferences, and wouldn't be filtered...

So there it is folks... Mail / spam developers, go build this into server side and client side mail so that just a bit more spam will be auto-filtered before I see it... :)

12 Comments:

Blogger Yeroc said...

Check out Paul Graham's article Filters that Fight Back which talks about the same idea...

September 13, 2004 5:59 PM  
Blogger dim said...

I'm all for any filtering, but one problem with that is if the links are encoded so they identify your email address as live. You may or may not care about this, but something that needs to be considered....

September 13, 2004 7:35 PM  
Anonymous Anonymous said...

It's all very well coming up with technical solutions to a social problem, but the only real way that spam is going to be eradicated is if the underlying infrastructure/network providers, over whose networks most of this spam travels, jointly agree to do something about it. By this I mean these companies standing up as one and publicly stating that if any ISP/hosting company does not get it's house in order and stamp down on spammers then they will have all network traffic bounced back at them and effectively disconnected from the Internet. This directly hits the ISP/hosting company in the pocket and if they don't sort things out quickly then they go out of business.

There are plenty of spam reports posted very day to identify the source of a lot of the junk. Administrators are notified, yet nothing is done. Enough of the technical solutions, time to use a baseball bat.

Time to disconnect them.

OK so the spammers can jump from ISP to ISP and hosting company to hosting company but before long they will run out of places to go. If this means that a lot of the free email providers require more stringent identity checks then so be it. A small price to pay to get rid of spam.

December 18, 2004 4:58 AM  
Anonymous Anonymous said...

Hmm. As soon as this scheme works, a spammer will get through by converting their red-flag text (or all text) into graphics. Still you just might save the planet collectively a few billion man-hours of hand-filtering.

-- Bob Stein, VisiBone

December 18, 2004 12:19 PM  
Anonymous Anonymous said...

One way spammers might defeat this (intentionally or not) is to have web pages load slowly. Humans will wait at least several seconds, but a spam filter that takes several seconds to check a link on every possible spam message will overload the server rather quickly (or if it's a filter running in your mail client, slow things down even more).

December 18, 2004 1:41 PM  
Anonymous Anonymous said...

Spammers could use this to DOS legitimate websites as well. Just put the URLs of a few anti-spam websites in there that they don't like, and those sites would still have to deal with the increased bandwidth from everyone's filters as well. And what if each email has 30 URL's in it? How slow it that going to be? Thomas thomasamos.com

December 18, 2004 2:35 PM  
Anonymous Anonymous said...

As much as everybody hates Mr. D. J. Bernstein, his cdb files tend to be rather teeny... SO, what we could do is this: every time the filter checks a website, if the url contains content that is deemed ok/not-ok, we could cache this information in either a flatfile or one of Mr. DJB's cdb files, if the particular link is read and is considered ok, we'll jump to the next link *until* we find one that isn't ok, and at that point, we'll bounce the message as user-does-not-exist (550 i think), if we come across a link that isn't in the db, we can then test to see if it leads to spam content and if it does, add it to the db... kinda destroys the entire concept of DOS'ing them, but there are other fun things we could do (ARIN lookups and emails to abuse@net.work.tld?) who knows... anyway, keeping a database of known good/bad urls would probably be the best way to get around a huge backlog of http request in the spam engine.

Please forgive the runon sentences, I type like I think sometimes.

Allen Parker
hardcore-linux.net

(please also add this as prior art ;-) )

December 19, 2004 6:09 AM  
Anonymous Anonymous said...

0spam takes care of my 7 years old email address and not one single spam mail gets in my inbox!

They offer a little program that sifts the spam out and people can get into the inbox if they click a link to removed them from the junk mail. And it´s free.

http://www.0spam.com/

December 20, 2004 8:32 AM  
Anonymous Anonymous said...

Hi Chris,

What about "let's write that filter!" approach? I'd be glad to help. Especially if it is released as F/OSS.

My name is Grigor Gatchev (g_gatchev), and my e-mail is on yahoo. Will be glad to hear from you.

December 21, 2004 6:23 AM  
Anonymous Anonymous said...

You may want to consider to use a different approach other than content filtering. Disposable email is one. There are some free one as well as commercial, but I found ZoEmail (http://www.zoemail.com/) is the easiest to use among the disposable email approach.

December 22, 2004 8:42 AM  
Anonymous Anonymous said...

The real problem is the people who click and buy. These people make the whole game so profitable for the spammers.

I do not know how politically or legally feasible it is, but I would live to see "sting" spam sites. These sting sites would actually send spam (yes adding to the problem), but the link would lead to a site that would warn the spamee the dangers of clicking on links that come from unsolicited sources. The site could also have a hall of shame were you could check if neighbors or friends were purchasing penis extenders, fake degrees, phony drivers licenses etc.

After enough stings, people would start to trust e-mail offers less thus making spamming less effective and therefore less likely.

March 14, 2005 12:16 PM  
Anonymous Anonymous said...

In late 2003, I wrote software that was going to turn spam against spammers, launching a DDoS against their web servers (see An Anti-Spam Manifesto). What stopped me from going live with it was the threat of joe jobs.

I now utilize a system of domain-specific email addresses (see PhishProof.com). I agree with your previous commenter who said that the real problem is that there are still customers who make spamming a viable endeavor. The hall of shame idea is an interesting one, but I suspect that the people who buy penis extenders and farm porn have a fairly high tolerence for shame.

One thing is certain. Spam filters don't work. Even the guys at Google haven't been able to get this right. I don't think it's possible to consistently filter out spam based on content. Spammers will always be able to come up with another trick, and our computers run slower and slower, performing all those tests.

SMTP is broken. Did Microsoft create SMTP? :P

April 21, 2005 1:39 AM  

Post a Comment

<< Home