Build your own spam filter with PHP and DNSBLs

By Brian December 2, 2006 PHP 3 Comments

Have you ever gotten an email asking if want certain parts of your body enlarged, parts that you might not even have? Was the next email you read one asking if you want to loose the inches you’ve recently gained? Did you ever notice how these emails are always from people that you are fairly certain have nothing to do with the contents of the email. Did MTeresa@Vatican.org really send that diet pill email? Have you ever gotten returned or rejected “can’t be delivered” emails from addressed you’ve never ever sent an email to?

I have.

SPAM. It’s HORRIBLE! My email box for Brian@TheCodeCave.com probably gets 3 to 1 spam over real email. I expected that. I put that address out everywhere and don’t protect it. It is meant to be my public address. But the FROM addresses on all that email never indicates who the email is really from. Even the company information inside the email header is faked. The spammers will grab someother name on their spam list and use it as their from address. I’ve had my name put into the from address of emails a few times. It’s an annoying problem, just ask the Nuclear Moose.

Why this can happen is a long story. It all relates back to the fact that SMTP and port 25 were never meant for submitting emails to email servers. SMTP was only meant for server to server communications. However, that’s for a different post. The long and short of it is that everything can be faked except for one thing: the IP address of the server that sent the email.

Because that IP address is accurate, you can use it to tell if the person that sent the email is a spammer. The post tells you a couple ways to do that. And because this is The Code Cave, you get a fully functional php routine to boot.

First let me show you what I am talking about. Emails contain lots of information that you don’t normally see. Most of that information is in the header of the email. Each email client has different ways of showing email headers. You might find it by viewing the properties of the email. In Outlook, you can see it by choosing, from an open email, View->Options. It will be there under the name of “Internet Headers”.

Here is what one email header looked like that came to me back in June:

Quote:

Return-Path: <tkuhnel@alushiptechnology.com>
Delivery-Date: Mon, 26 Jun 2006 23:53:47 -0400
Received-SPF: none (mxus6: 74.139.17.40 is neither permitted nor denied by domain of alushiptechnology.com) client-ip=74.139.17.40; envelope-from=tkuhnel@alushiptechnology.com; helo=Laskowski6;
Received: from [74.139.17.40] (helo=Laskowski6)
by mx.perfora.net (node=mxus6) with ESMTP (Nemesis),
id 0MKvMg-1Fv4dv28vt-0006m9 for brian@thecodecave.com; Mon, 26 Jun 2006 23:53:47 -0400
From: "Adam Field" <tkuhnel@alushiptechnology.com>
To: <brian@thecodecave.com>
Subject: RICARDO examined BENJAMIN of a please
Date: Tue, 27 Jun 2006 03:53:44 +0480
MIME-Version: 1.0
Content-Type: multipart/related;
type="multipart/alternative";
boundary="—-=_NextPart_000_006A_01C69962.9CE1DB50"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2670
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670
Message-ID: <0MKvMg-1Fv4dv28vt-0006m9@mx.perfora.net>
Envelope-To: brian@thecodecave.com

Almost all of that information in there about who sent this stuff is garbage. “alushiptechnology.com” had nothing to do with the email. Poor TKuhnel certainly had nothing to do with it. He/She just has their email address out there in the spam databases. Google even shows 10 or more names associated with this poor shlep.

But the secret is that “client-ip=74.139.17.40” in the Received SPF header is accurate. That is the machine that sent the spam. And if it sent one spam, it’s quite likely that it has sent more. That’s when DNSBLs come in. DNSBL stands for Domain Name Server Black List. Usually these lists are generated by creating an email address that is never meant for real use. Then any emails that arrive at that address is, by definition, spam. These addresses are called SPAM Traps. And the places where they are located are often called Honey Pots.

These lists of spammer IP address are made available to anyone anyone that wants to use them. Let’s take a look at how another IP address, 202.177.183.110, is identified by various DNSBLs:

DNSBL – Result (Reason)
AHBL – LISTED (127.0.0.3)
CBL – LISTED (127.0.0.2)
DNSBLNETAUOSPS – LISTED (127.0.0.2)
DNSBLNETAUT1 – LISTED (127.0.0.2)
DSBL – LISTED (127.0.0.2)
DSBLALL – LISTED (127.0.0.2)
EMAILBASURA – LISTED (127.0.0.2)
NJABLPROXIES – LISTED (127.0.0.9)
PSBL – LISTED (127.0.0.2)
SBL-XBL – LISTED (127.0.0.2)
SORBS-HTTP – LISTED (127.0.0.2)
SORBS-SOCKS – LISTED (127.0.0.3)
TQM-DYNAMIC – LISTED (127.0.0.2)
TQM-SPAMTRAP – LISTED (127.0.0.3)
UCEPROTECTL1 – LISTED (127.0.0.2)

Clearly this address was used by a spammer. But how can you take advantage of this?

Well, one of two ways. First you can contact these DNSBL sites and download the list. Or you can ask them to process a single address at a time. That’s what my routine does. It takes an IP address (202.177.183.110) and formats it like this: 110.183.177.202.bl.spamcannibal.org for each spam list I want to check. Then I call a GetHostByName to ask that server who that host really is. If I get back the exact same text that I sent, then I know the address is clean. If I get back something like 127.0.0.2, I know that the address is listed in their DNSBL for reason #2 (whatever that means to them). 127.0.0.3 would indicate reason 3.

When that happens, my routine deletes the email.

What I’m posting here is my preliminary version. I have a fuller version that is much more optimized and is much nicer to the DNSBLs. I’ll give that to anyone who makes a donation to the site and requests it. I’d like to start getting a little something out of this site and this routine adds some real value. Its eliminated 98% of my SPAM emails. I’m confident enough in it that I have it set to permanently delete the emails without my even seeing them. This version doesn’t do that and if it is used too frequently, it could cause your requests to be ignored by the DNSBLs. I’ll also provide a large list of all of the name servers I have and provide some tools you can use to tune your lists.

Here is the download link:
http://www.thecodecave.com/downloads/php/TCCSpamFilter.php.txt
[php]
= $rangestart) && ($remote_ip <= $rangeend)) { return true; } else { return false; } } // ******************************************************************************* // IMAP Get Full Header // (Thanks JamieD - http://www.codingforums.com/archive/index.php?t-89994.html) // Returns an array containing the original message header // ******************************************************************************* function imap_get_full_header( $p_stream, $p_msg_number ) { $header_string = imap_fetchheader ( $p_stream, $p_msg_number ); $header_array = explode ( "\n", $header_string ); foreach($header_array as $line) { if(eregi("^([^:]*): (.*)", $line, $arg)) { $header_obj[$arg[1]] = $arg[2]; $last = $arg[1]; } else { $header_obj[$last] .= "\n" . $line; } } return ( $header_obj ); } // ******************************************************************************* // Blocked IP // Performs a DNS check against a specific IP address. // Domain Name Server Blacklists (DNSBLs) use this method to declare whether an // email has been sent from an IP address that has been known to send spam. // ******************************************************************************* function BlockedIP($Suspect_IP, $DNSvr_Address) { $ReverseOrderedIP = array_reverse(explode('.', $Suspect_IP)); $FullLookupAddress = implode('.', $ReverseOrderedIP) . '.' . $DNSvr_Address; if ($FullLookupAddress != gethostbyname($FullLookupAddress)) { return true; } else { return false; } } // ******************************************************************************* // Sender IP // Given a mailbox and message number, this routine returns the IP address of // computer that sent the email. The "from" address can be faked, this IP // address cannot. // ******************************************************************************* function senderip($mbox, $num){ $struct = imap_get_full_header($mbox, $num); $str_in = $struct['Received-SPF']; $tween=""; // not needed but good practise when appending $chr1='client-ip='; $chr2=';'; for ($i=strpos($str_in, $chr1)+10;$i

[/php]