Build your own spam filter with PHP and DNSBLs
December 2, 2006
Have you ever gotten an email asking if want certain parts of your body enlarged, parts that you might not even have? Was the next email you read one asking if you want to loose the inches you’ve recently gained? Did you ever notice how these emails are always from people that you are fairly certain have nothing to do with the contents of the email. Did MTeresa@Vatican.org really send that diet pill email? Have you ever gotten returned or rejected “can’t be delivered” emails from addressed you’ve never ever sent an email to?
I have.
SPAM. It’s HORRIBLE! My email box for Brian@TheCodeCave.com probably gets 3 to 1 spam over real email. I expected that. I put that address out everywhere and don’t protect it. It is meant to be my public address. But the FROM addresses on all that email never indicates who the email is really from. Even the company information inside the email header is faked. The spammers will grab someother name on their spam list and use it as their from address. I’ve had my name put into the from address of emails a few times. It’s an annoying problem, just ask the Nuclear Moose.
Why this can happen is a long story. It all relates back to the fact that SMTP and port 25 were never meant for submitting emails to email servers. SMTP was only meant for server to server communications. However, that’s for a different post. The long and short of it is that everything can be faked except for one thing: the IP address of the server that sent the email.
Because that IP address is accurate, you can use it to tell if the person that sent the email is a spammer. The post tells you a couple ways to do that. And because this is The Code Cave, you get a fully functional php routine to boot.
First let me show you what I am talking about. Emails contain lots of information that you don’t normally see. Most of that information is in the header of the email. Each email client has different ways of showing email headers. You might find it by viewing the properties of the email. In Outlook, you can see it by choosing, from an open email, View->Options. It will be there under the name of “Internet Headers”.
Here is what one email header looked like that came to me back in June:
|
Return-Path: <tkuhnel@alushiptechnology.com> |
Almost all of that information in there about who sent this stuff is garbage. “alushiptechnology.com” had nothing to do with the email. Poor TKuhnel certainly had nothing to do with it. He/She just has their email address out there in the spam databases. Google even shows 10 or more names associated with this poor shlep.
But the secret is that “client-ip=74.139.17.40″ in the Received SPF header is accurate. That is the machine that sent the spam. And if it sent one spam, it’s quite likely that it has sent more. That’s when DNSBLs come in. DNSBL stands for Domain Name Server Black List. Usually these lists are generated by creating an email address that is never meant for real use. Then any emails that arrive at that address is, by definition, spam. These addresses are called SPAM Traps. And the places where they are located are often called Honey Pots.
These lists of spammer IP address are made available to anyone anyone that wants to use them. Let’s take a look at how another IP address, 202.177.183.110, is identified by various DNSBLs:
DNSBL – Result (Reason)
AHBL – LISTED (127.0.0.3)
CBL – LISTED (127.0.0.2)
DNSBLNETAUOSPS – LISTED (127.0.0.2)
DNSBLNETAUT1 – LISTED (127.0.0.2)
DSBL – LISTED (127.0.0.2)
DSBLALL – LISTED (127.0.0.2)
EMAILBASURA – LISTED (127.0.0.2)
NJABLPROXIES – LISTED (127.0.0.9)
PSBL – LISTED (127.0.0.2)
SBL-XBL – LISTED (127.0.0.2)
SORBS-HTTP – LISTED (127.0.0.2)
SORBS-SOCKS – LISTED (127.0.0.3)
TQM-DYNAMIC – LISTED (127.0.0.2)
TQM-SPAMTRAP – LISTED (127.0.0.3)
UCEPROTECTL1 – LISTED (127.0.0.2)
Clearly this address was used by a spammer. But how can you take advantage of this?
Well, one of two ways. First you can contact these DNSBL sites and download the list. Or you can ask them to process a single address at a time. That’s what my routine does. It takes an IP address (202.177.183.110) and formats it like this: 110.183.177.202.bl.spamcannibal.org for each spam list I want to check. Then I call a GetHostByName to ask that server who that host really is. If I get back the exact same text that I sent, then I know the address is clean. If I get back something like 127.0.0.2, I know that the address is listed in their DNSBL for reason #2 (whatever that means to them). 127.0.0.3 would indicate reason 3.
When that happens, my routine deletes the email.
What I’m posting here is my preliminary version. I have a fuller version that is much more optimized and is much nicer to the DNSBLs. I’ll give that to anyone who makes a donation to the site and requests it. I’d like to start getting a little something out of this site and this routine adds some real value. Its eliminated 98% of my SPAM emails. I’m confident enough in it that I have it set to permanently delete the emails without my even seeing them. This version doesn’t do that and if it is used too frequently, it could cause your requests to be ignored by the DNSBLs. I’ll also provide a large list of all of the name servers I have and provide some tools you can use to tune your lists.
Here is the download link:
http://www.thecodecave.com/downloads/php/TCCSpamFilter.php.txt
[php]
// *************************************************************************
// TCCSpamFilter.php 11/27/2006
// Written by Brian Layman
//
// A PHP example written to filter out spam from a webmail account.
// Provides an example of DNSBL filtering via domain name lookups.
// See http://www.thecodecave.com/article288 for details
//
// Usage:
// Customize the password, place it on your web site, and call it.
// Alternatively, add it to your cron tab file with a line like this:
// 00,15,30,45 * * * * wget -q http://www.example.com/ISpamFilter.php
//
// WARNING: DO NOT USE THIS SCRIPT MORE OFTEN THAN EVERY 15 MINUTE.
// You will be blocked if you abuse DNSBL services in this fashion.
// I have written an optimized version of this script that can be run
// once a minute and will produce MUCH less traffic than this version.
// Anyone who makes a donation to the site, and requests that source
// can get it.
//
// Original Author - Brian Layman
//
// Created - 27/Nov/2006
// Last Modified - 02/Dec/2006
// Contributors: (Put your name & Initials at the top)
// Brian Layman - BL - http://www.TheCodeCave.com
//
//
// History:
// 27/Nov/2006 - BL - Created
// 02/Dec/2006 - BL - Further Cleaning. Final comments about Rot13
//
// License - If this helps you - Great! Use it; modify it; share it,
// link back to my site.
//
// Indemnity -
// Use this file at your own risk. I'm not going to deliberately hack
// your server, but others might. I may or may not have been worried
// about security when I wrote this routine. It is up to YOU to make
// certain that ANY routines that you put on your Site are safe. Just
// because you see a variable here protected by AddSlashes or
// HTMLSpecialChar does not mean that ALL variables are protected.
//
// If this file allows a hole into your site, it is not my fault. In
// fact, you should just stop right now and delete this file. For if
// it causes blue smoke to be emitted from your web server, if it
// resets your business URL to point to MyClientsSuck.com, or if it
// causes your sister break up with her lawyer boyfriend and start
// dating a caver, it is not my fault. (Actually that last one might
// be an improvement, but it is still not my fault.) YOU are
// responsible for YOUR site. Learn how to protected it and understand
// what every line of code does that you use.
//
// Donations - If this batch file really helps you out, feel free to make
// donate an expresso via Paypal to Brian@TheCodeCave.com or just
// leave a comment at http://www.thecodecave.com/did-that-help and
// include your country of origin.
/*********************************************************************************/
/* Support Routines */
/*********************************************************************************/
// *******************************************************************************
// IP In Mask Range - NOT CURRENTLY USED
// Allows filtering via IP masks just as a normal network configurations do.
// Example: ipinmaskrange("192.168.100.0", "255.255.255.0", "192.168.100.20")
// Returns true because the example is in the network
// Example2: ipinmaskrange("192.168.100.0", "255.255.255.0", "192.168.101.20")
// Returns false because the example is outside the network
// *******************************************************************************
function ipinmaskrange($network, $mask, $ip) {
$ip_long=ip2long($ip);
$network_long=ip2long($network);
$mask_long=ip2long($mask);
if (($ip_long & $mask_long) == $network_long) {
return true;
} else {
return false;
}
}
// *******************************************************************************
// IP In Range
// Specify a range in the form a-b and the routine returns a true if the passed
// IP address is in that range.
// Example: ipinrange("192.168.100.0-192.168.100.255", "192.168.100.20")
// Returns true because the example is in the network
// Example2: ipinrange("192.168.100.0-192.168.100.255", "192.168.101.20")
// Returns false because the example is outside the network
// *******************************************************************************
function ipinrange($range, $ip) {
$range = explode("-", $range);
$rangestart = ip2long($range[0]);
$rangeend = ip2long($range[1]);
$remote_ip = ip2long($ip);
if (($remote_ip >= $rangestart) && ($remote_ip <= $rangeend)) {
return true;
}
else {
return false;
}
}
// *******************************************************************************
// IMAP Get Full Header
// (Thanks JamieD - http://www.codingforums.com/archive/index.php?t-89994.html)
// Returns an array containing the original message header
// *******************************************************************************
function imap_get_full_header( $p_stream, $p_msg_number )
{
$header_string = imap_fetchheader ( $p_stream, $p_msg_number );
$header_array = explode ( "\n", $header_string );
foreach($header_array as $line)
{
if(eregi("^([^:]*): (.*)", $line, $arg))
{
$header_obj[$arg[1]] = $arg[2];
$last = $arg[1];
}
else
{
$header_obj[$last] .= "\n" . $line;
}
}
return ( $header_obj );
}
// *******************************************************************************
// Blocked IP
// Performs a DNS check against a specific IP address.
// Domain Name Server Blacklists (DNSBLs) use this method to declare whether an
// email has been sent from an IP address that has been known to send spam.
// *******************************************************************************
function BlockedIP($Suspect_IP, $DNSvr_Address)
{
$ReverseOrderedIP = array_reverse(explode('.', $Suspect_IP));
$FullLookupAddress = implode('.', $ReverseOrderedIP) . '.' . $DNSvr_Address;
if ($FullLookupAddress != gethostbyname($FullLookupAddress)) {
return true;
} else {
return false;
}
}
// *******************************************************************************
// Sender IP
// Given a mailbox and message number, this routine returns the IP address of
// computer that sent the email. The "from" address can be faked, this IP
// address cannot.
// *******************************************************************************
function senderip($mbox, $num){
$struct = imap_get_full_header($mbox, $num);
$str_in = $struct['Received-SPF'];
$tween=""; // not needed but good practise when appending
$chr1='client-ip=';
$chr2=';';
for ($i=strpos($str_in, $chr1)+10;$i
}
return $tween;
}
// *******************************************************************************
// Is Black Listed
// This the core routine. Given an IP address, it runs some checks to decide if
// the email was sent from a black listed spammer.
// Usage: $is_it_spam = isblacklisted("192.168.100.1");
// *******************************************************************************
function isblacklisted($ip){
// If there are some people I never even want to see an email from, I would put
// their IP address in the blacklist ranges.
// Example: $BlackList = array("192.168.100.1-192.168.100.5","192.168.102.112-192.168.102.112");
$BlackList = array();
// If there are some people that are declared as spammers by a blocking service I want to use
// I would declare them in the white list.
$WhiteList = array("64.233.160.0-64.233.191.255", // Google mail is allowed
"12.196.88.128-12.196.88.159"); // A false listing due to virus infection that has been purged.
// Check white list membership first for optimization reasons.
$allowed = false;
foreach($WhiteList as $range) {
if(ipinrange($range, $ip)) {
$allowed=true;
}
}
// If this address doesn't get a free pass, check it out further.
if (!$allowed) {
// Iterate the black lists and check the ip address against them
foreach($BlackList as $range) {
if(ipinrange($range, $ip)) {
$blocked=true;
}
}
// PHP uses "Short Circuit Evaluation" so as soon as at true is hit, the routine exits out.
// This statement should be optimized with the local check first, and then the DNSBLs from
// the most inclusive to the least. You want to do as few external checks as possible.
// The full version of this script comes with several other recommended DNSBLs and my full list
// of DNSBLs of which I am aware.
return ($blocked ||
(BlockedIP($ip, 'bl.spamcop.net')) ||
(BlockedIP($ip, 'sbl-xbl.spamhaus.org')) ||
(BlockedIP($ip, 'dnsbl-2.uceprotect.net')) ||
(BlockedIP($ip, 'blackholes.five-ten-sg.com')) ||
(BlockedIP($ip, 'bl.spamcannibal.org')) ||
(false));
}
else {
return false ;
}
// NEVER BLOCK WITH: BLARSBL, FIVETENIGNORE, FIVETENSRC, JAMMDNSBL, SPAMBAG, SPEWS (these block large IP ranges)
// NEVER BLOCK WITH: MAPS-DUL, SORBS-DUHL (these knowingly list IPs that do not meet listing criteria).
}
// This routine iterates all of the emails on the server and checks if they are spam.
// if they are it deletes them. Because it is an IMAP server, they are still online
// If you wish to remove them, use a purge command.
function blockspam($MAILSERVER, $PHP_AUTH_USER, $PHP_AUTH_PW){
$mbox=imap_open($MAILSERVER, $PHP_AUTH_USER, $PHP_AUTH_PW);
// In this example version, this iterates ALL messages in your mailbox.
// The full version only iterates the new messages that have come in.
// By doing that, you can run this check as often as once a once a minute and have much
// less traffic running the check once an hour. If you abuse a DNSBL service, they might
// block your IP address.
for($x=0; $x < imap_num_msg($mbox); $x++) {
$idx=($x+1);
$ip=senderip($mbox, $idx);
if (isblacklisted($ip)) {
imap_delete($mbox, $idx);
}
}
imap_close($mbox);
}
/*********************************************************************************/
/* Main Calls */
/*********************************************************************************/
// This initial version only works with imap servers. That means you use
// port 143.
// Uncomment this line and put in your email server, email login and password
// blockspam("{imap.example.com:143}", "you@example.com","yourpassword");
// I don't like storing the passwords in plain text. You can use Rot13 as a
// really simple encryption method then you can have this file on your screen
// without a passer-by seeing your email password. Rot13 doesn't make it safe
// no two-way encyrption in source does, but it will block wandering eyes.
// Use http://rot13.thecodecave.com to get the encyrpted versions of the text.
// That above line would look something like this:
// blockspam(str_rot13("{vznc.rknzcyr.pbz:143}"), str_rot13("lbh@rknzcyr.pbz"), str_rot13("lbhecnffjbeq"));
// If you have multiple accounts, add another block spam line.
?>
[/php]
Comments
2 Responses to “Build your own spam filter with PHP and DNSBLs”
Got something to say?
http://www.thecodecave.com/downloads/php/TCCSpamFilter.php.txt dosen’t work =’(
huh.. the file is there but the server can’t see it… Here try this.. http://www.thecodecave.com/downloads/php/TCCSpamFilter.txt