Fight SPAM With IPWorks
Unsolicited email, or Spam, costs millions in wasted time, overloaded servers, and overloaded networks every day.
There are a number of ways to identify Spam messages, ranging from simple keyword matching to sophisticated heuristic approaches, and even Artificial Intelligence. These solutions are hard to develop and require constant maintenance because spammers continuously find ways around them. The problem itself is so complex that it is almost hopeless for the normal IT developer with little time on his or her hands to even attempt to develop a solution.
The good news is you don't have to. You can instead leverage what thousands of people around the world are building: hundreds of freely available Spam Blacklist Databases that identify known spam sources. When an email arrives, issue a special DNS request to a blacklist DNS server. The blacklist will reply, telling you if the message comes from a known spammer.
Requirements
The purpose of this article is to explain exactly how you can programmatically detect if an email comes from a blacklisted mail server, and if so, how to get rid of it. To make the Internet communications simple, we'll use two components from the IPWorks Internet Toolkit. This particular example uses the IPWorks .NET Edition, though the same functionality can be performed with any edition of IPWorks.
The DNS component will be used for making DNS queries to the blacklist and the IMAP component will be used for communicating with the mail server. While this tutorial uses an IMAP server for email storage, the same principals apply if you use a POP server for email. There is also a POP component included in IPWorks which can be used for mail retrieval as well.
Connecting to the Mail Server
To connect to the IMAP server, simply specify the mailserver, user, and password properties of the IMAP component and call the Connect() method. After that, I can specify any particular folder on the server by setting the mailbox property. There are two modes in which the IMAP component can open the mailbox: read-write, or read-only. I'll want to move or delete spam messages, which means I'll need read-write permissions, so I'll use the SelectMailbox method for this. To open the mailbox in read-only mode use ExamineMailbox.
'connect to imap server: Imap1.MailServer = txtServer.Text Imap1.User = txtUser.Text Imap1.Password = txtPass.Text Imap1.Connect() Imap1.Mailbox = txtMailbox.Text 'for example, "INBOX" 'open the mailbox in read/write mode: Imap1.SelectMailbox()
For each message in the mailbox, I'll fetch the message headers and examine them to determine if a spammer could have sent the email.
For i = 1 To Imap1.MessageCount 'fetch the message headers Imap1.Mailbox = txtMailbox.Text Imap1.SelectMailbox() Imap1.MessageSet = i.ToString() Imap1.FetchMessageHeaders() '... ' Here I'll add code to parse these headers, find out ' who is sending the email, and check the blacklist to determine ' if its sent by a spammer or not '... next i imap1.disconnect()
If you read the block of code above, you no doubt noticed that I left some code out and replaced it with comments. What goes in the place of those comments? Once I fetch the headers of each message, I'll need to look at the "Received" headers of the email to find out who is sending the message. Then, I'll query the blacklist to determine if this is a known spammer.
"Received" Headers - Who Sent This Email?
Every email that you receive has 1 or more "Received" headers. These headers provide an Internet trail of where the email has been on the path from its sender to the mail recipient. Because of this, these message headers are the key to determining if a message is spam.
For example, I will select a random email from my inbox and look at its headers. Below are 3 actual headers which were included in this message. This is the order in which they appear in the message headers (with the most recent is included first).
Received: from mail.nsoftware.com by lizard.nsoftware.com with POP3 (fetchmail-5.9.0) for lancer@localhost (single-drop); Fri, 07 Nov 2003 16:34:47 -0500 (EST) Received: from GUILLERMO [200.47.102.143] by MAXMAIL001.maximumasp.com id AAF07E40036; Fri, 07 Nov 2003 16:13:20 -0500 Received: from [92.85.134.32] by GUILLERMO id <5448835-66373>; Fri, 07 Nov 2003 22:12:28 +0100
This tells me the trail that this email has followed across the Internet. It originated (according to the message header) at 92.85.134.32 and was sent to GUILLERMO. Then GUILLERMO sent it to MAXMAIL001.maximumasp.com. Then MAXMAIL, which happens to be the same as mail.nsoftware.com, sent it to lizard.nsoftware.com (my local mail server). Let me explain:
The last Received header above is the first one added to the message, and is added by the senders mail server:
Received: from [92.85.134.32] by GUILLERMO id <5448835-66373>; Fri, 07 Nov 2003 22:12:28 +0100
Here, GUILLERMO is the senders mail server. "92.85.134.32" is the IP address from which the message was sent to this mail server. This particular header cannot be trusted, as spammers can fake Received headers.
Instead, I want to trust the line added by the recipients (my) Internet mail server. Look at the next Received header (the second one added):
Received: from GUILLERMO [200.47.102.143] by MAXMAIL001.maximumasp.com id AAF07E40036; Fri, 07 Nov 2003 16:13:20 -0500
Here, MAXMAIL001.maximumasp.com (my Internet mail server) is noting that it has received a message from GUILLERMO, the suspect mail server, and has also provided GUILLERMO's real IP address. This header can be trusted, since it comes from my own mail server. This is the header I am most interested in.
In order to determine IF this piece of mail is spam, I want to examine the IP address of origin (the place it arrives from when it comes to MY Internet mail server), which in this case is GUILLERMO (200.47.102.143).
So the code that goes inside the for loop above will find the Received header of interest, the one from the recipients Internet mail server, extract the IP address or hostname of the sending mail server, and then queries a spam blacklist. I won't get into the parsing code here, if you want to see it just download the sample project. Instead, I'll resume with querying the blacklist after I have extracted the IP address or hostname in question.
Internet Blacklists
Now that I know the IP address of the mail server that sent this email - I want to find out if this mail server has been used for sending spam.
For the purposes of this tutorial, I'm going to use a combination of 3 popular blacklists:
Blacklist | Query Zone Domain |
SpamCop | bl.spamcop.net |
DSBL | list.dsbl.org |
SpamHaus | sbl.spamhaus.org |
These blacklists can be queried using DNS requests for an IP4R hostname (IPv4 Reversed), which is the DNS lookup of a reversed IP addresses. This takes the form:
4.3.2.1.blacklist-domain
Where "4.3.2.1" is the reverse ip of the real ip "1.2.3.4", and "blacklist-domain" is the "zone domain" of the blacklist itself.
To clarify this, in order to perform an IP4R lookup of GUILLERMO (200.47.102.143) at spamhaus (sbl.spamhaus.org), perform the following DNS query:
143.102.47.200.sbl.spamhaus.org
sbl.spamhaus.org is the "zone domain" provided by SpamHaus. If it responds to this query with information other than something like "No Records Found", that means SpamHaus considers this IP address to be that of a spammer.
For this I've created a Boolean function called ListedInBL() which will return true if the sender is found in the blacklist. The first thing I need to do in this function is make sure I have an IP address - not a hostname. The DNS component can be used to easily determine the IP address of any hostname.
Private Function ListedInBL(ByVal ipaddress As String, ByVal blacklist As String, ByVal msgnum As Integer) As Boolean 'if the ipaddress contains any non-numerics, then 'I'll need to resolve it to an ipaddress If Regex.Match(ipaddress, "[^0-9.]").Success Then ipaddress = Resolve(ipaddress) End If
After I have an IP address I can create the ip4r address and perform the blacklist query. If the blacklist responds with a resolution of the IP address, I know that this server is blacklisted.
Dim hostaddress As String hostaddress = ip4r(ipaddress) + "." + blacklist hostaddress = Resolve(hostaddress) If hostaddress = "0.0.0.0" Then 'no results in table...this is a good thing! Return False Else 'this blacklist says this server is a spammer! Return True End If End Function
For the ip4r() function code, please download the sample project. As for the Resolve function - this takes advantage of the DNS component to perform a resolution and return the result.
Private Function Resolve(ByVal hostname As String) As String Dns1.DNSServer = txtDNSServer.Text 'the DNS component supports many types of query types, but 'I'm interested in "A" (address) records Dns1.QueryType = nsoftware.IPWorks.DnsQueryTypes.qtAddress 'NOTE: Some blacklists also keep TXT (qtText) DNS records 'With more information regarding the blacklisted domains Dns1.Query(hostname)
When the Query method is called, the DNS component will provide records for each response sent by the DNS server. If there are no records returned, that means no answer was available for that query.
If Dns1.RecordCount = 0 Then Return "0.0.0.0"
I'm only interested in the 1st field of the 1st record, because A records only have 1 field. All of the query types, record sources and field values are explained in the extensive documentation of the DNS component.
Dns1.FieldIndex = 1 Return Dns1.RecordFieldValue(1) End Function
What If This Is Spam?
If you determine that a message really is spam you can simply delete it. In the case of an automated spam killer application you may want to be careful as you could be deleting a message that you wanted to receive. Its possible that someone you know, or someone you would like to accept email from, is sending through a mail server that has been blacklisted because of someone else using that mail server.
If you are using an IMAP mail server, you can simply "flag" the suspected spam messages as deleted and allow the user to permanently purge these marked messages if and when they desire. In the sample project the user is allowed to optionally move all of these spam messages to a "spam" folder on their mail server.
If you are using a POP mail server, your options are more limited. POP servers do not include subfolders and messages do not have the "flag" capabilities of IMAP servers, messages can only be deleted, not modified.
The Results
Using this method on thousands of email messages we have been able to detect and delete over 95% of the received spam without a single false positive. Your results may vary, but this is a very fast and effective way to eliminate spam from your Inbox. The same techniques in this article can be extended to create Microsoft Outlook extensions, email proxies, or any number of custom solutions to curb the ever-growing amount of spam.
We appreciate your feedback. If you have any questions, comments, or suggestions about this article please contact our support team at kb@nsoftware.com.