MAIL FROM vs From vs Sender – exploiting SPF

After 10 years of managing computer networks, I still learn something new almost every day. This week I inadvertently discovered that there were many more ways of “identifying” the sender of an e-mail that I had ever known (or in this case, of spoofing the sender an an e-mail).

Despite having multiple layers of SPAM protection, including strict SPF records and advanced content filtering, a particular set of SPAM was cleanly passing every layer of protection – this set of spam claimed, no less, to have originated internally (Clearly, this was not the case).

So, here’s how mail is identified:

A) MAIL FROM: (Also known as the envelope address)
B) Mail headers:
B1) From: (Generally what you see in your mail client)
B2) Sender:
B3) Reply-To: (Hitting “Reply” will usually go to this address)
B4) Resent-From:
B5) Resent-Sender:
B6) Resent-Reply-To:

The SPF specifications look explicitly at the envelope address; which is rarely ever seen by more than the e-mail relays. So, some clever spammer has been sending out a deluge of e-mail with forged mail headers that all match our own e-mail domain, but sending a different identity in the MAIL FROM – which conveniently passed SPF with flying colors. The e-mail client then looked at the forged headers, which contained our own (white listed) domain. The rest is history.

… or was, until I added a few lines of code to our mimedefang-filter. Any e-mails with headers claiming to be sent from our own domain are now cross-checked against the MAIL FROM:. If the domains don’t match; poof, the message is silently discarded.

Why you can’t afford NOT to have an online identity

In this age of social networking, there’s no limit to the selection of sites you could sign up for… MySpace, Facebook, Twitter, LinkedIn, Xanga, Blogger – the list goes on. From time to time, I run into the occasional person who absolutely refuses to so much as sign up for any of these sites.

They often cite fears of identity theft, lack of privacy, lack of control over the data, and so on. These fears are very legitimate, by no means unfounded, and for the most part, I generally would confirm that they are true. Being a member of a social networking site does, to some extent increase your risk of identity theft. Having all the appropriate information in one single place makes it all the easier for the would-be-thief to gather the necessary information. Further, we seem to forget that each site that we sign up for is ultimately run by people. Employees of these companies are often free to browse every detail of information that you have posted, even that which is supposedly kept private. And yes, it is very true, you give up some element of privacy when you make part of your life public, however limited that may be.

With that said, I would like to propose what I feel is a very strong reason why these individuals who refrain from online “social networking” should join as soon as possible:

A) Current statistics show that a very small percentage of identity theft results from an online data compromise. Take a look at some statistics from the BBB [1] Only 9% to 11% of all identity theft takes place from a “compromise” that took place on-line (and only a fraction of these were related to social networking). It is significantly more likely that your identity will be stolen by a friend, neighbor, in-home employee, family member or relative, than by a random stranger who stumbles across your profile on a social networking site.

B) Your data is already at risk anyway: If you use any part of today’s modern infrastructure; credit cards, driver’s licenses, colleges and universities, and even snail mail, your chances of having your identity stolen are the nearly the same as if you are a member of any major social networking site. A number of high profile cases recently have demonstrated this, as names, addresses, social security numbers, and credit card information has been stolen from hacked databases at universities, shopping retailers, and more. [2][3]

C) (My key point) Social Networking is a very established form of communication. NOT having an identity on one of these sites makes it trivial for someone else to impersonate you, by signing up an account under your own name. Using this semi-formal means of communication, they will be able to gain access to other real-life information about you. However, if you at least have an account under your name on each of the most popular sites, it is much less likely that someone will try to impersonate you on the same site, particularly if you are already “connected” with all your friends.

[1] The Better Business’ Bureau – Identity Theft Quiz
[2] 45.7 million credit and debit cards stolen – all these people did was go shopping
[3] 100,000 iPad owner’s identities stolen all these people did was own an iPad

Fitting a curve to three points

I recently had a need to fit a curve to three points. The geometry can get somewhat complex due to the limits of finite math, however, here is JAVA applet I’ve created with a relatively simple solution to the problem. The circle can be found using only basic math calculations and a single square root.

Drag the green points around to see how the curve fits different sets of points.

Source Code: (View) (Download)

Tower of Hanoi Animation

I recently made this animation in about a half an hour using the powerful 3D modeler Blender. This is demonstrating a solution to the Tower of Hanoi with the least possible number of moves when using 4 disks.

Storing a Directed Graph in SQL

I recently ran into the problem of having to store a directed graph in MySQL. Actually, I was trying to efficiently store a thesaurus in SQL so that it both used little space and had high performing lookups. A thesaurus, as you probably know, could best be represented as a graph of some type; perhaps as a tree (in a naive implementation), or a directed acyclic graph, or even a directed multigraph.

SQL, unfortunately, does not readily lend itself to these type of data structures at first glance. With some thought, however, one can readily store a tree or any other graph in SQL. Consider the following example:


This quickly becomes much more complex as we want to attach definitions to words, and some words in the thesaurus may be the same ‘word’, but a differing class of speech, such as a noun, verb, etc.

How to store this in SQL? One possible solution is to build a node list and an adjacency list:

The node list:

ID Word Definition …
1392 dork nerd; stupid person …
3729 geek odd person; computer expert
7318 nerd geek, but lacking in social grace …
8504 tech computer guru …

The adjacency list:

src dst
1392 7318
3729 1392
3729 7318
3729 8504
7318 1392
7318 3729
7318 8504

Relatively fast and efficient lookups can now be done using SQL JOIN. While I’m sure this isn’t the most optimal structure for performance, it has to be close to the most efficient(*) means of storing these cross references.

I’ve set up a live demonstration of the thesaurus right here:

It contains 83,318 words and phrases, and 1,112,705 cross references for synonyms. Most uncached lookups take place on the order of less than a 1/10th of a second on this busy shared hosting server. Cached queries are virtually instantaneous. And of course, MySQL is not my area of expertise; I would imagine that this could be made even faster.

(*) The context of this problem was specifically storing a directed graph in SQL, not a thesaurus in SQL, nor a directed graph in general. If you’re looking for an optimal structure for a thesaurus or dictionary, you should look into a DAWG like this one: How to Create an Optimal DAWG