I recently ran into the problem of having to store a directed graph in MySQL. Actually, I was trying to efficiently store a thesaurus in SQL so that it both used little space and had high performing lookups. A thesaurus, as you probably know, could best be represented as a graph of some type; perhaps as a tree (in a naive implementation), or a directed acyclic graph, or even a directed multigraph.
SQL, unfortunately, does not readily lend itself to these type of data structures at first glance. With some thought, however, one can readily store a tree or any other graph in SQL. Consider the following example:
This quickly becomes much more complex as we want to attach definitions to words, and some words in the thesaurus may be the same ‘word’, but a differing class of speech, such as a noun, verb, etc.
How to store this in SQL? One possible solution is to build a node list and an adjacency list:
The node list:
|1392||dork||nerd; stupid person …|
|3729||geek||odd person; computer expert|
|7318||nerd||geek, but lacking in social grace …|
|8504||tech||computer guru …|
The adjacency list:
Relatively fast and efficient lookups can now be done using SQL JOIN. While I’m sure this isn’t the most optimal structure for performance, it has to be close to the most efficient(*) means of storing these cross references.
I’ve set up a live demonstration of the thesaurus right here:
It contains 83,318 words and phrases, and 1,112,705 cross references for synonyms. Most uncached lookups take place on the order of less than a 1/10th of a second on this busy shared hosting server. Cached queries are virtually instantaneous. And of course, MySQL is not my area of expertise; I would imagine that this could be made even faster.
(*) The context of this problem was specifically storing a directed graph in SQL, not a thesaurus in SQL, nor a directed graph in general. If you’re looking for an optimal structure for a thesaurus or dictionary, you should look into a DAWG like this one: How to Create an Optimal DAWG