Why you can’t afford NOT to have an online identity

In this age of social networking, there’s no limit to the selection of sites you could sign up for… MySpace, Facebook, Twitter, LinkedIn, Xanga, Blogger – the list goes on. From time to time, I run into the occasional person who absolutely refuses to so much as sign up for any of these sites.

They often cite fears of identity theft, lack of privacy, lack of control over the data, and so on. These fears are very legitimate, by no means unfounded, and for the most part, I generally would confirm that they are true. Being a member of a social networking site does, to some extent increase your risk of identity theft. Having all the appropriate information in one single place makes it all the easier for the would-be-thief to gather the necessary information. Further, we seem to forget that each site that we sign up for is ultimately run by people. Employees of these companies are often free to browse every detail of information that you have posted, even that which is supposedly kept private. And yes, it is very true, you give up some element of privacy when you make part of your life public, however limited that may be.

With that said, I would like to propose what I feel is a very strong reason why these individuals who refrain from online “social networking” should join as soon as possible:

A) Current statistics show that a very small percentage of identity theft results from an online data compromise. Take a look at some statistics from the BBB [1] Only 9% to 11% of all identity theft takes place from a “compromise” that took place on-line (and only a fraction of these were related to social networking). It is significantly more likely that your identity will be stolen by a friend, neighbor, in-home employee, family member or relative, than by a random stranger who stumbles across your profile on a social networking site.

B) Your data is already at risk anyway: If you use any part of today’s modern infrastructure; credit cards, driver’s licenses, colleges and universities, and even snail mail, your chances of having your identity stolen are the nearly the same as if you are a member of any major social networking site. A number of high profile cases recently have demonstrated this, as names, addresses, social security numbers, and credit card information has been stolen from hacked databases at universities, shopping retailers, and more. [2][3]

C) (My key point) Social Networking is a very established form of communication. NOT having an identity on one of these sites makes it trivial for someone else to impersonate you, by signing up an account under your own name. Using this semi-formal means of communication, they will be able to gain access to other real-life information about you. However, if you at least have an account under your name on each of the most popular sites, it is much less likely that someone will try to impersonate you on the same site, particularly if you are already “connected” with all your friends.

[1] The Better Business’ Bureau – Identity Theft Quiz
[2] 45.7 million credit and debit cards stolen – all these people did was go shopping
[3] 100,000 iPad owner’s identities stolen all these people did was own an iPad

Fitting a curve to three points

I recently had a need to fit a curve to three points. The geometry can get somewhat complex due to the limits of finite math, however, here is JAVA applet I’ve created with a relatively simple solution to the problem. The circle can be found using only basic math calculations and a single square root.

Drag the green points around to see how the curve fits different sets of points.

Source Code: (View) (Download)

Tower of Hanoi Animation

I recently made this animation in about a half an hour using the powerful 3D modeler Blender. This is demonstrating a solution to the Tower of Hanoi with the least possible number of moves when using 4 disks.

Storing a Directed Graph in SQL

I recently ran into the problem of having to store a directed graph in MySQL. Actually, I was trying to efficiently store a thesaurus in SQL so that it both used little space and had high performing lookups. A thesaurus, as you probably know, could best be represented as a graph of some type; perhaps as a tree (in a naive implementation), or a directed acyclic graph, or even a directed multigraph.

SQL, unfortunately, does not readily lend itself to these type of data structures at first glance. With some thought, however, one can readily store a tree or any other graph in SQL. Consider the following example:

directed-cyclic-graph

This quickly becomes much more complex as we want to attach definitions to words, and some words in the thesaurus may be the same ‘word’, but a differing class of speech, such as a noun, verb, etc.

How to store this in SQL? One possible solution is to build a node list and an adjacency list:

The node list:

ID Word Definition …
1392 dork nerd; stupid person …
3729 geek odd person; computer expert
7318 nerd geek, but lacking in social grace …
8504 tech computer guru …

The adjacency list:

src dst
1392 7318
3729 1392
3729 7318
3729 8504
7318 1392
7318 3729
7318 8504

Relatively fast and efficient lookups can now be done using SQL JOIN. While I’m sure this isn’t the most optimal structure for performance, it has to be close to the most efficient(*) means of storing these cross references.

I’ve set up a live demonstration of the thesaurus right here:

It contains 83,318 words and phrases, and 1,112,705 cross references for synonyms. Most uncached lookups take place on the order of less than a 1/10th of a second on this busy shared hosting server. Cached queries are virtually instantaneous. And of course, MySQL is not my area of expertise; I would imagine that this could be made even faster.


(*) The context of this problem was specifically storing a directed graph in SQL, not a thesaurus in SQL, nor a directed graph in general. If you’re looking for an optimal structure for a thesaurus or dictionary, you should look into a DAWG like this one: How to Create an Optimal DAWG

Objective-C Memory Management

Regardless of what your language background is, memory management in Objective-C can be a bit difficult to get a handle on. I won’t claim to be an expert on the topic, since I’m still learning myself, but here’s a few of the things I’ve learned while trying to unravel the complexity lately:

There’s a seeming mass of various methods with which you can message NSObjects and any class object that is extend from NSObject, including: alloc, dealloc, retain, release, autorelease, and finalize. But just what do all these mean?

First of all, you need to know that Objective-C has garbage collection capability. If you don’t know what garbage collection is, do a Google Search and familiarize yourself with the topic. Once you’re familiar with the wonders of garbage collection, you need to know that Objective-C may not always use it…

So, as best as I can tell, here’s what the above methods do:

  • alloc: Allocates a block of memory the size necessary for a new instance of an object, returning an uninitialized instance of the object – the object should be initialized using a class initializer after being allocated.
  • dealloc: Automatically called when a block of memory is to be released – ONLY if garbage collection is NOT enabled. Classes should release – not dealloc any objects allocated by the object from here, as they should no longer be needed. Finally, subclasses should also call the superclass dealloc method. This method should NOT be called manually
  • retain: Increments a “retain” count for the object – if the retain count of an object is greater than zero, it will not be garbage collected, autoreleased, dealloced, or finalized in any case.
  • release: Decrements a “retain” count for the object – when the retain count of an object becomes zero, it is free to be garbage collected, autoreleased, and dealloced or finalized.
  • autorelease: Adds the object to the “innermost” autorelease pool; the object will be released when the autorelease pool schedules the next release. When the schedule autorelease occurs, the “retain” count will be decremented and the object removed from the autorelease pool. Calling autorelease multiple times will add the object to the autorelease pool multiple times, and thus call release the same number of times before the object is removed from the pool. If the “retain” count reached zero, the object will be dealloced.

That’s the ugly… Is general, things are actually quite simple:

1) If an object is created using alloc, it will have a retain count of 1, and should be released using release when the object is no longer needed.

2) Calling retain will increment the retain counter; release will need to be called the same number of times to bring the retain count to pre-retain state. (Excluding autorelease scenarios)