COMPLETE GEEKOUT:MD5s as database keys
2007-01-09 07:47:00
'I had a client ask me today why we use these big, ugly, hashes as our database keys. They look like this: 7d768dd2641e45a1f14498b4513fc23c. A lot of databases just use an autonumber, meaning it adds 1 to the last entry in the database, so your records have kesy of 1, 2, 3, 4…
So, I gave him my answer, and I liked it so much I thought I'd share. If you're not a database programmer, prepare to be bored shitless.
The reason for the MD5 hash is twofold -- first comes from web programming where we don't want to have to insert a record, retrieve the record to get the key, and then go on to the next step (like inserting into a BUSINESS table and a CONTACT table where you need the BUSINESSID before you can create the associated CONTACT). The MD5 is extremely unique, and is generally generated with something like MD5(CONCAT(NOW(), [some random number])) and works great.
The second is a side effect in that it's a constant width which is handy for simple filters in queries. So, say you want to use a query for a list of all customer invoices. You can do a SELECT * from CUSTOMER, INVOICE… WHERE CSTMRID LIKE '%[variable]'. If the variable exists, you only get the one customer, and if it doesn't you get all customers. If you're using an integer, you would have to write two queries, one for = variable and one without the constraint.
And, I suppose there is a third consideration in security. A lot of web programming posts the database key on the address URL, which is more prone to hacking. If someone sees ID=3, they might be curious to see what they get with ID=4 and they might actually get something. It is nearly impossible to guess a valid MD5 database key.
So, we ended up standardizing on MD5 hashes for all our database keys no matter if it was truly necessary or not. It keeps it consistent, and then if you have some sort of trick you use that expects the hash, it will consistently work. Of course there are other ways to solve this in transactional databases, but mysql doesn't have the power and flexibility of Oracle.
Hope everyone had fun with that…