We’ve all more than likely spotted tables in databases with no primary keys. But does a primary key always have to be defined something like…

ALTER TABLE MyTable ADD Id INTEGER NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED;

I think we have ORM to blame for this widespread practice of using auto-incrementing integers as PK’s.

In the ORM, these additional restrictions are placed on primary keys:

  • Primary keys should be anonymous integer or numeric identifiers.

source http://en.wikipedia.org/wiki/Unique_key

I’m not saying this isn’t generally sensible. Probably most of the primary keys I have ever defined have been of this type. This is the best choice if the key will be used regularly in table joins. But this need not always be the case. Lets take an example. The table below contains UK Postcodes and location information.

Postcode Table with Id

What’s my beef with this? Well, lets assume that this table is solely used for for location lookups by postcode. What is the purpose of Id? Will it ever be used in any joins to other tables? Very unlikely in my opinion. Why not change the Postcode column to be the primary key?

Postcode Table

I saw this exact situation in a live system a few years ago. The Id column was redundant but the Postcode column wasn’t even indexed. Not good in a table with 2.4+ Million records. Careful selection of your Primary Keys not only result in fewer redundant columns but will also help with your indexing strategy.

Primary Keys, when selected like this, also protect us from screwing up our data. The Postcode table above contained over 300K duplicated postcodes. I’ve seen this cause developers to eliminate duplicates with SQL SELECT DISTINCT. Lets fix our data, not our compose our queries around it causing performance hits!

For further reading see this article about the proper selection of a Primary Key.