An InnoDB table must have a primary key (one is created if you don’t do it yourself). You may have a natural key at hand. Stop! Allow me to suggest an AUTO_INCREMENT may be better.
Why should one add an AUTO_INCREMENT PRIMARY KEY on a table on which there’s a natural key? Isn’t an AUTO_INCREMENT a pseudo key, meaning, it doesn’t have any explicit relation to the row data, other than it is a number and unique?
Yes, indeed so. Nevertheless, consider:
- Natural keys are many times multi-columned.
- Multi column PRIMARY KEYs make for larger keys, and make for bloated secondary keys as well. You may be wasting space for storing the additional AUTO_INCREMENT column, but you may gain space back on secondary keys.
- Multi column PRIMARY KEYs make for more locks. See also this post.
- InnoDB INSERTs work considerably faster when worked in ascending PRIMARY KEY order. Can you ensure your natural key is in such order?
- Even though an AUTO_INCREMENT makes for an INSERT bottleneck (values must be given serially), it is in particular helpful to InnoDB by ensuring PRIMARY KEY values are in ascending order.
- AUTO_INCEMENT makes for chronological resolution. You know what came first, and what came next.
- In many datasets, more recent entries are often being accessed more, and are therefore “hotter”. By using AUTO_INCREMENT, you’re ensuring that recent entries are grouped together within the B+ Tree. This means less random I/O when looking for recent data.
- A numerical key is in particular helpful in splitting your table (and tasks on your table) into smaller chunks. I write tools which can work out with any PRIMARY KEY combination, but it’s easier to work with numbers.
Great Post!
In 5.0 there were some issues with auto_inc performance, so this might only be a good practice if you’re using 5.1 or 5.5
Autinc is also tunable:
http://dev.mysql.com/doc/refman/5.5/en/innodb-auto-increment-handling.html
Also keep in mind that one of the things you note could be a reason FOR using Natural PRIMARY KEYs. That is to say, if the fields a query requests are only in the PRIMARY KEY and one SECONDARY KEY, then it is possible to serve the query entirely from the SECONDARY KEY.
This is all true, but if you have to create a secondary key on the natural key order to access the table AND you mainly access the table by the primary key, then it might make sense to organize the table around the natural key.
You will end up significantly increasing random IO once a table becomes large if you use a surrogate key but don’t access the data via it. Of course, the adaptive hash index could help with this, but it isn’t guaranteed if your access pattern can’t utilize it (perhaps range scans)..
I mean to say “mainly access the table by the ‘natural key'” then it might make sense to use it.
An auto increment PK is needed for m2m join tables?