UUID performance in MySQL?

UUID performance in MySQL? - mysql

We're considering using UUID values as primary keys for our MySQL database. The data being inserted is generated from dozens, hundreds, or even thousands of remote computers and being inserted at a rate of 100-40,000 inserts per second, and we'll never do any updates.
The database itself will typically get to around 50M records before we start to cull data, so not a massive database, but not tiny either. We're also planing to run on InnoDB, though we are open to changing that if there is a better engine for what we're doing.
We were ready to go with Java's Type 4 UUID, but in testing have been seeing some strange behavior. For one, we're storing as varchar(36) and I now realize we'd be better off using binary(16) - though how much better off I'm not sure.
The bigger question is: how badly does this random data screw up the index when we have 50M records? Would we be better off if we used, for example, a type-1 UUID where the leftmost bits were timestamped? Or maybe we should ditch UUIDs entirely and consider auto_increment primary keys?
I'm looking for general thoughts/tips on the performance of different types of UUIDs when they are stored as an index/primary key in MySQL. Thanks!

At my job, we use UUID as PKs. What I can tell you from experience is DO NOT USE THEM as PKs (SQL Server by the way).
It's one of those things that when you have less than 1000 records it;s ok, but when you have millions, it's the worst thing you can do. Why? Because UUID are not sequential, so everytime a new record is inserted MSSQL needs to go look at the correct page to insert the record in, and then insert the record. The really ugly consequence with this is that the pages end up all in different sizes and they end up fragmented, so now we have to do de-fragmentation periodic.
When you use an autoincrement, MSSQL will always go to the last page, and you end up with equally sized pages (in theory) so the performance to select those records is much better (also because the INSERTs will not block the table/page for so long).
However, the big advantage of using UUID as PKs is that if we have clusters of DBs, there will not be conflicts when merging.
I would recommend the following model:
PK INT Identity
Additional column automatically generated as UUID.
This way, the merge process is possible (UUID would be your REAL key, while the PK would just be something temporary that gives you good performance).
NOTE: That the best solution is to use NEWSEQUENTIALID (like I was saying in the comments), but for legacy app with not much time to refactor (and even worse, not controlling all inserts), it is not possible to do.
But indeed as of 2017, I'd say the best solution here is NEWSEQUENTIALID or doing Guid.Comb with NHibernate.

A UUID is a Universally Unique ID. It's the universally part that you should be considering here.
Do you really need the IDs to be universally unique? If so, then UUIDs may be your only choice.
I would strongly suggest that if you do use UUIDs, you store them as a number and not as a string. If you have 50M+ records, then the saving in storage space will improve your performance (although I couldn't say by how much).
If your IDs do not need to be universally unique, then I don't think that you can do much better then just using auto_increment, which guarantees that IDs will be unique within a table (since the value will increment each time)

Something to take into consideration is that Autoincrements are generated one at a time and cannot be solved using a parallel solution. The fight for using UUIDs eventually comes down to what you want to achieve versus what you potentially sacrifice.
On performance, briefly:
A UUID like the one above is 36
characters long, including dashes. If
you store this VARCHAR(36), you're
going to decrease compare performance
dramatically. This is your primary
key, you don't want it to be slow.
At its bit level, a UUID is 128 bits,
which means it will fit into 16 bytes,
note this is not very human readable,
but it will keep storage low, and is
only 4 times larger than a 32-bit int,
or 2 times larger than a 64-bit int.
I will use a VARBINARY(16)
Theoretically, this can work without a
lot of overhead.
I recommend reading the following two posts:
Brian "Krow" Aker's Idle Thoughts - Myths, GUID vs Autoincrement
To UUID or not to UUID ?
I reckon between the two, they answer your question.

I tend to avoid UUID simply because it is a pain to store and a pain to use as a primary key but there are advantages. The main one is they are UNIQUE.
I usually solve the problem and avoid UUID by using dual key fields.
COLLECTOR = UNIQUE ASSIGNED TO A MACHINE
ID = RECORD COLLECTED BY THE COLLECTOR (auto_inc field)
This offers me two things. Speed of auto-inc fields and uniqueness of data being stored in a central location after it is collected and grouped together. I also know while browsing the data where it was collected which is often quite important for my needs.
I have seen many cases while dealing with other data sets for clients where they have decided to use UUID but then still have a field for where the data was collected which really is a waste of effort. Simply using two (or more if needed) fields as your key really helps.
I have just seen too many performance hits using UUID. They feel like a cheat...

Instead of centrally generating unique keys for each insertion, how about allocating blocks of keys to individual servers? When they run out of keys, they can request a new block. Then you solve the problem of overhead by connecting for each insert.
Keyserver maintains next available id
Server 1 requests id block.
Keyserver returns (1,1000)
Server 1 can insert a 1000 records until it needs to request a new block
Server 2 requests index block.
Keyserver returns (1001,2000)
etc...
You could come up with a more sophisticated version where a server could request the number of needed keys, or return unused blocks to the keyserver, which would then of course need to maintain a map of used/unused blocks.

I realize this question is rather old but I did hit upon it in my research. Since than a number of things happened (SSD are ubiquitous InnoDB got updates etc).
In my research I found this rather interesting post on performance:
claiming that due to the randomness of a GUID/UUID index trees can get rather unbalanced. in the MariaDB KB I found another post suggested a solution.
But since than the new UUID_TO_BIN takes care of this. This function is only available in MySQL (tested version 8.0.18) and not in MariaDB (version 10.4.10)
TL;DR: Store UUID as converted/optimized BINARY(16) values.

I would assign each server a numeric ID in a transactional manner.
Then, each record inserted will just autoincrement its own counter.
Combination of ServerID and RecordID will be unique.
ServerID field can be indexed and future select performance
based on ServerID (if needed) may be much better.

The short answer is that many databases have performance problems (in particular with high INSERT volumes) due to a conflict between their indexing method and UUIDs' deliberate entropy in the high-order bits. There are several common hacks:
choose a different index type (e.g. nonclustered on MSSQL) that doesn't mind it
munge the data to move the entropy to lower-order bits (e.g. reordering bytes of V1 UUIDs on MySQL)
make the UUID a secondary key with an auto-increment int primary key
... but these are all hacks--and probably fragile ones at that.
The best answer, but unfortunately the slowest one, is to demand your vendor improve their product so it can deal with UUIDs as primary keys just like any other type. They shouldn't be forcing you to roll your own half-baked hack to make up for their failure to solve what has become a common use case and will only continue to grow.

What about some hand crafted UID? Give each of the thousands of servers an ID and make primary key a combo key of autoincrement,MachineID ???

Since the primary key is generated decentralised, you don't have the option of using an auto_increment anyway.
If you don't have to hide the identity of the remote machines, use Type 1 UUIDs instead of UUIDs. They are easier to generate and can at least not hurt the performance of the database.
The same goes for varchar (char, really) vs. binary: it can only help matters. Is it really important, how much performance is improved?

The main case where UUIDs cause miserable performance is ...
When the INDEX is too big to be cached in the buffer_pool, each lookup tends to be a disk hit. For HDD, this can slow down the access by 10x or worse. (No, that is not a typo for "10%".) With SSDs, the slowdown is less, but still significant.
This applies to any "hash" (MD5, SHA256, etc), with one exception: A type-1 UUID with its bits rearranged.
Background and manual optimization: UUIDs
MySQL 8.0: see UUID_TO_BIN() and BIN_TO_UUID()
MariaDB 10.7 carries this further with its UUID datatype.

Related

What's the performance impact of using a varchar(190) as a primary key for all tables in a MYSQL database? [duplicate]

We're considering using UUID values as primary keys for our MySQL database. The data being inserted is generated from dozens, hundreds, or even thousands of remote computers and being inserted at a rate of 100-40,000 inserts per second, and we'll never do any updates.
The database itself will typically get to around 50M records before we start to cull data, so not a massive database, but not tiny either. We're also planing to run on InnoDB, though we are open to changing that if there is a better engine for what we're doing.
We were ready to go with Java's Type 4 UUID, but in testing have been seeing some strange behavior. For one, we're storing as varchar(36) and I now realize we'd be better off using binary(16) - though how much better off I'm not sure.
The bigger question is: how badly does this random data screw up the index when we have 50M records? Would we be better off if we used, for example, a type-1 UUID where the leftmost bits were timestamped? Or maybe we should ditch UUIDs entirely and consider auto_increment primary keys?
I'm looking for general thoughts/tips on the performance of different types of UUIDs when they are stored as an index/primary key in MySQL. Thanks!

At my job, we use UUID as PKs. What I can tell you from experience is DO NOT USE THEM as PKs (SQL Server by the way).
It's one of those things that when you have less than 1000 records it;s ok, but when you have millions, it's the worst thing you can do. Why? Because UUID are not sequential, so everytime a new record is inserted MSSQL needs to go look at the correct page to insert the record in, and then insert the record. The really ugly consequence with this is that the pages end up all in different sizes and they end up fragmented, so now we have to do de-fragmentation periodic.
When you use an autoincrement, MSSQL will always go to the last page, and you end up with equally sized pages (in theory) so the performance to select those records is much better (also because the INSERTs will not block the table/page for so long).
However, the big advantage of using UUID as PKs is that if we have clusters of DBs, there will not be conflicts when merging.
I would recommend the following model:
PK INT Identity
Additional column automatically generated as UUID.
This way, the merge process is possible (UUID would be your REAL key, while the PK would just be something temporary that gives you good performance).
NOTE: That the best solution is to use NEWSEQUENTIALID (like I was saying in the comments), but for legacy app with not much time to refactor (and even worse, not controlling all inserts), it is not possible to do.
But indeed as of 2017, I'd say the best solution here is NEWSEQUENTIALID or doing Guid.Comb with NHibernate.

A UUID is a Universally Unique ID. It's the universally part that you should be considering here.
Do you really need the IDs to be universally unique? If so, then UUIDs may be your only choice.
I would strongly suggest that if you do use UUIDs, you store them as a number and not as a string. If you have 50M+ records, then the saving in storage space will improve your performance (although I couldn't say by how much).
If your IDs do not need to be universally unique, then I don't think that you can do much better then just using auto_increment, which guarantees that IDs will be unique within a table (since the value will increment each time)

Something to take into consideration is that Autoincrements are generated one at a time and cannot be solved using a parallel solution. The fight for using UUIDs eventually comes down to what you want to achieve versus what you potentially sacrifice.
On performance, briefly:
A UUID like the one above is 36
characters long, including dashes. If
you store this VARCHAR(36), you're
going to decrease compare performance
dramatically. This is your primary
key, you don't want it to be slow.
At its bit level, a UUID is 128 bits,
which means it will fit into 16 bytes,
note this is not very human readable,
but it will keep storage low, and is
only 4 times larger than a 32-bit int,
or 2 times larger than a 64-bit int.
I will use a VARBINARY(16)
Theoretically, this can work without a
lot of overhead.
I recommend reading the following two posts:
Brian "Krow" Aker's Idle Thoughts - Myths, GUID vs Autoincrement
To UUID or not to UUID ?
I reckon between the two, they answer your question.

I tend to avoid UUID simply because it is a pain to store and a pain to use as a primary key but there are advantages. The main one is they are UNIQUE.
I usually solve the problem and avoid UUID by using dual key fields.
COLLECTOR = UNIQUE ASSIGNED TO A MACHINE
ID = RECORD COLLECTED BY THE COLLECTOR (auto_inc field)
This offers me two things. Speed of auto-inc fields and uniqueness of data being stored in a central location after it is collected and grouped together. I also know while browsing the data where it was collected which is often quite important for my needs.
I have seen many cases while dealing with other data sets for clients where they have decided to use UUID but then still have a field for where the data was collected which really is a waste of effort. Simply using two (or more if needed) fields as your key really helps.
I have just seen too many performance hits using UUID. They feel like a cheat...

Instead of centrally generating unique keys for each insertion, how about allocating blocks of keys to individual servers? When they run out of keys, they can request a new block. Then you solve the problem of overhead by connecting for each insert.
Keyserver maintains next available id
Server 1 requests id block.
Keyserver returns (1,1000)
Server 1 can insert a 1000 records until it needs to request a new block
Server 2 requests index block.
Keyserver returns (1001,2000)
etc...
You could come up with a more sophisticated version where a server could request the number of needed keys, or return unused blocks to the keyserver, which would then of course need to maintain a map of used/unused blocks.

I realize this question is rather old but I did hit upon it in my research. Since than a number of things happened (SSD are ubiquitous InnoDB got updates etc).
In my research I found this rather interesting post on performance:
claiming that due to the randomness of a GUID/UUID index trees can get rather unbalanced. in the MariaDB KB I found another post suggested a solution.
But since than the new UUID_TO_BIN takes care of this. This function is only available in MySQL (tested version 8.0.18) and not in MariaDB (version 10.4.10)
TL;DR: Store UUID as converted/optimized BINARY(16) values.

I would assign each server a numeric ID in a transactional manner.
Then, each record inserted will just autoincrement its own counter.
Combination of ServerID and RecordID will be unique.
ServerID field can be indexed and future select performance
based on ServerID (if needed) may be much better.

The short answer is that many databases have performance problems (in particular with high INSERT volumes) due to a conflict between their indexing method and UUIDs' deliberate entropy in the high-order bits. There are several common hacks:
choose a different index type (e.g. nonclustered on MSSQL) that doesn't mind it
munge the data to move the entropy to lower-order bits (e.g. reordering bytes of V1 UUIDs on MySQL)
make the UUID a secondary key with an auto-increment int primary key
... but these are all hacks--and probably fragile ones at that.
The best answer, but unfortunately the slowest one, is to demand your vendor improve their product so it can deal with UUIDs as primary keys just like any other type. They shouldn't be forcing you to roll your own half-baked hack to make up for their failure to solve what has become a common use case and will only continue to grow.

What about some hand crafted UID? Give each of the thousands of servers an ID and make primary key a combo key of autoincrement,MachineID ???

Since the primary key is generated decentralised, you don't have the option of using an auto_increment anyway.
If you don't have to hide the identity of the remote machines, use Type 1 UUIDs instead of UUIDs. They are easier to generate and can at least not hurt the performance of the database.
The same goes for varchar (char, really) vs. binary: it can only help matters. Is it really important, how much performance is improved?

The main case where UUIDs cause miserable performance is ...
When the INDEX is too big to be cached in the buffer_pool, each lookup tends to be a disk hit. For HDD, this can slow down the access by 10x or worse. (No, that is not a typo for "10%".) With SSDs, the slowdown is less, but still significant.
This applies to any "hash" (MD5, SHA256, etc), with one exception: A type-1 UUID with its bits rearranged.
Background and manual optimization: UUIDs
MySQL 8.0: see UUID_TO_BIN() and BIN_TO_UUID()
MariaDB 10.7 carries this further with its UUID datatype.

Opinions please - switching legacy DB from CHAR(14) PKs to INT

I am administering a MySQL db for a payments processing system. For various legacy reasons it was originally built using CHAR(14) for many of the primary keys, which store a sequential ID based on a prefix identifying the type of data followed by a base36 encoded string representing a large number in sequence, e.g.
'PA00003NFMWHMQ' translating to 'payment 286103946050'
The advantage here is a semi-unique key that is still sequential, disadvantage being large values used for both clustered and non-clustered indexes, slowing down joins and lookups and requiring extra memory/storage.
I'm considering migrating them all to integers prior to releasing an API although I like the uniqueness too. I'm also wary of premature optimisation.
I'm not looking for a definite answer here, only some experienced opinions.
Thanks!

My first thought is "are you going to have to hang onto this ID for backwards compatibility anyway?" IDs that have meaning, like yours, tend to get stored and referenced in external systems. Will you wind up with a table that has an integer primary key for internal use and a char(14) legacy id and two indexes? That may still be an improvement, but it impacts whether this change is worth it. Keep this in mind in the rest of my commentary.
If you can switch completely to auto incremented integers and get rid of special ID generation code, that should certainly make things simpler and inserts faster. How much simpler and faster you need to determine. Is it just one extra function somewhere deep in the creation code that doesn't bother anybody? Or is does it impact the code and design all over the place?
...disadvantage being large values used for both clustered and non-clustered indexes, slowing down joins and lookups and requiring extra memory/storage.
As with any performance claim, the first thing would be to investigate if they're true. Is a char(14) key really slowing down joins and consuming memory and storage?
A char(14) (14 bytes) isn't much larger than an integer (4 bytes). The extra 10 bytes per row is just 10 MB per million records. But that's just to store the key. Every reference adds another 10 bytes. And every index featuring it another 10+ bytes. Still, I wouldn't assume this is a major storage and memory issue without measuring it.
Disk and memory are generally much cheaper than developer time. This doesn't mean to be wasteful, but consider whether saving a few gigs is worth however long this is likely to take (and the testing). Or if you can buy a bigger disk and more memory instead. For example, I have one project that could benefit from using enum fields instead of strings. But I haven't bothered because that would mean more developer time to make the change, and also to maintain the enum field. Instead it's still cheaper to pay for extra disk. That may change, and when it does I'll reconsider.
Similar with joins. If they're indexed, they should perform well regardless of whether it's a char or int. But you need testing.
I'd suggest you make a sanitized copy of the database, or generate one of decent size perhaps using your test factories, and run some performance tests with char(14) and with int. Be sure to test reality and whether this change will have a real impact on performance. Just running bare SQL queries may give you an outsized impression of their impact on performance. Also call the real functions you'd use in production, they might be swamping any SQL impact.
'PA00003NFMWHMQ' translating to 'payment 286103946050'
I'm considering migrating them all to integers prior to releasing an API
Exposing primary keys (or any other piece of implementation information) to the outside world has security and compatibility considerations. Its knowledge an attacker can use, for example they can predict what the next key will be. Don't do it.
Instead, assign each thing you're exposing a random API ID like a UUIDv4 (don't use MySQL's UUID function, it is guessable UUIDv1). Store them as binary(16) if space is a big concern.
Then it doesn't matter what your primary key is. You can change your design when you like.
The advantage here is a semi-unique key that is still sequential...
This is a puzzler. Primary keys must be unique, so I'm not sure what you mean by "semi-unique". Do you mean across tables? That the ID of a row in column A is probably unique from the row in column B? If that's the case, consider UUID primary keys. Or consider whether this is truly an advantage that you can actually use because of the semi part.

MYSQL: What is the impact of varchar length on performance when used a primary key? [duplicate]

What would be the performance penalty of using strings as primary keys instead of bigints etc.? String comparison is much more expensive than integer comparison, but on the other hand I can imagine that internally a DBMS will compute hash keys to reduce the penalty.
An application that I work on uses strings as primary keys in several tables (MySQL). It is not trivial to change this, and I'd like to know what can be gained performance wise to justify the work.

on the other hand I can imagine that
internally a DBMS will compute hash
keys to reduce the penalty.
The DB needs to maintain a B-Tree (or a similar structure) with the key in a way to have them ordered.
If the key is hashed and stored it in the B-Tree that would be fine to check rapidly the uniqueness of the key -- the key can still be looked up efficiently. But you would not be able to search efficient for range of data (e.g. with LIKE) because the B-Tree is no more ordered according to the String value.
So I think most DB really store the String in the B-Tree, which can (1) take more space than numeric values and (2) require the B-Tree to be re-balanced if keys are inserted in arbitrary order (no notion of increasing value as with numeric pk).
The penalty in practice can range from insignificant to huge. It all depends on the usage, the number of rows, the average size of the string key, the queries which join table, etc.

In our product we use varchar(32) for primary keys (GUIDs) and we haven't met performance issues of this. Our product is a web site with extreme overload and is critical to be stable.
We use SQL Server 2005.
Edit: In our biggest tables we have more than 3 000 000 records with lots of inserts and selects from them. I think in general, the benefit of migrating to int key will be very low, but the problems while migrating very high.

One thing to watch out for is page splits (I know this can happen in SQL Server - probably the same in MySQL).
Primary keys are physically ordered. By using an auto-increment integer you guarantee that each time you insert you are inserting the next number up, so there is no need for the db to reorder the keys. If you use strings however, the pk you insert may need to be placed in the middle of the other keys to maintain the pk order. That process of reordering the pks on the insert can get expensive.

It depends on several factors: RDBMS, number of indexes involving those columns but in general it will be more efficient using ints, folowed by bigints.
Any performance gains depend on usage, so without concrete examples of table schema and query workload it is hard to say.
Unless it makes sense in the domain (I'm thinking unique something like social security number), a surrogate integer key is a good choice; referring objects do not need to have their FK reference updated when the referenced object changes.

Should a primary key necessarily be auto-incrementing when I'm sure it is and will always be unique?

I've looked for a satisfying answer a tad more specific to my particular problem for a while now, but to avail. Whether I'm just not looking at the right places or not, I don't know, but here goes:
I'm pulling data from an application that afterwards is manipulated and sent to my own server. Amongst the data pulled is an, originally in the application's database, auto-incremented identifier. An example of this identifier I just now retrieved is 955534861. Isn't it better and more effective design to not auto-increment my primary key and just use the value I know is and will always stay unique, or should I look into concepts such as surrogate keys?
Thanks in advance.

The situation you describe resembles my primary job which is maintaining a data warehouse. We get data from other systems and store it.
Something that happens to us is that these "other systems" change. That leads to possibilities that the new version of the "other system" will duplicate the unique identifier from the previous system. We deal with this by adding something to that record in our data warehouse to guarantee it's uniqueness. It might be a field to identify the source system or it might be a date. It is never an autogenerated number.
If there is any chance of this this happening to you, you might want to expand your options.

If there is a natural key in your model, you cannot replace it by creating a surrogate key.
You can only add a surrogate key and keep the existing natural key, which has its pros and cons, as described here.

This'll get a little nerdy, but bear with me:
As long as a key value is unique, it'll serve its function. But for performance, you ideally want that key value to be as short as possible.
GUIDs are commonly used, because they are statistically highly unlikely to ever be repeated. But that comes at the expense of size: they are 128 bits long, which makes them longer than a machine word. To compare two GUIDs (as must be repeatedly done when sorting, or migrating down a b-tree for indexes) will take multiple processor intructions to load and compare the values. And they will consume more memory when cached into memory.
The advantage of auto-incrementing key values is that
They are guaranteed to be unique. Proxy index values are only predicted to be unique.
Because they will have full value coverage over the range of their underlying datatype, the most compact possible type may be used. This makes for smaller indexes and more efficient compare operations
Because the smallest possible type can be used, more index values can be stored on a single database page, which means you're more likely to get a cache hit when searching or joining on that value. That means that peformance will be--all other things being equal--somewhat better.
On most databases, auto-incrementing keys are worked into the database engine, so there is very small overhead in generating them.
If you employ a clustered index on your key value, new record inserts are less likely to require a random disk seek, and more likely to be read during read-ahead, so if you do any kind of sequential processing or lookup based on that key, it'll probably run faster.

The primary key, typically an auto-incrementing ID, is what MySQL uses as a row identifier as well, so it should be left alone. If you need a secondary key that's generated by your application for some other purpose, you may want to add that as another column with a UNIQUE index on it.
In other databases where there's a proper row identifier mechanism, this is less of an issue.

Optimizing Innodb table indexes with GUID/UUID keys

I have an InnoDB based schema with roughly 100 tables, most use GUID/UUID's as the primary key. I started this at a point in time where I didn't really understand the implications of a UUID PK with regard to Disk IO and fragmentation, but wanted the benefits of avoiding a single key dispenser when dealing with server clusters. We're not currently dealing with large numbers of rows, but we will be (in the hundreds of millions) and I would like to be prepared for that.
Now that I understand indexing in InnoDB better, specifically the clustered nature of the primary key, I can see that my UUID's are a poor choice for scalability from a DISK IO perspective, but I don't want to stop using them due to the server clustering requirement.
The accepted/recommended solution seems to be a mix of Autoincrement PK (INT|BIGINT), with UNIQUE Indexed UUID keys. My intention is to add a new first column ai_col to each table and assign it as the new PK, I'm taking queues from:
http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html
I would then update/recreate a new "UNIQUE" index on my UUID keys and continue to use them in our application layer.
My expectation is that once this is done that I can essentially ignore the ai_col and everything else runs business as usual. InnoDB will have a relatively small int based PK from which to cluster on and append to the other unique indexes.
Question 1: Am I correct in assuming that in this new scenario, I can have my cake and eat it too?
The follow up question is with regard to smaller 'associational' tables, i.e. Only two columns, both Foreign Keys to other tables joining them implicitly. In these cases I have typically two indexes, one being a UNIQUE two column index with the more heavily used column first, then a second single index on the other column. I know that this is essentially 2.5x as large as the actual row data, but it seems to really help our more complex queries during optimization, and is on smaller tables so relatively acceptable.
Most of these associational tables will only be a fraction the number of records in the primary tables because they're typically more specific, however, there are a few cases where these have many multiples the number of records as their foreign parents, i.e. potentially billions.
Question 2: Is it a good idea to add the numeric PK's to these tables as well? I'm guessing that the answer will be something along the lines of "Benchtest it" but I'm just looking for helpful nuggets of wisdom.
If I've obviously mis-interpreted anything or you can offer insights that I may not be considering, I'd really appreciate that too!
Many thanks!
EDIT: As promised in the answer, I just wanted to follow up for anyone interested... This solution has worked famously :) Read and write performance increased across the board, and so far it's been tested up to about 6 billion i/o's / month, without breaking a sweat.

Without any other suggestions, confirmations, or otherwise, I've begun testing on our dev server with a number of less used tables but ones that would be affected none the less if the new AI based id's were going to affect our application layer.
So far it's looking good, indexes are performing as expected and the new table fields haven't required any changes to our application layer, we've been basically able to ignore them.
I haven't run any thorough bench testing though to test the actual Disk IO under heavy load but from the sheer amount of information out there on the subject, I can surmise that we're in good shape for scaling up.
Once this has been in place for a while I'll drop in a follow up in case anyone's in the same boat we were.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008