When to fix auto-increment gaps in MYSQL - mysql

The database I am working on right now has records being added and removed by the 10s of thousands and because of this there are gaps in the auto-incremented key hundreds of thousands big and auto-increment numbers well into the billions.
These numbers are never stored to reference the individual record but are used to reference the record when doing on-the-fly calculations.
Is there any reason to remove these gaps and reset the auto-increment number or is this inconsequential?
The id field is an unsigned int, should I increase it to an unsigned big int? From what I understand, right now if it hits 4,294,967,295 it will break.

The only reason I'd worry about it is if you find yourself close to that 2^32 limit. If you're not using the column as a row id, then don't even worry about it.
EDIT If you are using this column for any kind of identifying information, then I'd switch the column over to a GUID or something, because you're gonna get overflow, and then you'll get duplicate values. And that's no bueno.

I don't know what the growth rate of your autoincrement field is, but it should be simple math for you to estimate when you will hit the 4294967295 limit.
If you still feel that you need to do something, you have the following options:
reset the current count to 1. Do this ideally by dropping the column and recreating it. Since you are not using this field for referential integrity, should be a quick and simple fix until the next time...
Change the datatype to an unsigned BIGINT. Now you can go up to 18446744073709551615. But you need more space in the heap to store this increased amount of data, and you have only postponed your problem.
Change from an autoincrement (INT / BIGINT) to a UUID. Then you can stop worrying about numbers and the nature of infinity, but you most likely will have to change all of your client code.
On a separate note, I sense a poor decision or two somewhere earlier up the line here.

Related

How to fix values missed by MYSQL auto_increment

I have a MYSQL table, where (to an already existing table) I added another column "Number" that is auto_incremented and has a UNIQUE KEY constraint.
There are 17000+ records in the table. After adding the "Number" column, one value is missed - there is a value of 14 369 and the next one is 14 371.
I tried removing the column and adding it again, but the missing value is still missing.
What might be the problem, and what is the least painfull way to solve this?
There is no problem and there is nothing to fix.
MySQL's auto_increment provides unique values, and it calculates them using sequential increment algorithm (it just increments a number).
That algorithm guarantees the fastest and accurate way of generating unique values.
That's its job. It doesn't "reuse" numbers and forcing it to do so comes with disastrous performance and stability.
Since queries do fail sometimes, these numbers get "lost" and you can't have them back.
If you require sequential numbers for whatever reason, create a procedure or scheduled event and maintain the numbers yourself.
You have to bear in mind that MySQL is a transactional database designed to operate under concurrent access. If it were to reuse these numbers, the performance would be abysmal since it'd have to use locks and force people to wait until it reorganizes the numbers.
InnoDB engine, the default engine, uses primary key values to organize records on the hard drive. If you were to change any of the values, it would start re-writing the records incurring a HUGE I/O wait that depends on the amount of data on the disk - it could bring the whole serve to a grinding halt.
TL:DR; there is no problem, there is nothing to fix, don't do it. If you persist, expect abnormal behavior.

MySQL PRIMARY KEYs: UUID / GUID vs BIGINT (timestamp+random)

tl;dr: Is assigning rows IDs of {unixtimestamp}{randomdigits} (such as 1308022796123456) as a BIGINT a good idea if I don't want to deal with UUIDs?
Just wondering if anyone has some insight into any performance or other technical considerations / limitations in regards to IDs / PRIMARY KEYs assigned to database records across multiple servers.
My PHP+MySQL application runs on multiple servers, and the data needs to be able to be merged. So I've outgrown the standard sequential / auto_increment integer method of identifying rows.
My research into a solution brought me to the concept of using UUIDs / GUIDs. However the need to alter my code to deal with converting UUID strings to binary values in MySQL seems like a bit of a pain/work. I don't want to store the UUIDs as VARCHAR for storage and performance reasons.
Another possible annoyance of UUIDs stored in a binary column is the fact that rows IDs aren't obvious when looking at the data in PhpMyAdmin - I could be wrong about this though - but straight numbers seem a lot simpler overall anyway and are universal across any kind of database system with no conversion required.
As a middle ground I came up with the idea of making my ID columns a BIGINT, and assigning IDs using the current unix timestamp followed by 6 random digits. So lets say my random number came about to be 123456, my generated ID today would come out as: 1308022796123456
A one in 10 million chance of a conflict for rows created within the same second is fine with me. I'm not doing any sort of mass row creation quickly.
One issue I've read about with randomly generated UUIDs is that they're bad for indexes, as the values are not sequential (they're spread out all over the place). The UUID() function in MySQL addresses this by generating the first part of the UUID from the current timestamp. Therefore I've copied that idea of having the unix timestamp at the start of my BIGINT. Will my indexes be slow?
Pros of my BIGINT idea:
Gives me the multi-server/merging advantages of UUIDs
Requires very little change to my application code (everything is already programmed to handle integers for IDs)
Half the storage of a UUID (8 bytes vs 16 bytes)
Cons:
??? - Please let me know if you can think of any.
Some follow up questions to go along with this:
Should I use more or less than 6 random digits at the end? Will it make a difference to index performance?
Is one of these methods any "randomer" ?: Getting PHP to generate 6 digits and concatenating them together -VS- getting PHP to generate a number in the 1 - 999999 range and then zerofilling to ensure 6 digits.
Thanks for any tips. Sorry about the wall of text.
I have run into this very problem in my professional life. We used timestamp + random number and ran into serious issues when our applications scaled up (more clients, more servers, more requests). Granted, we (stupidly) used only 4 digits, and then change to 6, but you would be surprised how often that the errors still happen.
Over a long enough period of time, you are guaranteed to get duplicate key errors. Our application is mission critical, and therefore even the smallest chance it could fail to due inherently random behavior was unacceptable. We started using UUIDs to avoid this issue, and carefully managed their creation.
Using UUIDs, your index size will increase, and a larger index will result in poorer performance (perhaps unnoticeable, but poorer none-the-less). However MySQL supports a native UUID type (never use varchar as a primary key!!), and can handle indexing, searching,etc pretty damn efficiently even compared to bigint. The biggest performance hit to your index is almost always the number of rows indexed, rather than the size of the item being index (unless you want to index on a longtext or something ridiculous like that).
To answer you question: Bigint (with random numbers attached) will be ok if you do not plan on scaling your application/service significantly. If your code can handle the change without much alteration and your application will not explode if a duplicate key error occurs, go with it. Otherwise, bite-the-bullet and go for the more substantial option.
You can always implement a larger change later, like switching to an entirely different backend (which we are now facing... :P)
You can manually change the autonumber starting number.
ALTER TABLE foo AUTO_INCREMENT = ####
An unsigned int can store up to 4,294,967,295, lets round it down to 4,290,000,000.
Use the first 3 digits for the server serial number, and the final 7 digits for the row id.
This gives you up to 430 servers (including 000), and up to 10 million IDs for each server.
So for server #172 you manually change the autonumber to start at 1,720,000,000, then let it assign IDs sequentially.
If you think you might have more servers, but less IDs per server, then adjust it to 4 digits per server and 6 for the ID (i.e. up to 1 million IDs).
You can also split the number using binary digits instead of decimal digits (perhaps 10 binary digits per server, and 22 for the ID. So, for example, server 76 starts at 2^22*76 = 318,767,104 and ends at 322,961,407).
For that matter you don't even need a clear split. Take 4,294,967,295 divide it by the maximum number of servers you think you will ever have, and that's your spacing.
You could use a bigint if you think you need more identifiers, but that's a seriously huge number.
Use the GUID as a unique index, but also calculate a 64-bit (BIGINT) hash of the GUID, store that in a separate NOT UNIQUE column, and index it. To retrieve, query for a match to both columns - the 64-bit index should make this efficient.
What's good about this is that the hash:
a. Doesn't have to be unique.
b. Is likely to be well-distributed.
The cost: extra 8-byte column and its index.
If you want to use the timestamp method then do this:
Give each server a number, to that append the proccess ID of the application that is doing the insert (or the thread ID) (in PHP it's getmypid()), then to that append how long that process has been alive/active for (in PHP it's getrusage()), and finally add a counter that starts at 0 at the start of each script invocation (i.e. each insert within the same script adds one to it).
Also, you don't need to store the full unix timestamp - most of those digits are for saying it's year 2011, and not year 1970. So if you can't get a number saying how long the process was alive for, then at least subtract a fixed timestamp representing today - that way you'll need far less digits.

MYSQL autoincrement a column or just have an integer, difference?

If I have a column, set as primary index, and set as INT.
If I don't set it as auto increment and just insert random integers which are unique into it, does that slow down future queries compared to autincrementing?
Does it speed things up if I run OPTIMIZE on a table with its primary and only index as INT? (assuming only 2 columns, and second column is just some INT value)
(the main worry is the upper limit on the autoincrement as theres lots of adds and deletes in my table)
If I don't set it as auto increment and just insert random integers which are unique into it, does that slow it down compared to autincrementing?
In MyISAM it will in fact speed it (marginally).
In InnoDB, this may slow the INSERT operations down due to page splits.
This of course implies that your numbers are really unique.
Does it speed things up if I optimise a table with its primary and only index as INT? (assuming only 2 columns, and second column is just some INT value)
AUTO_INCREMENT and INT may be used together.
OPTIMIZE TABLE will compact you table and indexes, freeing the space left from the deleted rows and page splits. If you had lots of DELETE operations on the table or INSERT out of order (like in your solution with random numbers), this will help.
It will also bring the logical and physical order of the index pages into consistency with each other which will speed up full scans or ranged queries on PK (PK BETWEEN val1 AND val2), but will hardly matter for random seeks.
(the main worry is the upper limit on the autoincrement as theres lots of adds and deletes in my table)
BIGINT UNSIGNED (which can also be used with AUTO_INCREMENT) may hold up values up to 18446744073709551615.
The upper limit for autoincremented integers is 18446744073709551615:
http://dev.mysql.com/doc/refman/5.1/en/numeric-types.html
Are you really hitting such limit? If you do, allowing MySQL to add one to the previous number is an algorithm that can hardly improve.
The upper limit on AUTOINCREMENT is the upper limit on the number type in the respective column. Even with INT UNSIGNED, this can take a while to hit; with BIGINT it's going to be very hard to reach that (and seriously, what kind of app are you building that 4 extra bytes per row are way too much?). So, if you're going to hit that limit, you'll hit it with autoincrement or without it.
Also, although not having AUTOINCREMENT will speed your inserts up a tiny bit, I'm willing to bet that any code to generate a unique integer to use instead of the AUTOINCREMENT will slow down the code more than the autoincrement would (generating random non-conflicting numbers will get progressively harder as your table fills up).
In other words, IMNSHO this looks like premature optimization, and will not significantly contribute to faster code (if at all), but it will make it less maintainable (as the PK will need to be generated explicitly, instead of the database taking care of it).

How can I handle the problem whe AUTO_INCREMENT hit its limit?

Is there any good practice for this?
I wish I could solve the problem when primary key hit the limit and not to avoid it. Because this is what will happen in my specific problem.
If it's unavoidable... What can i do?
This is mysql question, is there Sybase sql anywhere same problem?
Why would you hit the limit on that field? If you defined it with a datatype large enough it should be able to hold all your records.
If you used an unsigned bigint you can have up to 18,446,744,073,709,551,615 records!
You should pick the correct type for the primary key, if you know you will have lots of rows you could use bigint instead of the commonly used int.
In mysql you can easily adjust the primary key collumn with the alter table statement to adjust the range.
you should also use the unsigned property on that collumn because an auto increment primary key is always positive.
when the limit is reached you could maybe create some algorithm to put inside the ON DUPLICATE KEY UPDATE statement of an INSERT
Well, depending on the autoincrement column's datatype.
Unsign int goes up to 4294967295.
If you want to prevent the error, you can check the value last autoincrement value: LAST_INSERT_ID()
If it's approaching the datatype's max, either do not allow insertion or handle it in other ways.
Other than that, I can only suggest you use bigint so you can almost not hit the max for most scenario.
Can't give you a foolproof answer though :)
I know this question might be too old, but I would like to answer as well.
It is actually impossible to make that scenario unavoidable. Just by
thinking there is a physical limit about how many storage drives
humankind is able to make. But this is certainly not likely to happen
to fill all available storage.
As others have told you, an UNSIGNED BIGINT is able to handle up to 18,446,744,073,709,551,615 records, probably the way to go in "most" cases.
Here is another idea: by triggering the number of records of your database (for example, 85% full) you could backup that table into another table/region/database, and scale your infrastructure accordingly. And then reset that initial table.
And my last approach: some companies opt to do a tiny change to their license and use agreement, that they will shutdown an account if the user do not log in for a certain amount of time (say, 6 months for free users, 6 years for pro users, 60 years for ultimate users... <-- hey! you can also use different tables for those too!).
Hope somebody find this useful.

Is there any harm in resetting the auto-increment?

I have a 100 million rows, and it's getting too big.
I see a lot of gaps. (since I delete, add, delete, add.)
I want to fill these gaps with auto-increment.
If I do reset it..is there any harM?
If I do this, will it fill the gaps?:
mysql> ALTER TABLE tbl AUTO_INCREMENT = 1;
Potentially very dangerous, because you can get a number again that is already in use.
What you propose is resetting the sequence to 1 again. It will just produce 1,2,3,4,5,6,7,.. and so on, regardless of these numbers being in a gap or not.
Update: According to Martin's answer, because of the dangers involved, MySQL will not even let you do that. It will reset the counter to at least the current value + 1.
Think again what real problem the existence of gaps causes. Usually it is only an aesthetic issue.
If the number gets too big, switch to a larger data type (bigint should be plenty).
FWIW... According to the MySQL docs applying
ALTER TABLE tbl AUTO_INCREMENT = 1
where tbl contains existing data should have no effect:
To change the value of the
AUTO_INCREMENT counter to be used for
new rows, do this:
ALTER TABLE t2 AUTO_INCREMENT = value;
You cannot reset the counter to a
value less than or equal to any that
have already been used. For MyISAM, if
the value is less than or equal to the
maximum value currently in the
AUTO_INCREMENT column, the value is
reset to the current maximum plus one.
For InnoDB, if the value is less than
the current maximum value in the
column, no error occurs and the
current sequence value is not changed.
I ran a small test that confirmed this for a MyISAM table.
So the answers to you questions are: no harm, and no it won't fill the gaps. As other responders have said: a change of data type looks like the least painful choice.
Chances are you wouldn't gain anything from doing this, and you could easily screw up your application by overwriting rows, since you're going to reset the count for the IDs. (In other words, the next time you insert a row, it'll overwrite the row with ID 1, and then 2, etc.) What will you gain from filling the gaps? If the number gets too big, just change it to a larger number (such as BIGINT).
Edit: I stand corrected. It won't do anything at all, which supports my point that you should just change the type of the column to a larger integer type. The maximum possible value for a BIGINT is 2^64, which is over 18 quintillion. If you only have 100 million rows at the moment, that should be plenty for the foreseeable future.
I agree with musicfreak... The maximum for an integer (int(10)) is 4,294,967,295 (unsigned ofcoarse). If you need to go even higher, switching to BIGINT brings you up to 18,446,744,073,709,551,615.
Since you can't change the next auto-increment value, you have other options. The datatype switch could be done, but it seems a little unsettling to me since you don't actually have that many rows. You'd have to make sure your code can handle IDs that large, which may or may not be tough for you.
Are you able to do much downtime? If you are, there are two options I can think of:
Dump/reload the data. You can do this so it won't keep the ID numbers. For example you could use a SELECT ... INTO to copy the data, sans-IDs, to a new table with identical DDL. Then you drop the old table and rename the new table to the old name. Depending on how much data there is, this could take a noticeable about of time (and temporary disk space).
You could make a little program to issue UPDATE statements to change the IDs. If you let that run slowly, it would "defragment" your IDs over time. Then you could temporarily stop the inserts (just a minute or two), update the last IDs, then restart it. After updating the last IDs you can change the AUTO_INCREMENT value to be the next number and your hole will be gone. This shouldn't cause any real downtime (at least on InnoDB), but it could take quite a while depending on how aggressive your program is.
Of course, both of these ignore referential integrity. I'm assuming that's not a problem (log statements that aren't used as foreign keys, or some such).
Does it really matter if there are gaps?
If you really want to go back and fill them, you can always turn off auto increment, and manually scan for the next available id every time you want to insert a row -- remembering to lock the table to avoid race conditions, of course. But it's a lot of work to do for not much gain.
Do you really need a surrogate key anyway? Depending on the data (you haven't mentioned a schema) you can probably find a natural key.