I have encountered the fact that some people, after performing deletion of rows from a table, also reset the AUTO_INCREMENT for the primary key column of that table to re-number all the values as if they started from 1 again (or whatever the initial starting point).
My question is, is there a specific reason for doing this, other than just preference? As in, is there any detrimental impact on the database or future queries if you do not reset the auto-increment and just leave it as-is? If there is, could somebody provide an example where it would be necessary to reset AUTO_INCREMENT?
Thanks!
I don't think it is ever necessary to reset auto_increment, unless you are running out of values.
One case where auto-increment is often reset is when all the rows are deleted. If you use truncate table, then the auto-increment value is reset automatically. This does not always happen with delete without a where clause, so for consistency, you might want to reset it.
Another case is when a large insert fails, particularly if it fails repeatedly. You might not want the really large gaps.
When moving tables around you might want to keep the original id values. So, essentially, you ignore the auto-increment on inserts. Afterwards, though, you might want to set the automatic value to be consistent with other systems.
In general, though, resetting the auto-increment is not recommended.
Unfortunately, I've seen this behavior. And from what I observed, it's not due to a technical reason - it's closer to OCD.
Some people really don't like gaps in the ID column - they like the idea of it smoothly increasing by 1 for each record. The idea that some manual data manipulation they're doing screwing that up isn't pleasant - so they go through some hoops to make sure they don't cause gaps in the numbers.
But, yeah, this is a terrible practice. It's just asking for data integrity problems.
Resetting auto-inc is an uncommon operation. Under normal day to day work, just let it keep incrementing.
I've done reset of auto-inc in MySQL instances used for automated testing. A given set of tables is loaded with data over and over, and deletes its test data afterwards. Resetting the auto-inc may be the best way to make tests repeatable, if they're looking for specific values in the results.
Another scenario is when creating archive tables. Suppose you have a huge table, and you want to empty out the data efficiently (not using DELETE), but you want to archive the data, and you want new data to use id values higher than your old data.
CREATE TABLE mytable_new LIKE mytable;
SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME='mytable';
ALTER TABLE mytable_new AUTO_INCREMENT = /* value + 10000 */;
RENAME TABLE mytable TO mytable_archive, mytable_new TO mytable;
The above series of statements allow you to shuffle a new empty table into place atomically, so your app can continue writing to the table by the name it's used to. The auto-inc value you reset in the new table should be a value higher than the max id value in the old table, plus some comfortable gap to avoid overlap during the moments between the statements.
Reseting the auto increment usually helps in terms of organization, you can see no gap between id 6 and 60 if the rows between have been deleted.
However, you should be carefull about working with resetting auto-increments, because most likely, your code will depend on specific id's to fetch certain information.
In my opinion, just truncate the whole thing after your tests and seed the database with the correct information. If it's production, let it run wild and free, it could cause more harm and no beneficial output
As per comment on abr's answer, assuming that auto-increment ids are contiguous (or even sequential) is not just a bad idea, it is a dangerous one.
There may be good reason for deliberately creating gaps in the allocated ids if you intend to patch the data at a later point (e.g. if you have restored from an old backup and expect to recover some of the missing data but need to restore a service asap) or when you migrate from a single active server to multiple master nodes. But in these scenarios you are setting the counter to higher value than currently used - not resetting it back to the start.
If there is a risk that you are going to wrap around the numbers, then you've probably picked the wrong data type for your auto-increment attribute - changing the data type is the right way to fix the problem, not deleting data and resetting the counter to 0.
Related
I have a system whereby users can input data into a mysql table from many sites across the globe.
The data is posted via ajax to my table without issues. But, I would like to improve my insertion code to prevent insertion if the timestamp is within some interval. This would weed out duplicate rows in my table.
Before you get angry -> I do understand I can set a primary key to certain columns and prevent duplicate insertion.
In my use case, I need to allow duplications of the numeric data where it is truly duplicated values from a unique submission -> this is valid in my case. I would like to leverage the timestamp to weed out obvious double insertions where the variables were submitted by accident twice.
I have tried to disable the button for 1-2 seconds, but this hasn't solved the problem entirely.
If I have columns: weight, height, country and the timestamp, I'd like to somehow check if there is an insert within n sections of the timestamp, where the post includes data that matches these variables. This would tell me that there is an accidental duplication from a user and I shouldn't insert it into the database.
I'm not too familiar with MYSQL, so I was hoping to get some guidance here.
Thanks.
There are different solutions, depending on the specifics of your case:
If you need to apply some rule that validates the new row using values inside the row itself a CHECK constraint will do. Consider, though, that MySQL enforces CHECK constraints starting in version 8.0.3 (if I remember well).
If you want to enforce a rule in relation to other rows, you can serialize the insertions into a queue. The consumer of the queue will validate the insertions one by one and will accept or reject them. Consider that serialization is not a good option for massive level of insertions, since it produce a bottleneck (this may be your case since you say insertions from across the globe).
Alternatively, you can use optimistic insertion, and always produce the insertion with an intermediate status "waiting for validation". Then other process(es) can validate the row. If all is good, then the row is approved; if not, then a compensation procedure is executed, in a-la-microservice way.
Which one is your case?
I have a MYSQL table, where (to an already existing table) I added another column "Number" that is auto_incremented and has a UNIQUE KEY constraint.
There are 17000+ records in the table. After adding the "Number" column, one value is missed - there is a value of 14 369 and the next one is 14 371.
I tried removing the column and adding it again, but the missing value is still missing.
What might be the problem, and what is the least painfull way to solve this?
There is no problem and there is nothing to fix.
MySQL's auto_increment provides unique values, and it calculates them using sequential increment algorithm (it just increments a number).
That algorithm guarantees the fastest and accurate way of generating unique values.
That's its job. It doesn't "reuse" numbers and forcing it to do so comes with disastrous performance and stability.
Since queries do fail sometimes, these numbers get "lost" and you can't have them back.
If you require sequential numbers for whatever reason, create a procedure or scheduled event and maintain the numbers yourself.
You have to bear in mind that MySQL is a transactional database designed to operate under concurrent access. If it were to reuse these numbers, the performance would be abysmal since it'd have to use locks and force people to wait until it reorganizes the numbers.
InnoDB engine, the default engine, uses primary key values to organize records on the hard drive. If you were to change any of the values, it would start re-writing the records incurring a HUGE I/O wait that depends on the amount of data on the disk - it could bring the whole serve to a grinding halt.
TL:DR; there is no problem, there is nothing to fix, don't do it. If you persist, expect abnormal behavior.
I was previously under the impression that deleting rows in an autoincremented table can harm SELECT performance, and so I've been using a tinyint column called "removed" to mark whether an item is removed or not.
My SELECT queries are something like this:
SELECT * FROM items WHERE removed = 0 ORDER BY id DESC LIMIT 25
But I'm wondering whether it does, in fact, make sense to just delete those rows instead. Less than 1% of rows are marked as "removed" so it seems dumb for mysql to have to check whether removed = 0 for each row.
So can deleting rows harm performance in any way?
That depends a lot on your use case - and on your users. Marking the row as deleted can help you in various situations:
if a user decides "oh, I did need that item after all", you don't need to go through the backups to restore it - just flip the "deleted" bit again (note potential privacy implications)
with foreign keys, you can't just go around deleting rows, you'd break the relationships in the database; same goes for security/audit logs
you aren't changing the number of rows (which may decrease index efficiency if the removed rows add up)
Moreover, when properly indexed, in my measurements, the impact was always insignificant (note that I wrote "measurements" - go and profile likewise, don't just blindly trust some people on the Internet). So, my advice would be "use the removed column, it has significant benefits and no significant negative impact".
I don't think deleting rows harm on select query. Normally peoples takes an extra column named deleted [removed in your case] to provide a restore like functionality. So if you are not providing restore functionality then you can delete the row it will not affect the select query as far as I know. But while deleting keep relationships in mind they should also get deleted or will result in error or provide wrong results.
You just fill the table with more and more records which you don't need. If you don't plan to use them in the future, I don't think you need to store them at all. If you want to keep them anyway, but don't plan to use them often, you can just create a temp table to hold your "removed" records.
I've inherited the task of maintaining a very poorly-coded e-commerce site and I'm working on refactoring a lot of the code and trying to fix ongoing bugs.
Every database insert (adding an item to cart, etc.) begins with a grab_new_id function which COUNTs the number of rows in the table, then, starting with that number, querys the database to find an unused index number. In addition to being terrible performance-wise (there are 40,000+ rows already, and indexes are regularly deleted, so sometimes it takes several seconds just to find a new id) this breaks regularly when two operations are preformed simultaneously, as two entries are added with duplicate id numbers.
This seems idiotic to me - why not just use auto-increment on the index field? I've tested it both ways, and adding rows to the table without specifying an index id is (obviously) many times faster. My question is: can anyone think of any reason the original programmer might have done this? Is there some school of thought where auto_increment is somehow considered bad form? Are there databases that don't have auto-increment capabilities?
I've seen this before from someone that didn't know that feature existed. Definitely use the auto-increment feature.
Some people take the "roll your own" approach to everything, often because they haven't taken the time to see if that is an available feature or if someone else had already come up with it. You'll often see crazy workarounds or poor performing/fragile code from these people. Inheriting a bad database is no fun at all, good luck!
Well Oracle has sequences but not auto-generated ids as I understand it. However, usually this kind of stuff is done by devs who don't understand database programming and who hate to see gaps in the data (as you get from rollbacks). There are also people who like to create the id, so they have it available beforhand to use for child tables, but most databases with autogenerated ids also have a way to return that id to the user at the time of creation.
The only issue I found partially reasonable (but totally avoidable!) against auto_inc fields is that some backup tools by default include auto_inc values into table definition even if you don't include data into a db dump that may be inconvenient.
Depending on the specific situation, there are clearly many reasons for not using consecutive numbers as a primary key.
However, under the given that I do want consecutive numbers as a primary key, I see no reason not to use the built in auto_increment functionality MySQL offers
It was probably done that way for historical reasons; i.e. earlier versions didn't have autoinc variables. I've written code that uses manual autoinc fields on databases that don't support autoinc types, but my code wasn't quite as inefficient as pulling a count().
One issue with using autoinc fields as a primary key is that moving records in and out of tables may result in the primary key changing. So, I'd recommend designing in a "LegacyID" field up front that can be used as future storage for the primary key for times when you are moving records in and out of the table.
They may just have been inexperienced and unfamiliar with auto increment. One reason I can think of, but doesn't necessarily make much sense, is that it is difficult (not impossible) to copy data from one environment to another when using auto increment id's.
For this reason, I have used sequential Guids as my primary key before for ease of transitioning data, but counting the rows to populate the ID is a bit of a WTF.
Two things to watch for:
1.Your RDBMS intelligently sets the auto-increment value upon restart. Our engineers were rolling their own auto-increment key to get around the auto-increment field jumping by an order of 100000s whenever the server restarted. However, at some point Sybase added an option to set the size of the auto-increment.
2.The other place where auto-increment can get nasty is if you are replicating databases and are using a master-master configuration. If you write on both databases (NOT ADVISED), you can run into identity-collision.
I doubt either of these were the case, but things to be aware of.
I could see if the ids were generated on the client and pushed into the database, this is common practice when speed is necessary, but what you discribed seems over the top and unnecessary. Remove it and start an auto incrementing id.
I have a 100 million rows, and it's getting too big.
I see a lot of gaps. (since I delete, add, delete, add.)
I want to fill these gaps with auto-increment.
If I do reset it..is there any harM?
If I do this, will it fill the gaps?:
mysql> ALTER TABLE tbl AUTO_INCREMENT = 1;
Potentially very dangerous, because you can get a number again that is already in use.
What you propose is resetting the sequence to 1 again. It will just produce 1,2,3,4,5,6,7,.. and so on, regardless of these numbers being in a gap or not.
Update: According to Martin's answer, because of the dangers involved, MySQL will not even let you do that. It will reset the counter to at least the current value + 1.
Think again what real problem the existence of gaps causes. Usually it is only an aesthetic issue.
If the number gets too big, switch to a larger data type (bigint should be plenty).
FWIW... According to the MySQL docs applying
ALTER TABLE tbl AUTO_INCREMENT = 1
where tbl contains existing data should have no effect:
To change the value of the
AUTO_INCREMENT counter to be used for
new rows, do this:
ALTER TABLE t2 AUTO_INCREMENT = value;
You cannot reset the counter to a
value less than or equal to any that
have already been used. For MyISAM, if
the value is less than or equal to the
maximum value currently in the
AUTO_INCREMENT column, the value is
reset to the current maximum plus one.
For InnoDB, if the value is less than
the current maximum value in the
column, no error occurs and the
current sequence value is not changed.
I ran a small test that confirmed this for a MyISAM table.
So the answers to you questions are: no harm, and no it won't fill the gaps. As other responders have said: a change of data type looks like the least painful choice.
Chances are you wouldn't gain anything from doing this, and you could easily screw up your application by overwriting rows, since you're going to reset the count for the IDs. (In other words, the next time you insert a row, it'll overwrite the row with ID 1, and then 2, etc.) What will you gain from filling the gaps? If the number gets too big, just change it to a larger number (such as BIGINT).
Edit: I stand corrected. It won't do anything at all, which supports my point that you should just change the type of the column to a larger integer type. The maximum possible value for a BIGINT is 2^64, which is over 18 quintillion. If you only have 100 million rows at the moment, that should be plenty for the foreseeable future.
I agree with musicfreak... The maximum for an integer (int(10)) is 4,294,967,295 (unsigned ofcoarse). If you need to go even higher, switching to BIGINT brings you up to 18,446,744,073,709,551,615.
Since you can't change the next auto-increment value, you have other options. The datatype switch could be done, but it seems a little unsettling to me since you don't actually have that many rows. You'd have to make sure your code can handle IDs that large, which may or may not be tough for you.
Are you able to do much downtime? If you are, there are two options I can think of:
Dump/reload the data. You can do this so it won't keep the ID numbers. For example you could use a SELECT ... INTO to copy the data, sans-IDs, to a new table with identical DDL. Then you drop the old table and rename the new table to the old name. Depending on how much data there is, this could take a noticeable about of time (and temporary disk space).
You could make a little program to issue UPDATE statements to change the IDs. If you let that run slowly, it would "defragment" your IDs over time. Then you could temporarily stop the inserts (just a minute or two), update the last IDs, then restart it. After updating the last IDs you can change the AUTO_INCREMENT value to be the next number and your hole will be gone. This shouldn't cause any real downtime (at least on InnoDB), but it could take quite a while depending on how aggressive your program is.
Of course, both of these ignore referential integrity. I'm assuming that's not a problem (log statements that aren't used as foreign keys, or some such).
Does it really matter if there are gaps?
If you really want to go back and fill them, you can always turn off auto increment, and manually scan for the next available id every time you want to insert a row -- remembering to lock the table to avoid race conditions, of course. But it's a lot of work to do for not much gain.
Do you really need a surrogate key anyway? Depending on the data (you haven't mentioned a schema) you can probably find a natural key.