Reserve a block of auto-increment ids in MySQL - mysql

I receive batches of, say, 100 items that I need to insert into three related MySQL tables: say current, recent, and historical. I want to insert each batch in each table as a group in a single insert statement for speed. The current table has an auto-increment primary key id that I need to obtain for each inserted row and use as the primary key to insert the same row in the recent and historical tables. My idea is to get the current auto_increment value for current, increment it by 100 using alter table current AUTO_INCREMENT=, then insert the 100 rows into current with programmatically set ids from the block that I just "reserved". Then I can use the same 100 reserved id values for the inserts into the recent and historical tables without having to query them again from the current table.
My question: Is there some reason that this is a bad idea? I have seen nothing about it on the web. The closest I have seen on stack overflow is Insert into an auto increment field but that is not quite the same thing. I can see possible threading issues if I tried to do this from more than one thread at a time.
I'm also open to other suggestions on how to accomplish this.

There might be concurrency issues: If another connection inserts values between the time you get the current value and you set the new value, you would get duplicate keys.
I am not aware if that can happen in your situation, however, or if the inserts happen only from your batch described above, and there is never another instance of it running in parallel.

Methinks you shoud decouple the IDs from the 3 tables and using ALTER TABLE sounds very fishy too.
The most proper way I can think of:
in recent and historical, add a colum that references to current ID; don't try to force the primary IDs to be the same.
Acquire a WRITE table lock on current.
Get the auto_increment value X for current.
Insert your 100 records; their IDs should now run from X+1 to X+100.
Release the table lock.
Insert records in recent and historical with the know IDs in the extra column.
Note: I'm not sure if the auto_increment value points to the next ID, or the current highest value. If you use MAX(id) then you should use the code above.

This is [a bit] late, but in case someone else has this same question (as I did):
As Ethan pointed out in his comment, auto_increment is an internal MySQL utility to produce unique keys. Since you have the ability to generate your own id values external to MySQL, I suggest removing the auto_increment overhead from the table (but keep id as PK, for transport to the other tables). You can then insert your own id values along with the data.
Obviously once you do this you'll have to program your own incrementing id values. To retrieve a "starting point" for each batch and maintain the speed of a single batch INSERT call, create another table (I'll call in management) with just a single record of last_id, which is equivalent to, but independent of, max(id) of your three primary tables. Then, each time a new batch is ready to be processed, start a transaction on management with a write lock, read management.last_id, UPDATE management.last_id to (last_id+1)+number in batch, then close the transaction. You now have sequential id values to insert that are reserved for that batch because any future calls to management.last_id will the next-larger set of id values.
The write-locked transaction removes any concurrency issues (as stated in FrankPI's answer) because any other processes attempting to read management must wait for the lock to be removed and will return the value after the UPDATE. This also removes the id ambiguity in JvO's answer: "...IDs should now run from X+1 to X+100", which can be a dangerous assumption.

Related

Query optimation for insert to database

I need a solution to insert a lot of rows in concurrent time in my sql DB.
I have a rule, that everytime I insert to my transaction table, I need a unique ID that's composed by currentTime+transactionSource+sequenceNumber. my problem is, when I test my service using Jmeter, the service is down when the concurrent insert process is up to 3000 rows. the problem relies on, the duplication of the unique ID I generate. so, there are some duplications. in my assumption, the duplication happen because a previous insert process hasnt finished, but there's another insert process. So,it generates unique ID duplication.
Can anyone give me suggestion in what the best way for doing this? Thank you.
MySQL has three wonderful methods to ensure that an id is unique:
auto_increment columns
uuid()
uuid_short()
Use them! The most common way to implement a unique id is the first one:
create table t (
t_id int auto_increment primar key,
. . .
)
I strongly, strongly advise you not to maintain your own id. You get race conditions (as you have seen). Your code will be less efficient than the code in the database. If you need the separate components, you can implement them as columns in the table.
In other words, your fundamental problem is your "rule". And there are zillions of databases in the world that work perfectly well without such a rule.
Why don't you let the database handle the insert id and then update the row with a secondary field containing the format you want ? If you have dupplicates, you can always append the row id to this identifier so it will always be unique.

unique id without auto_increment

I have an existing schema with a non-auto-incrementing primary key. The key is used as a foreign key in a dozen other tables.
I have inherited a program with major performance problems. Currently, when a new row is added to this table, this is how a new unique id is created:
1) a query for all existing primary key values is retrieved
2) a random number is generated
3) if the number does not exist in the retrieved values, use it, otherwise goto (2)
The app is multi-threaded and multi-server, so simply grabbing the existing ids once at startup isn't an option. I do not have unique information from the initiating request to grab and convert into a pseudo-unique value (like a member id).
I understand it is theoretically possible to perform surgery on the internals to add autoincrementing to an existing primary key. I understand it would also be possible to systematically drop all foreign keys pointing to this table, then create-rename-insert a new version of the table, then add back foreign keys, but this table format is dictated by a third-party app and if I mess this up then Bad Things happen.
Is there a way to leverage sql/mysql to come up with unique row values?
The closest I have come up with is choosing a number randomly from a large space and hoping it is unique in the database, then retrying when the odd collision occurs.
Ideas?
If the table has a primary key that isn't being used for foreign key references, then drop that primary key. The goal is to make your column an auto-incremented primary key.
So, look for the maximum value and then the following should do what you want:
alter table t modify id int not null auto_increment primary key;
alter table t auto_increment = <maximum value> + 1;
I don't think it is necessary to explicitly set the auto_increment value, but I like to be sure.
I think you can SELECT MAX('strange-id-column')+1. That value will be unique and you can put that sql code inside a transaction with the INSERT code in order to prevent errors.
It seems really expensive to pull back a list of all primary key values (for large sets), and then to generate psuedo-random value and verify it's unique, by checking it against the list.
One of the big problems I see with this approach is that a pseudo-random number generator will generate the same sequence of values, when the sequence is started with the same seed value.
If that ever happens, then there will be collision after collision after collision until the sequence reaches a value that hasn't yet been used. And the next time it happens, you'd spin through that whole list again, to add one more value.
I don't understand why the value has to be random.
If there's not a requirement for pseudo-randomness, and an ascending value would be okay, here's what I would do if I didn't want to make any changes to the existing table:
I'd create another "id-generator" table that has an auto_increment column. I perform inserts to that table to generate id values.
Instead of running a query to pull back all existing id values from the existing table, I'd instead perform an INSERT into the "id-generator" table, and then a SELECT LAST_INSERT_ID() to retrieve the id of the row I just inserted, and that would use that as "generated" id value.
Basically, emulating an Oracle SEQUENCE object. It wouldn't be necessary to keep all of the rows in "id-generator" table. So, I could perform a DELETE of all rows that have an id value less than the maximum id value.
If there is a requirement for pseudo-randomness (shudder) I'd probably just attempt the INSERT as a way to find out if the key exists or not. If the insert fails due to a duplicate key, I'd try again with a different id value.
The repeated sequence from a pseudo-random generator scares me... if I got several collisions in a row... are these from a previously used sequence, or are they values from a different sequence. I don't have any way of knowing. Abandoning the sequence and restarting with a new seed, if that seed has been used before, I'm off chasing another series of previously generated values.
For low levels of concurrency (average concurrent ongoing inserts < 1) You can use optimistic locking to achieve a unique id without autoincrement:
set up a one-row table for this function, eg:
create table last_id (last_id bigint not null default 0);
To get your next id, retrieve this value in your app code, apply your newId function, and then attempt to update the value, eg:
select last_id from last_id; // In DB
newId = lastId + 1 // In app code
update last_id set last_id=$newId where last_id=$lastId // In DB
Check the number of rows that were updated. If it was 0, another server beat you to it and you should return to step 1.

Mysql fetch the row just inserted

So I'm designing a function that inserts a row into the MySQL database. The table has a Primary key with Auto-Increment enabled. So I don't insert the value of this column. But the PK is the only unique column of the entire table. How can I fetch the row I just inserted?
I don't see a problem if the function is in light traffic, but when its load is heavier and heavier, I can see a potential bug: say I inserted a row and the DB's AI value is 1, then before the fetch function starts to request the "latest inserted row", another row is inserted with the AI value 2. Now if the fetch function of Insert 1 runs, Row 2 will be fetched. I know the time gap will need to be so small to allow this bug to actually exist, but is there a better way to fetch the right row, while maintain the table only having the PK as the unique column? (I don't want to implement an additional checksum column, though I see it's a potential solution.)
its not very logical but you could:
insert into `table1` (`column1`,`column2`,`column3`) VALUES ("value1","value2","value3");
select * from `table1` where `PK`=LAST_INSERT_ID();
instead you should only SELECT LAST_INSERT_ID(); as jurgen d suggested and reuse the other data
Please read this php function mysqli_insert_id()
Sorry about the above, I foolishly assumed you were using php. MySQL also has a native LAST_INSERT_ID() function.
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENTvalue generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
Reference; http://dev.mysql.com/doc/refman/5.5/en/information-functions.html#function_last-insert-id

mysql delete,autoincrement

I have a table in MySQL using InnoDB and a column is there with the name "id".
So my problem is that whenever I delete the last row from the table and then insert a new value, the new value gets inserted after the deleted id.
I mean suppose my id is 32, and I want to delete it and then if I insert a new row after delete, then the column id auto-increments to 33. So the serial format is broken ie,id =30,31,33 and no 32.
So please help me out to assign the id 32 instead of 33 when ever I insert after deleting the last column.
Short answer: No.
Why?
It's unnecessary work. It doesn't matter, if there are gaps in the serial number.
If you don't want that, don't use auto_increment.
Don't worry, you won't run out of numbers if your column is of type int or even bigint, I promise.
There are reasons why MySQL doesn't automatically decrease the autoincrement value when you delete a row. Those reasons are
danger of broken data integrity (imagine multiple users perform deletes or inserts...doubled entries may occur or worse)
errors may occur when you use master slave replication or transactions
and so on ...
I highly recommend you don't waste time on this! It's really, really error prone.
You have two major misunderstandings about how a relational database works:
there is no such thing as the "last row" in a relational database.
The ID (assuming that is your primary key) has no meaning whatsoever. It doesn't matter if the new row is assigned the 33, 35354 or 236532652632. It's just a value to uniquely identify that row.
Do not rely on consecutive values in your primary key column.
And do not try the max(id)+1 approach. It will simply not work in a system with more than one transaction.
You should stop fighting this, even using SELECT max(id) will not fix this properly when using transactional database engine like Innodb.
Why you might ask? Imagine that you have 2 transactions, A and B, that started almost at the same time, both doing INSERT. First transaction A needs new row id, and it will use it from invisible sequence associated with this table (known as AUTOINCREMENT value), say 21. Another transaction B will use another successive value (say 22) - so far so good.
But, what if transaction A rolls back? Value 21 cannot be reused, and 22 is already committed. And what if there were 10 such transactions?
And max(id) can assign the same value to both A and B, so this is not valid as well.
I suppose you mean "Whenever I delete the last row from the table", isn't it?
Anyway this is how autoincrement works. It's made to keep correct data relations. If in another table you use an id of a record that has been deleted it's more correct to get an error instead of get another record when querying that id.
Anyway here you can see how to get the first free id in a field.

Select and insert at the same time

So, i need to get max number of field called chat_id and after that i need to increment it by one and insert some data in that field, so the query should look something like this:
SELECT MAX(`chat_id`) FROM `messages`;
Lets say it returns me 10 now i need to insert new data
INSERT INTO `messages` SET `chat id` = 11 -- other data here....
So it would work the way i want but my question is what if betwen that time while i'm incrementing and inserting new record other user gonna do the same? than there would already be record with 11 id and it could mess my data is there a way to make sure that the right id goes where i need, btw i can't user auto increment for this.
EDIT as i said i cannot use auto increment because that table already have id field with auto increment, this id is for different porpuse, also it's not unique and it can't be unique
EDIT 2 Solved it by redoing my whole tables structure since no one gave me better ideas
Don't try to do this on your own. You've already identified one of the pitfalls of that approach. I'm not sure why you're saying you can't use auto increment here. That's really the way to go.
CREATE TABLE messages (
chat_id INT NOT NULL AUTO_INCREMENT,
....
)
If you cannot use an auto-increment primary key then you will either have to exclusively lock the table (which is generally not a good idea), or be prepared to encounter failures.
Assuming that the chat_id column is UNIQUE (which it should be from what you 're saying), you can put these two queries inside a loop. If the INSERT succeeds then everything is fine, you can break out of the loop and continue. Otherwise it means that someone else managed to snatch this particular id out of your hands, so repeat the process until successful.
At this point I have to mention that you should not actually use a totally naive approach in production code (e.g. you might want to put an upper limit in how many iterations are possible before you give up) and that this solution will not work well if there is a lot of contention for the database (it will work just fine to ensure that the occasional race does not cause you problems). You should examine your access patterns and load before deciding on this.
AUTO_INCREMENT would solve this problem. But for other similar situations this would be a great use of transactions. If you're using InnoDb engine you can use transactions to ensure that operations happen in a specific order so that your data stays consistent.
You can solve this by using MySQL's built-in uuid() function to calculate the new primary key value, instead of leaving it to the auto increment feature.
Alter your table to make messages.chat_id a char(36) and remove the AUTO_INCREMENT clause.
Then do this:
# Generate a unique primary key value before inserting.
declare new_id char(36);
select uuid() into new_id;
# Insert the new record.
insert into messages
(chat_id, ...)
values
(new_id, ...);
# Select the new record.
select *
from messages
where chat_id = new_id;
The MySQL's documentation on uuid() says:
A UUID is designed as a number that is globally unique in space and time. Two calls to UUID() are expected to generate two different values, even if these calls are performed on two separate devices not connected to each other.
Meaning it's perfectly safe to use the value generated by uuid as a primary key value.
This way you can predict what the primary key value of the new record will be before you insert it and then query by it knowing for sure that no other process has "stolen" that id from you in between the insert and the select. Which in turn removes the need for a transaction.