unique id without auto_increment - mysql

I have an existing schema with a non-auto-incrementing primary key. The key is used as a foreign key in a dozen other tables.
I have inherited a program with major performance problems. Currently, when a new row is added to this table, this is how a new unique id is created:
1) a query for all existing primary key values is retrieved
2) a random number is generated
3) if the number does not exist in the retrieved values, use it, otherwise goto (2)
The app is multi-threaded and multi-server, so simply grabbing the existing ids once at startup isn't an option. I do not have unique information from the initiating request to grab and convert into a pseudo-unique value (like a member id).
I understand it is theoretically possible to perform surgery on the internals to add autoincrementing to an existing primary key. I understand it would also be possible to systematically drop all foreign keys pointing to this table, then create-rename-insert a new version of the table, then add back foreign keys, but this table format is dictated by a third-party app and if I mess this up then Bad Things happen.
Is there a way to leverage sql/mysql to come up with unique row values?
The closest I have come up with is choosing a number randomly from a large space and hoping it is unique in the database, then retrying when the odd collision occurs.
Ideas?

If the table has a primary key that isn't being used for foreign key references, then drop that primary key. The goal is to make your column an auto-incremented primary key.
So, look for the maximum value and then the following should do what you want:
alter table t modify id int not null auto_increment primary key;
alter table t auto_increment = <maximum value> + 1;
I don't think it is necessary to explicitly set the auto_increment value, but I like to be sure.

I think you can SELECT MAX('strange-id-column')+1. That value will be unique and you can put that sql code inside a transaction with the INSERT code in order to prevent errors.

It seems really expensive to pull back a list of all primary key values (for large sets), and then to generate psuedo-random value and verify it's unique, by checking it against the list.
One of the big problems I see with this approach is that a pseudo-random number generator will generate the same sequence of values, when the sequence is started with the same seed value.
If that ever happens, then there will be collision after collision after collision until the sequence reaches a value that hasn't yet been used. And the next time it happens, you'd spin through that whole list again, to add one more value.
I don't understand why the value has to be random.
If there's not a requirement for pseudo-randomness, and an ascending value would be okay, here's what I would do if I didn't want to make any changes to the existing table:
I'd create another "id-generator" table that has an auto_increment column. I perform inserts to that table to generate id values.
Instead of running a query to pull back all existing id values from the existing table, I'd instead perform an INSERT into the "id-generator" table, and then a SELECT LAST_INSERT_ID() to retrieve the id of the row I just inserted, and that would use that as "generated" id value.
Basically, emulating an Oracle SEQUENCE object. It wouldn't be necessary to keep all of the rows in "id-generator" table. So, I could perform a DELETE of all rows that have an id value less than the maximum id value.
If there is a requirement for pseudo-randomness (shudder) I'd probably just attempt the INSERT as a way to find out if the key exists or not. If the insert fails due to a duplicate key, I'd try again with a different id value.
The repeated sequence from a pseudo-random generator scares me... if I got several collisions in a row... are these from a previously used sequence, or are they values from a different sequence. I don't have any way of knowing. Abandoning the sequence and restarting with a new seed, if that seed has been used before, I'm off chasing another series of previously generated values.

For low levels of concurrency (average concurrent ongoing inserts < 1) You can use optimistic locking to achieve a unique id without autoincrement:
set up a one-row table for this function, eg:
create table last_id (last_id bigint not null default 0);
To get your next id, retrieve this value in your app code, apply your newId function, and then attempt to update the value, eg:
select last_id from last_id; // In DB
newId = lastId + 1 // In app code
update last_id set last_id=$newId where last_id=$lastId // In DB
Check the number of rows that were updated. If it was 0, another server beat you to it and you should return to step 1.

Related

Auto-increment a primary key in MySql

During the creation of tables using mysql on phpmyadmin, I always find an issue when it comes to primary keys and their auto-increments. When I insert lines into my table. The auto_increment works perfectly adding a value of 1 to each primary key on each new line. But when I delete a line for example a line where the primary key is 'id = 4' and I add a new line to the table. The primary key in the new line gets a value of 'id = 5' instead of 'id = 4'. It acts like the old line was never deleted.
Here is an example of the SQL statement:
CREATE TABLE employe(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30) NOT NULL
)
ENGINE = INNODB;
How do I find a solution to this problem ?
Thank you.
I'm pretty sure this is by design. If you had IDs up to 6 in your table and you deleted ID 2, would you want the next input to be an ID of 2? That doesn't seem to follow the ACID properties. Also, if there was a dependence on that data, for example, if it was user data, and the ID determined user IDs, it would invalidate pre-existing information, since if user X was deleted and the same ID was assigned to user Y, that could cause integrity issues in dependent systems.
Also, imagine a table with 50 billion rows. Should the table run an O(n) search for the smallest missing ID every time you're trying to insert a new record? I can see that getting out of hand really quickly.
Some links you might like to read:
Principles of Transaction-Oriented Database Recovery (1983)
How can we re-use the deleted id from any MySQL-DB table?
Why do you care?
Primary keys are internal row identifiers that are not supposed to be sexy or good looking. As long as they are able identify each row uniquely, they serve their purpose.
Now, if you care about its value, then you probably want to expose the primary key value somewhere, and that's a big red flag. If you need an external, visible identifier, you can create a secondary column with any formatting sequence and values you want.
As a side note, the term AUTO_INCREMENT is a bit misleading. It doesn't really mean they increase one by one all the time. It just mean it will try to produce sequential numbers, as long as it is possible. In multi-threaded apps that's usually not possible since batches or numbers are reserved per thread so the row insertion sequence may end actually not following the natural numbering. Row deletions have a similar effect, as well as INSERT with roll backs.
Primary keys are meant to be used for joining tables together and
indexing, they are not meant to be used for human usage. Reordering
primary key columns could orphan data and wreck havoc to your queries.
Tips: Add another column to your table and reorder that column to your will if needed (show that column to your user instead of the primary key).

generate id number mysql

i want to generate a id number for my user table.
id number is unique index.
here my trigger
USE `schema_epolling`;
DELIMITER $$
CREATE DEFINER=`root`#`localhost` TRIGGER `tbl_user_BINS` BEFORE INSERT ON `tbl_user`
FOR EACH ROW
BEGIN
SET NEW.id_number = CONCAT(DATE_FORMAT(NOW(),'%y'),LPAD((SELECT auto_increment FROM
information_schema.tables WHERE table_schema = 'schema_epolling' AND table_name =
'tbl_user'),6,0));
END
it works if i insert one by one .. or may 5 rows at a time.
but if i insert a bulk rows.. an error occured.
id number
heres the code i use for inserting bulk rows from another schema/table:
INSERT INTO schema_epolling.tbl_user (last_name, first_name)
SELECT last_name, first_name
FROM schema_nc.tbl_person
heres the error:
Error Code: 1062. Duplicate entry '14000004' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000011' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000018' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000025' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000032' for key 'id_number_UNIQUE'
if i use uuid() function it works fine. but i dont want uuid() its too long.
You don't want to generate id values that way.
The auto-increment value for the current INSERT is not generated yet at the time the BEFORE INSERT trigger executes.
Even if it were, the INFORMATION_SCHEMA would contain the maximum auto-increment value generate by any thread, not just the thread executing the trigger. So you would have a race condition that would easily conflict with other concurrent inserts and get the wrong value.
Also, querying INFORMATION_SCHEMA on every INSERT is likely to be a bottleneck for your performance.
In this case, to get the auto-increment value formatted with the two-digit year number prepended, you could advance the table's auto-increment value up to %y million, and then when we reach January 1 2015 you would ALTER TABLE to advance it again.
Re your comments:
The answer I gave above applies to how MySQL's auto-increment works. If you don't rely on auto-increment, you can generate the values by some other means.
Incrementing another one-row table as #Vatev suggests (though this creates a relatively long-lived lock on that table, which could be a bottleneck for your inserts).
Generating values in your application, based on an central, atomic id-generator like memcached. See other ideas here: Generate unique IDs in a distributed environment
Using UUID(). Yes, sorry, it's 32 characters long. Don't truncate it or you will use uniqueness.
But combining triggers with auto-increment in the way you show simply won't work.
I'd like to add my two cents to expound on Bill Karwin's point.
It's better that you don't generate a Unique ID by attempting to manually cobble one together.
The fact that your school produces an ID in that way does not mean that's the best way to do it (assuming that is what they are using that generated value for which I can't know without more information).
Your database work will be simpler and less error prone if you accept that the purpose for an ID field (or key) is to guarantee uniqueness in each row of data, not as a reference point to store certain pieces of human readable data in a central spot.
This type of a ID/key is known as a surrogate key.
If you'd like to read more about them here's a good article: http://en.wikipedia.org/wiki/Surrogate_key
It's common for a surrogate key to also be the primary key of a table, (and when it's used in this way it can greatly simplify creating relationships between tables).
If you would like to add a secondary column that concatenates date values and other information because that's valuable for an application you are writing, or any other purpose you see fit, then create that as a separate column in your table.
Thinking of an ID column/key in this, fire & forget, way may simplify the concept enough that you may experience a number of benefits in your database creation efforts.
As an example, should you require uniqueness between un-associated databases, you will more easily be able to stomach the use of a UUID.
(Because you'll know it's purpose is merely to ensure uniqueness NOT to be useful to you in any other way.)
Additionally, as you've found, taking the responsibility on yourself, instead of relying on the database, to produce a unique value adds time consuming complexity that can otherwise be avoided.
Hope this helps.

Reserve a block of auto-increment ids in MySQL

I receive batches of, say, 100 items that I need to insert into three related MySQL tables: say current, recent, and historical. I want to insert each batch in each table as a group in a single insert statement for speed. The current table has an auto-increment primary key id that I need to obtain for each inserted row and use as the primary key to insert the same row in the recent and historical tables. My idea is to get the current auto_increment value for current, increment it by 100 using alter table current AUTO_INCREMENT=, then insert the 100 rows into current with programmatically set ids from the block that I just "reserved". Then I can use the same 100 reserved id values for the inserts into the recent and historical tables without having to query them again from the current table.
My question: Is there some reason that this is a bad idea? I have seen nothing about it on the web. The closest I have seen on stack overflow is Insert into an auto increment field but that is not quite the same thing. I can see possible threading issues if I tried to do this from more than one thread at a time.
I'm also open to other suggestions on how to accomplish this.
There might be concurrency issues: If another connection inserts values between the time you get the current value and you set the new value, you would get duplicate keys.
I am not aware if that can happen in your situation, however, or if the inserts happen only from your batch described above, and there is never another instance of it running in parallel.
Methinks you shoud decouple the IDs from the 3 tables and using ALTER TABLE sounds very fishy too.
The most proper way I can think of:
in recent and historical, add a colum that references to current ID; don't try to force the primary IDs to be the same.
Acquire a WRITE table lock on current.
Get the auto_increment value X for current.
Insert your 100 records; their IDs should now run from X+1 to X+100.
Release the table lock.
Insert records in recent and historical with the know IDs in the extra column.
Note: I'm not sure if the auto_increment value points to the next ID, or the current highest value. If you use MAX(id) then you should use the code above.
This is [a bit] late, but in case someone else has this same question (as I did):
As Ethan pointed out in his comment, auto_increment is an internal MySQL utility to produce unique keys. Since you have the ability to generate your own id values external to MySQL, I suggest removing the auto_increment overhead from the table (but keep id as PK, for transport to the other tables). You can then insert your own id values along with the data.
Obviously once you do this you'll have to program your own incrementing id values. To retrieve a "starting point" for each batch and maintain the speed of a single batch INSERT call, create another table (I'll call in management) with just a single record of last_id, which is equivalent to, but independent of, max(id) of your three primary tables. Then, each time a new batch is ready to be processed, start a transaction on management with a write lock, read management.last_id, UPDATE management.last_id to (last_id+1)+number in batch, then close the transaction. You now have sequential id values to insert that are reserved for that batch because any future calls to management.last_id will the next-larger set of id values.
The write-locked transaction removes any concurrency issues (as stated in FrankPI's answer) because any other processes attempting to read management must wait for the lock to be removed and will return the value after the UPDATE. This also removes the id ambiguity in JvO's answer: "...IDs should now run from X+1 to X+100", which can be a dangerous assumption.

Select and insert at the same time

So, i need to get max number of field called chat_id and after that i need to increment it by one and insert some data in that field, so the query should look something like this:
SELECT MAX(`chat_id`) FROM `messages`;
Lets say it returns me 10 now i need to insert new data
INSERT INTO `messages` SET `chat id` = 11 -- other data here....
So it would work the way i want but my question is what if betwen that time while i'm incrementing and inserting new record other user gonna do the same? than there would already be record with 11 id and it could mess my data is there a way to make sure that the right id goes where i need, btw i can't user auto increment for this.
EDIT as i said i cannot use auto increment because that table already have id field with auto increment, this id is for different porpuse, also it's not unique and it can't be unique
EDIT 2 Solved it by redoing my whole tables structure since no one gave me better ideas
Don't try to do this on your own. You've already identified one of the pitfalls of that approach. I'm not sure why you're saying you can't use auto increment here. That's really the way to go.
CREATE TABLE messages (
chat_id INT NOT NULL AUTO_INCREMENT,
....
)
If you cannot use an auto-increment primary key then you will either have to exclusively lock the table (which is generally not a good idea), or be prepared to encounter failures.
Assuming that the chat_id column is UNIQUE (which it should be from what you 're saying), you can put these two queries inside a loop. If the INSERT succeeds then everything is fine, you can break out of the loop and continue. Otherwise it means that someone else managed to snatch this particular id out of your hands, so repeat the process until successful.
At this point I have to mention that you should not actually use a totally naive approach in production code (e.g. you might want to put an upper limit in how many iterations are possible before you give up) and that this solution will not work well if there is a lot of contention for the database (it will work just fine to ensure that the occasional race does not cause you problems). You should examine your access patterns and load before deciding on this.
AUTO_INCREMENT would solve this problem. But for other similar situations this would be a great use of transactions. If you're using InnoDb engine you can use transactions to ensure that operations happen in a specific order so that your data stays consistent.
You can solve this by using MySQL's built-in uuid() function to calculate the new primary key value, instead of leaving it to the auto increment feature.
Alter your table to make messages.chat_id a char(36) and remove the AUTO_INCREMENT clause.
Then do this:
# Generate a unique primary key value before inserting.
declare new_id char(36);
select uuid() into new_id;
# Insert the new record.
insert into messages
(chat_id, ...)
values
(new_id, ...);
# Select the new record.
select *
from messages
where chat_id = new_id;
The MySQL's documentation on uuid() says:
A UUID is designed as a number that is globally unique in space and time. Two calls to UUID() are expected to generate two different values, even if these calls are performed on two separate devices not connected to each other.
Meaning it's perfectly safe to use the value generated by uuid as a primary key value.
This way you can predict what the primary key value of the new record will be before you insert it and then query by it knowing for sure that no other process has "stolen" that id from you in between the insert and the select. Which in turn removes the need for a transaction.

MySQL, autoincrement sequence?

In a MySQL database column that has been set to AUTO_INCREMENT, can I assume that the values will always be created sequentially?
For instance, if 10 rows are inserted and receive values 1,2,3,...10, and then 3 is deleted, can I assume the next row inserted will receive 11?
The reason I ask is that I'd like to sort values based on the order in which they were inserted into the table, and if I can sort based on the auto incremented primary key it will be a little easier.
From what I understand from the manual; yes. Each table has it's own 'next auto increment value' that is incremented by the amount defined in auto_increment_increment (http://dev.mysql.com/doc/refman/5.0/en/replication-options-master.html#sysvar_auto_increment_increment) and that is never automatically reset, even though it can be manually reset. But as #miku said, if possible a timestamp would be preferable.
I've seen auto_increment mainly used for the primary key column. If you want to sort items by say date_added you should create an extra timestamp, date/datetime or int (epoch) column.
This way you make your intent explicit and easier to follow - also you can safely migrate, export and import your DB without the need to worry about how auto_increment is handled.