Query optimation for insert to database - mysql

I need a solution to insert a lot of rows in concurrent time in my sql DB.
I have a rule, that everytime I insert to my transaction table, I need a unique ID that's composed by currentTime+transactionSource+sequenceNumber. my problem is, when I test my service using Jmeter, the service is down when the concurrent insert process is up to 3000 rows. the problem relies on, the duplication of the unique ID I generate. so, there are some duplications. in my assumption, the duplication happen because a previous insert process hasnt finished, but there's another insert process. So,it generates unique ID duplication.
Can anyone give me suggestion in what the best way for doing this? Thank you.

MySQL has three wonderful methods to ensure that an id is unique:
auto_increment columns
uuid()
uuid_short()
Use them! The most common way to implement a unique id is the first one:
create table t (
t_id int auto_increment primar key,
. . .
)
I strongly, strongly advise you not to maintain your own id. You get race conditions (as you have seen). Your code will be less efficient than the code in the database. If you need the separate components, you can implement them as columns in the table.
In other words, your fundamental problem is your "rule". And there are zillions of databases in the world that work perfectly well without such a rule.

Why don't you let the database handle the insert id and then update the row with a secondary field containing the format you want ? If you have dupplicates, you can always append the row id to this identifier so it will always be unique.

Related

generate id number mysql

i want to generate a id number for my user table.
id number is unique index.
here my trigger
USE `schema_epolling`;
DELIMITER $$
CREATE DEFINER=`root`#`localhost` TRIGGER `tbl_user_BINS` BEFORE INSERT ON `tbl_user`
FOR EACH ROW
BEGIN
SET NEW.id_number = CONCAT(DATE_FORMAT(NOW(),'%y'),LPAD((SELECT auto_increment FROM
information_schema.tables WHERE table_schema = 'schema_epolling' AND table_name =
'tbl_user'),6,0));
END
it works if i insert one by one .. or may 5 rows at a time.
but if i insert a bulk rows.. an error occured.
id number
heres the code i use for inserting bulk rows from another schema/table:
INSERT INTO schema_epolling.tbl_user (last_name, first_name)
SELECT last_name, first_name
FROM schema_nc.tbl_person
heres the error:
Error Code: 1062. Duplicate entry '14000004' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000011' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000018' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000025' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000032' for key 'id_number_UNIQUE'
if i use uuid() function it works fine. but i dont want uuid() its too long.
You don't want to generate id values that way.
The auto-increment value for the current INSERT is not generated yet at the time the BEFORE INSERT trigger executes.
Even if it were, the INFORMATION_SCHEMA would contain the maximum auto-increment value generate by any thread, not just the thread executing the trigger. So you would have a race condition that would easily conflict with other concurrent inserts and get the wrong value.
Also, querying INFORMATION_SCHEMA on every INSERT is likely to be a bottleneck for your performance.
In this case, to get the auto-increment value formatted with the two-digit year number prepended, you could advance the table's auto-increment value up to %y million, and then when we reach January 1 2015 you would ALTER TABLE to advance it again.
Re your comments:
The answer I gave above applies to how MySQL's auto-increment works. If you don't rely on auto-increment, you can generate the values by some other means.
Incrementing another one-row table as #Vatev suggests (though this creates a relatively long-lived lock on that table, which could be a bottleneck for your inserts).
Generating values in your application, based on an central, atomic id-generator like memcached. See other ideas here: Generate unique IDs in a distributed environment
Using UUID(). Yes, sorry, it's 32 characters long. Don't truncate it or you will use uniqueness.
But combining triggers with auto-increment in the way you show simply won't work.
I'd like to add my two cents to expound on Bill Karwin's point.
It's better that you don't generate a Unique ID by attempting to manually cobble one together.
The fact that your school produces an ID in that way does not mean that's the best way to do it (assuming that is what they are using that generated value for which I can't know without more information).
Your database work will be simpler and less error prone if you accept that the purpose for an ID field (or key) is to guarantee uniqueness in each row of data, not as a reference point to store certain pieces of human readable data in a central spot.
This type of a ID/key is known as a surrogate key.
If you'd like to read more about them here's a good article: http://en.wikipedia.org/wiki/Surrogate_key
It's common for a surrogate key to also be the primary key of a table, (and when it's used in this way it can greatly simplify creating relationships between tables).
If you would like to add a secondary column that concatenates date values and other information because that's valuable for an application you are writing, or any other purpose you see fit, then create that as a separate column in your table.
Thinking of an ID column/key in this, fire & forget, way may simplify the concept enough that you may experience a number of benefits in your database creation efforts.
As an example, should you require uniqueness between un-associated databases, you will more easily be able to stomach the use of a UUID.
(Because you'll know it's purpose is merely to ensure uniqueness NOT to be useful to you in any other way.)
Additionally, as you've found, taking the responsibility on yourself, instead of relying on the database, to produce a unique value adds time consuming complexity that can otherwise be avoided.
Hope this helps.

Reserve a block of auto-increment ids in MySQL

I receive batches of, say, 100 items that I need to insert into three related MySQL tables: say current, recent, and historical. I want to insert each batch in each table as a group in a single insert statement for speed. The current table has an auto-increment primary key id that I need to obtain for each inserted row and use as the primary key to insert the same row in the recent and historical tables. My idea is to get the current auto_increment value for current, increment it by 100 using alter table current AUTO_INCREMENT=, then insert the 100 rows into current with programmatically set ids from the block that I just "reserved". Then I can use the same 100 reserved id values for the inserts into the recent and historical tables without having to query them again from the current table.
My question: Is there some reason that this is a bad idea? I have seen nothing about it on the web. The closest I have seen on stack overflow is Insert into an auto increment field but that is not quite the same thing. I can see possible threading issues if I tried to do this from more than one thread at a time.
I'm also open to other suggestions on how to accomplish this.
There might be concurrency issues: If another connection inserts values between the time you get the current value and you set the new value, you would get duplicate keys.
I am not aware if that can happen in your situation, however, or if the inserts happen only from your batch described above, and there is never another instance of it running in parallel.
Methinks you shoud decouple the IDs from the 3 tables and using ALTER TABLE sounds very fishy too.
The most proper way I can think of:
in recent and historical, add a colum that references to current ID; don't try to force the primary IDs to be the same.
Acquire a WRITE table lock on current.
Get the auto_increment value X for current.
Insert your 100 records; their IDs should now run from X+1 to X+100.
Release the table lock.
Insert records in recent and historical with the know IDs in the extra column.
Note: I'm not sure if the auto_increment value points to the next ID, or the current highest value. If you use MAX(id) then you should use the code above.
This is [a bit] late, but in case someone else has this same question (as I did):
As Ethan pointed out in his comment, auto_increment is an internal MySQL utility to produce unique keys. Since you have the ability to generate your own id values external to MySQL, I suggest removing the auto_increment overhead from the table (but keep id as PK, for transport to the other tables). You can then insert your own id values along with the data.
Obviously once you do this you'll have to program your own incrementing id values. To retrieve a "starting point" for each batch and maintain the speed of a single batch INSERT call, create another table (I'll call in management) with just a single record of last_id, which is equivalent to, but independent of, max(id) of your three primary tables. Then, each time a new batch is ready to be processed, start a transaction on management with a write lock, read management.last_id, UPDATE management.last_id to (last_id+1)+number in batch, then close the transaction. You now have sequential id values to insert that are reserved for that batch because any future calls to management.last_id will the next-larger set of id values.
The write-locked transaction removes any concurrency issues (as stated in FrankPI's answer) because any other processes attempting to read management must wait for the lock to be removed and will return the value after the UPDATE. This also removes the id ambiguity in JvO's answer: "...IDs should now run from X+1 to X+100", which can be a dangerous assumption.

Select and insert at the same time

So, i need to get max number of field called chat_id and after that i need to increment it by one and insert some data in that field, so the query should look something like this:
SELECT MAX(`chat_id`) FROM `messages`;
Lets say it returns me 10 now i need to insert new data
INSERT INTO `messages` SET `chat id` = 11 -- other data here....
So it would work the way i want but my question is what if betwen that time while i'm incrementing and inserting new record other user gonna do the same? than there would already be record with 11 id and it could mess my data is there a way to make sure that the right id goes where i need, btw i can't user auto increment for this.
EDIT as i said i cannot use auto increment because that table already have id field with auto increment, this id is for different porpuse, also it's not unique and it can't be unique
EDIT 2 Solved it by redoing my whole tables structure since no one gave me better ideas
Don't try to do this on your own. You've already identified one of the pitfalls of that approach. I'm not sure why you're saying you can't use auto increment here. That's really the way to go.
CREATE TABLE messages (
chat_id INT NOT NULL AUTO_INCREMENT,
....
)
If you cannot use an auto-increment primary key then you will either have to exclusively lock the table (which is generally not a good idea), or be prepared to encounter failures.
Assuming that the chat_id column is UNIQUE (which it should be from what you 're saying), you can put these two queries inside a loop. If the INSERT succeeds then everything is fine, you can break out of the loop and continue. Otherwise it means that someone else managed to snatch this particular id out of your hands, so repeat the process until successful.
At this point I have to mention that you should not actually use a totally naive approach in production code (e.g. you might want to put an upper limit in how many iterations are possible before you give up) and that this solution will not work well if there is a lot of contention for the database (it will work just fine to ensure that the occasional race does not cause you problems). You should examine your access patterns and load before deciding on this.
AUTO_INCREMENT would solve this problem. But for other similar situations this would be a great use of transactions. If you're using InnoDb engine you can use transactions to ensure that operations happen in a specific order so that your data stays consistent.
You can solve this by using MySQL's built-in uuid() function to calculate the new primary key value, instead of leaving it to the auto increment feature.
Alter your table to make messages.chat_id a char(36) and remove the AUTO_INCREMENT clause.
Then do this:
# Generate a unique primary key value before inserting.
declare new_id char(36);
select uuid() into new_id;
# Insert the new record.
insert into messages
(chat_id, ...)
values
(new_id, ...);
# Select the new record.
select *
from messages
where chat_id = new_id;
The MySQL's documentation on uuid() says:
A UUID is designed as a number that is globally unique in space and time. Two calls to UUID() are expected to generate two different values, even if these calls are performed on two separate devices not connected to each other.
Meaning it's perfectly safe to use the value generated by uuid as a primary key value.
This way you can predict what the primary key value of the new record will be before you insert it and then query by it knowing for sure that no other process has "stolen" that id from you in between the insert and the select. Which in turn removes the need for a transaction.

How to deal with duplicates in database?

In a program, should we use try catch to check insertion of duplicate values into tables, or should we check if the value is already present in the table and avoid insertion?
This is easy enough to enforce with a UNIQUE constraint on the database side so that's my recommendation. I try to put as much of the data integrity into the database so that I can avoid having bad data (although sometimes unavoidable).
If this is how you already have it you might as well just catch the mysql exception for duplicate value insertion on such a table as doing the check then the insertion is more costly then having the database do one simple lookup (and possibly an insert).
Depends upon whether you are inserting one, or a million, as well as whether the duplicate is the primary key.
If its the primary key, read: http://database-programmer.blogspot.com/2009/06/approaches-to-upsert.html
An UPSERT or ON DUPLICATE KEY... The idea behind an UPSERT is simple.
The client issues an INSERT command. If a row already exists with the
given primary key, then instead of throwing a key violation error, it
takes the non-key values and updates the row.
This is one of those strange (and very unusual) cases where MySQL
actually supports something you will not find in all of the other more
mature databases. So if you are using MySQL, you do not need to do
anything special to make an UPSERT. You just add the term "ON
DUPLICATE KEY UPDATE" to the INSERT statement:
If it's not the primary key, and you are inserting just one row, then you can still make sure this doesn't cause a failure.
For your actual question, I don't really like the idea of using try/catch for program flow, but really, you have to evaluate readability and user experience (in this case performance), and pick what you think is the best of mix of the two.
You can add a UNIQUE constraint to your table.. Something like
CREATE TABLE IF NOT EXISTS login
(
loginid SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
loginname CHAR(20) NOT NULL,
UNIQUE (loginname)
);
This will ensure no two login names are the same.
you can Create a Unique Composite Key
ALTER TABLE `TableName` ADD UNIQUE KEY (KeyOne, KeyTwo, ...);
you just need to create a unique key in your table so that it will not permit to add the same value again.
You should try inserting the value and catch the exception. In a busy system, if you check for the existience of a value it might get inserted between the time you check and the time you insert it.
Let the database do it's job, let the database check for the duplicate entry.
A database is a computerized representation of a set of business rules and a DBMS is used to enforce these business rules as constraints. Neither can verify a proposition in the database is true in the real world. For example, if the model in question is the employees of an enterprise and the Employees table contains two people named 'Jimmy Barnes' DBMS (nor the database) cannot know whether one is a duplicate, whether either are real people, etc. A trusted source is required to determine existence and identity. In the above example, the enterprise's personnel department is responsible for checking public records, perusing references, ensuring the person is not already on the payroll, etc then allocating an unique employee reference number that can be used as a key. This is why we look for industry-standard identifiers with a trusted source: ISBN for books, VIN for cars, ISO 4217 for currencies, ISO 3166 for countries, etc.
I think it is better to check if the value already exists and avoid the insertion. The check for duplicate values can be done in the procedure that saves the data (using exists if your database is an SQL database).
If a duplicate exists you avoid the insertion and can return a value to your app indicating so and then show a message accordingly.
For example, a piece of SQL code could be something like this:
select #ret_val = 0
If exists (select * from employee where last_name = #param_ln and first_name = #param_fn)
select #ret_val = -1
Else
-- your insert statement here
Select #ret_val
Your condition for duplicate values will depend on what you define as a duplicate record. In your application you would use the return value to know if the data was a duplicate. Good luck!

MySQL PhpMyAdmin: Alter AUTO_INCREMENT and/or INSERT_ID

I have an invoices table which stores a single record for each invoice, with the id column (int AUTO_INCREMENT) being the primary key, but also the invoice reference number.
Now, unfortunately I've had to manual migrate some invoices generated on an old system which have a five digit id, instead of a four digit one which the current system uses.
However, even when I reset the AUTO_INCREMENT through PhpMyAdmin (Table Operations) back to the next four digit id, it still inserts a five digit one being the higher id currently in the table plus one.
From searching around, it would seem that I actually need to change the insert_id as well as the AUTO_INCREMENT ? I've tried to execute ALTER TABLE invoices SET insert_id=8125 as well as ALTER TABLE invoices insert_id=8125 but neither of these commands seem to be valid.
Can anyone explain the correct way that I can reset the AUTO_INCREMENT so that it will insert records with id's 8125 onwards, and then when it gets to 10962 it will skip over the four records I've manually added and continue sequential id's from 10966 onwards. If it won't skip over 10962 - 10966 then this doesn't really matter, as the company doesn't generate that many invoices each year so this will occur in a subsequent year hence not causing a problem hopefully.
I would really appreciate any help with this sticky situation I've found myself in! Many Thanks
First thing I'll suggest is to ditch PHPMyAdmin because it's one of the worst "applications" ever made to be used to work with MySQL. Get a proper GUI. My favourite is SQLYog.
Now on to the problem. Never, ever tamper with the primary key, don't try to "reset" it as you said or to update columns that have an integer generated by the database. As for why, the topic is broad and can be discussed in another question, just never, ever touch the primary key once you've set it up.
Second thing is that someone was deleting records of invoices hence the autoincrement is now at 10k+ rather than at 8k+. It's not a bad thing, but if you need sequential values for your invoices (such as there can't be a gap between invoices 1 and 5) then use an extra field called sequence_id or invoice_ref and use triggers to calculate that number. Don't rely on auto_increment feature that it'll reuse numbers that have been lost trough DELETE operation.
Alternatively, what you can do is export the database you've been using, find the CREATE TABLE definition for the invoices table, and find the line where it says "AUTO_INCREMENT = [some number]" and delete that statement. Import into your new database and the auto_increment will continue from the latest invoice. You could do the same by using ALTER TABLE however it's safer to re-import.