mysql query bulk insert caused sequence gap - mysql

I have a query to select and insert rows in a table with auto increment col as key.
insert into table a select * from table B
I found that after executing the query, there is sequence gap in the auto increment column in table B
Can it be possible?
Or my select statement failed to insert some records ??
Please help.
I can not repro the case as this is a production server, and I don't have the backup before exec the query.
Many thanks.

If you are using InnoDB with INSERT IGNORE, you may want to read this article that talks about this issue:
Avoiding auto-increment holes on InnoDB with INSERT IGNORE
Why do we have gaps?
InnoDB checks an auto_increment counter on the
table and if a new value is needed, increments that counter and
assigns the new value to the column. Prior to MySQL 5.1.22 InnoDB used
a method to access that counter values called “Traditional“. This one
uses a special table lock called AUTO-INC that remains until the end
of the query or transaction. Because of this, two queries can’t have
the AUTO-INC lock at the same time, so we lose concurrency and
performance. The problems are even worse with long running queries
like INSERT INTO table1 … SELECT … FROM table2.
When using a simple insert from select, there's a known bug about this
In conclusion, i wouldn't say this is a problem with your query, but more related to the way the engine deals with the insert from select blocks.

Related

What is the best way to saving many rows to DB at the same time?

I have some words like ["happy","bad","terrible","awesome","happy","happy","horrible",.....,"love"].
These words are large in number, exceeding 100 ~ 200 maybe.
I want to saving that to DB at the same time.
I think calling to DB connection at every word is so wasteful.
What is the best way to save?
table structure
wordId userId word
You are right that executing repeated INSERT statements to insert rows one at a time i.e processing RBAR (row by agonizing row) can be expensive, and excruciatingly slow, in MySQL.
Assuming that you are inserting the string values ("words") into a column in a table, and each word will be inserted as a new row in the table... (and that's a whole lot of assumptions there...)
For example, a table like this:
CREATE TABLE mytable (mycol VARCHAR(50) NOT NULL PRIMARY KEY) ENGINE=InnoDB
You are right that running a separate INSERT statement for each row is expensive. MySQL provides an extension to the INSERT statement syntax which allows multiple rows to be inserted.
For example, this sequence:
INSERT IGNORE INTO mytable (mycol) VALUES ('happy');
INSERT IGNORE INTO mytable (mycol) VALUES ('bad');
INSERT IGNORE INTO mytable (mycol) VALUES ('terrible');
Can be emulated with single INSERT statement
INSERT IGNORE INTO mytable (mycol) VALUES ('happy'),('bad'),('terrible');
Each "row" to be inserted is enclosed in parens, just as it is in the regular INSERT statement. The trick is the comma separator between the rows.
The trouble with this comes in when there are constraint violations; either the whole statement succeeds or fails. Unlike the individual inserts, where one of them can fail and the other two succeed.
Also, be careful that the size (in bytes) of the statement does not exceed the max_allowed_packet variable setting.
Alternatively, a LOAD DATA statement is an even faster way to load rows into a table. But for a couple of hundred rows, it's not really going to be much faster. (If you were loading thousands and thousands of rows, the LOAD DATA statement could potentially be much faster.
It would be helpful to know you are generating that list of words but you could do
insert into table (column) values (word), (word2);
Without more info that is about as much as we can help
You could add a loop in whatever language is needed to iterate over the list to add them.

Performance of mysql counting rows in a big table

This fairly obvious question has very few (couldnt find any) solid answers.
I do simple select from table of 2 million rows.
select count(id) as total from big_table
Any machine I try this query on, usually takes at least 5 seconds to complete. This is unacceptable for realtime queries.
The reason I need an exact value of rows fetched is for precise statistical calculations later on.
Using the last auto increment value is unfortunately not an options because rows also get deleted periodically.
It can indeed be slow when running on an InnoDB engine. As stated in section 14.24 of the MySQL 5.7 Reference Manual, “InnoDB Restrictions and Limitations”, 3rd bullet point:
InnoDB InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate Function Descriptions”.
The suggested solution is a counter table. This is a separate table with one row and column, having the current record count. It could be kept updated via triggers. Something like this:
create table big_table_count (rec_count int default 0);
-- one-shot initialisation:
insert into big_table_count select count(*) from big_table;
create trigger big_insert after insert on big_table
for each row
update big_table_count set rec_count = rec_count + 1;
create trigger big_delete after delete on big_table
for each row
update big_table_count set rec_count = rec_count - 1;
You can see here a fiddle, where you should alter the insert/delete statements in the build section to see the effect on:
select rec_count from big_table_count;
You could extend this for several tables, either by creating such a table for each, or to reserve a row per table in the above counter table. It would then be keyed by a column "table_name".
Improving concurrency
The above method does have an impact if you have many concurrent sessions inserting or deleting records, because they need to wait for each other to complete the update of the counter.
A solution is to not let the triggers update the same, single record, but to let them insert a new record, like this:
create trigger big_insert after insert on big_table
for each row
insert into big_table_count (rec_count) values (1);
create trigger big_delete after delete on big_table
for each row
insert into big_table_count (rec_count) values (-1);
The way to get the count then becomes:
select sum(rec_count) from big_table_count;
Then, once in a while (e.g. daily) you should re-initialise the counter table to keep it small:
truncate table big_table_count;
insert into big_table_count select count(*) from big_table;

auto increment and uniqueness issue

I'm having problem implementing auto-increment logic in my app. Says I inserts a 'group', and in mysql it has value 10 for its Id, next one would be 11, 12 and so forth.
But once the record (assume it's Id 12) got deleted, the next new item is 12 again. So it may have conflict.
Could possibly makes the auto increment don't repeat the same Int? I want every Id be unique, once it's delete means it never come back.
InnoDB really has this "Feature" or Bug, that the recent auto_increment is NOT stored in the table space. As soon as you restart the MySQL Server, the "auto_increment" value is taken from the highest recent value of the table, thus conflicting with possibly deleted values.
The solution to this is really ugly. You could create a table with the highest unused values per table, in the form
tablename maxvalue
tableA 375
tableB 12
and you could write a Post-Startup-Script, if you manage the MySQL-Server. So after every delete of a row of such a table you would check per AFTER DELETE, if that row was the max-value. That is a bit easier with newer versions of MySQL, since table informations are stored in INFORMATION_SCHEMA, and not only calculated with every select (which means reading INFORMATION_SCHEMA does not fire heavy and blocking queries so often).
You only have to update maxvalue if the deleted row was that max value.
It is a bit easier to update the maxalue on every insert on a row, if that does not slow down the system.
In some cases you have just one table with critical references, and that table has an index, so you can retrieve maxvalue from that table.
All in all that is a big problem with InnoDB, and writing a lot of Triggers just for this single unsaved number auto_increment is really not nice.
I think you not set id is primary key and auto increment

How to atomically move rows from one table to another?

I am collecting readings from several thousand sensors and storing them in a MySQL database. There are several hundred inserts per second. To improve the insert performance I am storing the values initially into a MEMORY buffer table. Once a minute I run a stored procedure which moves the inserted rows from the memory buffer to a permanent table.
Basically I would like to do the following in my stored procedure to move the rows from the temporary buffer:
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
Unfortunately the previous is not usable because the data collection processes insert additional rows in "data_buffer" between INSERT and DELETE above. Thus those rows will get deleted without getting inserted to the "data" table.
How can I make the operation atomic or make the DELETE statement to delete only the rows which were SELECTed and INSERTed in the preceding statement?
I would prefer doing this in a standard way which works on different database engines if possible.
I would prefer not adding any additional "id" columns because of performance overhead and storage requirements.
I wish there was SELECT_AND_DELETE or MOVE statement in standard SQL or something similar...
I beleive this will work but will block until insert is done
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO data (SELECT * FROM data_buffer FOR UPDATE);
DELETE FROM data_buffer;
COMMIT TRANSACTION;
A possible way to avoid all those problems, and to also stay fast, would be to use two data_buffer tables (let's call them data_buffer1 and data_buffer2); while the collection processes insert into data_buffer2, you can do the insert and delete on data_buffer2; than you switch, so collected data goes into data_buffer2, while data is inserted+deleted from data_buffer1 into data.
How about having a row id, get the max value before insert, make the insert and then delete records <= max(id)
This is a similar solution to #ammoQ's answer. The difference is that instead of having the INSERTing process figure out which table to write to, you can transparently swap the tables in the scheduled procedure.
Use RENAME in the scheduled procedure to swap tables:
CREATE TABLE IF NOT EXISTS data_buffer_new LIKE data_buffer;
RENAME TABLE data_buffer TO data_buffer_old, data_buffer_new TO data_buffer;
INSERT INTO data SELECT * FROM data_buffer_old;
DROP TABLE data_buffer_old;
This works because RENAME statement swaps the tables atomically, thus the INSERTing processes will not fail with "table not found". This is MySQL specific though.
I assume the tables are identical, with the same columns and primary key(s)? If that is the case, you could nestled select inside a where clause...something like this:
DELETE FROM data_buffer
WHERE primarykey IN (SELECT primarykey FROM data)
This is a MySQL specific solution. You can use locking to prevent the INSERTing processes from adding new rows while you are moving rows.
The procedure which moves the rows should be as follows:
LOCK TABLE data_buffer READ;
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
UNLOCK TABLE;
The code which INSERTs new rows in the buffer should be changed as follows:
LOCK TABLE data_buffer WRITE;
INSERT INTO data_buffer VALUES (1, 2, 3);
UNLOCK TABLE;
The INSERT process will obviously block while the lock is in place.

MySQL INSERT and SELECT Order of precedence

if an INSERT and a SELECT are done simultaneously on a mysql table which one will go first?
Example: Suppose "users" table row count is 0.
Then this two queries are run at the same time (assume it's at the same mili/micro second):
INSERT into users (id) values (1)
and
SELECT COUNT(*) from users
Will the last query return 0 or 1?
Depends whether your users table is MyISAM or InnoDB.
If it's MyISAM, one statement or the other takes a lock on the table, and there's little you can do to control that, short of locking tables yourself.
If it's InnoDB, it's transaction-based. The multi-versioning architecture allows concurrent access to the table, and the SELECT will see the count of rows as of the instant its transaction started. If there's an INSERT going on simultaneously, the SELECT will see 0 rows. In fact you could even see 0 rows by a SELECT executed some seconds later, if the transaction for the INSERT has not committed yet.
There's no way for the two transactions to start truly simultaneously. Transactions are guaranteed to have some order.
It depends on which statement will be executed first. If first then the second will return 1, if the second one executes first, then it will return 0. Even you are executing them on the computer with multiple physical cores and due to the lock mechanism, they will never ever execute at the exactly same time stamp.