How to atomically move rows from one table to another? - mysql

I am collecting readings from several thousand sensors and storing them in a MySQL database. There are several hundred inserts per second. To improve the insert performance I am storing the values initially into a MEMORY buffer table. Once a minute I run a stored procedure which moves the inserted rows from the memory buffer to a permanent table.
Basically I would like to do the following in my stored procedure to move the rows from the temporary buffer:
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
Unfortunately the previous is not usable because the data collection processes insert additional rows in "data_buffer" between INSERT and DELETE above. Thus those rows will get deleted without getting inserted to the "data" table.
How can I make the operation atomic or make the DELETE statement to delete only the rows which were SELECTed and INSERTed in the preceding statement?
I would prefer doing this in a standard way which works on different database engines if possible.
I would prefer not adding any additional "id" columns because of performance overhead and storage requirements.
I wish there was SELECT_AND_DELETE or MOVE statement in standard SQL or something similar...

I beleive this will work but will block until insert is done
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO data (SELECT * FROM data_buffer FOR UPDATE);
DELETE FROM data_buffer;
COMMIT TRANSACTION;

A possible way to avoid all those problems, and to also stay fast, would be to use two data_buffer tables (let's call them data_buffer1 and data_buffer2); while the collection processes insert into data_buffer2, you can do the insert and delete on data_buffer2; than you switch, so collected data goes into data_buffer2, while data is inserted+deleted from data_buffer1 into data.

How about having a row id, get the max value before insert, make the insert and then delete records <= max(id)

This is a similar solution to #ammoQ's answer. The difference is that instead of having the INSERTing process figure out which table to write to, you can transparently swap the tables in the scheduled procedure.
Use RENAME in the scheduled procedure to swap tables:
CREATE TABLE IF NOT EXISTS data_buffer_new LIKE data_buffer;
RENAME TABLE data_buffer TO data_buffer_old, data_buffer_new TO data_buffer;
INSERT INTO data SELECT * FROM data_buffer_old;
DROP TABLE data_buffer_old;
This works because RENAME statement swaps the tables atomically, thus the INSERTing processes will not fail with "table not found". This is MySQL specific though.

I assume the tables are identical, with the same columns and primary key(s)? If that is the case, you could nestled select inside a where clause...something like this:
DELETE FROM data_buffer
WHERE primarykey IN (SELECT primarykey FROM data)

This is a MySQL specific solution. You can use locking to prevent the INSERTing processes from adding new rows while you are moving rows.
The procedure which moves the rows should be as follows:
LOCK TABLE data_buffer READ;
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
UNLOCK TABLE;
The code which INSERTs new rows in the buffer should be changed as follows:
LOCK TABLE data_buffer WRITE;
INSERT INTO data_buffer VALUES (1, 2, 3);
UNLOCK TABLE;
The INSERT process will obviously block while the lock is in place.

Related

How to insert to two tables with one query if one table is a junction table?

How can I INSERT values to two tables using only one query? I am using MySQL. One of the tables I want to insert to is a many-to-many relationship table. Please see my example below:
I recently added the many-to-many relationship tables. When I insert on the news, I type the following script:
INSERT INTO news (title, reporter_id)
VALUES ('Some Title', 15);
How can I have one query an be able to insert to two tables? Per MySQL insert documentation, seems like I can do query like
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
The problem is, I dont know my news_id until I execute my first insert. Should I just have two insert statements or is there a better way? Thanks for your help!
As mentioned by Uueerdo in the comments, you can use an AFTER INSERT trigger and access the generated ID with NEW.id. However - if you want to keep that logic in your application layer, you can copy the data from the first table to the second after your bulk insert with:
start transaction;
insert into news(title, reporter_id) values
('title2', 2),
('title3', 3);
insert into junctions(news_id, reporter_id)
select id, reporter_id
from news
where id >= last_insert_id()
order by id asc
limit 2;
commit;
This works for InnoDB if the innodb_autoinc_lock_mode is set to 0 or 1, because the generated IDs are guaranteed to be consecutive.
With innodb_autoinc_lock_mode set to 0 (“traditional”) or 1 (“consecutive”), the auto-increment values generated by any given statement are consecutive, without gaps, because the table-level AUTO-INC lock is held until the end of the statement, and only one such statement can execute at a time.
AUTO_INCREMENT Handling in InnoDB
LAST_INSERT_ID() will return the generated ID of the first inserted row.
If you insert multiple rows using a single INSERT statement, LAST_INSERT_ID() returns the value generated for the first inserted row only.
Information Functions - LAST_INSERT_ID()
You know the first generated ID. You know the number of inserted rows. So you know which rows to copy.
Demo: http://rextester.com/UEN69961

Fastest way to replace data in a table from a temporary table in MySQL

I have a need to "update" some table data I receive from external source (every time I receive "all" data, with some fields for some records updated).
There's no unique field or combination of fields, and thus I figured the best way would be to every time to wipe out all data from DB and write all (now updated) data in again. There are up to a 1000 records (there will never be more than that), about 15 short fields each: text, numbers, datetime. And I'm writing it to remote DB (so, it's slow).
Currently I'm doing:
delete from `table` where `date_dt` > ?
and then for each row
INSERT INTO `table` ( `field_0`,`field_1`,... ) VALUES (?,?,...)
It's not only slow, but it's possible that the end user may not see the complete data while I'm still inserting.
I figured I could do:
CREATE TEMPORARY TABLE `temp_table` ( ... ); -- same structure as in main table
INSERT INTO `temp_table` ( `field_0`,`field_1`,... ) VALUES (?,?,...) -- repeat 1000x
START TRANSACTION;
DELETE FROM `table`;
INSERT INTO `table` SELECT * FROM `temp_table`;
DROP `temp_table`;
COMMIT;
Does this makes any sense? What's is a better way of solving this?
The speed of filling up the temp table with data is not crucial, but filling the main table with data is (so users don't see incomplete data, or the period of time they do is minimal).
mysqlimport --delete will truncate the table first, and then load your external data from a CSV file. It runs many times faster than doing INSERT one row at a time.
See https://dev.mysql.com/doc/refman/5.7/en/mysqlimport.html
I did a presentation in April 2017 about performance of bulk data loads for MySQL:
https://www.slideshare.net/billkarwin/load-data-fast
P.S.: Don't use the temp table solution if you have a MySQL replication environment. This is a well-known way of breaking replication. If the slave restarts in between your creation of the temp table and the INSERT...SELECT that reads from the temp table, then the slave will find the temp table is gone, and this will result in an error and stop replication. This might seem unlikely, but it does happen eventually.

Performance of mysql counting rows in a big table

This fairly obvious question has very few (couldnt find any) solid answers.
I do simple select from table of 2 million rows.
select count(id) as total from big_table
Any machine I try this query on, usually takes at least 5 seconds to complete. This is unacceptable for realtime queries.
The reason I need an exact value of rows fetched is for precise statistical calculations later on.
Using the last auto increment value is unfortunately not an options because rows also get deleted periodically.
It can indeed be slow when running on an InnoDB engine. As stated in section 14.24 of the MySQL 5.7 Reference Manual, “InnoDB Restrictions and Limitations”, 3rd bullet point:
InnoDB InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate Function Descriptions”.
The suggested solution is a counter table. This is a separate table with one row and column, having the current record count. It could be kept updated via triggers. Something like this:
create table big_table_count (rec_count int default 0);
-- one-shot initialisation:
insert into big_table_count select count(*) from big_table;
create trigger big_insert after insert on big_table
for each row
update big_table_count set rec_count = rec_count + 1;
create trigger big_delete after delete on big_table
for each row
update big_table_count set rec_count = rec_count - 1;
You can see here a fiddle, where you should alter the insert/delete statements in the build section to see the effect on:
select rec_count from big_table_count;
You could extend this for several tables, either by creating such a table for each, or to reserve a row per table in the above counter table. It would then be keyed by a column "table_name".
Improving concurrency
The above method does have an impact if you have many concurrent sessions inserting or deleting records, because they need to wait for each other to complete the update of the counter.
A solution is to not let the triggers update the same, single record, but to let them insert a new record, like this:
create trigger big_insert after insert on big_table
for each row
insert into big_table_count (rec_count) values (1);
create trigger big_delete after delete on big_table
for each row
insert into big_table_count (rec_count) values (-1);
The way to get the count then becomes:
select sum(rec_count) from big_table_count;
Then, once in a while (e.g. daily) you should re-initialise the counter table to keep it small:
truncate table big_table_count;
insert into big_table_count select count(*) from big_table;

mysql query bulk insert caused sequence gap

I have a query to select and insert rows in a table with auto increment col as key.
insert into table a select * from table B
I found that after executing the query, there is sequence gap in the auto increment column in table B
Can it be possible?
Or my select statement failed to insert some records ??
Please help.
I can not repro the case as this is a production server, and I don't have the backup before exec the query.
Many thanks.
If you are using InnoDB with INSERT IGNORE, you may want to read this article that talks about this issue:
Avoiding auto-increment holes on InnoDB with INSERT IGNORE
Why do we have gaps?
InnoDB checks an auto_increment counter on the
table and if a new value is needed, increments that counter and
assigns the new value to the column. Prior to MySQL 5.1.22 InnoDB used
a method to access that counter values called “Traditional“. This one
uses a special table lock called AUTO-INC that remains until the end
of the query or transaction. Because of this, two queries can’t have
the AUTO-INC lock at the same time, so we lose concurrency and
performance. The problems are even worse with long running queries
like INSERT INTO table1 … SELECT … FROM table2.
When using a simple insert from select, there's a known bug about this
In conclusion, i wouldn't say this is a problem with your query, but more related to the way the engine deals with the insert from select blocks.

Atomically copying one MySQL table over another?

I am trying to copy one table over another one "atomically". Basically I want to update a table periodically, such that a process that reads from the table will not get an incomplete result if another process is updating the table.
To give some background info, I want a table that acts as a leaderboard for a game. This leaderboard will update every few minutes via a separate process. My thinking is as follows:
Table SCORES contains the publicly-viewable leaderboard that will be read from when a user views the leaderboard. This table is updated every few minutes. The process that updates the leaderboard will create a SCORES_TEMP table that contains the new leaderboard. Once that table is created, I want to copy all of its contents over to SCORES "atomically". I think what I want to do is something like:
TRUNCATE TABLE SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
I want to replace everything in SCORES. I don't need to maintain my primary keys or auto increment values. I just want to bring in all the data from SCORES_TEMP. But I know that if someone views the scores before these 2 statements are done, the leaderboard will be blank. How can I do this atomically, such that it will never show blank or incomplete data? Thanks!
Use rename table
RENAME TABLE old_table TO backup_table, new_table TO old_table;
It's atomic, works on all storage engines, and doesn't have to rebuild the indexes.
In MySQL, because of the behavior of TRUNCATE I think you'll need to:
BEGIN TRANSACTION;
DELETE FROM SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
COMMIT TRANSACTION;
I'm not sure there's a way to make what is always effectively a DDL operation transaction safe.
You may use transactions (for InnoDB),
BEGIN TRANSACTION;
DELETE FROM SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
COMMIT;
or LOCK TABLES (for MyISAM):
LOCK TABLES;
DELETE FROM SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
UNLOCK TABLES;
I don't know hot MySQL deals with transaction, but in T-SQL you could write
BEGIN TRAN
DELETE FROM SCORES
INSERT INTO SCORES SELECT * FROM SCORES_TEMP
COMMIT TRAN
This way your operation would be "atomic", but not instantaneous.