Insertion into database - mysql

I have to write into MySQL database a lot of data for about 5 times per second.
What is the fastest way: insert each 1/5 of second or make a queue and insert all stored data each ~5 seconds? If the second way is better - is it possible to insert into 1 table using 1 request a few rows?

Considering the frequency of the insertions
Its better to go with the second approach that is queuing and than adding at one go.!
But You should consider these scenarios first :
Is your system Real Time.? Yes then what is the maximum delay that you can afford (As it'll take ~5 seconds for next insertion and data to be persisted/available)?
What are the chances of Incorrect values/Errors to come in data, as if one data is incorrect you'll loose rest all if the query has to fail.

Using multiple buffer pools with innodb_buffer_pool_instances. it can depend on number of cores onmachine.
Use Partitioning of table.
You can collectively insert data using XML.

As each transaction comes with a fixed cost, I'd say that doing a multi-line insert every few seconds is better. With some of the systems we use at work we cache hundreds of lines before inserting them all in one go.
From the MySQL documentation you can do a multi-line insert like so:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

My experience is that when inserting data into a MySQL database it is faster to work with batches.
So the slower option is executing multiple insert queries:
INSERT INTO my_table VALUES (1, "a");
INSERT INTO my_table VALUES (2, "b");
The faster option would be:
INSERT INTO my_table VALUES (1, "a"), (2, "b");

You can make an insert with all the data witho something like this:
INSERT INTO table (field1, field2,... , fieldN )
VALUES
(value1_1', value1_2,... , value1_N),
(value2_1', value2_2,... , value2_N),
...
(valueM_1', valueM_2,... , valueM_N);

Related

What is the best way to saving many rows to DB at the same time?

I have some words like ["happy","bad","terrible","awesome","happy","happy","horrible",.....,"love"].
These words are large in number, exceeding 100 ~ 200 maybe.
I want to saving that to DB at the same time.
I think calling to DB connection at every word is so wasteful.
What is the best way to save?
table structure
wordId userId word
You are right that executing repeated INSERT statements to insert rows one at a time i.e processing RBAR (row by agonizing row) can be expensive, and excruciatingly slow, in MySQL.
Assuming that you are inserting the string values ("words") into a column in a table, and each word will be inserted as a new row in the table... (and that's a whole lot of assumptions there...)
For example, a table like this:
CREATE TABLE mytable (mycol VARCHAR(50) NOT NULL PRIMARY KEY) ENGINE=InnoDB
You are right that running a separate INSERT statement for each row is expensive. MySQL provides an extension to the INSERT statement syntax which allows multiple rows to be inserted.
For example, this sequence:
INSERT IGNORE INTO mytable (mycol) VALUES ('happy');
INSERT IGNORE INTO mytable (mycol) VALUES ('bad');
INSERT IGNORE INTO mytable (mycol) VALUES ('terrible');
Can be emulated with single INSERT statement
INSERT IGNORE INTO mytable (mycol) VALUES ('happy'),('bad'),('terrible');
Each "row" to be inserted is enclosed in parens, just as it is in the regular INSERT statement. The trick is the comma separator between the rows.
The trouble with this comes in when there are constraint violations; either the whole statement succeeds or fails. Unlike the individual inserts, where one of them can fail and the other two succeed.
Also, be careful that the size (in bytes) of the statement does not exceed the max_allowed_packet variable setting.
Alternatively, a LOAD DATA statement is an even faster way to load rows into a table. But for a couple of hundred rows, it's not really going to be much faster. (If you were loading thousands and thousands of rows, the LOAD DATA statement could potentially be much faster.
It would be helpful to know you are generating that list of words but you could do
insert into table (column) values (word), (word2);
Without more info that is about as much as we can help
You could add a loop in whatever language is needed to iterate over the list to add them.

Optimization of INSERT Query in SQL

I am investigating on comparing the effectiveness of two ways of INSERT data in SQL to know which executes faster
The first one is
INSERT INTO mytb(id, name)
select 0, 'uyen' union all
select 1, 'uyen' union all
....
select 1000, 'uyen'
and the second one is:
INSERT INTO mytb(id, name) values (0,'uyen');
INSERT INTO mytb(id, name) values (1,'uyen');
....
INSERT INTO mytb(id, name) values (1000,'uyen');
I have tried executing those with 1000 rows. Sometimes this runs faster and sometimes the other runs faster (I think this is because the resources of the system is different in every execution or any other issues). I did search around the internet without success. Could you please help me give the answer and explain the reasons. I want to know this to do some optimization for my queries.
Thanks

Insert query vs select query performance mysql

I have executed an insert query as follows -
Insert into tablename
select
query1 union query2
Now if I execute the select part of this insert query,it takes around 2-3 minutes.However,the entire insert script is taking more than 8 minutes.As per my knowledge the insert and corresponding select queries should take almost the same time for execution.
So is their any other factor that could impact the execution time of the insert?
It's not correct that insert and corresponding select takes the same time, it should not!
The select query just "reads" data and transmit them; if you are trying the query in an application (like phpMyadmin) is very likely to limit query result to paginate them, so the select is faster (as it doesn't select all the data).
The insert query must read that data, insert in the table, update primary key tree, update every other index on that table, update every view using that table, triggering any trigger on that table/column, ecc... so the insert operates a LOT way more actions than an insert.
So IT'S normal that the insert is slower than the select, how much slower depends on your tables and db structure.
You could optimize the insert with some db specific options, for example you could read here for mysql, if you are on DB2 you could crete a temp file then cpyf that into the real one, and so on...

How to atomically move rows from one table to another?

I am collecting readings from several thousand sensors and storing them in a MySQL database. There are several hundred inserts per second. To improve the insert performance I am storing the values initially into a MEMORY buffer table. Once a minute I run a stored procedure which moves the inserted rows from the memory buffer to a permanent table.
Basically I would like to do the following in my stored procedure to move the rows from the temporary buffer:
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
Unfortunately the previous is not usable because the data collection processes insert additional rows in "data_buffer" between INSERT and DELETE above. Thus those rows will get deleted without getting inserted to the "data" table.
How can I make the operation atomic or make the DELETE statement to delete only the rows which were SELECTed and INSERTed in the preceding statement?
I would prefer doing this in a standard way which works on different database engines if possible.
I would prefer not adding any additional "id" columns because of performance overhead and storage requirements.
I wish there was SELECT_AND_DELETE or MOVE statement in standard SQL or something similar...
I beleive this will work but will block until insert is done
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO data (SELECT * FROM data_buffer FOR UPDATE);
DELETE FROM data_buffer;
COMMIT TRANSACTION;
A possible way to avoid all those problems, and to also stay fast, would be to use two data_buffer tables (let's call them data_buffer1 and data_buffer2); while the collection processes insert into data_buffer2, you can do the insert and delete on data_buffer2; than you switch, so collected data goes into data_buffer2, while data is inserted+deleted from data_buffer1 into data.
How about having a row id, get the max value before insert, make the insert and then delete records <= max(id)
This is a similar solution to #ammoQ's answer. The difference is that instead of having the INSERTing process figure out which table to write to, you can transparently swap the tables in the scheduled procedure.
Use RENAME in the scheduled procedure to swap tables:
CREATE TABLE IF NOT EXISTS data_buffer_new LIKE data_buffer;
RENAME TABLE data_buffer TO data_buffer_old, data_buffer_new TO data_buffer;
INSERT INTO data SELECT * FROM data_buffer_old;
DROP TABLE data_buffer_old;
This works because RENAME statement swaps the tables atomically, thus the INSERTing processes will not fail with "table not found". This is MySQL specific though.
I assume the tables are identical, with the same columns and primary key(s)? If that is the case, you could nestled select inside a where clause...something like this:
DELETE FROM data_buffer
WHERE primarykey IN (SELECT primarykey FROM data)
This is a MySQL specific solution. You can use locking to prevent the INSERTing processes from adding new rows while you are moving rows.
The procedure which moves the rows should be as follows:
LOCK TABLE data_buffer READ;
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
UNLOCK TABLE;
The code which INSERTs new rows in the buffer should be changed as follows:
LOCK TABLE data_buffer WRITE;
INSERT INTO data_buffer VALUES (1, 2, 3);
UNLOCK TABLE;
The INSERT process will obviously block while the lock is in place.

Which option is most efficient to insert data to mysql db?

I have a mysql db with several tables, let's call them Table1, Table2, etc. I have to make several calls to each of these tables
Which is most efficient,
a) Collecting all queries for each table in one message, then executing them separately, e.g.:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE1 VALUES (A,B);
...execute
INSERT INTO TABLE2 VALUES (A,B);
INSERT INTO TABLE2 VALUES (A,B);
...execute
b) Collecting ALL queries in one long message(not in order of table), then executing this query, e.g:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE2 VALUES (B,C);
INSERT INTO TABLE1 VALUES (B,A);
INSERT INTO TABLE3 VALUES (D,B);
c) Something else?
Currently I am doing it like option (b), but I am wondering if there is a better way.
(I am using jdbc to access the db, in a groovy script).
Thanks!
Third option - using prepared statements.
Without posting your code, you've made this a bit of a wild guess, but this blog post shows great performance improvements using the groovy Sql.withBatch method.
The code they show (which uses sqlite) is reproduced here for posterity:
Sql sql = Sql.newInstance("jdbc:sqlite:/home/ron/Desktop/test.db", "org.sqlite.JDBC")
sql.execute("create table dummyTable(number)")
sql.withBatch {stmt->
100.times {
stmt.addBatch("insert into dummyTable(number) values(${it})")
}
stmt.executeBatch()
}
which inserts the numbers 1 to 1000 into a table dummyTable
This will obviously need tweaking to work with your unknown code
Rather than looking at which is more efficient, first consider whether the tables are large and whether you need concurrency.
If they are (millions of records) then you may want to separate them on a statement to statement basis and give some time between each statement, so you will not lock the table for too long at a time.
If your table isn't that large or concurrency is not a problem, then by all means do whichever. You should look at the slow logs of the statements and see which statement is faster.