I am investigating on comparing the effectiveness of two ways of INSERT data in SQL to know which executes faster
The first one is
INSERT INTO mytb(id, name)
select 0, 'uyen' union all
select 1, 'uyen' union all
....
select 1000, 'uyen'
and the second one is:
INSERT INTO mytb(id, name) values (0,'uyen');
INSERT INTO mytb(id, name) values (1,'uyen');
....
INSERT INTO mytb(id, name) values (1000,'uyen');
I have tried executing those with 1000 rows. Sometimes this runs faster and sometimes the other runs faster (I think this is because the resources of the system is different in every execution or any other issues). I did search around the internet without success. Could you please help me give the answer and explain the reasons. I want to know this to do some optimization for my queries.
Thanks
Related
I used to run this command to insert some rows in a counter table:
insert into `monthly_aggregated_table`
select year(r.created_at), month(r.created_at), count(r.id) from
raw_items r
group by 1,2;
This query is very heavy and takes some time to run (millions of rows), and the raw_items table is MyISAM, so it was causing table locking and writes to it had to wait for the insert to finish.
Now I created a slave server to do the SELECT.
What I would like to do is to execute the SELECT in the slave, but get the results and insert into the master database. Is it possible? How? What is the most efficient way to do this? (The insert used to have 1.3 million rows)
I am running MariaDB 10.0.17
You will have to split the action in 2 parts with a programming language like java or php in between.
First the select, then load the resultset into your application, and then insert the data.
Another optimization which you could do to speed the select up is add one new column in your table "ym_created_at" containing a concatenation of year(created_at) and month(created_at). Place an index on that column and then run the updated statement:
insert into `monthly_aggregated_table`
select ym_created_at, count(r.id) from
raw_items r
group by 1;
Easier and might be a lot quicker since not functions are acting on the columns you are using the group by on.
I have executed an insert query as follows -
Insert into tablename
select
query1 union query2
Now if I execute the select part of this insert query,it takes around 2-3 minutes.However,the entire insert script is taking more than 8 minutes.As per my knowledge the insert and corresponding select queries should take almost the same time for execution.
So is their any other factor that could impact the execution time of the insert?
It's not correct that insert and corresponding select takes the same time, it should not!
The select query just "reads" data and transmit them; if you are trying the query in an application (like phpMyadmin) is very likely to limit query result to paginate them, so the select is faster (as it doesn't select all the data).
The insert query must read that data, insert in the table, update primary key tree, update every other index on that table, update every view using that table, triggering any trigger on that table/column, ecc... so the insert operates a LOT way more actions than an insert.
So IT'S normal that the insert is slower than the select, how much slower depends on your tables and db structure.
You could optimize the insert with some db specific options, for example you could read here for mysql, if you are on DB2 you could crete a temp file then cpyf that into the real one, and so on...
I have to write into MySQL database a lot of data for about 5 times per second.
What is the fastest way: insert each 1/5 of second or make a queue and insert all stored data each ~5 seconds? If the second way is better - is it possible to insert into 1 table using 1 request a few rows?
Considering the frequency of the insertions
Its better to go with the second approach that is queuing and than adding at one go.!
But You should consider these scenarios first :
Is your system Real Time.? Yes then what is the maximum delay that you can afford (As it'll take ~5 seconds for next insertion and data to be persisted/available)?
What are the chances of Incorrect values/Errors to come in data, as if one data is incorrect you'll loose rest all if the query has to fail.
Using multiple buffer pools with innodb_buffer_pool_instances. it can depend on number of cores onmachine.
Use Partitioning of table.
You can collectively insert data using XML.
As each transaction comes with a fixed cost, I'd say that doing a multi-line insert every few seconds is better. With some of the systems we use at work we cache hundreds of lines before inserting them all in one go.
From the MySQL documentation you can do a multi-line insert like so:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
My experience is that when inserting data into a MySQL database it is faster to work with batches.
So the slower option is executing multiple insert queries:
INSERT INTO my_table VALUES (1, "a");
INSERT INTO my_table VALUES (2, "b");
The faster option would be:
INSERT INTO my_table VALUES (1, "a"), (2, "b");
You can make an insert with all the data witho something like this:
INSERT INTO table (field1, field2,... , fieldN )
VALUES
(value1_1', value1_2,... , value1_N),
(value2_1', value2_2,... , value2_N),
...
(valueM_1', valueM_2,... , valueM_N);
I run the following SQL Query on a MySQL platform.
Table A is a table which has a single column (primary key) and 25K rows.
Table B has several columns and 75K rows.
It takes 20 minutes to execute following query. I will be glad if you could help.
INSERT INTO sometable
SELECT A.PrimaryKeyColumn as keyword, 'SomeText', B.*
FROM A, B
WHERE B.PrimaryKeyColumn = CONCAT(A.PrimaryKeyColumn, B.NotUniqueButIndexedColumn);
Run the SELECT without the INSERT to see if the problem is with the SELECT or not.
If it is with the SELECT, follow the MySQL documentation explaining how to optimize queries using EXPLAIN.
If the SELECT runs fine but the INSERT takes forever, make sure you don't have a lot of unnecessary indexes on sometable. Beyond that, you may need to do some MySQL tuning and/or OS tuning (e.g., memory or disk performance) to get a measurable performance boost with the INSERT.
If I get it right you are roughly trying to insert 1.875 Billion records - (which does not match the where clause).
For that 20 minutes doesn't sound too bad....
I have a mysql db with several tables, let's call them Table1, Table2, etc. I have to make several calls to each of these tables
Which is most efficient,
a) Collecting all queries for each table in one message, then executing them separately, e.g.:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE1 VALUES (A,B);
...execute
INSERT INTO TABLE2 VALUES (A,B);
INSERT INTO TABLE2 VALUES (A,B);
...execute
b) Collecting ALL queries in one long message(not in order of table), then executing this query, e.g:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE2 VALUES (B,C);
INSERT INTO TABLE1 VALUES (B,A);
INSERT INTO TABLE3 VALUES (D,B);
c) Something else?
Currently I am doing it like option (b), but I am wondering if there is a better way.
(I am using jdbc to access the db, in a groovy script).
Thanks!
Third option - using prepared statements.
Without posting your code, you've made this a bit of a wild guess, but this blog post shows great performance improvements using the groovy Sql.withBatch method.
The code they show (which uses sqlite) is reproduced here for posterity:
Sql sql = Sql.newInstance("jdbc:sqlite:/home/ron/Desktop/test.db", "org.sqlite.JDBC")
sql.execute("create table dummyTable(number)")
sql.withBatch {stmt->
100.times {
stmt.addBatch("insert into dummyTable(number) values(${it})")
}
stmt.executeBatch()
}
which inserts the numbers 1 to 1000 into a table dummyTable
This will obviously need tweaking to work with your unknown code
Rather than looking at which is more efficient, first consider whether the tables are large and whether you need concurrency.
If they are (millions of records) then you may want to separate them on a statement to statement basis and give some time between each statement, so you will not lock the table for too long at a time.
If your table isn't that large or concurrency is not a problem, then by all means do whichever. You should look at the slow logs of the statements and see which statement is faster.