Insert query vs select query performance mysql - mysql

I have executed an insert query as follows -
Insert into tablename
select
query1 union query2
Now if I execute the select part of this insert query,it takes around 2-3 minutes.However,the entire insert script is taking more than 8 minutes.As per my knowledge the insert and corresponding select queries should take almost the same time for execution.
So is their any other factor that could impact the execution time of the insert?

It's not correct that insert and corresponding select takes the same time, it should not!
The select query just "reads" data and transmit them; if you are trying the query in an application (like phpMyadmin) is very likely to limit query result to paginate them, so the select is faster (as it doesn't select all the data).
The insert query must read that data, insert in the table, update primary key tree, update every other index on that table, update every view using that table, triggering any trigger on that table/column, ecc... so the insert operates a LOT way more actions than an insert.
So IT'S normal that the insert is slower than the select, how much slower depends on your tables and db structure.
You could optimize the insert with some db specific options, for example you could read here for mysql, if you are on DB2 you could crete a temp file then cpyf that into the real one, and so on...

Related

Mysql Locking 3 consecutive interdependent queries

I don't have a real code sorry. But only a problem explanation.
I would like to understand how is the best way to solve this problem.
I have 3 queries:
The first one is a long Transaction which performs an SQL INSERT statement in a table.
The second query COUNTs the number of rows of the previous table after the INSERT took place
The third query UPDATEs one field of the previously inserted record with the count number retrieved by the second query.
So far so good. My 3 queries are executed correctly.
Now suppose that these 3 queries are executed inside an API call. What happens now is that if multiple API calls are executed too fast and simultaneously, the second COUNT query retrieves a wrong value and consequently the 3th UPDATE has also a wrong value.
Nevertheless I have dead locks on the INSERT query because while making the INSERT, the SELECT COUNT tried to read at the same time on a second api call.
My question is what would be the best approach to solve this kind of problem.
I don't need code. I just would like to understand the best way to go.
Would I need to lock all the tables, for example?
It is unclear what you are doing, but this might be faster:
CREATE TEMPORARY TABLE t ...; -- all columns except count
INSERT IN t ...; -- the incoming data
SELECT COUNT(*) INTO #ct FROM t;
INSERT INTO real_table
(...) -- including the count-column last
SELECT ..., #ct FROM t; -- Note how count is tacked on last

Question to Answer: In INDEX Query, why the SELECT query is much faster than other UPDATE or INSERT query in SQL?

In INDEX Query, why the SELECT query is much faster than other UPDATE or INSERT query in SQL?
In simplest, select is pulling data already written. Update and insert have to write the data to the pages and also update indexes, so it needs to traverse all possible indexes of impacted tables.
Additionally, (credit to obe), select queries can take advantage of the cache if the data is associated with a prior query. It does not need to go back to the original data pages / indexes to re-pull the data.

automatically updated and cached views

Assume I want to find ids that appear in both mode=1 and mode=2:
SELECT id FROM tab a WHERE mode=1 and (SELECT COUNT(*) from tab b where b.mode=2 and a.id=b.id) = 0
and I need this query to run very quickly, even though the table contains millions of rows (already have an index on id1 and on id2). Is there a way to create something like a view that contains this query, that is updated automatically every time the table changes, to have the results prepared for me in advance?
You can create a table called summary_tab. Use a programming language or command line to execute a query like this:
insert into summary_tab
select id from ...
Then, use a task scheduler like cron to execute the script or command line every few minutes.
The other option is to create an AFTER INSERT trigger on your table that will execute a query like this and update summary table. However, if the query takes a long time and/or if you are inserting a bunch of records in tab table, the trigger will slow inserts down.
You could also try something like this:
select id
from tab
where mode in (1, 2)
group by id
having count(*) = 2
Check the speed and results of this query. If it is not that fast, try creating an index on id and another index on mode and yet another with combination of id+mode and see if one of the indexes makes the query fast enough that you don't have to use a summary table.

MySQL - query from slave and insert on master

I used to run this command to insert some rows in a counter table:
insert into `monthly_aggregated_table`
select year(r.created_at), month(r.created_at), count(r.id) from
raw_items r
group by 1,2;
This query is very heavy and takes some time to run (millions of rows), and the raw_items table is MyISAM, so it was causing table locking and writes to it had to wait for the insert to finish.
Now I created a slave server to do the SELECT.
What I would like to do is to execute the SELECT in the slave, but get the results and insert into the master database. Is it possible? How? What is the most efficient way to do this? (The insert used to have 1.3 million rows)
I am running MariaDB 10.0.17
You will have to split the action in 2 parts with a programming language like java or php in between.
First the select, then load the resultset into your application, and then insert the data.
Another optimization which you could do to speed the select up is add one new column in your table "ym_created_at" containing a concatenation of year(created_at) and month(created_at). Place an index on that column and then run the updated statement:
insert into `monthly_aggregated_table`
select ym_created_at, count(r.id) from
raw_items r
group by 1;
Easier and might be a lot quicker since not functions are acting on the columns you are using the group by on.

Which option is most efficient to insert data to mysql db?

I have a mysql db with several tables, let's call them Table1, Table2, etc. I have to make several calls to each of these tables
Which is most efficient,
a) Collecting all queries for each table in one message, then executing them separately, e.g.:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE1 VALUES (A,B);
...execute
INSERT INTO TABLE2 VALUES (A,B);
INSERT INTO TABLE2 VALUES (A,B);
...execute
b) Collecting ALL queries in one long message(not in order of table), then executing this query, e.g:
INSERT INTO TABLE1 VALUES (A,B);
INSERT INTO TABLE2 VALUES (B,C);
INSERT INTO TABLE1 VALUES (B,A);
INSERT INTO TABLE3 VALUES (D,B);
c) Something else?
Currently I am doing it like option (b), but I am wondering if there is a better way.
(I am using jdbc to access the db, in a groovy script).
Thanks!
Third option - using prepared statements.
Without posting your code, you've made this a bit of a wild guess, but this blog post shows great performance improvements using the groovy Sql.withBatch method.
The code they show (which uses sqlite) is reproduced here for posterity:
Sql sql = Sql.newInstance("jdbc:sqlite:/home/ron/Desktop/test.db", "org.sqlite.JDBC")
sql.execute("create table dummyTable(number)")
sql.withBatch {stmt->
100.times {
stmt.addBatch("insert into dummyTable(number) values(${it})")
}
stmt.executeBatch()
}
which inserts the numbers 1 to 1000 into a table dummyTable
This will obviously need tweaking to work with your unknown code
Rather than looking at which is more efficient, first consider whether the tables are large and whether you need concurrency.
If they are (millions of records) then you may want to separate them on a statement to statement basis and give some time between each statement, so you will not lock the table for too long at a time.
If your table isn't that large or concurrency is not a problem, then by all means do whichever. You should look at the slow logs of the statements and see which statement is faster.