I used to run this command to insert some rows in a counter table:
insert into `monthly_aggregated_table`
select year(r.created_at), month(r.created_at), count(r.id) from
raw_items r
group by 1,2;
This query is very heavy and takes some time to run (millions of rows), and the raw_items table is MyISAM, so it was causing table locking and writes to it had to wait for the insert to finish.
Now I created a slave server to do the SELECT.
What I would like to do is to execute the SELECT in the slave, but get the results and insert into the master database. Is it possible? How? What is the most efficient way to do this? (The insert used to have 1.3 million rows)
I am running MariaDB 10.0.17
You will have to split the action in 2 parts with a programming language like java or php in between.
First the select, then load the resultset into your application, and then insert the data.
Another optimization which you could do to speed the select up is add one new column in your table "ym_created_at" containing a concatenation of year(created_at) and month(created_at). Place an index on that column and then run the updated statement:
insert into `monthly_aggregated_table`
select ym_created_at, count(r.id) from
raw_items r
group by 1;
Easier and might be a lot quicker since not functions are acting on the columns you are using the group by on.
Related
I know that you can use
SELECT *
FROM table
WHERE id IN (ids)
In my case, I have 100,000 of ids.
I'm wondering if MySQL has a limit for IN clause. If anyone knows a more efficient way to do this, that would be great!
Thanks!
Just this week I had to kill -9 a MySQL 5.7 Server where one of the developers had run a query like you describe, with a list of hundreds of thousands of id's in an IN( ) predicate. It caused the thread running the query to hang, and it wouldn't even respond to a KILL command. I had to shut down the MySQL Server instance forcibly.
(Fortunately it was just a test server.)
So I recommend don't do that. I would recommend one of the following options:
Split your list of 100,000 ids into batches of at most 1,000, and run the query on each batch. Then use application code to merge the results.
Create a temporary table with an integer primary key.
CREATE TEMPORARY TABLE mylistofids (id INT PRIMARY KEY);
INSERT the 100,000 ids into it. Then run a JOIN query for example:
SELECT t.* FROM mytable AS t JOIN mylistofids USING (id)
Bill Karwin suggestions are good.
The number of values from IN clause is only limited by max_allowed_packet from my.ini
MariaDB creates a temporary table when the IN clause exceeds 1000 values.
Another problem with such number of IDs is the transfer of data from the PHP script (for example) to the MySQL server. It will be a very long query. You can also create a stored procedure with that select and just call it from you script. It will be more efficient in terms of passing data from your script to MySQL.
Assume I want to find ids that appear in both mode=1 and mode=2:
SELECT id FROM tab a WHERE mode=1 and (SELECT COUNT(*) from tab b where b.mode=2 and a.id=b.id) = 0
and I need this query to run very quickly, even though the table contains millions of rows (already have an index on id1 and on id2). Is there a way to create something like a view that contains this query, that is updated automatically every time the table changes, to have the results prepared for me in advance?
You can create a table called summary_tab. Use a programming language or command line to execute a query like this:
insert into summary_tab
select id from ...
Then, use a task scheduler like cron to execute the script or command line every few minutes.
The other option is to create an AFTER INSERT trigger on your table that will execute a query like this and update summary table. However, if the query takes a long time and/or if you are inserting a bunch of records in tab table, the trigger will slow inserts down.
You could also try something like this:
select id
from tab
where mode in (1, 2)
group by id
having count(*) = 2
Check the speed and results of this query. If it is not that fast, try creating an index on id and another index on mode and yet another with combination of id+mode and see if one of the indexes makes the query fast enough that you don't have to use a summary table.
I am trying to generate some data as follows (everything is done in MySQL Workbench 6.0.8.11354 build 833):
What I have:
users table (~9.000.000 entries):
SUBSCRIPTION,NUMBER,STATUS,CODE,TEST1,TEST2,TEST3,MANUFACTURER,TYPE,PROFILE
text(3 options),number,text(3
options),number,yes/no,yes/no,yes,no,number(6 options),text(50
options),text(30 options)
What I need:
stats1 table (data that I want and I need to create):
PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER,TEST1YES,TEST1NO,TEST2YES,TEST2NO,TEST3YES,TEST3NO
profile1,type1,subscription1,status1,man1,count,count,count,count,count,count
profile1,type2,subscription2,status2,man2,count,count,count,count,count,count
each PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER pair is unique.
What I did so far:
Created the stats1 table
Execute the following query in order to populate the table (I ended up with ~500 distinct entries):
insert into stats1 (PROFILE,TYPE,SUBSCRIPTION,STATUS,MANUFACTURER) select DISTINCT users.PROFILE,users.TYPE,users.SUBSCRIPTION,users.STATUS,users.MANUFACTURER from users;
Execute the following script for counting the values foe OCT13YES, for each of the ~500 entries:
update stats1 SET TEST1YES = (select count(*) from users where (users.TEST1YES='yes' and users.PROFILE=stats1.PROFILE and users.TYPE=stats1.TYPE and users.SUBSCRIPTION=stats1.SUBSCRIPTION and users.STATUS=stats1.STATUS and users.MANUFACTURER=stats1.MANUFACTURER));
I receive the following error in Workbench:
Error Code: 2013. Lost connection to MySQL server during query 600.573 sec
This is a known bug in Workbench and, even so, the server continues to execute the query.
However, the query runs for more than 70 minutes in the background (as I saw in client connections / management) and I need to run 5 more queries like this, for the rest of the columns.
Is there a better / faster / more efficient way for performing the count for those 6 columns in stats1 table?
I have executed an insert query as follows -
Insert into tablename
select
query1 union query2
Now if I execute the select part of this insert query,it takes around 2-3 minutes.However,the entire insert script is taking more than 8 minutes.As per my knowledge the insert and corresponding select queries should take almost the same time for execution.
So is their any other factor that could impact the execution time of the insert?
It's not correct that insert and corresponding select takes the same time, it should not!
The select query just "reads" data and transmit them; if you are trying the query in an application (like phpMyadmin) is very likely to limit query result to paginate them, so the select is faster (as it doesn't select all the data).
The insert query must read that data, insert in the table, update primary key tree, update every other index on that table, update every view using that table, triggering any trigger on that table/column, ecc... so the insert operates a LOT way more actions than an insert.
So IT'S normal that the insert is slower than the select, how much slower depends on your tables and db structure.
You could optimize the insert with some db specific options, for example you could read here for mysql, if you are on DB2 you could crete a temp file then cpyf that into the real one, and so on...
I run the following SQL Query on a MySQL platform.
Table A is a table which has a single column (primary key) and 25K rows.
Table B has several columns and 75K rows.
It takes 20 minutes to execute following query. I will be glad if you could help.
INSERT INTO sometable
SELECT A.PrimaryKeyColumn as keyword, 'SomeText', B.*
FROM A, B
WHERE B.PrimaryKeyColumn = CONCAT(A.PrimaryKeyColumn, B.NotUniqueButIndexedColumn);
Run the SELECT without the INSERT to see if the problem is with the SELECT or not.
If it is with the SELECT, follow the MySQL documentation explaining how to optimize queries using EXPLAIN.
If the SELECT runs fine but the INSERT takes forever, make sure you don't have a lot of unnecessary indexes on sometable. Beyond that, you may need to do some MySQL tuning and/or OS tuning (e.g., memory or disk performance) to get a measurable performance boost with the INSERT.
If I get it right you are roughly trying to insert 1.875 Billion records - (which does not match the where clause).
For that 20 minutes doesn't sound too bad....