I run the following SQL Query on a MySQL platform.
Table A is a table which has a single column (primary key) and 25K rows.
Table B has several columns and 75K rows.
It takes 20 minutes to execute following query. I will be glad if you could help.
INSERT INTO sometable
SELECT A.PrimaryKeyColumn as keyword, 'SomeText', B.*
FROM A, B
WHERE B.PrimaryKeyColumn = CONCAT(A.PrimaryKeyColumn, B.NotUniqueButIndexedColumn);
Run the SELECT without the INSERT to see if the problem is with the SELECT or not.
If it is with the SELECT, follow the MySQL documentation explaining how to optimize queries using EXPLAIN.
If the SELECT runs fine but the INSERT takes forever, make sure you don't have a lot of unnecessary indexes on sometable. Beyond that, you may need to do some MySQL tuning and/or OS tuning (e.g., memory or disk performance) to get a measurable performance boost with the INSERT.
If I get it right you are roughly trying to insert 1.875 Billion records - (which does not match the where clause).
For that 20 minutes doesn't sound too bad....
Related
I have a database table which is around 700GB with 1 Billion rows, the data is approximately 500 GB and index is 200GB,
I am trying to delete all the data before 2021,
Roughly around 298,970,576 rows in 2021 and there are 708,337,583 rows remaining.
To delete this I am running a non-stop query in my python shell
DELETE FROM table_name WHERE id < 1762163840 LIMIT 1000000;
id -> 1762163840 represent data from 2021. Deleting 1Mil row taking almost 1200-1800sec.
Is there any way I can speed up this because the current way is running for more than 15 days and there is not much data delete so far and it's going to do more days.
I thought that if I make a table with just ids of all the records that I want to delete and then do an exact map like
DELETE FROM table_name WHERE id IN (SELECT id FROM _tmp_table_name);
Will that be fast? Is it going to be faster than first making a new table with all the records and then deleting it?
The database is setup on RDS and instance class is db.r3.large 2 vCPU and 15.25 GB RAM, only 4-5 connections running.
I would suggest recreating the data you want to keep -- if you have enough space:
create table keep_data as
select *
from table_name
where id >= 1762163840;
Then you can truncate the table and re-insert new data:
truncate table table_name;
insert into table_name
select *
from keep_data;
This will recreate the index.
The downside is that this will still take a while to re-insert the data (renaming keep_data would be faster). But it should be much faster than deleting the rows.
AND . . . this will give you the opportunity to partition the table so future deletes can be handled much faster. You should look into table partitioning if you have such a large table.
Multiple techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
It points out that LIMIT 1000000 is unnecessarily big and causes more locking than might be desirable.
In the long run, PARTITIONing would be beneficial, it mentions that.
If you do Gordon's technique (rebuilding table with what you need), you lose access to the table for a long time; I provide an alternative that has essentially zero downtime.
id IN (SELECT...) can be terribly slow -- both because of the inefficiency of in-SELECT and due to the fact that DELETE will hang on to a huge number of rows for transactional integrity.
I know that you can use
SELECT *
FROM table
WHERE id IN (ids)
In my case, I have 100,000 of ids.
I'm wondering if MySQL has a limit for IN clause. If anyone knows a more efficient way to do this, that would be great!
Thanks!
Just this week I had to kill -9 a MySQL 5.7 Server where one of the developers had run a query like you describe, with a list of hundreds of thousands of id's in an IN( ) predicate. It caused the thread running the query to hang, and it wouldn't even respond to a KILL command. I had to shut down the MySQL Server instance forcibly.
(Fortunately it was just a test server.)
So I recommend don't do that. I would recommend one of the following options:
Split your list of 100,000 ids into batches of at most 1,000, and run the query on each batch. Then use application code to merge the results.
Create a temporary table with an integer primary key.
CREATE TEMPORARY TABLE mylistofids (id INT PRIMARY KEY);
INSERT the 100,000 ids into it. Then run a JOIN query for example:
SELECT t.* FROM mytable AS t JOIN mylistofids USING (id)
Bill Karwin suggestions are good.
The number of values from IN clause is only limited by max_allowed_packet from my.ini
MariaDB creates a temporary table when the IN clause exceeds 1000 values.
Another problem with such number of IDs is the transfer of data from the PHP script (for example) to the MySQL server. It will be a very long query. You can also create a stored procedure with that select and just call it from you script. It will be more efficient in terms of passing data from your script to MySQL.
Can any one help me to re-write the query to speed up the execution time? It took 37 seconds to execute.
DELETE FROM storefront_categories
WHERE userid IN (SELECT userid
FROM MASTER
where expirydate<'2020-2-4'
)
At the same time, this query took only 4.69 seconds only to execute.
DELETE FROM storefront_categories
WHERE userid NOT IN (SELECT userid FROM MASTER)
The table storefront_categories have 97K records where as in MASTER have 40K records. We have created a index on MASTER.expirydate field.
When deleting 40K rows, expect it to take time. The main cost (assuming adequate indexing and a decent query) is the overhead of transactional semantics of an "atomic" delete. This involves making a copy of each row being deleted, just in case there is a crash. That way, InnoDB can bring the database back to what it had been before the crash.
When deleting 40% of a table, it is much faster to copy the rows to keep into another table then swap tables.
When deleting a large number of rows (regardless of the percentage), it is better to do it in chunks. And it is best to walk through the table based on the PRIMARY KEY.
I discuss both of those techniques, plus others, in http://mysql.rjweb.org/doc.php/deletebig
As for the query formulation:
It is version-dependent; old versions of MySQL did a poor job on some flavors.
NOT IN (SELECT ...) and NOT EXISTS tend to be the worst performers.
IN (SELECT ...) and/or EXISTS may be better.
"Multi-table DELETE is another option. It works like JOIN.
(Bottom line: You did not say what version you are running; I can't predict which formulation will be best.)
My blog avoids the formulation debate.
The query looks fine as it is.
I would suggest the following indexes for optimization:
master(expiry_date, userid)
storefront_categories(userid)
The first index is a covering index for the subquery on master: it means that the database should be able to execute the subquery by looking at the index only (whereas with just expiry_date in the index, it still needs to look at the table data to fetch the related userid).
The second index lets the database optimize the in operation.
I would try with exists :
DELETE
FROM storefront_categories
WHERE EXISTS (SELECT 1
FROM MASTER M
WHERE M.userid = storefront_categories.userid AND
M.expirydate <'2020-02-04'
);
Index would be metter here i would expect index on storefront_categories(userid) & MASTER(userid, expirydate).
I would advise you to use NOT EXISTS with the correct index:
DELETE sc
FROM storefront_categories sc
WHERE NOT EXISTS (SELECT 1
FROM master m
WHERE m.userid = sc.userid AND
m.expirydate < '2020-02-04'
);
The index you want is on master(userid, expirydate). The order of the columns is important. For this version, an index on storefront_categories does not help.
Note that I changed the date format. I recommend using YYYY-MM-DD to avoid ambiguity -- and to use the full 10 characters.
I used to run this command to insert some rows in a counter table:
insert into `monthly_aggregated_table`
select year(r.created_at), month(r.created_at), count(r.id) from
raw_items r
group by 1,2;
This query is very heavy and takes some time to run (millions of rows), and the raw_items table is MyISAM, so it was causing table locking and writes to it had to wait for the insert to finish.
Now I created a slave server to do the SELECT.
What I would like to do is to execute the SELECT in the slave, but get the results and insert into the master database. Is it possible? How? What is the most efficient way to do this? (The insert used to have 1.3 million rows)
I am running MariaDB 10.0.17
You will have to split the action in 2 parts with a programming language like java or php in between.
First the select, then load the resultset into your application, and then insert the data.
Another optimization which you could do to speed the select up is add one new column in your table "ym_created_at" containing a concatenation of year(created_at) and month(created_at). Place an index on that column and then run the updated statement:
insert into `monthly_aggregated_table`
select ym_created_at, count(r.id) from
raw_items r
group by 1;
Easier and might be a lot quicker since not functions are acting on the columns you are using the group by on.
I have a MySQL MYISAM table (say tbl) consisting of 2 unsigned int fields, say, f1 and f2. There is an index on f2 and the table is very large (approximately 320,000,000+ rows). I update this table periodically (with approximately 100,000 new rows a week), and, in order to be able to search this table without doing an ORDER BY (which would be very time consuming in real-time queries), I physically ORDER the table according to the way in which I want to retrieve its rows.
So, I perform an ALTER TABLE tbl ORDER BY f1 DESC. (I know I have enough physical space on the server for a copy of the table.) I have read that during this operation, a temporary table is created and SELECT statements are not affected on the current rows.
However, I have experienced that this is not the case, and SELECT statements on the table that occur at the same time with the ALTER table are getting blocked and do not terminate. After the ALTER TABLE tbl completes (about 40 minutes on the production server), the SELECT statements on tbl start executing fine again.
Is there any reason why the "ALTER table tbl ORDER BY f1 DESC" seems to be blocking other clients from querying tbl?
Altering a table will always grab a lock on the table, preventing SELECTs from running.
I'll admin that I didn't even know you could do that with an ALTER TABLE.
What are you trying to get from the table? For example, all records in a given range? 320 million rows is not a trivial number. I'll give you my gut reactions:
Switch to InnoDB (allows #2, also gives transactions, but without #2 may hurt performance)
Partition the table (makes it act like a number of slightly smaller tables)
Consider a redesign, such as having a "working set" table and a "historical" table, basically manually partitioning. If you usually look for recently inserted data, this (along with partitioning) will help a lot. If your lookups are evenly distributed, this probably won't make a difference.
Consider adding a new column you could use in conjunction to narrow down selects (so instead of searching on date, search on date and customer ID)
Since I don't know what you're storing, some of these (such as #4) may not apply.
There are some other things you could try. OPTIMIZE TABLE may help you but take less time, but I doubt it. I think internally it's implemented as a dump/reload, at least on the InnoDB side.