MySQL update with subselect too slow - mysql

Having an issue with an update query taking more than 20 minutes (I kill it after that).
Scenario:
Table one has some 300K records.
Table two contains the same set of records (copied over), but with an extra field that needs to contain the id of the record that matches a number of fields, and has the highest value of another (a score). To clarify, the end result should be table two containing 300K records with each record having the id of another record that has the same set of basic properties, and the highest score within the set of records with those properties.
The below completes in ~5s when I only copy 2K records instead of the full 300k records into table two.
UPDATE vtable2 v1 SET v1.buddy = (
SELECT v2.id FROM vtable1 v2
WHERE
v2.group_id = v1.group_id AND
// 6 more basic comparisons
ORDER BY score DESC LIMIT 1
)
I need to find buddies for the full 300K records. All fields involved in joining and sorting have indexes.
Help much appreciated.

MySQL sub-queries tend to be a little slower. I prefer using joins in such cases. I am not exactly clear on your schema design - but you can try something like this -
UPDATE vtable2 v1
[INNER] JOIN vtable1 v2
ON v2.group_id = v1.group_id
AND //OTHER JOIN CONDITIONS IF ANY
WHERE
//any other conditions
SET
v1.buddy = v2.id
PS - Of-course you need to make sure you have proper indexes on your columns. If you need help with that, you can post the whole query with an explain plan.

you could test with numeric variable
SELECT v2.id FROM vtable1 v2
WHERE
v2.group_id = 1 AND
// 6 more basic comparisons
ORDER BY score DESC LIMIT 1
Anyway I think use Join it's better but I don't have schema DB.
Maybe you have a trouble about index on your sql DB.

You can use an exclusion join to find the row in vtable1 such that no other row in vtable1 with a higher score can be found.
UPDATE vtable2 AS v1
INNER JOIN vtable1 AS v2a ON v1.group_id = v2a.group_id AND (...conditions...)
LEFT OUTER JOIN vtable1 AS v2b ON v1.group_id = v2b.group_id
AND v2a.score < v2b.score AND (...conditions...)
SET v1.buddy = v2.id
WHERE v2b.group_id IS NULL;
You do have to duplicate all the other conditions in the expression for the outer join; you can't put them into the WHERE clause.

Related

Retrieving a huge amount of data using MySQL

I have a database consists of 7 tables and each table contains millions of records. I am trying to retrieve the data from different tables using aggregate functions and JOIN clause. I am using MySQL Workbench and phpMyAdmin to run queries.
The problem is I cannot retrieve the data even when I limit the number of records. However, when I indicate the IDs of the required records to retrieve it works fine. For example:
select avg(grade)
from TableA
inner join TableB on TableA.ID = TableB.ID limit 5;
If I used the above query, MySQL will stop working until losing the connection to the server.
select avg(grade)
from TableA
inner join TableB on TableA.ID = TableB.ID
where TableA.ID = 1 OR TableA.ID = 2 .... to 5
In the last query, MySQL will return the required result.
I would like to optimise these queries if I can, rather than increasing the server timeout.
In your sample queries:
Read all of one table. Or less than 'all' if there is a useful WHERE clause`. (Not in either of your cases.)
For each row in that table, reach for the matching row(s) in the next table. This uses the ON clause and WHERE clause if relevant. (You have ON, but neither query has a usable WHERE; see below.)
That gives you a big temp table.
Do any other filtering -- such as that WHERE .. OR ...
Do the aggregation.
Sort the result. Oh, there is no ORDER BY.
Deliver a few rows, based on LIMIT.
That is, most of the effort occurs before the LIMIT. (There are cases where an INDEX can be effective even into the LIMIT, but not in your examples.)
Which table is grade in? Does it really give you the correct answer? Are the two table in a 1:1 relationship, or 1:many? If 1:many, shouldn't you do the AVG against one of the tables, then do the JOIN? Note: this will speed things up. Please provide SHOW CREATE TABLE so I can help you further.

Optimizing INNER JOIN across multiple tables

I have trawled many of the similar responses on this site and have improved my code at several stages along the way. Unfortunately, this 3-row query still won't run.
I have one table with 100k+ rows and about 30 columns of which I can filter down to 3-rows (in this example) and then perform INNER JOINs across 21 small lookup tables.
In my first attempt, I was lazy and used implicit joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `lookup_table` x 21
WHERE `master_table`.`indexed_col` = "value"
AND `lookup_table`.`id` = `lookup_col` x 21
The query looked to be timing out:
#2013 - Lost connection to MySQL server during query
Following this, I tried being explicit about the joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `master_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `master_table`.`lookup_col` x 21
WHERE `master_table`.`indexed_col` = "value"
Still got the same result. I then realised that the query was probably trying to perform the joins first, then filter down via the WHERE clause. So after a bit more research, I learned how I could apply a subquery to perform the filter first and then perform the joins on the newly created table. This is where I got to, and it still returns the same error. Is there any way I can improve this query further?
SELECT `temp_table`.*, `lookup_table`.`data_point` x 21
FROM (SELECT * FROM `master_table` WHERE `indexed_col` = "value") as `temp_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `temp_table`.`lookup_col` x 21
Is this the best way to write up this kind of query? I tested the subquery to ensure it only returns a small table and can confirm that it returns only three rows.
First, at its most simple aspect you are looking for
select
mt.*
from
Master_Table mt
where
mt.indexed_col = 'value'
That is probably instantaneous provided you have an index on your master table on the given indexed_col in the first position (in case you had a compound index of many fields)…
Now, if I am understanding you correctly on your different lookup columns (21 in total), you have just simplified them for redundancy in this post, but actually doing something in the effect of
select
mt.*,
lt1.lookupDescription1,
lt2.lookupDescription2,
...
lt21.lookupDescription21
from
Master_Table mt
JOIN Lookup_Table1 lt1
on mt.lookup_col1 = lt1.pk_col1
JOIN Lookup_Table2 lt2
on mt.lookup_col2 = lt2.pk_col2
...
JOIN Lookup_Table21 lt21
on mt.lookup_col21 = lt21.pk_col21
where
mt.indexed_col = 'value'
I had a project well over a decade ago dealing with a similar situation... the Master table had about 21+ million records and had to join to about 30+ lookup tables. The system crawled and queried died after running a query after more than 24 hrs.
This too was on a MySQL server and the fix was a single MySQL keyword...
Select STRAIGHT_JOIN mt.*, ...
By having your master table in the primary position, where clause and its criteria directly on the master table, you are good. You know the relationships of the tables. Do the query in the exact order I presented it to you. Don't try to think for me on this and try to optimize based on a subsidiary table that may have smaller record count and somehow think that will help the query faster... it won't.
Try the STRAIGHT_JOIN keyword. It took the query I was working on and finished it in about 1.5 hrs... it was returning all 21 million rows with all corresponding lookup key descriptions for final output, hence still needed a longer duration than just 3 records.
First, don't use a subquery. Write the query as:
SELECT mt.*, lt.`data_point`
FROM `master_table` mt INNER JOIN
`lookup_table` l
ON l.`id` = mt.`lookup_col`
WHERE mt.`indexed_col` = value;
The indexes that you want are master_table(value, lookup_col) and lookup_table(id, data_point).
If you are still having performance problems, then there are multiple possibilities. High among them is that the result set is simply too big to return in a reasonable amount of time. To see if that is the case, you can use select count(*) to count the number of returned rows.

Mysql: Why is WHERE IN much faster than JOIN in this case?

I have a query with a long list (> 2000 ids) in a WHERE IN clause in mysql (InnoDB):
SELECT id
FROM table
WHERE user_id IN ('list of >2000 ids')
I tried to optimize this by using an INNER JOIN instead of the wherein like this (both ids and the user_id use an index):
SELECT table.id
FROM table
INNER JOIN users ON table.user_id = users.id WHERE users.type = 1
Surprisingly, however, the first query is much faster (by the factor 5 to 6). Why is this the case? Could it be that the second query outperforms the first one, when the number of ids in the where in clause becomes much larger?
This is not Ans to your Question but you may use as alternative to your first query, You can better increase performance by replacing IN Clause with EXISTS since EXISTS performance better than IN ref : Here
SELECT id
FROM table t
WHERE EXISTS (SELECT 1 FROM USERS WHERE t.user_id = users.id)
This is an unfair comparison between the 2 queries.
In the 1st query you provide a list of constants as a search criteria, therefore MySQL has to open and search only table and / or 1 index file.
In the 2nd query you instruct MySQL to obtain the list dynamically from another table and join that list back to the main table. It is also not clear, if indexes were used to create a join or a full table scan was needed.
To have a fair comparison, time the query that you used to obtain the list in the 1st query along with the query itself. Or try
SELECT table.id FROM table WHERE user_id IN (SELECT users.id FROM users WHERE users.type = 1)
The above fetches the list of ids dynamically in a subquery.

Optimize performance of MySQL UPDATE query containing EXISTS

Can anybody please give me a hint on how to optimize this update MySQL query that takes about a minute to process?
UPDATE store s
SET reservation=1
WHERE EXISTS (
SELECT 1
FROM item i
WHERE s.reservation=0
AND s.status!=9
AND s.id=i.store_id
AND i.store_id!=0
)
I need to update (set reservation=1) all rows in "store" table (which is very large) where there is currently reservation=0 but it's id exists in another table "item". Table "item" is also large but not as much as "store".
I'am not an expert on creating efficient queries so forgive me if this is just a completely wrong attitude and the whole thing has a simple solution.
Thanks for any ideas.
It looks like some of the predicates in the correlated subquery could be moved to the outer query. For example, I believe this is equivalent:
UPDATE store s
SET s.reservation = 1
WHERE s.reservation = 0
AND s.status != 9
AND s.id != 0
AND EXISTS ( SELECT 1
FROM item i
WHERE i.store_id = s.id
)
For best performance of that, at a minimum, we'd want an index on store that has reservation as the leading column. Also including the status and id columns would mean those conditions could be checked from the index page, without a lookup of the underlying page in the table.
And for that correlated subquery (dependent query), we'd want an index on item with a store_id as the leading column.
As another option, consider re-writing the correlated subquery as a JOIN operation, for example:
UPDATE store s
JOIN item i
ON i.store_id = s.id
SET s.reservation = 1
WHERE s.reservation = 0
AND s.status != 9
AND s.id != 0
If you're running MySQL 5.5 or earlier, you can't get an EXPLAIN on an UPDATE statement. The closest we can get is rewriting the query as a SELECT, and getting an EXPLAIN on that. MySQL 5.6 does support EXPLAIN on an UPDATE statement.
You can try to use:
UPDATE store s INNER JOIN item i ON s.id=i.store_id SET reservation=1 WHERE i.store_id!=0 AND s.reservation=0 AND s.status != 9;
This case should works faster because you will not go thru all 'item' table each time when you need to check 'store' row.

Mysql range check instead of index usage on inner join

I'm having a serious problem with MySQL (innoDB) 5.0.
A very simple SQL query is executed with a very unexpected query plan.
The query:
SELECT
SQL_NO_CACHE
mbCategory.*
FROM
MBCategory mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
MBCategory - contains 216583 rows
ResourcePermission - contains 3098354 rows.
In MBCategory I've multiple indexes (columns order as in index):
Primary (categoryId)
A (groupId,parentCategoryId,categoryId)
B (groupId,parentCategoryId)
In ResourcePermission I've multiple indexes (columns order as in index):
Primary - on some column
A (primKey).
When I look into query plan Mysql changes tables sequence and selects rows from ResourcePermission at first and then it joins the MBCategory table (crazy idea) and it takes ages. So I added STRAIGHT_JOIN to force the innodb engine to use correct table sequence:
SELECT
STRAIGHT_JOIN SQL_NO_CACHE
mbCategory.*
FROM
MBCategory
mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
But here the second problem materialzie:
In my opinion mysql should use index A (primKey) on the join operation instead it performs Range checked for each record (index map: 0x400) and it again takes ages !
Force index doesn't help, mysql still performing Range checked for each record .
There are only 23 rows in the MBCategory which fulfill where criteria, and after join there are only 75 rows.
How can I make mysql to choose correct index on this operation ?
Ok,
elementary problem.
I owe myself a beer.
The system I'm recently tunning is not a system I've developted - I've been assigned to it by my management to improve performance (originall team doesn't have knowledge on this topic).
After fee weeks of improving SQL queries, indexes, number of sql queries that are beeing executed by application I didn't check one of the most important things in this case !!
COLUMN TYPES ARE DIFFERENT !
Developer who have written than kind of code should get quite a big TALK.
Thanks for help !
I had the same problem with a different cause. I was joining a large table, and the ON clause used OR to compare the primary key (ii.itemid) to two different columns:
SELECT *
FROM share_detail sd
JOIN box_view bv ON sd.container_id = bv.id
JOIN boxes b ON b.id = bv.shared_id
JOIN item_index ii ON ii.itemid = bv.shared_id OR b.parent_itemid = ii.itemid;
Fortunately, it turned out the parent_itemid comparison was redundant, so I was able to remove it. Now the index is being used as expected. Otherwise, I was going to try splitting the item_index join into two separate joins.