How to optimize a MySQL update which contains an "in" subquery? - mysql

How do I optimize the following update because the sub-query is being executed for each row in table a?
update
a
set
col = 1
where
col_foreign_id not in (select col_foreign_id in b)

You could potentially use an outer join where there are no matching records instead of your not in:
update table1 a
left join table2 b on a.col_foreign_id = b.col_foreign_id
set a.col = 1
where b.col_foreign_id is null
This should use a simple select type rather than a dependent subquery.
Your current query (or the one that actually works since the example in the OP doesn't look like it would) is potentially dangerous in that a NULL in b.col_foreign_id would cause nothing to match, and you'd update no rows.
not exists would also be something to look at if you want to replace not in.
I can't tell you that this will make your query any faster, but there is some good info here. You'll have to test in your environment.
Here's a SQL Fiddle illuminating the differences between in, exists, and outer join (check the rows returned, null handling, and execution plans).

Related

Join Performances When Searching For NULL Value

I need to find a value that exists in LoyaltyTransactionBasketItemStores table but not in DimProductConsolidate table. I need the item code and its corresponding company. This is my query
SELECT
A.ProductReference, A.CompanyCode
FROM
(SELECT ProductReference, CompanyCode FROM dwhdb.LoyaltyTransactionsBasketItemsStores GROUP BY ProductReference) A
LEFT JOIN
(SELECT LoyaltyVariantArticleCode FROM dwhdb.DimProductConsolidate) B ON B.LoyaltyVariantArticleCode = A.ProductReference
WHERE
B.LoyaltyVariantArticleCode IS NULL
It is a pretty straight forward query. But when I run it, it's taking 1 hour and still not finish. Then I use EXPLAIN and this is the result
But when I remove the CompanyCode from my query, its performance is increasing a lot. This is the EXPLAIN result
I want to know why is this happening and is there any way to get ProductReference and its company with a lot more better performance?
Your current query is rife with syntax and structural errors. I would use exists logic here:
SELECT a.ProductReference, a.CompanyCode
FROM dwhdb.LoyaltyTransactionsBasketItemsStores a
WHERE NOT EXISTS (SELECT 1 FROM dwhdb.DimProductConsolidate b
WHERE b.LoyaltyVariantArticleCode = a.ProductReference);
Your current query is doing a GROUP BY in the first subquery, but you never select aggregates, but rather other non aggregate columns. On most other databases, and even on MySQL in strict mode, this syntax is not allowed. Also, there is no need to have 2 subqueries here. Rather, just select from the basket table and then assert that matching records do not exist in the other table.

Using a join in Mysql update statement instead of sub-query

I'm currently using the following to update a table of mine:
UPDATE Check_Dictionary
SET InDict = "No" WHERE (Leagues, Matches, Line) IN (SELECT * FROM (
SELECT Leagues, Matches, Line FROM Check_Dictionary
WHERE InDict = "No")as X)
However, when I have large data sets (40k+ rows) this seems to be fairly inefficient/slow. All of the searching I'm doing suggests that joins are far more efficient for this sort of thing than a sub-query. However, being a mysql newbie I'm not sure of the best way to do it.
My table may have multiple rows where the League/Matches/Line fields are the same. Generally the InDict field on these rows will be "Yes". However, if one of them is "No" I need to update all of the other rows with the same League/Matches/Line columns to "No" as well (so they all have a value of "No").
Would using a join in Mysql update statement instead of sub-query be more efficient?
How can I do it using a join?
I would think a join should be faster, but it depends on indexing and other things, you should try it for yourself to see which performs better (and maybe use explain to analyze the queries).
As for syntax, any of these should work:
UPDATE Check_Dictionary c1
JOIN (
SELECT Leagues, Matches, Line
FROM Check_Dictionary
WHERE InDict = "No"
) AS X USING (Leagues, Matches, Line)
SET InDict = "No"
UPDATE Check_Dictionary AS c1
JOIN Check_Dictionary AS c2 USING (Leagues, Matches, Line)
SET c1.InDict = "No"
WHERE c2.InDict = "No"
The update join query given by "jpw" was correct you can use it, I don't want to repeat. Having said, i just want to post join is faster than subquery obviously especially if you want to update 40K+ rows. Below is the data from MySQL documentation says about the same.
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things. Today, MySQL Server and many other modern database systems offer a wide range of outer join types.
MySQL Server supports multiple-table DELETE statements that can be used to efficiently delete rows based on information from one table or even from many tables at the same time. Multiple-table UPDATE statements are also supported. See Section 13.2.2, “DELETE Syntax”, and Section 13.2.10, “UPDATE Syntax”.
Source : http://dev.mysql.com/doc/refman/5.1/en/rewriting-subqueries.html

MYSQL - not equal joins not working properly

I'm having trouble getting a query to work properly. I feel that this should be easy but for some reason I can't get it correct.
I have two tables joined by an ID field. I'm trying to get all the records that are in t1 and don't show up in t2.
This works currently:
select * from at_templates a
left join at_vault b on a.id = b.template
where b.at_id is null
BUT, I also want to put another condition in the query to limit the data to a subset and it is not working:
select * from at_templates a
left join at_vault b on a.id = b.template
where b.at_id != 1
The second query comes up empty but I want the same results as the first, based upon the input of at_id.
Any ideas?
Your working example implies that the "first table" you want to see records from is a and the "second table" you want to use to exclude records is b. If you are excluding all records that exist in b, then you can't further limit the result set by any value like b.at_id because there are no values associated with b in your result set.
Additionally, if the condition b.at_id is null is true, the condition b.at_id != 1 will never be true because an inequality comparison with null will always return null. (The reason for this is that null is not a value; it is a placeholder indicating the absence of a value.)
If you want to get the same results from both queries, based on a comparison between some user input parameter and the field b.at_id (and noting that your second query currently returns an empty set), you might be able to use MySQL's null-safe equality operator in the following way:
SELECT
*
FROM
at_templates AS a
LEFT JOIN
at_vault AS b ON a.id = b.template
WHERE NOT b.at_id <=> 1;
This is a MySQL extension, not a standard syntax; unfortunately the ANSI SQL standard syntax, IS [NOT] DISTINCT FROM, doesn't appear to be widely supported. Some alternate ways to rewrite this condition are discussed in How to rewrite IS DISTINCT FROM and IS NOT DISTINCT FROM?.
Keep in mind that if in the future you have some values of b.at_id that are not 1, this query would return those rows as well, and not just the rows returned by your first query.

Is this query well written? I am fairly new at this and am wondering if there is a better way to write it

UPDATE table1
INNER JOIN table2
ON table1.var1=table2.var1
SET table1.var2=table2.var2
My table has about 975,000 rows in it and I know this will take a while no matter what. Is there any better way to write this?
Thanks!
If the standard case is that table1.Var2 already is equal to table2.var2, you may end up with an inflated write count as the database may still update all those rows with no functional change in value.
You may get better performance by updating only those rows which have a different value than the one you desire.
UPDATE table1
INNER JOIN table2
ON table1.var1=table2.var1
SET table1.var2=table2.var2
WHERE (table1.var2 is null and table2.var2 is not null OR
table1.var2 is not null and table2.var2 is null OR
table1.var2 <> table2.var2)
Edit: Nevermind... MySQL only updates on actual changes, unlike some other RDBMS's (MS SQL, for example.)
Your query:
UPDATE table1 INNER JOIN
table2
ON table1.var1 = table2.var1
SET table1.var2 = table2.var2;
A priori, this looks fine. The major issue that I can see would be a 1-many relationship from table1 to table2. In that case, multiple rows from table2 might match a given row from table1. MySQL assigns an arbitrary value in such a case.
You could fix this by choosing one value, such as the min():
UPDATE table1 INNER JOIN
(select var1, min(var2) as var2
from table2
group by var1
) t2
ON table1.var1 = t2.var1
SET table1.var2 = t2.var2;
For performance reasons, you should have an index on table2(var1, var2). By including both columns in the index, the query will be able to use the index only and not have to fetch rows directly from the table.

Use ORDER BY 'x' with a JOIN, but keep rows that don't have a value for 'x'

This is simplified version of a relatively complex problem that myself and my colleagues can't quite get our heads around.
Consider two tables, table_a and table_b. In our CMS table_a holds metadata for all the data stored in the database, and table_b has some more specific information, so for simplicity's sake, a title and date column.
At the moment our query looks like:
SELECT *
FROM `table_a` LEFT OUTER JOIN `table_b` ON (table_a.id = table_b.id)
WHERE table_a.col = 'value'
ORDER BY table_b.date ASC
LIMIT 0,20
This degrades badly when table_a has a large amount of rows. If the JOIN is changed RIGHT OUTER JOIN (which triggers MySQL to use the INDEX set on table_b.date), the query is infinitely quicker, but it doesn't produce the same results (because if table_b.date doesn't have a value, it is ignored).
This becomes an issue in our CMS because if the user sorts on the date column, any rows that don't have a date set yet disappear from the interface, creating a confusing UI experience and makes it difficult to add dates for the rows that missing them.
Is there a solution that will:
Use table_b.date's INDEX so that
the query will scale better
Somehow retain those rows in
table_b that don't have a date
set so that a user can enter the
data
I'm going to second ArtoAle's comment. since the order by applies to a null value in the outer join for missing rows in table_b, those rows will be out of order anyway.
The simulated outer join is the ugly part, so lets look at that first. Mysql doesn't have except, so you need to write the query in terms of exists.
SELECT table_a.col1, table_a.col2, table_a.col3, ... NULL as table_b_col1, NULL as ...
FROM
table_a
WHERE
NOT EXISTS (SELECT 1 FROM table_a INNER JOIN table_b ON table_a.id = table_b.id);
Which should be UNION ALLed with the original query as an inner join. The UNION_ALL is needed to preserve the original order.
This sort of query is probably going to be dog-slow no matter what you do, because there won't be an index that readily supports a "Foreign Key not present" sort of query. This basically boils down to an index scan in table_a.id with a lookup (Or maybe a parallel scan) for the corresponding row in table_b.id.
So we ended up implemented a different solution that while the results were not as good as using an INDEX, it still provided a nice speed boost of around 25%.
We remove the JOIN and instead used an ORDER BY subquery:
SELECT *
FROM `table_a`
WHERE table_a.col = 'value'
ORDER BY (
SELECT date
FROM table_b
WHERE id = table_a.id
) ASC
LIMIT 0,20