MySQL use subquery from FROM in WHERE - mysql

I have a question about MySQL. My script looks like this :
SELECT *
FROM (subQuery) AS q1
WHERE
(SELECT something
FROM q1
WHERE id = q1.id-1) = x;
My problem is that in the last subquery the table q1 is unknown. How can I solve this?

You can't compare rows in the results. The best you could do is duplicate the subquery, call it q1b for example, and join it to q1 on q1.id - 1 = q1b.id; then in the where have something like q1b.something = x (though that can be in the ON as well)
There is also some possibilities with temp tables; but you still need the two joins, or maybe a correlated subquery in the where would work. ... if stored in a true TEMPORARY table, not just some "dummy" table you drop after, you'll need two copies. You can't use the same TEMPORARY table twice in the same query.

Related

Performing a custom SQL query

I want to select all those rows in table A where column x's value is present in table B's column y.
I am new to writing SQL queries have have tried using different combinations of SELECT statement, COUNT function and WHERE clause since are a really long time, but was unable to do so.
Is it possible to do this using plain SQL queries or is something complex like procedure needed?
A typical method is exists:
select a.*
from a
where exists (select 1
from b
where b.y = a.x
);

Join Performances When Searching For NULL Value

I need to find a value that exists in LoyaltyTransactionBasketItemStores table but not in DimProductConsolidate table. I need the item code and its corresponding company. This is my query
SELECT
A.ProductReference, A.CompanyCode
FROM
(SELECT ProductReference, CompanyCode FROM dwhdb.LoyaltyTransactionsBasketItemsStores GROUP BY ProductReference) A
LEFT JOIN
(SELECT LoyaltyVariantArticleCode FROM dwhdb.DimProductConsolidate) B ON B.LoyaltyVariantArticleCode = A.ProductReference
WHERE
B.LoyaltyVariantArticleCode IS NULL
It is a pretty straight forward query. But when I run it, it's taking 1 hour and still not finish. Then I use EXPLAIN and this is the result
But when I remove the CompanyCode from my query, its performance is increasing a lot. This is the EXPLAIN result
I want to know why is this happening and is there any way to get ProductReference and its company with a lot more better performance?
Your current query is rife with syntax and structural errors. I would use exists logic here:
SELECT a.ProductReference, a.CompanyCode
FROM dwhdb.LoyaltyTransactionsBasketItemsStores a
WHERE NOT EXISTS (SELECT 1 FROM dwhdb.DimProductConsolidate b
WHERE b.LoyaltyVariantArticleCode = a.ProductReference);
Your current query is doing a GROUP BY in the first subquery, but you never select aggregates, but rather other non aggregate columns. On most other databases, and even on MySQL in strict mode, this syntax is not allowed. Also, there is no need to have 2 subqueries here. Rather, just select from the basket table and then assert that matching records do not exist in the other table.

Query performance issue with multiple left joins

In mysql v8.x I have a table with about 7000 records in it. I'm trying to create a single query combines two subqueries of the same table.
I thought I could achieve this by left joining on the subqueries and then matching on any records that have values for these as shown in the example below (note: this effect happens when my_table has just just an id column).
The query seems to work quickly when the subqueries return records but not when the subqueries return empty (which I've recreated in the example below with WHERE FALSE). When this happens there is a situation where executing these queries on their own that each take a millisecond or so, takes 12 seconds.
My understanding is that these these joins should return the same number of rows as the source table and as such there shouldn't be such a big difference. I'm interested in understanding how the join works in this type of case and why it's producing such a difference in execution time.
SELECT my_table.* FROM accessory_requests
LEFT JOIN
( SELECT my_table.id
FROM my_table
WHERE FALSE
) as join1
ON join1.id = my_table.id
LEFT JOIN
( SELECT my_table.id
FROM my_table
WHERE FALSE
) as join2
ON join2.id = my_table.id
WHERE join1.id IS NOT NULL OR join2.id IS NOT NULL;
Your query is all messed up and it is not really clear what you are trying to do.
However, I can comment on your performance issues. MySQL has a tendency to materialize subqueries in the FROM clause. That means that a new copy of the table is created. In doing so, indexes are lost on the table. So, eliminate the subqueries in the FROM clause.
If you ask another question with sample data, desired results, and a decent explanation, then it might be possible to help with a more efficient form of the query. I suspect you just want not exists, but that is a rather large leap from this question.
combines two subqueries of the same table.
What do you mean?
If you want to take the rows from each subquery, then simply do
( SELECT ... ) -- what you are calling the first subquery
UNION
( SELECT ... ) -- 2nd
Also,
LEFT JOIN ( ... ) as join1 ON ...
WHERE join1.id IS NOT NULL;
is probably the same as simply
JOIN ( ... ) as join1 ON ...
If by "combining" you mean to have multiple columns, then see the tag [pivot-table].

Optimizing INNER JOIN across multiple tables

I have trawled many of the similar responses on this site and have improved my code at several stages along the way. Unfortunately, this 3-row query still won't run.
I have one table with 100k+ rows and about 30 columns of which I can filter down to 3-rows (in this example) and then perform INNER JOINs across 21 small lookup tables.
In my first attempt, I was lazy and used implicit joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `lookup_table` x 21
WHERE `master_table`.`indexed_col` = "value"
AND `lookup_table`.`id` = `lookup_col` x 21
The query looked to be timing out:
#2013 - Lost connection to MySQL server during query
Following this, I tried being explicit about the joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `master_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `master_table`.`lookup_col` x 21
WHERE `master_table`.`indexed_col` = "value"
Still got the same result. I then realised that the query was probably trying to perform the joins first, then filter down via the WHERE clause. So after a bit more research, I learned how I could apply a subquery to perform the filter first and then perform the joins on the newly created table. This is where I got to, and it still returns the same error. Is there any way I can improve this query further?
SELECT `temp_table`.*, `lookup_table`.`data_point` x 21
FROM (SELECT * FROM `master_table` WHERE `indexed_col` = "value") as `temp_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `temp_table`.`lookup_col` x 21
Is this the best way to write up this kind of query? I tested the subquery to ensure it only returns a small table and can confirm that it returns only three rows.
First, at its most simple aspect you are looking for
select
mt.*
from
Master_Table mt
where
mt.indexed_col = 'value'
That is probably instantaneous provided you have an index on your master table on the given indexed_col in the first position (in case you had a compound index of many fields)…
Now, if I am understanding you correctly on your different lookup columns (21 in total), you have just simplified them for redundancy in this post, but actually doing something in the effect of
select
mt.*,
lt1.lookupDescription1,
lt2.lookupDescription2,
...
lt21.lookupDescription21
from
Master_Table mt
JOIN Lookup_Table1 lt1
on mt.lookup_col1 = lt1.pk_col1
JOIN Lookup_Table2 lt2
on mt.lookup_col2 = lt2.pk_col2
...
JOIN Lookup_Table21 lt21
on mt.lookup_col21 = lt21.pk_col21
where
mt.indexed_col = 'value'
I had a project well over a decade ago dealing with a similar situation... the Master table had about 21+ million records and had to join to about 30+ lookup tables. The system crawled and queried died after running a query after more than 24 hrs.
This too was on a MySQL server and the fix was a single MySQL keyword...
Select STRAIGHT_JOIN mt.*, ...
By having your master table in the primary position, where clause and its criteria directly on the master table, you are good. You know the relationships of the tables. Do the query in the exact order I presented it to you. Don't try to think for me on this and try to optimize based on a subsidiary table that may have smaller record count and somehow think that will help the query faster... it won't.
Try the STRAIGHT_JOIN keyword. It took the query I was working on and finished it in about 1.5 hrs... it was returning all 21 million rows with all corresponding lookup key descriptions for final output, hence still needed a longer duration than just 3 records.
First, don't use a subquery. Write the query as:
SELECT mt.*, lt.`data_point`
FROM `master_table` mt INNER JOIN
`lookup_table` l
ON l.`id` = mt.`lookup_col`
WHERE mt.`indexed_col` = value;
The indexes that you want are master_table(value, lookup_col) and lookup_table(id, data_point).
If you are still having performance problems, then there are multiple possibilities. High among them is that the result set is simply too big to return in a reasonable amount of time. To see if that is the case, you can use select count(*) to count the number of returned rows.

MYSQL: how to find entries corresponding to MIN() of costly function

I am running a complicated and costly query to find the MIN() values of a function grouped by another attribute. But I don't just need the value, I need the entry that produces it + the value.
My current pseudoquery goes something like this:
SELECT MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) FROM (prefiltering) as a GROUP BY a.group_att;
but I want a.* and MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) as my result.
The only way I can think of is using this ugly beast:
SELECT a1.*, COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2)
FROM (prefiltering) as a1
WHERE COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) =
(SELECT MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) FROM (prefiltering) as a GROUP BY a.group_att)
But now I am executing the prefiltering_query 2 times and have to run the costly function twice. This is ridiculous and I hope that I am doing something seriously wrong here.
Possible solution?:
Just now I realize that I could create a temporary table containing:
(SELECT a1.*, COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) as complex FROM (prefiltering) as a1)
and then run the MIN() as subquery and compare it at greatly reduced cost. Is that the way to go?
A problem with your temporary table solution is that I can't see any way to avoid using it twice in the same query.
However, if you're willing to use an actual permanent table (perhaps with ENGINE = MEMORY), it should work.
You can also move the subquery into the FROM clause, where it might be more efficient:
CREATE TABLE temptable ENGINE = MEMORY
SELECT a1.*,
COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) AS complex
FROM prefiltering AS a1;
CREATE INDEX group_att_complex USING BTREE
ON temptable (group_att, complex);
SELECT a2.*
FROM temptable AS a2
NATURAL JOIN (
SELECT group_att, MIN(complex) AS complex
FROM temptable GROUP BY group_att
) AS a3;
DROP TABLE temptable;
(You can try it without the index too, but I suspect it'll be faster with it.)
Edit: Of course, if one temporary table won't do, you could always use two:
CREATE TEMPORARY TABLE temp1
SELECT *, COSTLY_FUNCTION(att1,att2,$v1,$v2) AS complex
FROM prefiltering;
CREATE INDEX group_att_complex ON temp1 (group_att, complex);
CREATE TEMPORARY TABLE temp2
SELECT group_att, MIN(complex) AS complex
FROM temp1 GROUP BY group_att;
SELECT temp1.* FROM temp1 NATURAL JOIN temp2;
(Again, you may want to try it with or without the index; when I ran EXPLAIN on it, MySQL didn't seem to want to use the index for the final query at all, although that might be just because my test data set was so small. Anyway, here's a link to SQLize if you want to play with it; I used CONCAT() to stand in for your expensive function.)
You can use the HAVING clause to get columns in addition to that MIN value. For example:
SELECT a.*, COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2) FROM (prefiltering) as a GROUP BY a.group_att HAVING MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) = COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2);