MySQL Optimizing a subquery which uses a WHERE clause? - mysql

In my query, I use a subquery which get the second last highest date of an offer for a given product.
Here's my subquery:
LEFT JOIN (SELECT product_id, MAX(offer_date) AS sec_last_date
FROM t_offers AS s1
WHERE offer_date < (SELECT MAX(offer_date)
FROM t_offers
WHERE product_id=s1.product_id)
GROUP BY product_id) AS t2 ON t2.product_id=p.id
LEFT JOIN t_offers AS o2 ON o2.product_id=t2.product_id AND
o2.offer_date=t2.sec_last_date
It works fine, but for now there're only few rows in the t_offers table.
It will probably not work that fine with thousands or millions of rows because of the WHERE clause which forces MySQL to iterate the t_offers table for each product_id.
How could I optimized this subquery ?

Sub queries are often not great in MySQL due to not using indexes for the joins.
However it might be worth trying a sub query with a join, rather than a sub query with a sub query:-
LEFT JOIN (SELECT s1.product_id, MAX(s1.offer_date) AS sec_last_date
FROM t_offers AS s1
INNER JOIN t_offers AS s2
ON s2.product_id = s1.product_id
AND s2.offer_date > s1.offer_date
GROUP BY s1.product_id) AS t2 ON t2.product_id=p.id

can't you just sort the offer date and get the latest 2 something like:
select product_id, offer_date
from your table
order by offer_Date desc
limit 2

Related

Combining simple selects in mysql (join or union?)

I have two queries that are both very quick (20ms) - when i combine them with a join, i get a 30 second query and the data is wrong... What's wrong?
SELECT
count(profile.id),
date(profile.createdAt)
FROM profile
GROUP BY date(profile.createdAt)
ORDER BY date(profile.createdAt) DESC;
and
SELECT
count(product.id),
date(product.createdAt)
FROM product
GROUP BY date(product.createdAt)
ORDER BY date(product.createdAt) desc;
Joining them i get a very slow query:
SELECT
count(profile._id),
date(profile.createdAt),
count(product._id),
date(product.createdAt)
FROM profile
INNER JOIN product
ON date(product.createdAt) = date(profile.createdAt)
GROUP BY
date(product.createdAt),
date(profile.createdAt)
ORDER BY date(product.createdAt) desc;
The logical error with your current approach is that you are double counting one or both of the counts due to the join. You may try doing the aggregations in separate subqueries, and then join those subqueries:
SELECT
t1.createdAt,
COALESCE(t1.profile_cnt, 0) AS profile_cnt,
COALESCE(t2.product_cnt, 0) AS product_cnt
FROM
(
SELECT DATE(createdAt) AS createdAt, COUNT(id) AS profile_cnt
FROM profile
GROUP BY DATE(createdAt)
) t1
INNER JOIN
(
SELECT DATE(createdAt) AS createdAt, COUNT(id) AS product_cnt
FROM product
GROUP BY DATE(createdAt)
) t2
ON t1.createdAt = t2.createdAt;
If the two tables don't both contain the same dates, then the above query might drop certain dates. To avoid this, we could join with a calendar table which includes all dates we want to appear in the output.
Regarding performance, you are doing a join of two aggregation queries, so it is not expected to be that performant. Also, calling DATE to cast createdAt to a pure date is expensive, and maybe could be avoided by maintaining a dedicated date column.
I think the problem is that you are joining on the result of the date function, which is likely doing a lot under the hood. That function has to execute for every record in each table.
If you can, join with the primary keys/foreign keys of the tables to take advantage of indexes.

Two select statements on same table and get Count(*)

Im trying to do two queries on the same table to get the Count(*) value.
I have this
SELECT `a`.`name`, `a`.`points` FROM `rank` AS a WHERE `id` = 1
And in the same query I want to do this
SELECT `b`.`Count(*)` FROM `rank` as b WHERE `b`.`points` >= `a`.`points`
I tried searching but did not find how to do a Count(*) in the same query.
Typically you would not intermingle a non aggregate and aggregate query together in MySQL. You might do this in databases which support analytic functions, such as SQL Server, but not in (the current version of) MySQL. That being said, your second query can be handled using a correlated subquery in the select clause the first query. So you may try the following:
SELECT
a.name,
a.points,
(SELECT COUNT(*) FROM rank b WHERE b.points >= a.points) AS cnt
FROM rank a
WHERE a.id = 1;
As I understand from the question, you want to find out in a table for a given id how many rows have the points greater than this row. This can be achieved using full join.
select count(*) from rank a join rank b on(a.id != b.id) where a.id=1 and b.points >= a.points;

Can't understand. Is this a subquery?

I have something in a query that I have to edit, that I don't understand.
There are 4 tables that are joined: tickets, tasks, tickets_users, users. The whole query is not important, but you have an example at the end of the post. What bugs me is this kind of code used many times in relation to other tables:
(SELECT name
FROM users
WHERE users.id=tickets_users.users_id
) AS RequesterName,
Is this a subquery with the tables users and tickets_users joined? What is this?
WHERE users.id=tickets_users.users_id
If this was a join I would have expected to see:
ON users.id = tickets_users.users_id
And how is this different from a typical join? Just use the same column definition: users.name and just join with the users table.
Can anyone enlighten me on the advanced SQL querying prowess of the original author?
The query looks like this:
SELECT
description,
(SELECT name
FROM users
WHERE users.id = tickets_users.users_id) AS RequesterName,
(SELECT description
FROM tickets
WHERE tickets.id = ticket_tasks.tickets_id) AS TicketDescription,
ticket_tasks.content AS TaskDescription
FROM
ticket_tasks
RIGHT JOIN
tickets ON ticket_tasks.tickets_id = tickets.id
INNER JOIN
tickets_users ON tickets_users.tickets_id = tickettasks.tickets_id
Thanks,
This is what is called a correlated subquery. To describe it in simple terms its doing a select inside a select.
However doing this more than once in ANY query is not recommended AT ALL.. the performance issue with this will be huge.
A correlated subquery will return a row by row comparison for each row of the select... if that doesnt make sense then think of it this way...
SELECT
id,
(SELECT id FROM tableA AS ta WHERE ta.id > t.id)
FROM
tableB AS t;
This will do for each row in tableB, every row in tableA will be selected and compared to tableB id.
NOTE:
If you have 100 rows in all 4 tables and you do a correlated subquery for each one then you are doing 100*100*100*100 row comparisons. thats 100,000,000 (one hundred million) comparisons!
A correlated subquery is NOT a join, but rather a subquery..
SELECT *
FROM
(SELECT id FROM t -- this is a subquery
) AS temp
However, JOINs are different... generally you can do it one of these two ways
This is the faster way
SELECT *
FROM t
JOIN t1 ON t1.id = t.id
This is the slower way
SELECT *
FROM t, t1
WHERE t1.id = t.id
what the second join is doing is making the Cartesian Product of the two tables and then filtering out the extra stuff in the WHERE clause as opposed to the first JOIN that filters as it joins.
For the different types of joins theres a few and all are useful in their prospective actions..
INNER JOIN (same as JOIN)
LEFT JOIN
RIGHT JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
In mysql FULL JOIN or FULL OUTER JOIN does not exist.. so in order to do a FULL join you need to combine a LEFT and RIGHT join. See this link for a better understanding of what joins do with Venn diagrams LINK
REMEMBER this is for SQL so it includes the FULL joins as well. those don't work in MySQL.

MySQL - Why would nearly 2 identical queries result in 2 different results? GROUP BY

I have 2 queries that are nearly identical, one with a GROUP BY, one without. The results are very different. The GROUP BY query results in over double the non-GROUP BY query result.
Query 1:
SELECT table2.name, COUNT(DISTINCT(op.id))
FROM op INNER JOIN table1 ON table1.EID = op.ID
INNER JOIN table3 ON table3.id = table1.jobid
INNER JOIN table2 ON table2.id = table3.CatID
WHERE op.BID = 1
AND op.ActiveStartDate <= NOW()
AND op.ActiveEndDate >= NOW()
GROUP BY table2.name
ORDER BY COUNT(*) DESC;
Query 2:
SELECT op.Type, COUNT(DISTINCT op.id)
FROM op
WHERE op.BID = 1
AND op.ActiveStartDate <= NOW()
AND op.ActiveEndDate >= NOW()
ORDER BY op.Type ASC;
These should result in the same result. When playing around with them, once I remove the "GROUP BY" from query 1, the result is the same. If I put the "GROUP BY" back into Query 1, the result is more than doubled.
Edit: It seems the additional INNER JOINS are not affecting the results, but rather the GROUP BY in query 1. If I remove the GROUP BY in query 1, the results between the 2 queries are identical. If I add the GROUP BY back into query 1, the results are very different.
I don't know how you think that those are nearly identical queries; they are very different. Anyway, you shouldn't remove the GROUP BY from the first one, but add a GROUP BY on the second query:
SELECT op.Type, COUNT(DISTINCT op.id)
FROM op
WHERE op.BID = 1
AND op.ActiveStartDate <= NOW()
AND op.ActiveEndDate >= NOW()
GROUP BY op.Type
ORDER BY op.Type ASC;
Of course, this doesn't mean that you'll get the same results anyway, since the first query has 3 extra joins.
the queries are not at all "nearly identical..." in my view.
you have INNER JOIN with other tabls that can have duplicates and so the INNER JOIN will increase the number of rows by that number.
you can check the explaination in here
INNER JOIN and GROUP BY

How to find at most two items in MySQL without aggregations and groupings

I have a table Transactions(store_id, item_id, price). I want to find store_id's which sell at most two different item without using aggregate functions and groupings.
Is there any way to do that ?
Interesting requirements... this would be a lot faster and easier with aggregate functions and groupings.. but here's another way:
SELECT DISTINCT t1.store_id
FROM
Transactions t1
LEFT JOIN Transactions t2
ON t1.store_id = t2.store_id
AND t1.item_id <> t2.item_id
LEFT JOIN Transactions t3
ON t1.store_id = t3.store_id
AND t3.item_id NOT IN (t1.item_id, t2.item_id)
WHERE t3.store_id IS NULL
The query works by joining from one store record to another record for the same store, but different item. It then attempts to join to a third record for the same store, but different item. If it finds this record, then the store sells more than two items, and will be excluded in the WHERE clause.
Just to give you an idea, here's how the query would normally look:
SELECT store_id
FROM Transactions
GROUP BY store_id
HAVING COUNT(DISTINCT item_id) < 3