I have two queries that are both very quick (20ms) - when i combine them with a join, i get a 30 second query and the data is wrong... What's wrong?
SELECT
count(profile.id),
date(profile.createdAt)
FROM profile
GROUP BY date(profile.createdAt)
ORDER BY date(profile.createdAt) DESC;
and
SELECT
count(product.id),
date(product.createdAt)
FROM product
GROUP BY date(product.createdAt)
ORDER BY date(product.createdAt) desc;
Joining them i get a very slow query:
SELECT
count(profile._id),
date(profile.createdAt),
count(product._id),
date(product.createdAt)
FROM profile
INNER JOIN product
ON date(product.createdAt) = date(profile.createdAt)
GROUP BY
date(product.createdAt),
date(profile.createdAt)
ORDER BY date(product.createdAt) desc;
The logical error with your current approach is that you are double counting one or both of the counts due to the join. You may try doing the aggregations in separate subqueries, and then join those subqueries:
SELECT
t1.createdAt,
COALESCE(t1.profile_cnt, 0) AS profile_cnt,
COALESCE(t2.product_cnt, 0) AS product_cnt
FROM
(
SELECT DATE(createdAt) AS createdAt, COUNT(id) AS profile_cnt
FROM profile
GROUP BY DATE(createdAt)
) t1
INNER JOIN
(
SELECT DATE(createdAt) AS createdAt, COUNT(id) AS product_cnt
FROM product
GROUP BY DATE(createdAt)
) t2
ON t1.createdAt = t2.createdAt;
If the two tables don't both contain the same dates, then the above query might drop certain dates. To avoid this, we could join with a calendar table which includes all dates we want to appear in the output.
Regarding performance, you are doing a join of two aggregation queries, so it is not expected to be that performant. Also, calling DATE to cast createdAt to a pure date is expensive, and maybe could be avoided by maintaining a dedicated date column.
I think the problem is that you are joining on the result of the date function, which is likely doing a lot under the hood. That function has to execute for every record in each table.
If you can, join with the primary keys/foreign keys of the tables to take advantage of indexes.
Related
Im trying to do two queries on the same table to get the Count(*) value.
I have this
SELECT `a`.`name`, `a`.`points` FROM `rank` AS a WHERE `id` = 1
And in the same query I want to do this
SELECT `b`.`Count(*)` FROM `rank` as b WHERE `b`.`points` >= `a`.`points`
I tried searching but did not find how to do a Count(*) in the same query.
Typically you would not intermingle a non aggregate and aggregate query together in MySQL. You might do this in databases which support analytic functions, such as SQL Server, but not in (the current version of) MySQL. That being said, your second query can be handled using a correlated subquery in the select clause the first query. So you may try the following:
SELECT
a.name,
a.points,
(SELECT COUNT(*) FROM rank b WHERE b.points >= a.points) AS cnt
FROM rank a
WHERE a.id = 1;
As I understand from the question, you want to find out in a table for a given id how many rows have the points greater than this row. This can be achieved using full join.
select count(*) from rank a join rank b on(a.id != b.id) where a.id=1 and b.points >= a.points;
I'm trying to get the rank of a particular lap time of a specific track owned by a particular user.
There are multiple rows (laps) in this table for a specific user. So I'm trying to GROUP BY as seen in the subquery of FIND_IN_SET.
Right now MySQL (latest version) is complaining that my session_id,user_id,track_id,duration are not aggregated for the GROUP BY.
Which I don't understand why its complaining about this since the GROUP BY is in a subquery.
session_lap_times schema:
session_id, int
user_id, int
track_id, int
duration, decimal
This is what I've got so far.
SELECT
session_id
user_id,
track_id,
duration,
FIND_IN_SET( duration,
(SELECT GROUP_CONCAT( duration ORDER BY duration ASC ) FROM
(SELECT user_id,track_id,min(duration)
FROM session_lap_times GROUP BY user_id,track_id) AS aa WHERE track_id=s1.track_id)
) as ranking
FROM session_lap_times s1
WHERE user_id=1
It seems like its trying to enforce the group by rules on the parent queries as well.
For reference, this is the error I'm getting: http://imgur.com/a/ILufE
Any help is greatly appreciated.
If I'm not mistaken, the problem is here (broken out for clarity):
SELECT user_id,track_id,any_value(duration)
FROM session_lap_times
GROUP BY user_id
The query is probably barfing because track_id is in the select and not in the group by. That means the subselect doesn't stand on its own and makes the whole thing fail.
Try adding track_id to your group by and adjust from there.
You are grouping by user_id but you do not do any aggregation in select or having in the following sub-query
SELECT
user_id,any_value(track_id),any_value(duration)
FROM session_lap_times GROUP BY user_id
You are using GROUP_CONCAT in a wrong context in the following sub-query because you do not group any column in ranking temporary table.
(SELECT GROUP_CONCAT( duration ORDER BY duration ASC ) FROM
(SELECT user_id,track_id,any_value(duration)
FROM session_lap_times GROUP BY user_id,track_id) AS aa WHERE track_id=s1.track_id)
) as ranking
In my query, I use a subquery which get the second last highest date of an offer for a given product.
Here's my subquery:
LEFT JOIN (SELECT product_id, MAX(offer_date) AS sec_last_date
FROM t_offers AS s1
WHERE offer_date < (SELECT MAX(offer_date)
FROM t_offers
WHERE product_id=s1.product_id)
GROUP BY product_id) AS t2 ON t2.product_id=p.id
LEFT JOIN t_offers AS o2 ON o2.product_id=t2.product_id AND
o2.offer_date=t2.sec_last_date
It works fine, but for now there're only few rows in the t_offers table.
It will probably not work that fine with thousands or millions of rows because of the WHERE clause which forces MySQL to iterate the t_offers table for each product_id.
How could I optimized this subquery ?
Sub queries are often not great in MySQL due to not using indexes for the joins.
However it might be worth trying a sub query with a join, rather than a sub query with a sub query:-
LEFT JOIN (SELECT s1.product_id, MAX(s1.offer_date) AS sec_last_date
FROM t_offers AS s1
INNER JOIN t_offers AS s2
ON s2.product_id = s1.product_id
AND s2.offer_date > s1.offer_date
GROUP BY s1.product_id) AS t2 ON t2.product_id=p.id
can't you just sort the offer date and get the latest 2 something like:
select product_id, offer_date
from your table
order by offer_Date desc
limit 2
I have 2 queries that are nearly identical, one with a GROUP BY, one without. The results are very different. The GROUP BY query results in over double the non-GROUP BY query result.
Query 1:
SELECT table2.name, COUNT(DISTINCT(op.id))
FROM op INNER JOIN table1 ON table1.EID = op.ID
INNER JOIN table3 ON table3.id = table1.jobid
INNER JOIN table2 ON table2.id = table3.CatID
WHERE op.BID = 1
AND op.ActiveStartDate <= NOW()
AND op.ActiveEndDate >= NOW()
GROUP BY table2.name
ORDER BY COUNT(*) DESC;
Query 2:
SELECT op.Type, COUNT(DISTINCT op.id)
FROM op
WHERE op.BID = 1
AND op.ActiveStartDate <= NOW()
AND op.ActiveEndDate >= NOW()
ORDER BY op.Type ASC;
These should result in the same result. When playing around with them, once I remove the "GROUP BY" from query 1, the result is the same. If I put the "GROUP BY" back into Query 1, the result is more than doubled.
Edit: It seems the additional INNER JOINS are not affecting the results, but rather the GROUP BY in query 1. If I remove the GROUP BY in query 1, the results between the 2 queries are identical. If I add the GROUP BY back into query 1, the results are very different.
I don't know how you think that those are nearly identical queries; they are very different. Anyway, you shouldn't remove the GROUP BY from the first one, but add a GROUP BY on the second query:
SELECT op.Type, COUNT(DISTINCT op.id)
FROM op
WHERE op.BID = 1
AND op.ActiveStartDate <= NOW()
AND op.ActiveEndDate >= NOW()
GROUP BY op.Type
ORDER BY op.Type ASC;
Of course, this doesn't mean that you'll get the same results anyway, since the first query has 3 extra joins.
the queries are not at all "nearly identical..." in my view.
you have INNER JOIN with other tabls that can have duplicates and so the INNER JOIN will increase the number of rows by that number.
you can check the explaination in here
INNER JOIN and GROUP BY
is there some efficient way how to write queries that join various results of GROUP BYs on a common table? How MySQL handles merging results of aggreagate functions ona a subGROUP with full fields from original table?
i am using this and its slow (and i need also other condition than CONDITION=1)
SELECT a.CID,a.AS_ALL,b.AS_ACTIVE FROM
(SELECT CID,COUNT(DISTINCT RAID) AS AS_ALL FROM MYTABLE GROUP BY CID) a
LEFT JOIN
(SELECT CID,COUNT(DISTINCT RAID) AS AS_ACTIVE FROM MYTABLE WHERE CONDITION=1 GROUP BY CID ) b ON a.CID=b.CID;
also is it save to use something like?? will MySQL always correctly merge COLUMN_A with results of aggregation?
SELECT COLUMN_A COUNT(DISTINCT COLUMN_A), SUM(COLUMN_A),SUM(COLUMN_B) FROM ATABLE WHERE CONDITION=1 GROUP BY COLUMN_C
Thank you for advice
Try this:
SELECT CID,COUNT(DISTINCT RAID) AS AS_ALL,SUM(IF(CONDITION=1, 1, 0)) AS AS_ACTIVE
FROM MYTABLE GROUP BY CID