This question already has answers here:
Get top n records for each group of grouped results
(12 answers)
Get records with highest/smallest <whatever> per group
(2 answers)
Closed 8 years ago.
I have the following query which returns the correct results but I'm sure it's not the best way to get these results...
select * from (
select * from features where feature_area = 0
order by updateStamp desc limit 1
) as feature_1
union all
select * from (
select * from features where feature_area = 1
order by updateStamp desc limit 1
) as feature_2
union all
select * from (
select * from features where feature_area = 2
order by updateStamp desc limit 1
) as feature_3
This returns results which look something like...
id feature_area title updateStamp
--------------------------------------------------------------------
103 0 This is a title 2014-04-15 09:26:14
102 1 Another title 2014-03-27 14:09:49
98 2 More title 2014-01-21 16:00:55
Could this be improved using joins rather than unions and if so could you point me in the right direction please.
EDIT:
Having looked at the other options pointed out by #Ben it would seem I've already got the quickest query (albeit not that attractive) for my particular purpose. Feel free to correct me if you think I'm wrong though. I'm no expert, hence I'm asking for advice.
select f.* from features f
inner join (
select
feature_area
max(updateStamp) as updateStamp
from
features
where feature_are IN (0,1,2)
group by feature_area
) sq on sq.feature_area = f.feature_area
and sq.updateStamp = f.updateStamp
Hope I read your question correctly.
select *
From features f
inner join ( select feature_area, max(updateStamp) as maxUpdateStamp
from features
Group by feature_area
) as minfeatures
ON minfeatures.feature_area = f.feature_area
AND minfeatures.maxUpdateStamp = f.updateStamp
Assuming proper indexes, it's often most performant to solve this with the anti-join:
SELECT f1.*
FROM features f1
LEFT JOIN features f2
ON f2.feature_area = f1.feature_area
AND f2.updateStamp < f1.updateStamp
WHERE f1.feature_area < 3
AND f2.id IS NULL
ORDER BY f1.feature_area
In cases where there are duplicate rows with same feature_area and highest updateStamp, it will return duplicate rows.
For more explanation of this technique:
Get records with highest/smallest <whatever> per group
with MaxFeature
AS
(
select
feature_area AS feature_area
,max(updateStamp) AS MaxUpdateStamp
from
features
group by
feature_area
)
select
Features.*
from
Features
inner join
Maxfeature
on
Features.feature_area = MaxFeature.feature_area
and
Features.updateStamp = MaxFeature.MaxUpdateStamp
order by
Features.feature_area asc
Related
This is just a test project. I want to know how to select All professors with more than 5 failed students in a subject
I already know how to select All professors with at least 2 subjects with the following query:
SELECT paulin_professors.*,
IFNULL(sub_p.total, 0) num
FROM paulin_professors
LEFT JOIN ( SELECT COUNT(*) total, pau_profid
FROM paulin_profsubject
GROUP BY pau_profid
) sub_p ON (sub_p.pau_profid = paulin_professors.pau_profid)
WHERE sub_p.total >= 2;
I know I'm close but I can't get it to work (All professors with more than 5 failed students in a subject) . Any ideas? TIA
try using SELECT with UNION
select [columnName1],[columnName2] from [Table1] where [condition] union select [columnName1],[columnName2] from [Table1] where [condition] union ....
Looks like can get the professor IDs from the profsubject table and JOIN the studentenrolled table using the subjid for the join. In a similar way to what you had, you can get the count of students who have a grade less than a certain pass/fail threshold (in this case 65).
Then to get a short list, you can select the distinct profids from this derivaed table.
SELECT
distinct pau_profid
FROM
(SELECT
t1.pau_profid,
IFNULL(t2.total_failed, 0) number_failed >= 5
FROM
paulin_profsubject t1
LEFT JOIN
(SELECT
COUNT(*) total_failed,
pau_subjid
FROM
paulin_studentenrolled
WHERE
pau_grade < 65
GROUP BY
pau_subjid
) t2
ON
t1.pau_subjid = t2.pau_subjid
WHERE
number_failed >= 5
) t3;
I have the following query…
SELECT DISTINCT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
GROUP BY post_post_id
UNION
SELECT DISTINCT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id )
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
AND post_updated >:updated
GROUP BY post_post_id ORDER BY post_posted_date DESC LIMIT :limit
Where :id = 7, :updated = 0.0 and :limit=40 for example
My issue is that the query is taking about a minute to return results. Is there anything in this query that I can do to speed up the result?
I am using RDS
********EDIT*********
I was asked to run the query with an EXPLAIN the result is below
********EDIT**********
View Definitition
CREATE ALGORITHM=UNDEFINED DEFINER=`MySQLUSer`#`%` SQL SECURITY DEFINER VIEW `vPAS_Posts_Users`
AS SELECT
`PAS_User`.`user_user_id` AS `user_user_id`,
`PAS_User`.`user_country` AS `user_country`,
`PAS_User`.`user_city` AS `user_city`,
`PAS_User`.`user_company` AS `user_company`,
`PAS_User`.`user_account_type` AS `user_account_type`,
`PAS_User`.`user_account_premium` AS `user_account_premium`,
`PAS_User`.`user_sign_up_date` AS `user_sign_up_date`,
`PAS_User`.`user_first_name` AS `user_first_name`,
`PAS_User`.`user_last_name` AS `user_last_name`,
`PAS_User`.`user_avatar_url` AS `user_avatar_url`,
`PAS_User`.`user_cover_image_url` AS `user_cover_image_url`,
`PAS_User`.`user_bio` AS `user_bio`,
`PAS_User`.`user_telephone` AS `user_telephone`,
`PAS_User`.`user_dob` AS `user_dob`,
`PAS_User`.`user_sector` AS `user_sector`,
`PAS_User`.`user_job_type` AS `user_job_type`,
`PAS_User`.`user_unique` AS `user_unique`,
`PAS_User`.`user_deleted` AS `user_deleted`,
`PAS_User`.`user_updated` AS `user_updated`,
`PAS_Post`.`post_post_id` AS `post_post_id`,
`PAS_Post`.`post_language_id` AS `post_language_id`,
`PAS_Post`.`post_type` AS `post_type`,
`PAS_Post`.`post_promoted` AS `post_promoted`,
`PAS_Post`.`post_user_id` AS `post_user_id`,
`PAS_Post`.`post_posted_date` AS `post_posted_date`,
`PAS_Post`.`post_latitude` AS `post_latitude`,
`PAS_Post`.`post_longitude` AS `post_longitude`,
`PAS_Post`.`post_location_name` AS `post_location_name`,
`PAS_Post`.`post_text` AS `post_text`,
`PAS_Post`.`post_media_url` AS `post_media_url`,
`PAS_Post`.`post_image_height` AS `post_image_height`,
`PAS_Post`.`post_link` AS `post_link`,
`PAS_Post`.`post_link_title` AS `post_link_title`,
`PAS_Post`.`post_unique` AS `post_unique`,
`PAS_Post`.`post_deleted` AS `post_deleted`,
`PAS_Post`.`post_updated` AS `post_updated`,
`PAS_Post`.`post_original_post_id` AS `post_original_post_id`,
`PAS_Post`.`post_original_type` AS `post_original_type`,
`PAS_Post`.`post_passed_on_by` AS `post_passed_on_by`,
`PAS_Post`.`post_passed_on_caption` AS `post_passed_on_caption`,
`PAS_Post`.`post_passed_on_fullname` AS `post_passed_on_fullname`,
`PAS_Post`.`post_passed_on_avatar_url` AS `post_passed_on_avatar_url`
FROM (`PAS_User` join `PAS_Post` on((`PAS_User`.`user_user_id` = `PAS_Post`.`post_user_id`)));
try this query:
SELECT *
FROM
vPAS_Posts_Users
WHERE
post_user_id =:id
AND post_type != 4
AND post_updated > :updated
UNION
SELECT u.*
FROM vPAS_Posts_Users u
JOIN PAS_Follow f ON f.folw_followed_user_id = u.post_user_id
WHERE
u.post_updated > :updated
AND ( (f.folw_follower_user_id = :id AND f.folw_deleted = 0)
OR (u.post_type = 4 AND u.post_passed_on_by = f.folw_follower_user_id AND u.post_user_id != :id)
)
ORDER BY u.post_posted_date DESC;
LIMIT :limit
Other improvements
Indices:
Be sure you have indices on the following columns:
PAS_User.user_user_id
PAS_Post.post_user_id
PAS_Post.post_type
PAS_Post.post_updated
PAS_Follow.folw_followed_user_id
PAS_Follow.folw_deleted
PAS_Post.post_passed_on_by
After that is done, please 1- check the performance again (SQL_NO_CACHE) and 2- extract another explain plan so we can adjust the query.
EXPLAIN Results
Here are the some suggestions for the query and view first of all using the UNION for the two result sets which might makes your query to work slow instead you can use the UNION ALL
Why i am referring you to use UNION ALL
Reason is both UNION ALL and UNION use temporary table for result generation.The difference in execution speed comes from the fact UNION requires internal temporary table with index (to skip duplicate rows) while UNION ALL will create table without such index.This explains the slight performance improvement when using UNION ALL.
UNION on its own will remove any duplicate records so no need to use the DISTINCT clause, try to only one GROUP BY of the whole result set by subqueries this will also minimize the execution time rather then grouping results in each subquery.
Make sure you have added the right indexes on the columns especially the columns used in the WHERE,ORDER BY, GROUP BY, the data types should be appropriate for each column with respect to the nature of data in it like post_posted_date should be datetime,date with an index also.
Here is the rough idea for the query
SELECT q.* FROM (
SELECT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
UNION ALL
SELECT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id
AND vPAS_Posts_Users.post_updated >:updated)
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
) q
GROUP BY q.post_post_id ORDER BY q.post_posted_date DESC LIMIT :limit
References
Difference Between Union vs. Union All – Optimal Performance Comparison
Optimize Mysql Union
MySQL Performance Blog
From your explain I can see that most of your table don't have any key except for the primary one, I would suggest you to add some extra key on the columns you're going to join, for example on: PAS_Follow.folw_followed_user_id and vPAS_Posts_Users.post_user_id, just this will result in a big performance boost.
Bye,
Gnagno
I have a simple query that uses a subquery:
SELECT pictures.*
FROM pictures
WHERE pictures.user_id IN
(SELECT follows.following_id
FROM follows
WHERE follows.follower_id = 9)
ORDER BY created_at DESC LIMIT 5;
I am wondering,
a) How can I remove the sub query and use JOINS instead and b) will there be a performance benefit in using JOINS instead of sub query?
(follows.following_id, follows.follower_id, pictures.user_id are all indexed)
Thanks
SELECT DISTINCT pictures.*
FROM pictures
INNER JOIN follows
ON pictures.user_ID = follows.following_id
WHERE follows.follower_id = 9
ORDER BY pictures.created_at DESC
LIMIT 5
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
UPDATE
Another way to achieve the same result is by using EXISTS
SELECT *
FROM pictures
WHERE EXISTS
(
SELECT 1
FROM follows
WHERE pictures.user_ID = follows.following_id AND
follows.follower_id = 9
)
ORDER BY pictures.created_at DESC
LIMIT 5
Below query is doing what I need:
SELECT assign.from_uid, assign.aid, assign.message, curriculum.asset,
curriculum.title, curriculum.description
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum
ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13 AND assign.status = 1
GROUP BY assign.from_uid, assign.to_uid, assign.nid
ORDER BY assign.created DESC
Now I need to get the total count of rows of the result. For example if it is displaying 5 rows the o/p should be like My expected o/p. The query I tried is given below.
SELECT count(description) FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13 AND assign.status = 1
GROUP BY assign.from_uid, assign.to_uid, assign.nid
ORDER BY assign.created DESC
My expected o/p:
count(*)
---------
5
My current o/p:
count(*)
---------
6
2
5
6
6
The easiest solution would be to
place your initial GROUP BY query in a subselect
select the amount of rows retrieved from this subselect
SQL Statement
SELECT COUNT(*)
FROM (
SELECT assign.from_uid
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13
AND assign.status = 1
GROUP BY
assign.from_uid
, assign.to_uid
, assign.nid
) q
Edit - why doesn't the original query return the results required
It did already prepared what was needed to get the correct result
Your query without grouping returns a resultset of 25 records (6+2+5+6+6)
From these 25 records, you have 5 unique combinations of from_uid, to_uid, nid
Now you don't want to count how many records each combination has (as you did in your example) but how many unique (distinct anyone?) combinations there are.
One solution to this is the subselect I presented but following equivalent statement using a DISTINCT clause might be more comprehensive.
SELECT COUNT(*)
FROM (
SELECT DISTINCT assign.from_uid
, assign.to_uid
, assign.nid
FROM assignment assign
INNER JOIN curriculum_topics_assets curriculum ON assign.nid = curriculum.asset
WHERE assign.to_uid = 13
AND assign.status = 1
) q
Note that my personal preference goes to the GROUP BY solution.
To get the number of rows for a query do:
SELECT COUNT(*) as RowCount FROM (--insert other query here--) s
In you example:
SELECT COUNT(*) as RowCount FROM (SELECT a.from_uid
FROM assignment a
INNER JOIN curriculum_topics_assets c ON a.nid = c.asset
WHERE a.to_uid = 13
AND a.status = 1
GROUP BY a.from_uid, a.to_uid, a.nid
) s
Note that I the dropped the stuff that has no effect on the number of rows to make the query run slightly faster.
You should use COUNT(*) instead of count(description). Look at: http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/
I have the following queries -
SELECT COUNT(capture_id) as count_captures
FROM captures
WHERE user_id = 9
...returns 5
SELECT COUNT(id) as count_items
FROM items
WHERE creator_user_id = 9
...returns 22
I tried the following query -
SELECT COUNT(capture_id) as count_captures,
COUNT(items.id) as count_items
FROM captures
LEFT JOIN items ON captures.user_id = items.creator_user_id
WHERE user_id = 9
...but it returns two columns both with 110 as the value. I would want 5 in one column and 22 in the other. What am I doing wrong?
My knee-jerk is a subquery:
select count(capture_id) as count_captures,
(select count(id) as count_items
from items i where i.creator_user_id = captures.user_id) as count_items
from captures
where user_id = 9
I'm not really sure what you can do to avoid this. You're seeing expected (and generally desired behavior).
Of course, if you know that the ID's in both won't repeat themselves, you can use distinct:
SELECT COUNT( DISTINCT capture_id) as count_captures,
COUNT( DISTINCT items.id) as count_items
FROM captures
LEFT JOIN items ON captures.user_id = items.creator_user_id
WHERE user_id = 9
A LEFT JOIN returns each row in the left table with each row in the right table that matches the results. Since all of your id's are the same which produces a Cartesian Product of the table. (5 * 22 = 110).
This is expected to happen.
You could always union the results (warning, untested):
SELECT SUM(sub.count_captures), SUM(sub.count_items)
FROM (SELECT COUNT(capture_id) as count_captures, 0 as count_items
from captures where user_id = 9
UNION
SELECT 0 as count_captures, count(id) as count_items
from items where creator_user = 9) sub
Another way to combine two (seemingly not related) queries into one:
SELECT
( SELECT COUNT(capture_id)
FROM captures
WHERE user_id = 9
)
AS count_captures
, ( SELECT COUNT(id)
FROM items
WHERE creator_user_id = 9
)
AS count_items
There really is no need for subqueries or JOIN in these cases. Although the optimizer may be smart enough to figure that out, I wouldn't try to confuse him.