MySQL how to select first row each group with count - mysql

I have a table like this (simplified version):
+------+-------+-----+--------------+-----+
| id | name | age | company.name | ...
+------+-------+-----+--------------------+
| 1 | Adam | 21 | Google | ...
| 3 | Peter | 20 | Apple | ...
| 2 | Bob | 20 | Microsoft | ...
| 9 | Alice | 18 | Google | ...
+------+-------+-----+--------------------+
I need groups data with counting rows by any one column. And I need to get first row in each group. User select which column will be used to group.
If user select column age to group then results:
+------+------------+-------+
| id | group_name | count |
+------+------------+-------+
| 9 | 18 | 1 |
+------+------------+-------+
| 2 | 20 | 2 |
+------+------------+-------+
| 1 | 21 | 1 |
+------+------------+-------+
Column to group may be numeric or string.
Currently I does it by this query:
SELECT id, group_name, users_name, count(id) as count FROM (
SELECT persons.id as id, company.type as group_name, users.name as users_name
FROM persons
LEFT JOIN company on company.id = persons.company_id
LEFT JOIN position on position.id=persons.position_id
...
LEFT JOIN source on source.id=persons.source_id
WHERE ...
ORDER BY if(company.type = '' or company.type is null,1,0) ASC,
company.type ASC, IF(persons.status = '' or persons.status is null,1,0) ASC,
persons.status ASC, persons.id
) t1 GROUP BY group_name
but with new version mysql this SQL stoped works I think that order is ignored in sub-select.
I know that similar topics was wroted, but proposed solutions not working with my query. I have to join many tables, add multiple conditions and use cascade order and then select first row from each group. I will be very happy if solution will be optimised for performace.
---- EDIT ----
Proposed solution:
SQL select only rows with max value on a column
which suggest to use MAX() and GROUP BY not working well. For two reason
If grouped column include string, then query return not first row, but last row in each group.
If my dataset has a cascade order, I can not use MAX in a few columns at the same time.
I created sqlfiddle which include exact example.
http://sqlfiddle.com/#!9/23225d/11/0
-- EXAMPLE 1 - Group by string
-- base query
SELECT persons.*, company.* FROM persons
LEFT JOIN company ON persons.company_id = company.id
ORDER BY company.name ASC, company.id ASC;
-- grouping query
SELECT MAX(persons.id) as id, company.name, count(persons.id) as count
FROM persons
LEFT JOIN company ON persons.company_id = company.id
GROUP BY company.name
ORDER BY company.name ASC, persons.id ASC;
-- The results will be:
-- |ID | NAME | COUNT|
-- |1 | Google | 2 |
-- |3 | Microsoft| 3 |
-- EXAMPLE 2 - Cascade order
-- base query
SELECT persons.*, company.* FROM persons
LEFT JOIN company ON persons.company_id = company.id
ORDER BY company.type ASC, persons.status ASC;
-- grouping query
SELECT MAX(persons.id) as id, company.type, count(persons.id) as count
FROM persons
LEFT JOIN company ON persons.company_id = company.id
GROUP BY company.type
ORDER BY company.type ASC, persons.status ASC;
-- The results will be:
-- |ID | NAME| COUNT|
-- |3 | 1 | 2 |
-- |2 | 2 | 3 |

Just change MAX() to MIN() to get the first row instead of the last row in each group.
To get the extreme values of cascading columns, see SQL : Using GROUP BY and MAX on multiple columns. Use that in the subquery part of the query to get the row containing those extremes, as in SQL select only rows with max value on a column.
So the form of the full query is:
SELECT t1.id, t1.grouped_column, t2.count
FROM yourTable AS t
JOIN (SELECT t3.grouped_column, t3.order_column1, MIN(t4.order_column2) AS order_column2, SUM(t3.count) AS count
FROM (SELECT grouped_column, MIN(order_column1) AS order_column1, COUNT(*) AS count
FROM yourTable
GROUP BY grouped_column) AS t3
JOIN yourTable AS t4
ON t3.grouped_column = t4.grouped_column AND t3.order_column1 = t4.order_column1
GROUP BY t4.grouped_column, t4.order_column1) AS t2
ON t1.grouped_column = t2.grouped_column AND t1.ordered_column1 = t2.order_column1 AND t1.order_column2 = t2.order_column2
Since you want to operate on a join, I suggest you define a view that uses the join. Then you can use that view in place of yourTable in the above query.

Related

MySql Join and count number of records

I have two tables
tbl_groups:
id | name
----------------
1 | BSCS
2 | BSIT
3 | BBA
tbl_students:
id | name | group_id
-------------------------------
1 | Student Name | 1
2 | Student 2 | 1
3 | Student 3 | 2
I want to show groups details: group name and number of students in a particular group,
I am using this query but it shows groups that has students. it does not show group with 0 students.
select tb2.id, tb2.name, count(*) from tbl_students tb1 JOIN tbl_groups tb2 ON tb1.group_id = tb2.id
How do I show all groups, please give me some idea
EDIT:
if I use above query I get following result:
id | name | count(*)
-------------------------------
1 | Student Name | 2
2 | BSIT | 1
(it doest show 3rd group because there are 0 students, I want to show this groups also).
Just use a left join:
select tb2.id, tb2.name, count(tb1.id) as no_std
from tbl_groups tb2
LEFT JOIN tbl_students tb1 ON tb2.id = tb1.group_id
group by tb2.id, tb2.name
See it working live here: http://sqlfiddle.com/#!9/2282a3/5
I would just use a correlated subquery to get the count of students in each group, like so:
select
g.*,
(select count(*) from tbl_students s where s.group_id = g.id) no_students
from tbl_groups g
This does not filter out groups that have no students (it will give a count of 0 instead). And with an index on tbl_students(group_id), this should be as efficient as it gets (this index is already there if you set up a foreign key constraint on that column - as you should have).

leetcode 574 winning candidate query

Please see the picture for ERROR SCREENSHOT
Table: Candidate
+-----+---------+
| id | Name |
+-----+---------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
+-----+---------+
Table: Vote
+-----+--------------+
| id | CandidateId |
+-----+--------------+
| 1 | 2 |
| 2 | 4 |
| 3 | 3 |
| 4 | 2 |
| 5 | 5 |
+-----+--------------+
id is the auto-increment primary key, CandidateId is the id appeared in Candidate table.
Write a sql to find the name of the winning candidate, the above example will return the winner B.
+------+
| Name |
+------+
| B |
+------+
Notes:
You may assume there is no tie, in other words there will be at most one winning candidate.
Why this code can't work? Just try to use without limit
SELECT c.Name AS Name
FROM Candidate AS c
JOIN
(SELECT r.CandidateId AS can, MAX(r.Total_vote) AS big
FROM (SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) AS r) AS v
ON c.id = v.can;
In your query, here: SELECT r.CandidateId AS can, MAX(r.Total_vote) AS big
you use MAX aggregate function, without group by, which is not correct SQL.
Try:
SELECT Candidate.* FROM Candidate
JOIN (
SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId
ORDER BY COUNT(id) DESC LIMIT 1
) v
ON Candidate.id = v.CandidateId
This is a join/group by query with order by:
select c.name
from candidate c join
vote v
on v.candidateid = c.id
group by c.id, c.name
order by count(*) desc
limit 1;
SELECT c.Name AS Name
FROM Candidate AS c JOIN (SELECT r.CandidateId AS can
FROM
(SELECT CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) AS r
WHERE r.Total_vote = (SELECT MAX(r.Total_vote) FROM (SELECT
CandidateId, COUNT(id) AS Total_vote
FROM Vote
GROUP BY CandidateId) r)) AS v
ON c.id = v.can;
This is updated code
My code has two errors. The first one is "use of an aggregate like Max requires a Group By clause if there are any non-aggregated columns in the select list", but not sure why my previous code still can run and show no error. Maybe the system add the group by function automatically when it run.
The second one is that max can't be used with Group by in this format.

Select AVG of a column and a single specific row

feedback table
-------------------------------
|rating|feedback|feedback_date|
-------------------------------
| 5 | good | 1452638788 |
| 1 | bad | 1452638900 |
| 0 | ugly | 1452750303 |
| 3 | ok | 1453903030 |
-------------------------------
desired result
average_rating | rating | feedback | feedback_date
2.25 | 3 | ok | 1453903030
Is it possible (in a single query) to select the average of one column and also one specific row from the table?
For example, i'd like to retrieve the average of the column rating and the most recent row as a whole.
I tried the following, and also with the ORDER BY direction as DSC but they both just gave me the average_rating and the first row in the table.
SELECT AVG(f.rating) AS average_rating, f.* FROM feedback f ORDER BY feedback_date ASC
SELECT * FROM feedback NATURAL JOIN (
SELECT AVG(rating), MAX(feedback_date) feedback_date FROM feedback
) t
See it on sqlfiddle.
you can do it with a sub query like this
SELECT AVG(f.rating) AS average_rating, t1.* FROM feedback f inner join (select * from feedback order by feedback_date asc limit 1 ) t1 on true
You can put a subquery in the SELECT clause, and calculate the average in the subquery.
SELECT (SELECT AVG(rating) FROM feedback) AS avg_rating, feedback.*
FROM feedback
ORDER BY feedback_date DESC
LIMIT 1

Join with Group By & Order by using Case in MySQL

Why this query wont work? Is it beacause combinaton of order by and group by?
One table is with adverts, other with subscriptions, third is with services, and fourth is many to many relation between services and locations (location is position where advert should be shown).
What i want is to order adverts stored in adverts table having location 2 first, then those who don't have location defined and then with location 1 (this order is generated programmicaly)
adverts table:
id, name, subscription_id
subscriptions table:
subscription_id, service_id, date, paid etc...
service_locations table:
service_id, location_id
as you can se there is fourth table in this case, but it is unimportant
The query:
select adverts.id, GROUP_CONCAT(service_locations.location_id) AS locations from adverts
left join subscriptions
on adverts.subscription_id = subscriptions.id
left join service_locations
on service_locations.service_id = subscriptions.service_id
group by adverts.id
order by case service_locations.location_id
when 2 then 1
when 1 then 3
else 2
end
Expected results:
+----+-----------+
| id | locations |
+----+-----------+
| 1 | 2 |
| 3 | 1,2 |
| 2 | null |
+----+-----------+
What i actually get (the third in row has location 2 but it is placed after null):
+----+-----------+
| id | locations |
+----+-----------+
| 1 | 2 |
| 2 | null |
| 3 | 1,2 |
+----+-----------+
When you use group by, all columns not in the group by should have aggregation functions. So, I think you intend something like this:
select a.id, GROUP_CONCAT(sl.location_id) AS locations
from adverts a left join
subscriptions s
on a.subscription_id = s.id left join
service_locations sl
on sl.service_id = s.service_id
group by a.id
order by max(case sl.location_id
when 2 then 1
when 1 then 3
else 2
end);
I'm not sure if max() is what you really need, but you do need an aggregation function. This specifically produces the output in the question:
order by (case min(sl.location_id)
when 2 then 1
when 1 then 2
else 3
end);
I have found a solution, order by must be executed before group by, which is not a default behaivor, more about that behaivour here: https://stackoverflow.com/a/14771322/4329156) (a subquery must be used)
So, query should look like
select *, GROUP_CONCAT(location_id) as locations from (
select adverts.id AS id, service_locations.location_id AS location_id from adverts
left join subscriptions
on adverts.subscription_id = subscriptions.id
left join service_locations
on service_locations.service_id = subscriptions.service_id
order by case service_locations.location_id
when 2 then 1
when 1 then 3
else 2
end
) as table
group by table.id
order by case table.location_id
when 2 then 1
when 1 then 3
else 2
end

Sort data before using GROUP BY?

I have read that grouping happens before ordering, is there any way that I can order first before grouping without having to wrap my whole query around another query just to do this?
Let's say I have this data:
id | user_id | date_recorded
1 | 1 | 2011-11-07
2 | 1 | 2011-11-05
3 | 1 | 2011-11-06
4 | 2 | 2011-11-03
5 | 2 | 2011-11-06
Normally, I'd have to do this query in order to get what I want:
SELECT
*
FROM (
SELECT * FROM table ORDER BY date_recorded DESC
) t1
GROUP BY t1.user_id
But I'm wondering if there's a better solution.
Your question is somewhat unclear but I have a suspicion what you really want is not any GROUP aggregates at all, but rather ordering by date first, then user ID:
SELECT
id,
user_id,
date_recorded
FROM tbl
ORDER BY date_recorded DESC, user_id ASC
Here would be the result. Note reordering by date_recorded from your original example
id | user_id | date_recorded
1 | 1 | 2011-11-07
3 | 1 | 2011-11-06
2 | 1 | 2011-11-05
5 | 2 | 2011-11-06
4 | 2 | 2011-11-03
Update
To retrieve the full latest record per user_id, a JOIN is needed. The subquery (mx) locates the latest date_recorded per user_id, and that result is joined to the full table to retrieve the remaining columns.
SELECT
mx.user_id,
mx.maxdate,
t.id
FROM (
SELECT
user_id,
MAX(date_recorded) AS maxdate
FROM tbl
GROUP BY user_id
) mx JOIN tbl t ON mx.user_id = t.user_id AND mx.date_recorded = t.date_recorded
Iam just using the technique
"Using order clause before group by inserting it in group_concat clause"
SELECT SUBSTRING_INDEX(group_concat(cast(id as char)
ORDER BY date_recorded desc),',',1),
user_id,
SUBSTRING_INDEX(group_concat(cast(`date_recorded` as char)
ORDER BY `date_recorded` desc),',',1)
FROM data
GROUP BY user_id