Select distinct column along with some other columns in MySQL - mysql

I can't seem to find a suitable solution for the following (probably an age old) problem so hoping someone can shed some light. I need to return 1 distinct column along with other non distinct columns in mySQL.
I have the following table in mySQL:
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
3 James Barbados 3 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
6 Declan Trinidad 3 WI
I would like SQL statement to return the DISTINCT name along with the destination, rating based on country.
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
As you can see, James and Declan have different ratings, but the same name, so they are returned only once.
The following query returns all rows because the ratings are different. Is there anyway I can return the above result set?
SELECT (distinct name), destination, rating
FROM table
WHERE country = 'WI'
ORDER BY id

Using a subquery, you can get the highest id for each name, then select the rest of the rows based on that:
SELECT * FROM table
WHERE id IN (
SELECT MAX(id) FROM table GROUP BY name
)
If you'd prefer, use MIN(id) to get the first record for each name instead of the last.
It can also be done with an INNER JOIN against the subquery. For this purpose the performance should be similar, and sometimes you need to join on two columns from the subquery.
SELECT
table.*
FROM
table
INNER JOIN (
SELECT MAX(id) AS id FROM table GROUP BY name
) maxid ON table.id = maxid.id

The problem is that distinct works across the entire return set and not just the first field. Otherwise MySQL wouldn't know what record to return. So, you want to have some sort of group function on rating, whether MAX, MIN, GROUP_CONCAT, AVG, or several other functions.
Michael has already posted a good answer, so I'm not going to re-write the query.

I agree with #rcdmk . Using a DEPENDENT subquery can kill performance, GROUP BY seems more suitable provided that you have already INDEXed the country field and only a few rows will reach the server. Rewriting the query giben by #rcdmk , I added the ORDER BY NULL clause to suppress the implicit ordering by GROUP BY, to make it a little faster:
SELECT MIN(id) as id, name, destination as rating, country
FROM table WHERE country = 'WI'
GROUP BY name, destination ORDER BY NULL

You can do a GROUP BY clause:
SELECT MIN(id) AS id, name, destination, AVG(rating) AS rating, country
FROM TABLE_NAME
GROUP BY name, destination, country
This query would perform better in large datasets than the subquery alternatives and it can be easier to read as well.

Related

MySQL: Count occurrences of distinct values for each row

Based on an example already given, I would like to ask my further question.
MySQL: Count occurrences of distinct values
example db
id name
----- ------
1 Mark
2 Mike
3 Paul
4 Mike
5 Mike
6 John
7 Mark
expected result
name count
----- -----
Mark 2
Mike 3
Paul 1
Mike 3
Mike 3
John 1
Mark 2
In my opinion 'GROUP BY' doesn't help.
Thank you very much.
Simplest approach would be using Count() as Window Function over a partition of name; but they are available only in MySQL 8.0.2 and onwards.
However, another approach is possible using a Derived Table. In a sub-select query (Derived Table), we will identify the counts for each unique name. Now, we simply need to join this to the main table, to show counts against each name (while not doing a grouping on them):
SELECT
t1.name,
dt.total_count
FROM your_table AS t1
JOIN
(
SELECT name,
COUNT(*) AS total_count
FROM your_table
GROUP BY name
) AS dt ON dt.name = t1.name
ORDER BY t1.id
If MySQL 8.0.2+ is available, the solution would be less verbose:
SELECT
name,
COUNT(*) OVER (PARTITION BY name) AS total_count
FROM your_table

How can I write a query that aggregate a single row with latest date among multiple set of rows?

I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.
SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON
Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason
In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason
The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle
Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;

MS Access query-String aggregation

Looking for query in MS Access for below question-
Following is my data set where last row is with NULL in Value column. Also by doing Max(Value) for each Name+Office+Person+Category, I have extracted this data to avoid multiple rows with value
ID Name Office Person Category Value
1 FMR Americas Ben Global 7
1 FMR London Ben Global 5
1 FMR London Ben Overall 4.2
156 Asset London Ben Global 13
156 Asset London Ben Overall
157 WSR Paris Zen Global 2
My Expected result set is as below- I am expecting cross mark or any indicator which will show that for ID,Name,Office,person combination has value for Global/Overll categories or not in single row. I know it's somewhat of similar to "String aggregation"
ID Name Office Person Global Overall
1 FMR Americas Ben X
1 FMR London Ben X X
156 Asset London Ben X
157 WSR Paris Zen X
Appreciate your inputs..
I played around with this a little. I created two select queries Global and Overall
Global
SELECT ID, Name, Office, Person, Category AS Global
FROM [YourTable]
WHERE Category="Global" AND Value IS NOT NULL
Overall
SELECT ID, Name, Office, Person, Category AS Overall
FROM [YourTable]
WHERE Category="Overall" AND Value IS NOT NULL
Then I created a new query to join the select queries
SELECT g.ID, g.Name, g.Office, g.Person, Global, Overall
FROM Global g
LEFT JOIN Overall o ON g.ID = o.ID AND g.Name = o.Name AND g.Office = o.Office AND g.Person = o.Person
Hope this helps.
First, get a list of unique id/name/office combinations:
SELECT DISTINCT ID, Name, Office, Person
FROM TableName
Next, create subqueries for each category:
For Global:
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Global"
For Overall:
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Overall"
Finally, left join the subqueries to the main query, and use an expression to show the X:
SELECT DISTINCT ID, Name, Office, Person
Iif(Global.ID Is Not Null, "X") AS IsGlobal,
Iif(Overall.ID Is Not Null, "X") AS IsOverall
FROM (TableName
LEFT JOIN (
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Global"
) AS Global
ON TableName.ID=Global.ID
AND TableName.Name=Global.Name
AND TableName.Office=Global.Office
AND TableName.Person=Global.Person)
LEFT JOIN (
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Overall"
) AS Overall
ON TableName.ID=Overall.ID
AND TableName.Name=Overall.Name
AND TableName.Office=Overall.Office
AND TableName.Person=Overall.Person
It may be easier for you to save the subqueries as Access queries and reference the saved queries by name, instead of including the whole subquery in this query.

How does this count work?

My query is given below:
select vend_id,
COUNT(*) as num_prods
from Products
group by vend_id;
Please tell me how does this part work - select vend_id, COUNT(vend_id) as opposed to select COUNT(vend_id)?
select COUNT(vend_id)
That will return the number of rows where the vendor ID is not null
select vend_id, COUNT(*) as num_prods
from Products
group by vend_id
That will group the elements by Id's, and return, for each Id, how many rows do you have.
An example:
ID name salary start_date city region
----------- ---------- ----------- ----------------------- ---------- ------
1 Jason 40420 1994-02-01 00:00:00.000 New York W
2 Robert 14420 1995-01-02 00:00:00.000 Vancouver N
3 Celia 24020 1996-12-03 00:00:00.000 Toronto W
4 Linda 40620 1997-11-04 00:00:00.000 New York N
5 David 80026 1998-10-05 00:00:00.000 Vancouver W
6 James 70060 1999-09-06 00:00:00.000 Toronto N
7 Alison 90620 2000-08-07 00:00:00.000 New York W
8 Chris 26020 2001-07-08 00:00:00.000 Vancouver N
If you run this query, you will get One row for city, and you can apply a function (in this case, count) to that row. So, for each city, you will get the count of rows. You can also use other functions.
SELECT City, COUNT(*) as Employees
FROM Employee
GROUP BY City
The result is:
City Employees
--------- ---------
New York 3
Toronto 2
Vancouver 3
as you can compare the numbers of rows for each city
When you simply select COUNT(vend_id) with no GROUP BY clause, you get one row with the total count of rows with a non-NULL vendor ID - that last bit is important and is one reason why you may prefer COUNT(*) so as to avoid "missing" rows. Some people may argue that COUNT(*) is somehow less efficient but that's true in no DBMS I've used. In any case, if you are using a brain-dead DBMS, you can always try COUNT(1).
When you group by vend_id, you get one row per vendor ID with the count being the number of rows for that ID.
In step-by-step detail (conceptually, though there are almost certainly efficiencies to be gained by optimising), the first query:
SELECT COUNT(vend_id) AS num_prods FROM products
Get a list of all rows in products.
Count the rows where vend_id is not NULL, then deliver one row containing that count in the single num_prods column.
For the grouping one:
SELECT vend_id, COUNT(vend_id) AS num_prods FROM products GROUP BY vend_id
Get a list of all rows in products.
For each value of vend_id:
Count the rows matching that vend_id where vend_id is not NULL, then deliver one row containing the vend_id in the first column and that count in the second num_prods column.
Note that those rows with a null vend_id do not contribute to the aggregate function (count in this case).
In the first query, that simply means they don't appear in the overall total.
In the second case, it means that the output row still exists but the count will be zero. That's another good reason to use COUNT(*) or COUNT(1).
select vend_id will only select the vend_id field, where select * will select all the fields
select vend_id, COUNT(vend_id) and select COUNT(vend_id) gives same result for the count column as long as you use group by vend_id. when you use select vend_id, COUNT(vend_id) you must group by it using vend_id

getting individual records from a group by

I have two tables, one is a table of names with a category tag and the other is a table of scores for each name
ID Name Category
1 Dave 1
2 John 1
3 Lisa 2
4 Jim 2
and the score table is
PersonID Score
1 50
2 100
3 75
4 50
4 75
I would then like a query that returned something like
Category TotalScore Names
1 150 Dave, John
2 200 Lisa, Jim
Is this possible to do with one query?
I can get the totals with a sum query and grouping by category but cannot see a way to get the names as I would like.
Many thanks
You need to use group_concat:
select Category, sum(Score) as TotalScore, group_concat(Name) as Names from categories
join scores on scores.category = categories.category
group by category
Or even better:
group_concat(DISTINCT Name ORDER BY Name ASC SEPARATOR ',') as names
Just add group_concat(Name) as names into your sum query.
Here is a solution working for Postgres (which doesn't have a group_concat() function):
select category, sum(score) as TotalScore, array(select id from perso where category=P.category order by id) as Names from perso P JOIN scores S ON S."PersonID" = P.id GROUP BY category;
(I know this was a MySQL question, but nonetheless someone might google it up but needs an answer for Postgres :) )