Distinct based on one row with multiple non-aggregate columns selected - mysql

I'm trying to run a query that returns distinct AddressIDs.
The row to be retuned for each AddressID should be the one with the latest ReadDate.
I also want to return the value from (non-aggregate) columns PhoneNumber, SomeCode, and Country for the given records.
There are similar questions on here to mine, but nothing seems to suit my exact situation. I've tried different subqueries and making the other columns aggregates, but I can't seem to get the results I desire.
Say the base of the query like:
select cr.AddressID, cr.ReadDate, in.PhoneNumber, in.SomeCode, in.Country
from CustomerReadings cr, in.CustomerInfo
where cr.AddressID = in.AddressID
For example, if I have a table that looks like:
AddressID ReadDate PhoneNumber SomeCode Country
1005 01/01/1997 5556565 GHS Canada
1005 05/06/2006 5556753 ROT USA
1005 08/12/2018 5552345 JKR USA
2007 02/05/2012 5558746 MSC Canada
2007 12/07/2018 5552345 RRE France
4000 03/01/1999 5552345 RRE France
4000 09/05/2007 5551243 MSR USA
I want the query results to look like:
AddressID ReadDate PhoneNumber SomeCode Country
1005 08/12/2018 5552345 JKR USA
2007 12/07/2018 5552345 RRE France
4000 09/05/2007 5551243 MSR USA
If anything is unclear please let me know and I'll update my question accordingly.
In the case of 1 table as you used in your answer example, the code works.
But when I bring in another table, I no longer get just one distinct AddressID back, eg:
select (or select distinct)
cr.AddressID, cr.ReadDate, in.PhoneNumber, in.SomeCode, in.Country
from
CustomerReadings cr,
CustomerInfo in
where
cr.AddressID = in.AddressID
and cr.ReadDate =
(select max(cr2.ReadDate)
from CustomerReadings cr2
where cr2.AddressID = cr.AddressID)
order by
2 desc,
1;

There should be questions that are very similar. I use a correlated subquery:
select t.*
from t
where t.readdate = (select max(t2.readdate) from t t2 where t2.addressid = t.addressid);

You need correlated subquery :
select t.*
from table t
where readdate = (select max(t1.readdate) from table t1 where t1.addressid = t.addressid);
If you are working with latest version of MySQL, then row_number() would helpful :
select t.*
from (select t.*,
row_number() over (partition by addressid order by readdate desc) as seq
from table t
) t
where seq = 1;
However, if the readdate has ties, then row_number() would no longer help use dense_rank() instead.

Related

Sql query max, group by

I am trying to get all students group by class_id, student_id, teacher_id
SO what I mean is this one :
Select id,class_id, student_id,teacher_id, max(active)
FROM student_classes
GROUP BY class_id, student_id, teacher_id
But this is what I get
Actually what I want as a result is:
114 137 1 47 1
108 138 2 49 0
113 197 3 47 1
So basically the problem is at the third row. Instead of having id = 113 I get ID=111.
What should I do in this case? Can you please help me with the query
As mentioned in the comments, MySQL allows something against the SQL standard, letting you include a non-aggregated column (in this case id) in the select list of a query that includes a group by. As far as I know, it will arbitrarily pick one row in each grouping and display the id value from that row.
If you have a specific rule about which id value you want to see, you need to express that in your query.
By the way, your desired output appears to have multiple typos (e.g. 197, which doesn't appear in your data at all).
From your comment (which you should edit into your original question), and your desired output, I think the rule you want for the id column is:
If there are any rows with active=1 in the group, choose the maximum id value from those rows
If all rows in the group have active=0, choose the minimum id value. (You didn't say this specifically; I'm assuming it based on the presence of 108 on the second row of your desired output.)
I think that this query will produce those results. (And also eliminate the non-standard MySQL behavior.)
SELECT
COALESCE(
MAX(CASE WHEN active=1 THEN id ELSE NULL END),
MIN(id)
) AS some_id
class_id, student_id, teacher_id, max(active)
FROM student_classes
GROUP BY class_id, student_id, teacher_id
MySQL versions 5.5, 5.6 works as you coded. But actually it's not correct. With version 5.7 and higher it will throw error. The error will be like "SELECT list is not in GROUP BY clause and contains nonaggregated column 'student_classes.id'..."
Therefore it seems your DB version is old and maybe this code should work as you wanted
select
---------
min(x.id) as id,
---------
x.class_id,
x.student_id,
x.active
from student_classes x
inner join (select
class_id,
student_id,
teacher_id,
---------
max(active) max_active
---------
from student_classes x
group by class_id, student_id, teacher_id
) y
on x.class_id = y.class_id and
x.student_id = y.student_id and
x.teacher_id = y.teacher_id and
x.active = y.max_active
group by x.class_id, x.student_id, x.active
order by id, class_id, student_id
;
You don't want an aggregation actually, but rather pick particular rows. The rule for picking a row is: Per class_id, student_id, teacher_id get the one with the maximum active and in case of a tie the lowest id. This is a ranking of rows.
As of MySQL 8 you can use a window function like ROW_NUMBER to rank rows:
select *
from
(
select
sc.*,
row_number() over (partition by class_id, student_id, teacher_id
order by active desc, id) as rn
from student_classes sc
) with_wanted_id
where rn = 1;
In older versions you could use NOT EXISTS to exclude rows for which a better row exists:
select *
from student_classes sc1
where not exists
(
select null
from student_classes sc2
where sc2.class_id = sc1.class_id
and sc2.student_id = sc1.student_id
and sc2.teacher_id = sc1.teacher_id
and
(
sc2.active > sc1.active
or
(sc2.active = sc1.active and sc2.id < sc1.id)
)
);

How can I write a query that aggregate a single row with latest date among multiple set of rows?

I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.
SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON
Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason
In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason
The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle
Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;

MS Access query-String aggregation

Looking for query in MS Access for below question-
Following is my data set where last row is with NULL in Value column. Also by doing Max(Value) for each Name+Office+Person+Category, I have extracted this data to avoid multiple rows with value
ID Name Office Person Category Value
1 FMR Americas Ben Global 7
1 FMR London Ben Global 5
1 FMR London Ben Overall 4.2
156 Asset London Ben Global 13
156 Asset London Ben Overall
157 WSR Paris Zen Global 2
My Expected result set is as below- I am expecting cross mark or any indicator which will show that for ID,Name,Office,person combination has value for Global/Overll categories or not in single row. I know it's somewhat of similar to "String aggregation"
ID Name Office Person Global Overall
1 FMR Americas Ben X
1 FMR London Ben X X
156 Asset London Ben X
157 WSR Paris Zen X
Appreciate your inputs..
I played around with this a little. I created two select queries Global and Overall
Global
SELECT ID, Name, Office, Person, Category AS Global
FROM [YourTable]
WHERE Category="Global" AND Value IS NOT NULL
Overall
SELECT ID, Name, Office, Person, Category AS Overall
FROM [YourTable]
WHERE Category="Overall" AND Value IS NOT NULL
Then I created a new query to join the select queries
SELECT g.ID, g.Name, g.Office, g.Person, Global, Overall
FROM Global g
LEFT JOIN Overall o ON g.ID = o.ID AND g.Name = o.Name AND g.Office = o.Office AND g.Person = o.Person
Hope this helps.
First, get a list of unique id/name/office combinations:
SELECT DISTINCT ID, Name, Office, Person
FROM TableName
Next, create subqueries for each category:
For Global:
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Global"
For Overall:
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Overall"
Finally, left join the subqueries to the main query, and use an expression to show the X:
SELECT DISTINCT ID, Name, Office, Person
Iif(Global.ID Is Not Null, "X") AS IsGlobal,
Iif(Overall.ID Is Not Null, "X") AS IsOverall
FROM (TableName
LEFT JOIN (
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Global"
) AS Global
ON TableName.ID=Global.ID
AND TableName.Name=Global.Name
AND TableName.Office=Global.Office
AND TableName.Person=Global.Person)
LEFT JOIN (
SELECT ID, Name, Office, Person
FROM TableName
WHERE Category="Overall"
) AS Overall
ON TableName.ID=Overall.ID
AND TableName.Name=Overall.Name
AND TableName.Office=Overall.Office
AND TableName.Person=Overall.Person
It may be easier for you to save the subqueries as Access queries and reference the saved queries by name, instead of including the whole subquery in this query.

Select distinct column along with some other columns in MySQL

I can't seem to find a suitable solution for the following (probably an age old) problem so hoping someone can shed some light. I need to return 1 distinct column along with other non distinct columns in mySQL.
I have the following table in mySQL:
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
3 James Barbados 3 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
6 Declan Trinidad 3 WI
I would like SQL statement to return the DISTINCT name along with the destination, rating based on country.
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
As you can see, James and Declan have different ratings, but the same name, so they are returned only once.
The following query returns all rows because the ratings are different. Is there anyway I can return the above result set?
SELECT (distinct name), destination, rating
FROM table
WHERE country = 'WI'
ORDER BY id
Using a subquery, you can get the highest id for each name, then select the rest of the rows based on that:
SELECT * FROM table
WHERE id IN (
SELECT MAX(id) FROM table GROUP BY name
)
If you'd prefer, use MIN(id) to get the first record for each name instead of the last.
It can also be done with an INNER JOIN against the subquery. For this purpose the performance should be similar, and sometimes you need to join on two columns from the subquery.
SELECT
table.*
FROM
table
INNER JOIN (
SELECT MAX(id) AS id FROM table GROUP BY name
) maxid ON table.id = maxid.id
The problem is that distinct works across the entire return set and not just the first field. Otherwise MySQL wouldn't know what record to return. So, you want to have some sort of group function on rating, whether MAX, MIN, GROUP_CONCAT, AVG, or several other functions.
Michael has already posted a good answer, so I'm not going to re-write the query.
I agree with #rcdmk . Using a DEPENDENT subquery can kill performance, GROUP BY seems more suitable provided that you have already INDEXed the country field and only a few rows will reach the server. Rewriting the query giben by #rcdmk , I added the ORDER BY NULL clause to suppress the implicit ordering by GROUP BY, to make it a little faster:
SELECT MIN(id) as id, name, destination as rating, country
FROM table WHERE country = 'WI'
GROUP BY name, destination ORDER BY NULL
You can do a GROUP BY clause:
SELECT MIN(id) AS id, name, destination, AVG(rating) AS rating, country
FROM TABLE_NAME
GROUP BY name, destination, country
This query would perform better in large datasets than the subquery alternatives and it can be easier to read as well.

how to number the datas from mysql

This is a doubt on mysql select query
let me axplain my doubt with a simple example
consider this is my query
SELECT dbCountry from tableCountry
tableCountry has fields dbCuntryId, dbCountry and dbState
I have the result as
dbCountry
india
america
england
kenya
pakisthan
I need the result as
1 india
2 america
3 england
4 kenya
5 pakisthan
the numbers 12345 must be generated with the increase in data and it is not an autoincrement id.
How can i get it
is it something like loop
You can try this:
SELECT dbCountry,
(SELECT COUNT(*) FROM tableCountry t2 WHERE t2.dbCountry <= t1.dbCountry)
AS RowNum
FROM tableCountry t1
ORDER BY dbCountry
The following should do what you need. It uses a variable that is incremented and returned for each row:
SELECT
#rownum:=#rownum+1 number,
c.dbCountry
FROM
tableCountry c,
(SELECT #rownum:=0) r
If you want the result to always be in the same order you'll need to add an order by constraint to the query, for example, ORDER BY c.dbCountry to order by the country name.