How to get MAX date from table join that use UNION? - mysql

I have 3 table that need to be joined to get the max date.
table_grade_A
ID_GRADE GRADE NOTE SURVEYOR
1 70.7 PASS TOM
3 51.2 FAIL TOM
table_grade_B
ID_GRADE SUB_GRADE_I SUB_GRADE_II TOTAL_GRADE NOTE SURVEYOR
2 30.8 40.1 70.9 PASS MARVOLO
4 10.3 54.1 64.4 FAIL MARVOLO
5 41.7 20.9 62.6 FAIL RIDDLE
table_grade
ID_GRADE STUDENT TEST_DATE
1 MIYA 2018-12-20
2 LAYLA 2018-12-21
3 MIYA 2018-12-21
4 MIYA 2018-12-22
5 KARRIE 2018-12-28
Every student may get different test and different test stored in different table. I use UNION to populate the value from table_grade_a and table_grade_b and JOIN them to table_grade
My current query:
SELECT tg.STUDENT, MAX(tg.TEST_DATE) AS 'TEST_DATE', temp_grade.* FROM `table_grade` tg
INNER JOIN (
SELECT ID_GRADE,GRADE,NOTE
FROM table_grade_a 'tga'
UNION ALL
SELECT ID_GRADE,TOTAL_GRADE AS GRADE,NOTE
FROM table_grade_a 'tgb'
)temp_grade ON tg.ID_GRADE = temp_grade.ID_GRADE
WHERE tg.STUDENT = 'MIYA'
The result of above query is:
STUDENT TEST_DATE GRADE NOTE
MIYA 2018-12-22 70.7 PASS
The expected output should be:
STUDENT TEST_DATE GRADE NOTE
MIYA 2018-12-22 64.4 FAIL

For a result corresponding the the max date of each student:
The MIN or MAX of a column does not necessarily align to the other values of the wanted row(s), so you need to do more than just calculate the maximum date. In MySQL prior to version 8 you could do something like this, by calculating the maximum dates then using that as an inner join to limit the rows to those corresponding to the maximum values:
select
temp_grade .*
from table_grade tg
inner join (
select student, max(test_date) as test_date
from table_grade
group by student
) gd on tg.student = gd.student and tg.test_date = gd.test_date
INNER JOIN (
SELECT ID_GRADE,GRADE,NOTE
FROM table_grade_a 'tga'
UNION ALL
SELECT ID_GRADE,TOTAL_GRADE AS GRADE,NOTE
FROM table_grade_a 'tgb'
)temp_grade ON tg.ID_GRADE = temp_grade.ID_GRADE
# WHERE tg.STUDENT = 'MIYA'
In MySQL v8+ you could use row_number() over(...) instead:
select
temp_grade .*
from (
select *
, row_number() over(partition by student order by test_date DESC) as rn
from table_grade
) tg
INNER JOIN (
SELECT ID_GRADE,GRADE,NOTE
FROM table_grade_a 'tga'
UNION ALL
SELECT ID_GRADE,TOTAL_GRADE AS GRADE,NOTE
FROM table_grade_a 'tgb'
)temp_grade ON tg.ID_GRADE = temp_grade.ID_GRADE
where tg.rn = 1
# and tg.STUDENT = 'MIYA'

The problem with your current approach is that you are selecting the max date, a table level aggregate, while also asking for all individual records at the same time. This does make sense. One correct possibility would be to use LIMIT with ORDER BY:
SELECT tg1.STUDENT, tg1.TEST_DATE, tg2.*
FROM table_grade tg1
INNER JOIN
(
SELECT ID_GRADE, GRADE, NOTE
FROM table_grade_a
UNION ALL
SELECT ID_GRADE, TOTAL_GRADE, NOTE
FROM table_grade_b
) tg2
ON tg1.ID_GRADE = tg2.ID_GRADE
WHERE
tg1.STUDENT = 'MIYA'
ORDER BY
tg1.TEST_DATE DESC
LIMIT 1;

Related

How to SELECT rows with MIN(DateTime column), GROUP by another column and DISTINCT by another column in SQL?

My table is:
id
student_id
exam_date
license
result
1
101
01-11-2020
B2
FAILED
2
102
15-11-2020
A
PASSED
3
103
22-11-2020
D
FAILED
4
101
01-10-2020
D
PASSED
5
104
01-12-2020
A
PASSED
6
103
29-11-2020
D
PASSED
7
101
01-12-2020
B2
PASSED
8
105
01-09-2020
B2
FAILED
9
104
01-11-2020
A
FAILED
10
105
01-11-2020
B2
PASSED
I need to select the results that would have the first result according to the exam date according to each student id and the license column. If the same student takes different license exam, these two results need to come up as well. But I need only one result row for each student id and license value.
The result should look like this:
id
student_id
exam_date
license
result
1
101
01-11-2020
B2
FAILED
2
102
15-11-2020
A
PASSED
3
103
22-11-2020
D
FAILED
4
101
01-10-2020
D
PASSED
8
105
01-09-2020
B2
FAILED
9
104
01-11-2020
A
FAILED
I've done the research and queries and so far I only got 1 row for student_id although the student takes two different license examination.
The following is my query:
SELECT scct_outer.id, scct_outer.stud_id, scct_outer.exam_date, scct_outer.license, scct_outer.result
FROM stud_cdl_comp_test AS scct_outer
INNER JOIN
(SELECT stud_id, MIN(exam_date) AS MinExamDate
FROM stud_cdl_comp_test AS scct
INNER JOIN stud AS s ON scct.stud_id = s.id
INNER JOIN agent_profile AS ap ON s.agent_profile_id = ap.id
GROUP BY stud_id) groupedscct
ON scct_outer.stud_id = groupedscct.stud_id
AND scct_outer.exam_date = groupedscct.MinExamDate
The problem with you original code is that it is missing a correlartion on the licences between the outer query and the subquery. You would phrase it as:
select s.*
from stud_cdl_comp_test s
inner join (
select student_id, licence, min(exam_date) as minexamdate
from stud_cdl_comp_test as scct
group by stud_id, licence
) s1 on s1.student_id = s.student_id and s1.license = s.license and s1.minexamdate = s.date
I have no idea what stud and agent_profile are, so I removed the from the query.
That said, this is not the method I would recommend - a simple and efficient option is to filter with a subquery:
select *
from stud_cdl_comp_test s
where s.exam_date = (
select min(s1.exam_date)
from stud_cdl_comp_test s1
where s1.student_id = s.student_id and s1.license = s.license
)
This can take advantage of an index on (student_id, license, exam_date).
Alternatively, you can use row_number(), available in MySL 8.0:
select *
from (
select s.*,
row_number() over(partition by student_id, licence order by exam_date) rn
from stud_cdl_comp_test s
) s
where rn = 1
Thinking that you are grouping by student_id in this case is almost incorrect in this case. What are actually grouping by is student + license. Let's call this key combination individual_license.
Here's what the solution will look like:
SELECT
st.id,
st.stud_id,
st.exam_date,
st.license,
st.result
FROM stud_cdl_comp_test AS st
INNER JOIN
(SELECT
MIN(exam_date) AS min_date,
st_inner.student_id,
st_inner.license
FROM stud_cdl_comp_test AS st_inner
GROUP BY st_inner.student_id, st_inner.license
) grouped_inner
ON grouped_inner.student_id = st.student_id
AND grouped_inner.license = st.license
AND grouped_inner.min_date = st.exam_date;
This should work.

Nested queries and Join

As a beginner with SQL, I’m ok to do simple tasks but I’m struggling right now with multiple nested queries.
My problem is that I have 3 tables like this:
a Case table:
id nd date username
--------------------------------------------
1 596 2016-02-09 16:50:03 UserA
2 967 2015-10-09 21:12:23 UserB
3 967 2015-10-09 22:35:40 UserA
4 967 2015-10-09 23:50:31 UserB
5 580 2017-02-09 10:19:43 UserA
a Value table:
case_id labelValue_id Value Type
-------------------------------------------------
1 3633 2731858342 X
1 124 ["864","862"] X
1 8981 -2.103 X
1 27 443 X
... ... ... ...
2 7890 232478 X
2 765 0.2334 X
... ... ... ...
and a Label table:
id label
----------------------
3633 Value of W
124 Value of X
8981 Value of Y
27 Value of Z
Obviously, I want to join these tables. So I can do something like this:
SELECT *
from Case, Value, Label
where Case.id= Value.case_id
and Label.id = Value.labelValue_id
but I get pretty much everything whereas I would like to be more specific.
What I want is to do some filtering on the Case table and then use the resulting id's to join the two other tables. I'd like to:
Filter the Case.nd's such that if there is serveral instances of the same nd, take the oldest one,
Limit the number of nd's in the query. For example, I want to be able to join the tables for just 2, 3, 4 etc... different nd.
Use this query to make a join on the Value and Label table.
For example, the output of the queries 1 and 2 would be:
id nd date username
--------------------------------------------
1 596 2016-02-09 16:50:03 UserA
2 967 2015-10-09 21:12:23 UserB
if I ask for 2 different nd. The nd 967 appears several times but we take the oldest one.
In fact, I think I found out how to do all these things but I can't/don't know how to merge them.
To select the oldest nd, I can do someting like:
select min((date)), nd,id
from Case
group by nd
Then, to limit the number of nd in the output, I found this (based on this and that) :
select *,
#num := if(#type <> t.nd, #num + 1, 1) as row_number,
#type := t.nd as dummy
from(
select min((date)), nd,id
from Case
group by nd
) as t
group by t.nd
having row_number <= 2 -- number of output
It works but I feel it's getting slow.
Finally, when I try to make a join with this subquery and with the two other tables, the processing keeps going on for ever.
During my research, I could find answers for every part of the problem but I can't merge them. Also, for the "counting" problem, where I want to limit the number of nd, I feel it's kind of far-fetch.
I realize this is a long question but I think I miss something and I wanted to give details as much as possible.
to filter the case table to eliminate all but oldest nds,
select * from [case] c
where date = (Select min(date) from case
where nd = c.nd)
then just join this to the other tables:
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
to limit it to a certain number of records, there is a mysql specific command, I think it called Limit
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
Limit 4 -- <=== will limit return result set to 4 rows
if you only want records for the top N values of nd, then the Limit goes on a subquery restricting what values of nd to retrieve:
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
and nd In (select distinct nd from [case]
order by nd desc Limit N)
So finally, here is what worked well for me:
select *
from (
select *
from Case
join (
select nd as T_ND, date as T_date
from Case
where nd in (select distinct nd from Case)
group by T_ND Limit 5 -- <========= Limit of nd's
) as t
on Case.nd = t.T_ND
where date = (select min(date)
from Case
where nd = t.T_ND)
) as subquery
join Value
on Value.context_id = subquery.id
join Label
on Label.id = Value.labelValue_id
Thank you #charlesbretana for leading me on the right track :).

Count first occurence with column value ordered by another column

I have an assigns table with the following columns:
id - int
id_lead - int
id_source - int
date_assigned - int (this represents a unix timestamp)
Now, lets say I have the following data in this table:
id id_lead id_source date_assigned
1 20 5 1462544612
2 20 6 1462544624
3 22 6 1462544615
4 22 5 1462544626
5 22 7 1462544632
6 25 6 1462544614
7 25 8 1462544621
Now, lets say I want to get a count of the rows whose id_source is 6, and is the first entry for each lead (sorted by date_assigned asc).
So in this case, the count would = 2, because there are 2 leads (id_lead 22 and 25) whose first id_source is 6.
How would I write this query so that it is fast and would work fine as a subquery select? I was thinking something like this which doesn't work:
select count(*) from `assigns` where `id_source`=6 order by `date_assigned` asc limit 1
I have no idea how to write this query in an optimal way. Any help would be appreciated.
Pseudocode:
select rows
with a.id_source = 6
but only if
there do not exist any row
with same id_lead
and smaller date_assigned
Translate it to SQL
select * -- select rows
from assigns a
where a.id_source = 6 -- with a.id_source = 6
and not exists ( -- but only if there do not exist any row
select 1
from assigns a1
where a1.id_lead = a.id_lead -- with same id_lead
and a1.date_assigned < a.date_assigned -- and smaller date_assigned
)
Now replace select * with select count(*) and you'll get your result.
http://sqlfiddle.com/#!9/3dc0f5/7
Update:
The NOT-EXIST query can be rewritten to an excluding LEFT JOIN query:
select count(*)
from assigns a
left join assigns a1
on a1.id_lead = a.id_lead
and a1.date_assigned < a.date_assigned
where a.id_source = 6
and a1.id_lead is null
If you want to get the count for all values of id_source, the folowing query might be the fastest:
select a.id_source, count(1)
from (
select a1.id_lead, min(a1.date_assigned) date_assigned
from assigns a1
group by a1.id_lead
) a1
join assigns a
on a.id_lead = a1.id_lead
and a.date_assigned = a1.date_assigned
group by a.id_source
You still can replace group by a.id_source with where a.id_source = 6.
The queries need indexes on assigns(id_source) and assigns(id_lead, date_assigned).
Simple query for that would be
check here http://sqlfiddle.com/#!9/8666e0/7
select count(*) from
(select * from assigns group by id_lead )t
where t.id_source=6

SELECT MAX DATE for each ID

I have two calls this "tipo_hh" and "tipo_hh_historial".
I need to make a join between the two tables, where "id" is the same in both tables.
But I need that for each "id" in the table "tipo_hh" select the "valor" on the table "tipo_hh_historial" with the condition that is the record with "fecha_cambio" and "hora_cambio" maxima.
"id" is primary key and auto increment in the table "tipo_hh"
Something like this.
This is the table "tipo_hh"
id nombre
1 Reefer
2 Lavados
3 Dry
4 Despacho
This is the table "tipo_hh_historial"
id valor fecha_cambio hora_cambio
1 1.50 27/06/2013 19:15:05
1 5.50 27/06/2013 19:19:32
1 5.50 27/06/2013 19:20:06
1 2.50 27/06/2013 21:03:30
2 4.66 27/06/2013 19:15:17
2 3.00 27/06/2013 19:20:22
3 5.00 27/06/2013 19:20:32
4 1.50 27/06/2013 19:20:50
And I need this:
id nombre valor
1 Reefer 2.50
2 Lavados 3.00
3 Dry 5.00
4 Despacho 1.50
Using a sub query to get the max date / time for the historical record for each id, and using that to get the rest of the latest historical record:-
SELECT tipo_hh.id, tipo_hh.nombre, tipo_hh_historial.valor
FROM tipo_hh
INNER JOIN
(
SELECT id, MAX(STR_TO_DATE(CONCAT(fecha_cambio, hora_cambio), '%d/%m/%Y%k:%i:%s')) AS MaxDateTime
FROM tipo_hh_historial
GROUP BY id
) Sub1
ON tipo_hh.id = Sub1.id
INNER JOIN tipo_hh_historial
ON tipo_hh_historial.id = Sub1.id
AND STR_TO_DATE(CONCAT(fecha_cambio, hora_cambio), '%d/%m/%Y%k:%i:%s') = Sub1.MaxDateTime
SQL Fiddle:-
http://www.sqlfiddle.com/#!2/68baa/2
First of all you should use proper data types for your columns like for date there should a column of type data same as for the time column in you sample data set you have date formatted as '%d/%m/%Y' id this could be change to standard format '%Y-%m-%d' this will be good to so the below query is for proper types for the columns
SELECT t.* ,new_tipo_hh_historial.`valor`
FROM tipo_hh_new t
JOIN (
SELECT th.*
FROM tipo_hh_historial_new th
JOIN (
SELECT id,valor,
MAX(fecha_cambio ) fecha_cambio
,MAX(hora_cambio) hora_cambio
FROM `tipo_hh_historial_new`
GROUP BY id
) thh
ON (
th.`id` =thh.`id`
AND th.fecha_cambio=thh.`fecha_cambio`
AND th.hora_cambio = thh.`hora_cambio`
)
) new_tipo_hh_historial
USING (id)
Fiddle Demo
And for in case you have date and time stored as string then you need to format them as real types you can use below query but not recommended
SELECT t.* ,new_tipo_hh_historial.`valor`
FROM tipo_hh t
JOIN (
SELECT th.*
FROM tipo_hh_historial th
JOIN (
SELECT id,valor,
MAX(STR_TO_DATE(fecha_cambio , '%d/%m/%Y')) fecha_cambio
,MAX(TIME_FORMAT(hora_cambio,'%H:%i:%s')) hora_cambio
FROM `tipo_hh_historial`
GROUP BY id
) thh
ON (
th.`id` =thh.`id`
AND STR_TO_DATE(th.fecha_cambio , '%d/%m/%Y')=thh.`fecha_cambio`
AND TIME_FORMAT(th.hora_cambio,'%H:%i:%s') = thh.`hora_cambio`
)
) new_tipo_hh_historial
USING (id)
Fiddle Demo
Your problem seems like the greatest-n-per-group problem so you can first get the maxima from your table tipo_hh_historial maxima of fecha_cambio and hora_cambio and need to self join with multiple conditions to get the maximums like i.e
ON (
th.`id` =thh.`id`
AND th.fecha_cambio=thh.`fecha_cambio`
AND th.hora_cambio = thh.`hora_cambio`
)
and then join with your first table to get the expected results
Edit: the problem spotted by #Kickstart he already answered so i will provide the another way to overcome.There should be single field to store the date and time for the record like for fecha_cambio DATETIME so there will no chance to miss the id and get the correct maxima for date and time.See below updated query
SELECT t.* ,new_tipo_hh_historial.`valor`
FROM tipo_hh_new t
JOIN (
SELECT th.*
FROM tipo_hh_historial_alter th
JOIN (
SELECT id,valor,
MAX(fecha_cambio ) fecha_cambio
FROM `tipo_hh_historial_alter`
GROUP BY id
) thh
ON (
th.`id` =thh.`id`
AND th.fecha_cambio=thh.`fecha_cambio`
)
) new_tipo_hh_historial
USING (id)
Updated fiddle demo
try this:
SELECT A.id, B.nombre, A.valor, MAX(A.hora_cambio) AS hora_cambio_time
FROM tipo_hh_historial AS A
INNER JOIN tipo_hh AS B
ON(A.id = B.id)
GROUP BY A.id
SELECT tipo_hh.id, tipo_hh.nombre, tipo_hh_historial.valor
FROM tipo_hh INNER JOIN tipo_hh_historial
ON tipo.id = tipo_hh_historial.id AS
group by tipo_hh_historial.id
Having max(tipo_hh_historial.hora_cambio);

Select Count of Rows with Joined Tables

I have two tables with a one to many relationship. I join the tables by an id column. My problem is that I need a count of all matching entries from the second (tablekey_id) table but I need the information from the row marked with the boolean is_basedomain. As a note there is only one row with is_basedomain = 1 per set of rows with the same tablekey_id.
Table: tablekey
id linkdata_id timestamp
22 9495028175 2013-03-10 01:13:46
23 8392740179 2013-03-10 21:23:25
Table: searched_domains.
NOTE: tablekey_id is the foreign key to the id in the tablekey table.
id tablekey_id domain is_basedomain
1 22 somesite.com 1
2 22 yahoo.com 0
3 23 red.com 1
4 23 blue.com 0
5 23 green.com 0
Heres the query Im working with. I was trying to use a sub query but I cant seem to select only the count for the current tablekey_id so this does not work.
SELECT `tablekey_id`, `linkdata_id`, `timestamp`, `domain`, `is_basedomain`,
(SELECT COUNT(1) AS other FROM `searched_domains` AS dd
ON dd.tablekey_id = d.tablekey_id GROUP BY `tablekey_id`) AS count
FROM `tablekey` AS k
JOIN `searched_domains` AS d
ON k.id = d.tablekey_id
WHERE `is_basedomain` = 1 GROUP BY `tablekey_id`
The result that I would like to get back is:
tablekey_id linkdata_id timestamp domain is_basedomain count
22 9495028175 2013-03-10 01:13:46 somesite.com 1 2
23 8392740179 2013-03-10 21:23:25 red.com 1 3
Can anyone help me get this into one query?
You can treat the searched_domains rows that have is_basedomain=1 as a separate table in the query and join it with another instance of searched_domains (to get the count):
SELECT
d.tablekey_id,
k.linkdata_id,
k.timestamp,
d.domain,
d.is_basedomain,
COUNT(*) as 'count'
FROM
tablekey AS k
join searched_domains AS d on d.tablekey_id=k.id
join searched_domains AS d2 on d2.tablekey_id=d.tablekey_id
WHERE
d.is_basedomain = 1
GROUP BY
d.tablekey_id,
k.linkdata_id,
k.timestamp,
d.domain,
d.is_basedomain
you have an error when using ON instead use WHERE
try this
SELECT `tablekey_id`, `linkdata_id`, `timestamp`, `domain`, `is_basedomain`,
(SELECT COUNT(1) AS other FROM `searched_domains` AS dd
where dd.tablekey_id = d.tablekey_id GROUP BY `tablekey_id`) AS count
FROM `tablekey` AS k
JOIN `searched_domains` AS d
ON k.id = d.tablekey_id
WHERE `is_basedomain` = 1 GROUP BY `tablekey_id`
DEMO HERE
There is no reason to use subquery, or what is your opinion?
SELECT
`tablekey_id`,
`linkdata_id`,
`timestamp`,
`domain`,
`is_basedomain`,
COUNT(*) as count
FROM
`tablekey` AS k ,
`searched_domains` AS d
WHERE
k.id = d.tablekey_id AND
`is_basedomain` = 1
GROUP BY
`tablekey_id`,
`linkdata_id`,
`timestamp`,
`domain`,
`is_basedomain`
If you want only latest timestamp use MAX(timestamp) as timestamp and remove it from group by.