MYSQL JOIN and get full row based on MAX - mysql

Really tried to search for a solution to this, but I can't get it to work.
I have 2 tables in Mysql event_parties and events.
event_parties:
g_event_id agent_id
---------- --------
2917 2
2918 2
2919 2
3067 3
3078 3
events:
g_event_id event_id event_time
---------- -------- ----------
2917 29 2016-10-19 15:24:25
2918 31 2016-10-19 15:24:28
2919 30 2016-10-19 15:29:46
3067 29 2016-10-20 15:33:46
3078 30 2016-10-21 15:29:46
I need an JOIN between these two tables with g_event_id as the ID.
I need all fields in table events and I need the row with highest g_event_id or with highest event_time.
Like this:
agent_id g_event_id event_id event_time
-------- ---------- -------- ----------
2 2919 30 2016-10-19 15:29:46
3 3078 30 2016-10-21 15:29:46
Been struggling with this for several days :(
/ Kristian

For just the one highest:
ORDER BY g_event_id DESC, event_time DESC LIMIT 1
UPDATE:
for highest per something, you need a double join.
first have all in one table, and then join in later,
and the one row that don't have a later evetn, is the latests event.
for g_event_id as definition of later
SELECT event_parties.agent_id, event_parties.g_event_id, event_id, event_time
FROM event_parties
INNER JOIN events USING (g_event_id)
LEFT JOIN event_parties AS later_event
ON (later_event.agent_id = event_parties.agent_id
AND later_event.g_event_id > event_parties.g_event_id)
WHERE later_event.g_event_id IS NULL
GROUP BY event_parties.agent_id

try this:
select
ep.agent_id,
ep.g_event_id,
e.event_id,
e.event_time
from event_parties ep
join events e on e.g_event_id = ep.g_event_id
order by e.g_event_id DESC, e.event_time desc
limit 1

Related

MySQL Max of a Date not returning the correct tuple

I have a table "messages", that stores messages sent to people over time, regarding some items.
The structure of the messages table is:
message_id
user_id
date_sent
created_at
For each user, I can have multiple tuples in the table.
Some of these messages are already sent, and some are not sent yet.
I'm trying to get the last created message for each user.
I'm using max(created_at) and a group_by(user_id), but the associated message_id is not the one associated with the max(created_id) tuple.
Table data:
message_id | user_id | date_sent | created_at
----------------------------------------------
1 1 2021-07-01 2021-07-01
2 1 2021-07-02 2021-07-02
3 2 2021-07-01 2021-07-01
4 3 2021-07-04 2021-07-04
5 1 2021-07-22 2021-07-22
6 1 NULL 2021-07-23
7 2 NULL 2021-07-29
8 1 NULL 2021-07-29
9 3 2021-07-29 2021-07-29
My Select:
select * from messages ma right join
( SELECT max(mb.created_at), message_id
FROM `messages` mb WHERE mb.created_at <= '2021-07-24'
group by user_id)
mc on ma.message_id=mc.message_id
the result is
message_id | user_id | date_sent | created_at
----------------------------------------------
5 1 2021-07-22 2021-07-23
3 2 2021-07-01 2021-07-01
4 3 2021-07-04 2021-07-04
I don't know why but for user 1, the message_id returned is not the one associated with the tuple that has the max(created_at).
I was expecting to be: (get the tuple with the max(date_sent) of the select grouped by user_id)
message_id | user_id | date_sent | created_at
----------------------------------------------
6 1 NULL 2021-07-23
3 2 2021-07-01 2021-07-01
4 3 2021-07-04 2021-07-04
Any idea? Any help?
thank you.
You're stumbling over MySQL's notorious nonstandard extension to GROUP BY. It gives you the illusion you can do things you can't. Example
SELECT max(created_at), message_id
FROM messages
GROUP BY user_id
actually means
SELECT max(created_at), ANY_VALUE(message_id)
FROM messages
GROUP BY user_id
where ANY_VALUE() means MySQL can choose any message_id it finds most convenient from among that user's messages. That's not what you want.
To solve your problem, you need first to use a subquery to find the latest created_at date for each user_id. Fiddle.
SELECT user_id, MAX(created_at) created_at
FROM messages
WHERE created_at <= '2021-07-24'
GROUP BY user_id
Then, you need to find the message for the particular user_id created on that date. Use the subquery for that. Fiddle
SELECT a.*
FROM messages a
JOIN (
SELECT user_id, MAX(created_at) created_at
FROM messages
WHERE created_at <= '2021-07-24'
GROUP BY user_id
) b ON a.user_id = b.user_id AND a.created_at = b.created_at
See how that JOIN works? It pulls out the rows matching the latest date for each user.
There's a possible optimization. If
your message_id is an autoincrementing primary key and
you never UPDATE your created_at columns, but only set them to the current date when you INSERT the rows
then the most recent message for each user_id is also the message with the largest message_id. In that case you can use this query instead. Fiddle
SELECT a.*
FROM messages a
JOIN (
SELECT user_id, MAX(message_id) message_id
FROM messages
WHERE created_at <= '2021-07-24'
GROUP BY user_id
) b ON a.message_id=b.message_id
Due to the way primary key indexes work, this can be faster.
You want an ordinary JOIN rather than a RIGHT or LEFT JOIN here: the ordinary JOIN only returns rows that match the ON condition.
Pro tip almost nobody actually uses RIGHT JOIN. When you want that kind of JOIN, use LEFT JOIN. You don't want that kind of join to solve this problem.

SQL left join two times

The user table looks like this:
user_id
name
surname
1
a
aa
2
b
bb
3
c
cc
The book's table looks like this:
user_id
book_name
1
book1
1
book2
1
book3
2
book1
The expenses table looks like this:
user_id
amount_spent
date
1
10
2020-02-03
1
30
2020-02-02
1
10
2020-02-01
1
15
2020-01-31
1
13
2020-01-15
2
15
2020-02-01
3
20
2020-02-01
The result which I want:
CountUsers
amount_spent
2
65
Explanation: I want to count how many users have book1 and how much total they spend on a date between 2020-02-01 - 2020-02-03.
Now how the query should look like?
I am using MySQL version 8.
I have tried:
SELECT
count(*), sum(amount_spend) as total_amount_spend
FROM
(select sum(amount_spend) as amount_spend
FROM expanses
LEFT JOIN books ON books.user_id = expanses.user_id WHERE books.book_name ='book1 GROUP BY expanses.user_id) src'
And the result is wrong because I am getting a higher amount_spend than in my table result above. I think while joining the table there are some duplicates but I do not know how to fix them.
I want to count how many users have book1 and how much total they spend on a date between 2020-02-01 - 2020-02-03.
I am thinking:
select count(*), sum(e.amount_spent)
from user_books ub join
expenses e
on ub.user_id = e.user_id
where book_name = 'book1';
Note: This assumes that user_books doesn't have duplicate rows.
FIDDLE
You miss the date part in your code.
SELECT
count(*), sum(amount_spent) as total_amount_spend
FROM
(select sum(amount_spent) as amount_spent
FROM expanses
LEFT JOIN books ON books.user_id = expanses.user_id
WHERE books.book_name ='book1'
and expanses.date between '2020-02-01' and '2020-02-03'
GROUP BY expanses.user_id) src;
will do a job.
Please note that you don't need to have left join here (unless you're sure that it may happen that no expenses at all for given user will be), and you don't need to have grouping in subquery. So your query could look like:
select count(distinct expanses.user_id), sum(amount_spent) as amount_spent
from expanses
inner join books on books.user_id = expanses.user_id
where books.book_name ='book1'
and expanses.date between '2020-02-01' and '2020-02-03';

Self Join MYSQL Parent Child

For some reason this is eluding me tonight. I have 3 tables, appointments, clients, tasks. I am trying to get all the appointments for today or tomorrow, check the users table to pull the current clients, and any family members, and then pull tasks that meet another criteria to get a full list of tasks for the day or past due.
On the client table we are importing information from another datasource maintaining the original id as reference_id, if they don't have a parent_id, they are the parent, otherwise they are children.
appointment table
id | user_id | client_id | start_at
1 1 2 2018-02-15
2 1 1 2018-02-15
3 1 2 2018-02-15
4 2 4 2018-02-15
clients table:
id | reference_id | parent_id | user_id
1 35 null 1
2 36 35 1
3 37 35 1
4 35 null 2
5 36 35 2
tasks table
id | client_id | task | due_date
1 2 do something 2018-02-15
2 4 do something 2018-01-10
3 2 do something 2018-02-01
4 2 do something 2018-05-15
I got started trying to join the table onto itself, but when I try combining the parent it ends up killing all results, but only when I add WHERE parent_id > 0 to the query. Before that I am getting about 40 results, but should be well over 100. In the dataset, I have 50 appointments, some of the users have 2-5 children / relationship but I am not sure why since I am doing LEFT JOIN instead of inner join, if anything I should start ending up with redundant extra results.
SELECT
start_at,
clients.id,
clients.reference_id,
clients.parent_id,
clients.first_name,
clients.last_name,
children.user_id,
children.first_name,
children.last_name,
children.reference_id,
children.parent_id,
parents.last_name
FROM
`reminderdental2`.`appointments` AS appointment
INNER JOIN clients AS clients
ON appointment.patient_id = clients.id
INNER JOIN clients AS children
ON clients.parent_id = children.parent_id
LEFT JOIN
(SELECT
reference_id,
user_id,
last_name,
first_name,
parent_id
FROM
clients WHERE parent_id > 0) AS parents
ON clients.parent_id = parents.reference_id
WHERE clients.user_id = '27'
AND children.user_id = '27'
AND parents.user_id = '27'
AND start_at > NOW() - INTERVAL 1 DAY
AND start_at < NOW() + INTERVAL 1 DAY
Is there a better way to design this than joining the client table 3 times? Why did it kill the query adding the WHERE parent_id > 0?

Super complicated mysql query

The database (Mysql) witch i do query comes from an telephony system, and i need to read how many agents (event_parties.agent_id) is logged into different group (event_groups.group_id).
Each time an agent logges in to an group an new record is entered inside events table with event_id=29, if logout event_id=30 at the same time an new entry in table event_parties appears with same g_event_id and the agent_id representing the agent,
also in table events_groups an new entry appears with same g_event_id and group_id representing th egroup that the agents logges in/out to(inside the table event_groups the same g_event_id could the same for more than one entry if agent logges in/out more than one group at the same time).
So my thinking is that i could get the logged in agents in and group_id by selecting all records where there are no newer entry (event_time) with same event_groups.group_id and same event_parties.agent_id and the events.event_id is between 29 and 30.
events.event_id =29 means that agents logges in.
events.event_id =30 means that agents logges out.
I have some serious difficulties to design such an mysql select :(
Here are some example data in each table.
Table:
events
g_event_id event_id event_time
---------- -------- ----------
7816 31 2016-11-03 09:46:18
7815 30 2016-11-03 09:45:18
7814 31 2016-11-03 09:44:18
7813 29 2016-11-03 09:43:18
7812 30 2016-11-03 09:42:18
7811 29 2016-11-03 09:41:18
7810 31 2016-11-03 09:40:18
7809 29 2016-11-03 09:39:18
7808 31 2016-11-03 09:38:18
7807 7 2016-11-03 09:37:18
7806 29 2016-11-03 09:36:18
7805 30 2016-11-03 09:35:18
7804 30 2016-11-03 09:34:18
7803 29 2016-11-03 09:33:18
7802 29 2016-11-03 09:32:18
Table:
event_parties
g_event_id agent_id
---------- --------
7816 1
7815 1
7814 1
7813 1
7812 1
7811 1
7810 2
7809 2
7808 2
7807 3
7806 3
7805 3
7804 3
7803 3
7802 3
Table:
event_groups
g_event_id group_id
---------- --------
7816 1
7815 1
7814 1
7813 1
7813 2
7813 3
7813 4
7812 1
7811 1
7810 1
7809 1
7808 1
7807 1
7806 1
7806 3
7805 4
7804 1
7804 2
7803 4
7802 1
7802 2
From tables above i want my select statement result to be:
group_id agent_id
-------- --------
4 1
3 1
2 1
1 2
1 3
3 3
Is such a query possible, is there any sql genius out there :)
/ Kristian
SELECT group_id, agent_id
FROM (SELECT agent_id, eg.group_id, if(event_id = 29, 1, -1) AS transitions
FROM event_parties ep
JOIN `events` e ON ep.g_event_id = e.g_event_id
JOIN event_groups eg ON ep.g_event_id = eg.g_event_id
WHERE e.event_id IN (29, 30)) AS t
GROUP BY agent_id, group_id
HAVING sum(transitions) > 0
ORDER BY agent_id, group_id DESC
Link to SQL Fiddle
I think that this will do what you are saying. For every agent/group combination, it sets number of transitions to 1 if they login and -1 if they log out. Looking over the whole data set, if they have logged in and then logged out, the sum will be 0 for a specific agent group, which is calculated in the outer query.
This does depend on not starting with a log out event for a specific agent/group combination. If the data set you are looking starts with a log out event, then the user will never appear to be logged out.
Alternatively, you could get the same result by looking at the last record, and determining if it's a 29 or a 30, and only displaying the ones that are 29.
SELECT group_id, agent_id
FROM (SELECT agent_id, group_id, max(e.g_event_id) AS last_event_id
FROM event_parties ep
JOIN `events` e ON ep.g_event_id = e.g_event_id
JOIN event_groups eg ON ep.g_event_id = eg.g_event_id
WHERE e.event_id IN (29, 30)
GROUP BY agent_id, group_id) AS last_event
JOIN `events` e ON e.g_event_id = last_event.last_event_id
WHERE e.event_id = 29;
This is less dependent on where you are starting in the series, but the join is slightly more complex.
Link to SQL Fiddle
FWIW syntax style change using natural join:
select group_id, agent_id
from ( select agent_id, group_id,
max( g_event_id ) as g_event_id
from event_parties natural join `events` natural join event_groups
where event_id in (29, 30)
group
by agent_id, group_id ) as last_event
natural join `events`
where event_id = 29;
I just make a Demo now clear me what you want
select g.group_id,p.agent_id from event_groups g
join event_parties p on g.g_event_id=p.g_event_id
join events e on p.g_event_id =e.g_event_id where e.event_id=29

MySQL result count from multiple table getting wrong result?

I have three tables: attendance, cv_target, and candidate. I need to find the candidate count for a specific user.
I am not an expert in MySQL. I have tried the query below, but I'm unable to find the exact value.
SELECT
attendance_date,
cv_target_date_for,
cv_requirement,
job_id,
cv_target,
achi,
recruiter_comment,
recruiter_rating
FROM
attendance f
RIGHT JOIN
(
SELECT
cv_requirement,
cv_target,
cv_target_date_for,
achi,
recruiter_comment,
recruiter_rating
FROM
cv_target a
LEFT JOIN
(
SELECT
COUNT(candidate_id) AS achi,
cv_target_date,
fk_job_id
FROM
candidate
GROUP BY
fk_job_id,
cv_target_date
) b
ON a.cv_requirement = b.fk_job_id
AND a.cv_target_date_for = b.cv_target_date
WHERE
cv_target_date_for BETWEEN '2014-02-01' AND '2014-03-01'
AND cv_recruiter = '36'
) c
ON f.attendance_date=c.cv_target_date_for
GROUP BY
cv_requirement,
cv_target_date_for
ORDER BY
c`.`cv_target_date_for` ASC
attendance
id fk_user_id attendance_date
1 44 2014-02-24
2 44 2014-02-25
3 44 2014-02-26
4 44 2014-02-27
5 36 2014-02-24
6 44 2014-02-28
cv_target
id cv_recruiter cv_requirement cv_target cv_target_date_for
1 44 1 3 2014-02-24
2 44 2 2 2014-02-24
3 44 3 2 2014-02-25
4 44 4 3 2014-02-25
4 44 4 3 2014-02-26
candidate
candidate_id fk_posted_user_id fk_job_id cv_target_date
1 44 1 2014-02-24
2 44 3 2014-02-25
3 44 3 2014-02-25
3 44 4 2014-02-25
4 44 4 2014-02-26
5 44 5 2014-02-28
5 44 5 2014-02-28
Desired result
attendance_date cv_target_date_for job_id cv_target achi(count)
2014-02-24 2014-02-24 1 3 1
2014-02-24 2014-02-24 2 2 null
2014-02-25 2014-02-25 3 2 2
2014-02-25 2014-02-25 4 3 1
2014-02-26 2014-02-26 4 3 1
2014-02-27 2014-02-27 null null null
2014-02-28 null 5 null 2
Output that I am getting
attendance_date cv_target_date_for job_id cv_target achi(count)
2014-02-24 2014-02-24 1 3 1
2014-02-24 2014-02-24 2 2 null
2014-02-25 2014-02-25 3 2 2
2014-02-25 2014-02-25 4 3 1
2014-02-26 2014-02-26 4 3 1
Date 27 and 28 are not showing. I want those values also.
Original Answer
I think I understand what you want. The following assumes you want all attendance dates within a specific range for a specific user. And for each of those attendance dates, you want all cv_target records, if any. And for each of those, you want a count of the candidates.
Use a subquery to get the count. That's the only part that needs to go in the subquery. Only use a GROUP BY expression in the subquery, not the outer query. Only select the fields you need.
Use LEFT JOIN to get all the records from the table on the left side of the expression and only matching records from the table on the right side. So all records from attendance (that match the WHERE expression), and matching records from cv_target (regardless of whether they have a match in the candidate subquery), and then matching records from the candidate subquery.
Try this:
SELECT
DATE_FORMAT(a.attendance_date, '%Y-%m-%d') AS attendance_date,
DATE_FORMAT(t.cv_target_date_for, '%Y-%m-%d') AS cv_target_date_for,
t.cv_requirement AS job_id,
t.cv_target,
c.achi AS `achi(count)`
FROM
attendance AS a
LEFT JOIN
cv_target AS t
ON a.fk_user_id = t.cv_recruiter
AND a.attendance_date = t.cv_target_date_for
LEFT JOIN
(
SELECT
COUNT(candidate_id) AS achi,
fk_job_id,
cv_target_date
FROM
candidate
WHERE
fk_posted_user_id = 44
AND cv_target_date BETWEEN '2014-02-01' AND '2014-03-01'
GROUP BY
fk_job_id,
cv_target_date
) AS c
ON t.cv_requirement = c.fk_job_id
AND t.cv_target_date_for = c.cv_target_date
WHERE
a.fk_user_id = 44
AND a.attendance_date BETWEEN '2014-02-01' AND '2014-03-01'
ORDER BY
ISNULL(t.cv_target_date_for), t.cv_target_date_for, t.cv_requirement
Note that the following line is not necessary for the correct result. However, depending on the database structure and amount of data, it may improve performance.
AND cv_target_date BETWEEN '2014-02-01' AND '2014-03-01'
The ISNULL function is being used to sort NULL to the bottom.
I've created an SQL Fiddle showing the output you request, except for cv_target_date_for. It's not possible to output values that do not exist in the data.
UPDATE
With the new data and new requirement of retrieving data where either cv_target or candidate has data for a particular attendance date, you need to add another table to get the job IDs. In your original question you had a table with ID numbers and job titles, but it had no dates.
You might want to rethink your database design. I'm not sure I understand how your tables relate to one another, but those two new records for the candidate table appear to be orphaned. All your joins are based on date, but you don't appear to have a table that links job ID numbers to dates.
You could create a derived table by doing a UNION of cv_target and candidate. Then use the derived table as the left side of the join.
Updated query:
SELECT
DATE_FORMAT(a.attendance_date, '%Y-%m-%d') AS attendance_date,
DATE_FORMAT(t.cv_target_date_for, '%Y-%m-%d') AS cv_target_date_for,
j.job_id,
t.cv_target,
c.achi AS `achi(count)`
FROM
attendance AS a
LEFT JOIN
(
SELECT
cv_requirement AS job_id,
cv_target_date_for AS job_date
FROM
cv_target
WHERE
cv_recruiter = 44
AND cv_target_date_for BETWEEN '2014-02-01' AND '2014-03-01'
UNION
SELECT
fk_job_id AS job_id,
cv_target_date AS job_date
FROM
candidate
WHERE
fk_posted_user_id = 44
AND cv_target_date BETWEEN '2014-02-01' AND '2014-03-01'
) AS j
ON a.attendance_date = j.job_date
LEFT JOIN
cv_target AS t
ON a.fk_user_id = t.cv_recruiter
AND j.job_id = t.cv_requirement
AND j.job_date = t.cv_target_date_for
LEFT JOIN
(
SELECT
COUNT(candidate_id) AS achi,
fk_job_id,
cv_target_date
FROM
candidate
WHERE
fk_posted_user_id = 44
AND cv_target_date BETWEEN '2014-02-01' AND '2014-03-01'
GROUP BY
fk_job_id,
cv_target_date
) AS c
ON j.job_id = c.fk_job_id
AND j.job_date = c.cv_target_date
WHERE
a.fk_user_id = 44
AND a.attendance_date BETWEEN '2014-02-01' AND '2014-03-01'
ORDER BY
ISNULL(t.cv_target_date_for), t.cv_target_date_for, j.job_id
I've created an updated SQL Fiddle showing the output you request, except for cv_target_date_for. It's not possible to output values that do not exist in the data (i.e. 2014-02-27).
If that's a typo and you meant 2014-02-28, then you'll need to select the date from the derived table instead of the cv_target table. And you should probably change the column heading in the result because it's no longer the cv_target_date_for date.
To get the date from either cv_target or candidate, change this line:
DATE_FORMAT(t.cv_target_date_for, '%Y-%m-%d') AS cv_target_date_for,
to this:
DATE_FORMAT(j.job_date, '%Y-%m-%d') AS job_date,
And you may need to tweak the order by expression to suit your needs.