Groupwise maximum in larger query - mysql

Really struggling with a query that uses groupwise maximum, any help would be much appreciated. Feel free to point out if I should not be using groupwise maximum.
I have two tables application and email, one application can have many emails. What I'm trying to do in my query is get all details from application and join the email table (I'm actually only getting a foreign key from email for another table which indicates if the email has been replied to), getting the last email sent based on the max(timestamp), which is why I am trying to use groupwise maximum.
I've tried this, but it seems to make a duplicate of each row:
SELECT `application` . * , `email1`.`student_email_id` AS `email_student_email_id`
FROM `application`
LEFT JOIN (
SELECT MAX( tstamp ) AS tstamp, id, student_email_id, application_id
FROM email
GROUP BY id, student_email_id, application_id
) AS email1 ON `email1`.`application_id` = `application`.`id`
WHERE `application`.`status` = 'returned'
This is what seemed to work at first but is causing issues now and I'm sure it's pretty sloppy code:
select `application`.*, `email1`.`student_email_id` as `email_student_email_id`
from `application`
left join (
select student_email_id, max(tstamp) as tstamp, application_id
from email
group by application_id, tstamp
order by tstamp desc
limit 1) as email1 on `email1`.`application_id` = `application`.`id`
where `application`.`status` = 'returned'
Any guidance would be highly appreciated, if you need to see more code please ask! Thanks.
Further clarity if needed for my db set up and what should be happening (left out unimportant parts):
Application Table
+----+----------+
| id | status |
+----+----------+
| 1 | returned |
+----+----------+
Email Table
+----+------------+----------------+------------------+
| id | tstamp | application_id | student_email_id |
+----+------------+----------------+------------------+
| 1 | 2014-12-26 | 1 | NULL |
| 2 | 2014-12-27 | 1 | 3 |
+----+------------+----------------+------------------+
The query should be showing the following:
+----+----------+------------------------+
| id | status | email_student_email_id |
+----+----------+------------------------+
| 1 | returned | 3 |
+----+----------+------------------------+
First solution above shows duplicates of everything (maybe I'm nearly there) and second one shows null for the joined table columns, although I'm sure it did work at one stage or in isolation at least!

You're looking for the latest row in your Email table for each distinct application_id.
Your subquery to get that isn't quite right. Here's how you get that.
SELECT s.application_id, e.student_email_id
FROM email e
JOIN (
SELECT MAX(tstamp) tstamp, application_id
FROM email
GROUP BY application_id
) s ON e.application_id = s.application_id AND e.tstamp = s.tstamp
There's another way to do this, that might be more efficient. It will work if the id column is an autoincrement column.
SELECT s.application_id, e.student_email_id
FROM email e
JOIN (
SELECT MAX(id) id
FROM email
GROUP BY application_id
) s ON e.id = s.id
Either of these preceding subqueries gets the latest student_email_id for each application_id. The second one uses the JOIN to extract only the highest id number for each application_id, and uses that id to find the latest student_email_id.
Your subquery was this. It doesn't get what you hoped for.
SELECT MAX( tstamp ) AS tstamp, id, student_email_id, application_id /*wrong*/
FROM email
GROUP BY id, student_email_id, application_id
You grouped this by id. That means you're going to get all the detail rows. That's not what you want. Even this
SELECT MAX( tstamp ) AS tstamp, student_email_id, application_id /*wrong*/
FROM email
GROUP BY student_email_id, application_id
will give you more than one record for each application_id value.
So the query you need is:
SELECT application.* , email1.student_email_id AS email_student_email_id
FROM application
LEFT JOIN (
SELECT s.application_id, e.student_email_id
FROM email e
JOIN (
SELECT MAX(id) id
FROM email
GROUP BY application_id
) s ON e.id = s.id
) AS email1 ON email1.application_id = application.id
WHERE application.status = 'returned'
When you're designing queries like this, it's smart to test from the inside out, starting with the innermost subquery.

Related

SQL: join 2 tables based on column value and select row values where grouped column value is max

Ive got this query and I want to get all names from those clients that have the highest price of a day.
If multiple clients exist having the same max price, they shall be selected too.
I managed to get the customers with max price grouped by date but I dont think it gives me both customers if they have the same max value on the same day.
The names should be distinct.
The output needs to be as follows:
| Name (asc) |
------------------
| customer name |
| customer name |
| ...... |
The Orders table looks as follows:
|Client|Price|Orderdate |
------------------------
|1 |100.0|2010.01.10|
|... |... | ..... |
and the Client table:
|Client_NR|Name |
-----------------------
|1 |customer#001|
|2 |customer#002|
select distinct k1.NAME from Orders a LEFT JOIN Order b on a.Orderdate = b.Orderdate
JOIN Client k1 on k1.Client_NR = a.Client
where a.Price IN
(SELECT MAX(a.Price) from Order a group by Orderdate)
order by NAME asc
I presume my error lies within the Join Client line but I just cant figure it out.
Ive tried to use a.price = b.price in the first join but the test would fail.
Any advise is highly appreciated.
WITH cte AS ( SELECT Client.Name,
RANK() OVER (PARTITION BY Orders.Orderdate
ORDER BY Orders.Price DESC) rnk
FROM Client
JOIN Orders ON Client.Client_NR = Orders.Client )
SELECT Name
FROM cte
WHERE rnk = 1
ORDER BY Name

Update duplicate email addresses on mysql database table

I have a huge database that I have almost over 10k row in my user table and there are 2700 duplicate email addresses.
Basically the application did not limit the users from registering their accounts with the same email address over and over again. I have cleaned the multiple ones -more than 2 times- manually, there weren't many, but there are 2700 email addresses with duplicate value occur at least 2 times. So I want to update the duplicate email addresses and change the email address with a smaller id number to something like from "email#mail.com" to "1email#mail.com", basically adding "1" to the beginning of all duplicate email addresses. I can select and display the duplicate email addresses but could not find the way to update only one of the email addresses and leave the other on untouched.
My table structure is like id username email password.
If you do not have MySQL 8:
Here I am just prepending the id of the row to the email address:
UPDATE my_table JOIN (
SELECT email, MAX(id) AS max_id, COUNT(*) AS cnt FROM my_table
GROUP BY email
HAVING cnt > 1
) sq ON my_table.email = sq.email AND my_table.id <> sq.max_id
SET my_table.email = CONCAT( my_table.id, my_table.email)
;
See DB-Fiddle
The inner query:
SELECT email, MAX(id) AS max_id, COUNT(*) AS cnt FROM my_table
GROUP BY email
HAVING cnt > 1
looks for all emails that that are duplicated (i.e. there is more than one row with the same email address) and computes the row that has the maximum id value for each email address. For the sample data in my DB-Fiddle demo, it would return the following:
| email | max_id | cnt |
| ---------------- | ------ | --- |
| emaila#dummy.com | 3 | 3 |
| emailb#dummy.com | 5 | 2 |
The above inner query is aliased as table sq.
Now if I join my_table with the above query as follows:
SELECT my_table.* from my_table join (
SELECT email, MAX(id) AS max_id, COUNT(*) AS cnt FROM my_table
GROUP BY email
HAVING cnt > 1
) sq on my_table.email = sq.email and my_table.id <> sq.max_id
I get:
| id | email |
| --- | ---------------- |
| 1 | emaila#dummy.com |
| 2 | emaila#dummy.com |
| 4 | emailb#dummy.com |
because I am selecting from my_table all rows that have duplicate email addresses (condition my_table.email = sq.email except for the rows that have the highest value of id for each email address (condition my_table.id <> sq.max_id).
It is the ids from the above join whose email addresses are to be modified.
WITH cte AS ( SELECT id,
email,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) rn
FROM sourcetable )
UPDATE sourcetable src, cte
SET src.email = CONCAT(rn - 1, src.email)
WHERE src.id = cte.id
AND cte.rn > 1;
fiddle
I want to update the duplicate email addresses and change the email address with a smaller id number
If so the ordering in window function must be reversed:
WITH cte AS ( SELECT id,
email,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id DESC) rn
FROM sourcetable )
UPDATE sourcetable src, cte
SET src.email = CONCAT(rn - 1, src.email)
WHERE src.id = cte.id
AND cte.rn > 1;
fiddle

SQL Select Join the same table (sort by most replies)

So I want to select * from "board_b" the thread that has the most replies. My problem is that the replies are actually in the same table. Take a look at this:
+---+-----------+---------+
|ID | name | replyto |
+---+-----------+---------+
| 1 | newthread | |
| 2 | reply | 1 |
+---+-----------+---------+
(NOTE: the name column is not set to those, it is just to demonstrate) As you can see, 1 is a new thread, and 2 is a reply to 1. Now I have a table full of these, and the table has more columns (text, timestamp, etc...) but the general idea is like the one above.
The thing I want to achieve is select all threads, and sort them by most replies (and also limit by 0, 20). I've tried looking in to joining tables but it get's too complicated for me to understand, so a sample code would be great.
Something like this will do it:
SELECT board.id, board.name, COUNT(reply.id)
FROM board_b board INNER JOIN board_b reply ON board.id = reply.replyto
GROUP BY board.id, board.name
ORDER BY COUNT(reply.id) desc
LIMIT 20
You want to use group by:
select replyto as thread, count(*) as cnt
from board_b
group by replyto
order by cnt desc
limit 0, 20;
select c.replyto, c.replycount
from
(
select a.replyto as replyto, count(*) as replycount
from board_b a
inner join (
select id, name, replyto
from board_b
where replyto is null
) b
on b.id = a.replyto
group by a.replyto
) c
where c.replycount between 0 and 20
order by c.replycount desc

SQL Query - Not in a set of already in-use items

I am trying to select jobs that are not currently assigned to a user.
Users table: id | name
Jobs: id | name
Assigned: id | user_id | job_id | date_assigned
I want to select all the jobs that are not currently taken. Example:
Users:
id | name
--------------
1 | Chris
2 | Steve
Jobs
id | name
---------------
1 | Sweep
2 | Skids
3 | Mop
Assigned
id | user_id | job_id | date_assigned
-------------------------------------------------
1 | 1 | 1 | 2012-01-01
2 | 1 | 2 | 2012-01-02
3 | 2 | 3 | 2012-01-05
No two people can be assigned the same job. So the query would return
[1, Sweep]
Since no one is working on it since Chris got moved to Skids a day later.
So far:
SELECT
*
FROM
jobs
WHERE
id
NOT IN
(
SELECT
DISTINCT(job_id)
FROM
assigned
ORDER BY
date_assigned
DESC
)
However, this query returns NULL on the same data set. Not addressing that the sweep job is now open because it is not currently being worked on.
SELECT a.*
FROM jobs a
LEFT JOIN
(
SELECT a.job_id
FROM assigned a
INNER JOIN
(
SELECT MAX(id) AS maxid
FROM assigned
GROUP BY user_id
) b ON a.id = b.maxid
) b ON a.id = b.job_id
WHERE b.job_id IS NULL
This gets the most recent job per user. Once we have a list of those jobs, we select all jobs that aren't on that list.
You can try this variant:
select * from jobs
where id not in (
select job_id from (
select user_id, job_id, max(date_assigned)
from assigned
group by user_id, job_id));
I think you might want:
SELECT *
FROM jobs
WHERE id NOT IN (SELECT job_id
from assigned
where user_id is not null
)
This assumes that re-assigning someone changes the user id on the original assignment. Does this happen? By the way, I also simplified the subquery.
First you need to be looking at a list of only current job assignments. Ordering isn't enough. The way you have it set up, you need a distinct subset of job assignments from Assigned that are the most recent assignments.
So you want a grouping subquery something like
select job_id, user_id, max(date_assigned) last_assigned from assigned group by job_id, user_id
Put it all together and you get
select id, name from jobs
where id not in (
select job_id as id from (
select job_id, user_id, max(date_assigned) last_assigned from assigned
group by job_id, user_id
)
)
As an extra feature, you could pass up the value of "last_assigned" and it would tell you how long a job has been idle for.

MySQL getting the lowest ID for a certain user -or- the ID of the entry with the highest urgency for each row

I have the following database
id | user | urgency | problem | solved
The information in there has different users, but these users all have multiple entries
1 | marco | 0 | MySQL problem | n
2 | marco | 0 | Email problem | n
3 | eddy | 0 | Email problem | n
4 | eddy | 1 | MTV doesn't work | n
5 | frank | 0 | out of coffee | y
What I want to do is this: Normally I would check everybody's oldest problem first. I use this query to get the ID's of the oldest problem.
select min(id) from db group by user
this gives me a list of the oldest problem ID's. But I want people to be able to make a certain problem more urgent. I want the ID with the highest urgency for each user, or ID of the problem with the highest urgency
Getting the max(urgency) won't give the ID of the problem, it will give me the max urgency.
To be clear: I want to get this as a result
row | id
0 | 1
1 | 4
The last entry should be in the results since it's solved
Select ...
From SomeTable As T
Join (
Select T1.User, Min( T1.Id ) As Id
From SomeTable As T1
Join (
Select T2.User, Max( T2.Urgency ) As Urgency
From SomeTable As T2
Where T2.Solved = 'n'
Group By T2.User
) As MaxUrgency
On MaxUrgency.User = T1.User
And MaxUrgency.Urgency = T1.Urgency
Where T1.Solved = 'n'
Group By T1.User
) As Z
On Z.User = T.User
And Z.Id = T.Id
There are lots of esoteric ways to do this, but here's one of the clearer ones.
First build a query go get your min id and max urgency:
SELECT
user,
MIN(id) AS min_id,
MAX(urgency) AS max_urgency
FROM
db
GROUP BY
user
Then incorporate that as a logical table into
a larger query for your answers:
SELECT
user,
min_id,
max_urgency,
( SELECT MIN(id) FROM db
WHERE user = a.user
AND urgency = a.max_urgency
) AS max_urgency_min_id
FROM
(
SELECT
user,
MIN(id) AS min_id,
MAX(urgency) AS max_urgency
FROM
db
GROUP BY
user
) AS a
Given the obvious indexes, this should be pretty efficient.
The following will get you exactly one row back -- the most urgent, probably oldest problem in your table.
select id from my_table where id = (
select min(id) from my_table where urgency = (
select max(urgency) from my_table
)
)
I was about to suggest adding a create_date column to your table so that you could get the oldest problem first for those problems of the same urgency level. But I'm now assuming you're using the lowest ID for that purpose.
But now I see you wanted a list of them. For that, you'd sort the results by ID:
select id from my_table where urgency = (
select max(urgency) from my_table
) order by id;
[Edit: Left out the order by!]
I forget, honestly, how to get the row number. Someone on the interwebs suggests something like this, but no idea if it works:
select #rownum:=#rownum+1 ‘row', id from my_table where ...