Select max date by grouping? - mysql

PLEASE will someone help? I've put HOURS into this silly, stupid problem. This stackoverview post is EXACTLY my question, and I have tried BOTH suggested solutions to no avail.
Here are MY specifics. I have extracted 4 records from my actual database, and excluded no fields:
master_id date_sent type mailing response
00001 2015-02-28 00:00:00 PHONE NULL NULL
00001 2015-03-13 14:45:20 EMAIL ThankYou.html NULL
00001 2015-03-13 14:34:43 EMAIL ThankYou.html NULL
00001 2015-01-11 00:00:00 EMAIL KS_PREVIEW TRUE
00001 2015-03-23 21:42:03 EMAIL MailChimp Update #2 NULL
(sorry about the alignment of the columns.)
I want to get the most recent mailing and date_sent for each master_id. (My extract is of only one master_id to make this post simple.)
So I run this query:
SELECT master_id,date_sent,mailing
FROM contact_copy
WHERE type="EMAIL"
and get the expected result:
master_id date_sent mailing
1 3/13/2015 14:45:20 ThankYou.html
1 3/13/2015 14:34:43 ThankYou.html
1 1/11/2015 0:00:00 KS_PREVIEW
1 3/23/2015 21:42:03 MailChimp Update #2
BUT, when I add this simple aggregation to get the most recent date:
SELECT master_id,max(date_sent),mailing
FROM contact_copy
WHERE type="EMAIL"
group BY master_id
;
I get an UNEXPECTED result:
master_id max(date_sent) mailing
00001 2015-03-23 21:42:03 ThankYou.html
So my question: why is it returning the WRONG MAILING?
It's making me nuts! Thanks.
By the way, I'm not a developer, so sorry if I'm breaking some etiquette rule of asking. :)

That's because when you use GROUP BY, all the columns have to be aggregate columns, and mailing is not one of them..
You should use a subquery or a join to make it work
SELECT master_id,date_sent,mailing
FROM contact_copy cc
JOIN
( SELECT master_id,max(date_sent)
FROM contact_copy
WHERE type="EMAIL"
group BY master_id
) result
ON cc.master_id= result.master_id AND cc.date_sent=result.date_sent

You're getting an "unexpected" result because of a MySQL specific extension to the GROUP BY functionality. The result you're getting is actually expected, according to the MySQL Reference Manual.
Ref: https://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html
Other database engines would reject your query as invalid... an error along the lines of "non-aggregate expressions included in the SELECT list not included in the GROUP BY".)
We can get MySQL to behave like other databases (and return an error for that query) if we include ONLY_FULL_GROUP_BY in the SQL mode.
Ref: https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html#sqlmode_only_full_group_by
To get the result you are looking for...
If the (master_id,type,date_sent) tuple is UNIQUE in contact_copy (that is, if for given values of master_id and type, there will be no "duplicate" values of date_sent), we could use a JOIN operation to retrieve the specified result.
First, we write a query to get the "maximum" date_sent for a given master_id and type. For example:
SELECT mc.master_id
, mc.type
, MAX(mc.date_sent) AS max_date_sent
FROM contact_copy mc
WHERE mc.master_id = '0001'
AND mc.type = 'EMAIL'
To retrieve the entire row associated with that "maximum" date_sent, we can use that query as an inline view. That is, wrap the query text in parens, assign an alias, and then reference that as if it were a table, for example:
SELECT c.master_id
, c.date_sent
, c.mailing
FROM ( SELECT mc.master_id
, mc.type
, MAX(mc.date_sent) AS max_date_sent
FROM contact_copy mc
WHERE mc.master_id = '0001'
AND mc.type = 'EMAIL'
) m
JOIN contact_copy c
ON c.master_id = m.master_id
AND c.type = m.type
AND c.date_sent = m.max_date_sent
Note that if there are multiple rows that have the same values of master_id,type and date_sent, there is potential to return more than one row. You could add a LIMIT 1 clause to guarantee that you return only one row; which of those rows is returned is indeterminate, without an ORDER BY clause before the LIMIT clause.

Related

Ordering MySQL 8 results by count existence in a crosswalk table

I have the following MySQL 8 tables:
[submissions]
===
id
submission_type
name
[reject_reasons]
===
id
name
[submission_reject_reasons] -- crosswalk joining the first 2 tables
===
id
submission_id
reject_reason_id
In my application, users can submit submissions, and other users can request changes to those submissions. When they request these rejections, 1+ entries get saved to the submission_reject_reasons table (which stores the ID of the submission for which rejections are requested, as well as the ID of the reason for why the rejection is being made). So a typical entry in the table might look like:
id submission_id reject_reason_id
==============================================
45 384 294
Where submission_id = 384 is the "Fizz Buzz" submission and reject_reason_id = 294 is the "Missing Required Field" reason.
I currently have a query that fetches all the reject_reasons out of the DB:
SELECT * FROM reject_reasons
I now want to modify this query to sort the results based on their usage frequency. Meaning the query might currently return:
294 | Missing Required Field
14 | Malformed Entry
1885 | Makes No Sense
etc. But lets say there are 5 entries in the submission_reject_reasons table where 294 (Missing Required Field) is the reject_reason_id, and say there are 15 enries where 1885 (Makes No Sense) is present, and 120 entries where 14 (Malformed Entry) are present. I need a query that returns all reject_reasons sorted by their count in the submission_reject_reasons (SRR) table, descending, so that the most frequently used appear earlier in the sort. Hence the result set would be:
14 | Malformed Entry --> because there are 120 instances of this in the SRR table
1885 | Makes No Sense --> because there are 15 instances in the SRR
294 | Missing Required Field --> because there are only 5 instances in the SRR
Furthermore, I need a ranking from most-used to least-used. If a reason doesn't exist in the SRR table it should have a default "count" of zero (0) but should still come back in the query. If 2+ reason counts are tied, then I don't care how they are sorted. Any ideas here? I need the final result set to only contain the rr.id and rr.name field/values.
My best attempt is not getting me anywhere:
SELECT rr.id, rr.name
FROM reject_reasons AS rr
LEFT JOIN submission_reject_reasons AS srr on rr.id = srr.reject_reason_id
GROUP BY rr.id
ORDER BY COUNT(*) DESC
Can anyone help me over the finish line here? Can anyone spot where I'm goin awry? Thanks in advance!
You should be grouping by the reject reason ID. COUNT(*) is what you want to count in each group.
SELECT rr.id, rr.name
FROM reject_reasons AS rr
JOIN submission_reject_reasons AS srr on rr.id = srr.reject_reason_id
GROUP BY rr.id
ORDER BY COUNT(*) DESC
There's no need for any EXISTS check, since the INNER JOIN won't return any reject reasons that don't exist in submission_reject_reasons.

Get rows which are related to the searched row, by specific column

I am trying to implement a sql query to below scenario,
user_id
nic_number
reg_number
full_name
code
B123
12345
1212
John
123
B124
12346
1213
Peter
124
B125
12347
1214
Darln
125
B123
12345
1212
John
126
B123
12345
1212
John
127
In the subscribers table there can be rows with same user_id , nic_number , reg_number , full_name. But the code is different.
First -> get the user who have same code i have typed in the query ( i have implemented a query for that and it is working fine)
Second -> Then in that data i need to find the related rows (check by nic_number, and reg_number) and display only those related rows. That means in the below query I have got the data for code = 123. Which will show the first row of the table.
But I need to display only the rest of the rows which have the same nic_number or reg_number for the searched code only once.
That means the last 2 rows of the table.
select code,
GROUP_CONCAT(distinct trim(nic_number)) as nic_number,
GROUP_CONCAT(distinct trim(reg_number)) as reg_number,
GROUP_CONCAT(distinct trim(full_name)) as full_name from subscribers
where code like lower(concat('123')) group by code;
I need to implement sql query for this scenario by changing the above query.(Only one query, without joins or triggers).
I have tried this for a long time and unable to get the result. If anyone of you help me to get the result it will be very helpful.
You can combine nic and reg numbers in a unique key to get your records.
EDITED
to extract only related rows and not the one searched by code,
by the way, code seems not to be unique in subscribers table.
select
code,
trim(nic_number) as nic_number,
trim(reg_number) as reg_number,
trim(full_name) as full_name,
trim(code) as code
from
subscribers s1
where
code <> lower(trim('123'))
and trim(nic_number) + '|' + trim(reg_number) IN (
select trim(nic_number) + '|' + trim(reg_number)
from subscribers
where code = lower(trim('123'))
)
I'm not sure why you have specified "without joins" - I get that you may not want to have triggers on a table (which you don't need to achieve this anyway), but a JOIN is standard SQL syntax that will help you achieve the result you are after.
Try:
SELECT
s1.code, s1.nic_number, s1.reg_number, s1.full_name
FROM subscribers s1
INNER JOIN
(
SELECT nic_number, reg_number
FROM subscribers
WHERE code = '123'
) s2
ON s1.nic_number = s2.nic_number
AND s1.reg_number = s2.reg_number
WHERE s1.code <> '123';
Or, if you really need to achieve it with no JOINs at all, then you're just doubling-up the sub-query that you need to include:
SELECT
s1.code, s1.nic_number, s1.reg_number, s1.full_name
FROM subscribers s1
WHERE s1.nic_number IN
(
SELECT nic_number FROM subscribers
WHERE code = '123'
)
AND s1.reg_number IN
(
SELECT reg_number FROM subscribers
WHERE code = '123'
)
AND s1.code <> '123';
The latter query is not necessarily ideal, but it still achieves the desired result.

Complex SQL Select query with inner join

My SQL query needs to return a list of values alongside the date, but with my limited knowledge I have only been able to get this far.
This is my SQL:
select lsu_students.student_grouping,lsu_attendance.class_date,
count(lsu_attendance.attendance_status) AS count
from lsu_attendance
inner join lsu_students
ON lsu_students.student_grouping="Central1A"
and lsu_students.student_id=lsu_attendance.student_id
where lsu_attendance.attendance_status="Present"
and lsu_attendance.class_date="2015-02-09";
This returns:
student_grouping class_date count
Central1A 2015-02-09 23
I want it to return:
student_grouping class_date count
Central1A 2015-02-09 23
Central1A 2015-02-10 11
Central1A 2015-02-11 21
Central1A 2015-02-12 25
This query gets the list of the dates according to the student grouping:
select distinct(class_date)from lsu_attendance,lsu_students
where lsu_students.student_grouping like "Central1A"
and lsu_students.student_id = lsu_attendance.student_id
order by class_date
I think you just want a group by:
select s.student_grouping, a.class_date, count(a.attendance_status) AS count
from lsu_attendance a inner join
lsu_students s
ON s.student_grouping = 'Central1A' and
s.student_id = a.student_id
where a.attendance_status = 'Present'
group by s.student_grouping, a.class_date;
Comments:
Using single quotes for string constants, unless you have a good reason.
If you want a range of class dates, then use a where with appropriate filtering logic.
Notice the table aliases. The query is easier to write and to read.
I added student grouping to the group by. This would be required by any SQL engine other than MySQL.
Just take out and lsu_attendance.class_date="2015-02-09" or change it to a range, and then add (at the end) GROUP BY lsu_students.student_grouping,lsu_attendance.class_date.
The group by clause is what you're looking for, to limit aggregates (e.g. the count function) to work within each group.
To get the number of students present in each group on each date, you would do something like this:
select student_grouping, class_date, count(*) as present_count
from lsu_students join lsu_attendance using (student_id)
where attendance_status = 'Present'
group by student_grouping, class_date
Note: for your example, using is simpler than on (if your SQL supports it), and putting the table name before each field name isn't necessary if the column name doesn't appear in more than one table (though it doesn't hurt).
If you want to limit which data rows get included, put your constraints get in the where clause (this constrains which rows are counted). If you want to constrain the aggregate values that are displayed, you have to use the having clause. For example, to see the count of Central1A students present each day, but only display those dates where more than 20 students showed up:
select student_grouping, class_date, count(*) as present_count
from lsu_students join lsu_attendance using (student_id)
where attendance_status = 'Present' and student_grouping = 'Central1A'
group by student_grouping, class_date
having count(*) > 20

MySQL ORDER BY Column = value AND distinct?

I'm getting grey hair by now...
I have a table like this.
ID - Place - Person
1 - London - Anna
2 - Stockholm - Johan
3 - Gothenburg - Anna
4 - London - Nils
And I want to get the result where all the different persons are included, but I want to choose which Place to order by.
For example. I want to get a list where they are ordered by LONDON and the rest will follow, but distinct on PERSON.
Output like this:
ID - Place - Person
1 - London - Anna
4 - London - Nils
2 - Stockholm - Johan
Tried this:
SELECT ID, Person
FROM users
ORDER BY FIELD(Place,'London'), Person ASC "
But it gives me:
ID - Place - Person
1 - London - Anna
4 - London - Nils
3 - Gothenburg - Anna
2 - Stockholm - Johan
And I really dont want Anna, or any person, to be in the result more then once.
This is one way to get the specified output, but this uses MySQL specific behavior which is not guaranteed:
SELECT q.ID
, q.Place
, q.Person
FROM ( SELECT IF(p.Person<=>#prev_person,0,1) AS r
, #prev_person := p.Person AS person
, p.Place
, p.ID
FROM users p
CROSS
JOIN (SELECT #prev_person := NULL) i
ORDER BY p.Person, !(p.Place<=>'London'), p.ID
) q
WHERE q.r = 1
ORDER BY !(q.Place<=>'London'), q.Person
This query uses an inline view to return all the rows in a particular order, by Person, so that all of the 'Anna' rows are together, followed by all the 'Johan' rows, etc. The set of rows for each person is ordered by, Place='London' first, then by ID.
The "trick" is to use a MySQL user variable to compare the values from the current row with values from the previous row. In this example, we're checking if the 'Person' on the current row is the same as the 'Person' on the previous row. Based on that check, we return a 1 if this is the "first" row we're processing for a a person, otherwise we return a 0.
The outermost query processes the rows from the inline view, and excludes all but the "first" row for each Person (the 0 or 1 we returned from the inline view.)
(This isn't the only way to get the resultset. But this is one way of emulating analytic functions which are available in other RDBMS.)
For comparison, in databases other than MySQL, we could use SQL something like this:
SELECT ROW_NUMBER() OVER (PARTITION BY t.Person ORDER BY
CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.ID) AS rn
, t.ID
, t.Place
, t.Person
FROM users t
WHERE rn=1
ORDER BY CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.Person
Followup
At the beginning of the answer, I referred to MySQL behavior that was not guaranteed. I was referring to the usage of MySQL User-Defined variables within a SQL statement.
Excerpts from MySQL 5.5 Reference Manual http://dev.mysql.com/doc/refman/5.5/en/user-variables.html
"As a general rule, other than in SET statements, you should never assign a value to a user variable and read the value within the same statement."
"For other statements, such as SELECT, you might get the results you expect, but this is not guaranteed."
"the order of evaluation for expressions involving user variables is undefined."
Try this:
SELECT ID, Place, Person
FROM users
GROUP BY Person
ORDER BY FIELD(Place,'London') DESC, Person ASC;
You want to use group by instead of distinct:
SELECT ID, Person
FROM users
GROUP BY ID, Person
ORDER BY MAX(FIELD(Place, 'London')), Person ASC;
The GROUP BY does the same thing as SELECT DISTINCT. But, you are allowed to mention other fields in clauses such as HAVING and ORDER BY.

mysql query orderby groupby,

I have a table with the following key fields: commentid, comment, time, parentid
The comments are of two types, regular and replies. Regular have parentid 0; replies have parentid equal to the commentid they reply to.
I thought I had this figured out with group by and orderby but my idea didn't work and I'm now struggling as my knowledge of sql is rudimentary.
I want it to display by time DESC except that I would like those with the same parentid grouped together and within that parentid, that they also be be sorted by time. I am allowing only one level of reply.
As an example, I would like following order:
time3 commentid3 parentid0
time2 commentid2 parentid0
parentid2 commentid4 time4 (this one is a reply)
parentid2 commentid5 time5 (this one also reply)
time1 comment1 parentid0
I tried SELECT * from comments GROUP BY parentid ORDER BY TIME DESC but this did not work. If needed, I can add another column. Any suggestions would be appreciated! Thx.
I'm making a few assumptions here. I'm assuming your commentid is an auto-incrementing id, so that would mean that the insert order would be from oldest to newest. This will not work if you are not using auto-incrementing ids or if you have some kind of partial-save functionality with these tables. So it's kind of fragile.
I'm also assuming parent_id is null if it is the parent.
SELECT commentid, comment, time, parent_id, if(parent_id = 0, commentid, parent_id) thread
FROM comments
ORDER BY thread desc, time asc
Anyway to add some more information.
Group By is not what you want to use because it will group all rows by the grouping column into one row. Group By is usually used for aggregate calculations such as counting or summing the values in rows, etc.
EDIT:
I updated the query to sort by time asc, which will put the regular comment first and then the replies below the parent comment from oldest to newest.
You can't retrieve a "tree-like" result from a sql query (well there are ways, but they don't seem practically usable here)
What you can do is to retrieve all datas from regular and replies (which implies that "regular" datas will be replicated if they have many replies, and you'll have to treat them after getting the data, to get a "tree" result)
result will look like that
regular.time, regular.commentid, replies.commentid, replies.time
if regular has no comments, replies.commentid and replies.time will be null
Query (with a "self left join") would look like that (I moved some fields that seems to be useless now)
select
regular.time as regulartime,
regular.commentid as regularid,
replies.commentid as repliesid,
replies.time as repliestime
from comments regular
left join comments replies on replies.parentid = parent.commentid
where regular.parentid = 0
order by regular.time desc, replies.time asc
Following your example, you should get
time3 commentid3 null null
time2 commentid2 commentid4 time4 (this one is a reply)
time2 commentid2 commentid5 time5 (this one also reply)
time1 commentid1 null nulll
To get both layers of the data, you will need a UNION of just the top-most layer of the original comment, and another for any POSSIBLE replies. The first column of the first part of the query will hold a 1 or 2 for sorting purposes. This will be used to float the original post to the top of the group for a given question... then, all replies will show in natural order after that.
Also, to retain the proper grouping by original date/time, I am preserving the original post comment time with the '2' CommentType records so they do stay grouped with exact same original time start basis, but grab the actual comment and time of the RESPONSE (alias "r") for their respective sorting.
select
PreQuery.*
from
( select
'1' as CommentType,
c.Time as OriginalTime,
c.CommentID StartingCommentID,
c.Comment,
c.Time as LastTime,
c.CommentID as EndCommentID
from
comments c
where
c.ParentID = 0
UNION ALL
select
'2' as CommentType,
c.Time as OriginalTime,
c.CommentID StartingCommentID,
r.Comment,
r.Time as LastTime,
r.CommentID as EndCommentID
from
comments c
join comments r
on c.CommentID = r.ParentID
where
c.ParentID = 0 ) PreQuery
order by
PreQuery.OriginalTime DESC,
PreQuery.StartingCommentID,
PreQuery.CommentType,
PreQuery.LastTime
This should give you the results I think you are looking for (slightly modified)
CommentType OriginalTime StartingCommentID Comment LastTime EndCommentID
1 Time3 ID3 Comm3 Time3 ID3 <-- ID 3 IS the start
1 Time2 ID2 Comm2 Time2 ID2 <-- ID 2 is the start of next
2 Time2 ID2 Comm4 Time4 ID4 <- ID4 is reply to orig ID2
2 Time2 ID2 Comm5 Time5 ID5 <- another reply to ID2
1 Time1 ID1 Comm1 Time1 ID1 <-- start of new comment ID1
So, for all rows, the 2nd and 3rd columns will always represent the parent ID that is starting the first comment... and for those with comment type = 1, the comment, last time and end comment ID is the actual content from the starting comment. For comment type = 2, the final comment, last time and end comment will be the ID of the RESPONSE record.