GROUP BY with MAX date field - erratic results - mysql

Have a table containing form data. Each row contains a section_id and field_id. There are 50 distinct fields for each section. As users update an existing field, a new row is inserted with an updated date_modified. This keeps a rolling archive of changes.
The problem is that I'm getting erratic results when pulling the most recent set of fields to display on a page.
I've narrowed down the problem to a couple of fields, and have recreated a portion of the table in question on SQLFiddle.
Schema:
CREATE TABLE IF NOT EXISTS `cTable` (
`section_id` int(5) NOT NULL,
`field_id` int(5) DEFAULT NULL,
`content` text,
`user_id` int(11) NOT NULL,
`date_modified` datetime NOT NULL,
KEY `section_id` (`section_id`),
KEY `field_id` (`field_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This query shows all previously edited rows for field_id 39. There are five rows returned:
SELECT cT.*
FROM cTable cT
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here's what I'm trying to do to pull the most recent row for field_id 39. No rows returned:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Record Count: 0;
If I try the same query on a different field_id, say 54, I get the correct result:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=54;
Record Count: 1;
Why would same query work on one field_id, but not the other?

In your subquery from where you are getting maxima you need to GROUP BY section_id,field_id using just GROUP BY field_id is skipping the section id, on which you are applying filter
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT section_id,field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY section_id,field_id
) AS max
ON(max.field_id =cT.field_id
AND max.date_modified=cT.date_modified
AND max.section_id=cT.section_id
)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
See Fiddle Demo

You are looking for the max(date_modified) per field_id. But you should look for the max(date_modified) per field_id where the section_id is 123. Otherwise you may find a date for which you find no match later.
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable
WHERE section_id = 123
GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here is the SQL fiddle: http://www.sqlfiddle.com/#!2/0cefd8/19.

Related

MySQL JOIN/AGGREGATE function output

I have two tables in my database :
select * from marks;
select * from subjects;
I need to find the id of the students who got the highest marks in each subject along with the subject name, i.e., Resultset should have 3 columns:
student_id
subject_name
maximum_marks
1
PHYSICS
97.5
2
CHEMSITRY
98.5
Please help me write the query for the above result set
This is what I've tried so far
select m.student_id, s.subject_name, max(m.marks) as maximum_marks from
marks m inner join subjects s
on m.subject_id=s.subject_id
group by m.subject_id;
OUTPUT:
SQL Fiddle Demo
select m.student_id, s.subject_name, m.max_marks
from subjects s join (
select student_id,subject_id, max(marks) as max_marks
from marks
group by student_id,subject_id
order by 3 desc
) as m
on s.subject_id = m.subject_id
group by s.subject_id
Schema & sample & ONLY_FULL_GROUP_BY disabled
CREATE TABLE IF NOT EXISTS `marks` (
`student_id` int(6) NOT NULL,
`subject_id` int(6) NOT NULL,
`marks` float NOT NULL
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `subjects` (
`subject_id` int(6) NOT NULL,
`subject_name` varchar(10) NOT NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `marks` (`student_id`, `subject_id`, `marks`) VALUES
(1,1,97.5),(1,2,92.5),
(2,1,90.5),(2,2,98.5),
(3,1,90.5),(3,2,67.5),
(4,1,80.5),(4,2,97.5);
INSERT INTO `subjects` (`subject_id`, `subject_name`) VALUES
(2,"Chemistry"),(1,"Physics");
I've found a little bit better solution, this is a common use-case of correlated sub-queries, the output can be achieved without a group-by.
select m1.student_id, m1.subject_id, m1.marks, s.subject_name
from marks m1 inner join subjects s
on m1.subject_id=s.subject_id
where m1.marks=
(select max(marks) from marks m2 where m1.subject_id=m2.subject_id);

Mysql Different order by result between inner join query and exists query

I have 2 tables in the database
User table
has columns (name, name_ar, ...)
User Profile table
has columns (user_id, office_id, address, mobile, ...)
the relationship between the two tables is one to one relation
Now, I'm trying to filter users by their office and order them by name_ar.
I tried two different queries to do this and I expect the same result from the two queries but the result is different in order.
SELECT
`id`, `name_ar`
FROM
`users`
WHERE EXISTS
(
SELECT
*
FROM
`user_profiles`
WHERE
`users`.`id` = `user_profiles`.`user_id` AND `office_id` = 1
) AND(
`group` = "doctor" AND `state` = "active"
) AND `users`.`deleted_at` IS NULL
ORDER BY
`name_ar` IS NULL, `name_ar` ASC
SELECT
`u`.`id`,
`name_ar`
FROM
`users` u
INNER JOIN `user_profiles` up ON
`u`.`id` = `up`.`user_id`
WHERE
`group` = "doctor" AND `state` = "active" AND `up`.`office_id` = 1
ORDER BY
`name_ar` IS NULL, `name_ar` ASC
the two results do not have the same order from the beginning of appearing null value in name_ar column (from the fifth row exactly the order is different between the two results), Can any explain for me why is this happens? Is it because of null values or another reason?
The 1st condition of the ORDER BY clause:
`name_ar` IS NULL
sends all nulls to the end of the results.
The 2nd:
`name_ar` ASC
sorts the non null names alphabetically but when it comes to the null names at the end there is not any defined order for them.
What you can do is add another final condition, like:
`id` ASC
so you have all the nulls (and duplicate names if they exist) sorted by id:
ORDER BY `name_ar` IS NULL, `name_ar`, `id`

How to get max date of group of rows

I am looking to get the latest date of a select statement inside a select statement. I am using Hibernate, so there are limitations to normal MySQL such as not being able to have the select statement in the from area or inside MAX.
Here is a test structure:
CREATE TABLE User (
username varchar(20) NOT NULL PRIMARY KEY,
locationId int(10) NOT NULL
);
CREATE TABLE UserRecords (
id int(10) NOT NULL AUTO_INCREMENT PRIMARY KEY,
username varchar(20) NOT NULL,
recordDate datetime NOT NULL
);
INSERT INTO User VALUES ('test',1);
INSERT INTO User VALUES ('test2',2);
INSERT INTO User VALUES ('test3',1);
INSERT INTO UserRecords VALUES (null,'test','2018-02-10 14:29:40');
INSERT INTO UserRecords VALUES (null,'test2','2018-03-11 12:21:10');
INSERT INTO UserRecords VALUES (null,'test3','2018-05-18 11:11:15');
INSERT INTO UserRecords VALUES (null,'test','2018-06-20 16:58:50');
This is what I am after and works regularly, but doesn't work in Hibernate:
SELECT
u.locationId,
MAX(
SELECT
MAX(ur.recordDate)
FROM
UserRecords
WHERE
ur.username=u.username
)
FROM
User u
GROUP BY
u.locationId
The closest I can get is by just listing the max dates of each user and then have to parse them after.
SELECT
u.locationId,
GROUP_CONCAT(
CONCAT('''',
SELECT
MAX(ur.recordDate)
FROM
UserRecords
WHERE
ur.username=u.username
, '''')
)
FROM
User u
GROUP BY
u.locationId
This is really stripped down, but hopefully you get the idea.
Looks like you're trying to get the max record date per location id which can be achieved joining nest subQueries
location ID's max record date
SELECT
u.locationId,
Max(urRecordDate.maxRecordDate)
FROM User u
INNER JOIN
(SELECT
ur.username,
MAX(ur.recordDate) AS maxRecordDate
FROM UserRecords ur
GROUP BY ur.username) AS urRecordDate
ON u.username = urRecordDate.username
GROUP BY u.locationId
Users max record date and locationId
SELECT
u.locationId,
urRecordDate.maxRecordDate
FROM User u
INNER JOIN
(SELECT
ur.username,
MAX(ur.recordDate) AS maxRecordDate
FROM UserRecords ur
GROUP BY ur.username) AS urRecordDate
ON u.username = urRecordDate.username
using native SQL queries in hibernate
Another approach:
select u.locationId, ur.recordDate
FROM User u
JOIN UserRecords ur on (ur.username = u.username)
ORDER BY ur.recordDate desc
LIMIT 1;

Update mysql table based with group_concat

UPDATE BELOW!
Who can help me out
I have a table:
CREATE TABLE `group_c` (
`parent_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`child_id` int(11) DEFAULT NULL,
`number` int(11) DEFAULT NULL,
PRIMARY KEY (`parent_id`)
) ENGINE=InnoDB;
INSERT INTO group_c(parent_id,child_id)
VALUES (1,1),(2,2),(3,3),(4,1),(5,4),(6,4),(7,6),(8,1),(9,2),(10,1),(11,1),(12,1),(13,0);
I want to update the number field to 1 for each child that has multiple parents:
SELECT group_concat(parent_id), count(*) as c FROM group_c group by child_id having c>1
Result:
GROUP_CONCAT(PARENT_ID) C
12,11,10,8,1,4 6
9,2 2
6,5 2
So all rows with parent_id 12,11,10,8,1,4,9,2,6,5 should be updated to number =1
I've tried something like:
UPDATE group_c SET number=1 WHERE FIND_IN_SET(parent_id, SELECT pid FROM (select group_concat(parent_id), count(*) as c FROM group_c group by child_id having c>1));
but that is not working.
How can I do this?
SQLFIDDLE: http://sqlfiddle.com/#!2/acb75/5
[edit]
I tried to make the example simple but the real thing is a bit more complicated since I'm grouping by multiple fields. Here is a new fiddle: http://sqlfiddle.com/#!2/7aed0/11
Why use GROUP_CONCAT() and then try to do something with it's result via FIND_IN_SET() ? That's not how SQL is intended to work. You may use simple JOIN to retrieve your records:
SELECT
parent_id
FROM
group_c
INNER JOIN
(SELECT
child_id,
count(*) as c
FROM
group_c
group by
child_id
having c>1) AS childs
ON childs.child_id=group_c.child_id
-check your modified demo. If you want UPDATE, then just use:
UPDATE
group_c
INNER JOIN
(SELECT
child_id,
count(*) as c
FROM
group_c
group by
child_id
having c>1) AS childs
ON childs.child_id=group_c.child_id
SET
group_c.number=1
For anyone interested. This is how I solved it. It's in two queries but in my case it's not really an issue.
UPDATE group_c INNER JOIN (
SELECT parent_id, count( * ) AS c
FROM `group_c`
GROUP BY child1,child2
HAVING c >1
) AS cc ON cc.parent_id = group_c.parent_id
SET group_c.number =1 WHERE number =0;
UPDATE group_c INNER JOIN group_c as gc ON
(gc.child1=group_c.child1 AND gc.child2=group_c.child2 AND gc.number=1)
SET group_c.number=1;
fiddle: http://sqlfiddle.com/#!2/46d0b4/1/0
Here's a similar solution...
UPDATE group_c a
JOIN
( SELECT DISTINCT x.child_id candidate
FROM group_c x
JOIN group_c y
ON y.child_id = x.child_id
AND y.parent_id < x.parent_id
) b
ON b.candidate = a.child_id
SET number = 1;
http://sqlfiddle.com/#!2/bc532/1

Is there a more efficent way to write this query?

Ok imagine the following DB structure
USERS:
id | name | company_id
1 John 1
2 Jane 1
3 Jack 2
4 Jill 3
COMPANIES:
id | name
1 CompanyA
2 CompanyB
3 CompanyC
4 CompanyD
First I want to SELECT all the companies that have more than one user
SELECT
`c`.`name`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
Easy enough. Now I want to SELECT all the users that belong to a company that has more than one user. I have this combined query but I think this is not efficent
SELECT * FROM `users` WHERE `company_id` = (
SELECT
`c`.`id`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
)
Basically I take the id returned from the first query (companies that have more than 1 user) and then query the users table to find all users with that company.
Why not
SELECT * FROM users u GROUP BY u.company_id HAVING COUNT(u.id) > 1
You don't really need any information from the companies table according to the data you say needs returning. "Now I want to SELECT all the users that belong to a company that has more than one user."
try this:
SELECT u.id,u.name,u.company_id FROM users u
inner join companies c on u.company_id = c.id
group by c.id
having count(u.id) > 1
Simplest way to get the users only is probably to keep the subquery but eliminate the join; since it's not a correlated subquery, it should be fairly efficient (obviously an index on company_id helps here);
SELECT u.* FROM USERS u WHERE company_id IN (
SELECT company_id FROM USERS GROUP BY company_id HAVING COUNT(*)>1
);
You could for example rewrite it as a LEFT JOIN, but I suspect it will actually be less efficient since you'd most likely need to use a DISTINCT when using a JOIN;
SELECT DISTINCT u.*
FROM USERS u
LEFT JOIN USERS u2
ON u.company_id=u2.company_id AND u.id<>u2.id
WHERE u2.id IS NOT NULL;
An SQLfiddle to test both.
Try also a semi-join query:
SELECT *
FROM users u
WHERE EXISTS (
SELECT null FROM users u1
WHERE u.company_id=u1.company_id
AND u.id <> u1.id
)
demo --> http://www.sqlfiddle.com/#!2/12dc34/2
Assumming that id is a primary key column, creating an index on company_id column gives better performance.
If you are really obsessed with the performance of this query, create a composite index on columns company_id + id:
CREATE INDEX very_fast ON users( company_id, id );
Could you try this?
SELECT users.*
FROM users INNER JOIN
(
SELECT company_id
FROM users
GROUP BY company_id
HAVING COUNT(*) > 1
) x USING(company_id);
You should have an index INDEX(company_id)
Peformance Test
I have tested 3 queries in answers.
Q1 = sub-query (with GROUP BY) and INNER JOIN
Q2 = LEFT JOIN and IS NOT NULL
Q3 = EXISTS
All queries return same result. Test was done with TPC-H lineitem table. And The problem is "find lineitem have more than 1 item"
Test Results
It depends on what you want is retrieving FIRST N row or entire rows.
Q1 (get FIRST 10K rows) : 2.85 sec
Q2 (get FIRST 10K rows) : 0.03 sec
Q3 (get FIRST 10K rows) : 0.03 sec
Q1 (get all rows) : 8.19 sec
Q2 (get all rows) : 34.12 sec
Q3 (get all rows) : 29.54 sec
Schema and DATA
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM lineitem\G
*************************** 1. row ***************************
COUNT(*): 11997996
1 row in set (1.68 sec)
mysql> SHOW CREATE TABLE lineitem\G
*************************** 1. row ***************************
Table: lineitem
Create Table: CREATE TABLE `lineitem` (
`l_orderkey` int(11) NOT NULL,
`l_partkey` int(11) NOT NULL,
`l_suppkey` int(11) NOT NULL,
`l_linenumber` int(11) NOT NULL,
`l_quantity` decimal(15,2) NOT NULL,
`l_extendedprice` decimal(15,2) NOT NULL,
`l_discount` decimal(15,2) NOT NULL,
`l_tax` decimal(15,2) NOT NULL,
`l_returnflag` char(1) NOT NULL,
`l_linestatus` char(1) NOT NULL,
`l_shipDATE` date NOT NULL,
`l_commitDATE` date NOT NULL,
`l_receiptDATE` date NOT NULL,
`l_shipinstruct` char(25) NOT NULL,
`l_shipmode` char(10) NOT NULL,
`l_comment` varchar(44) NOT NULL,
PRIMARY KEY (`l_orderkey`,`l_linenumber`),
KEY `l_orderkey` (`l_orderkey`),
KEY `l_partkey` (`l_partkey`,`l_suppkey`),
CONSTRAINT `lineitem_ibfk_1` FOREIGN KEY (`l_orderkey`) REFERENCES `orders` (`o_orderkey`),
CONSTRAINT `lineitem_ibfk_2` FOREIGN KEY (`l_partkey`, `l_suppkey`) REFERENCES `partsupp` (`ps_partkey`, `ps_suppkey`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Queries
Q1 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey)
LIMIT 10000;
Q2 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL
LIMIT 10000;
Q3 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
)
LIMIT 10000;
retrieve entire rows
Q1 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey);
Q2 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL;
Q3 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
);