GROUP_BY, MAX() and only_full_group_by - mysql

I have a table:
ID ACCOUNT BALANCE TIME
1 Bill 10 1478885000
2 Bill 10 1478885001
3 James 5 1478885002
4 Ann 20 1478885003
5 Ann 15 1478885004
I want to get latest (based on TIME) balance of several accounts. I.e.:
ACCOUNT BALANCE
Bill 10
Ann 15
I try to use this SQL:
SELECT ACCOUNT, BALANCE, max(TIME)
FROM T1
WHERE ACCOUNT IN ( 'Bill', 'Ann')
GROUP BY ACCOUNT
I receive error:
1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'BALANCE' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I understand the error and tried different SQLs but still do not understand how to retrieve needed data without multiple queries.
P.S. I use MySQl 5.7

SELECT T1.ACCOUNT, T1.BALANCE, T1.TIME
FROM T1
JOIN (SELECT ACCOUNT, max(TIME) as m_time
FROM T1
WHERE T1.ACCOUNT IN ( 'Bill', 'Ann')
GROUP BY ACCOUNT ) T2
ON T1.ACCOUNT = T2.ACCOUNT
AND T1.TIME = T2.m_time
WHERE T1.ACCOUNT IN ( 'Bill', 'Ann')
EDIT: for multiple time change better use variables
SQL DEMO: I change the date of Ann to be the same
SELECT ACCOUNT, BALANCE, TIME
FROM (
SELECT ACCOUNT, BALANCE, TIME,
#rn := if(ACCOUNT = #acc,
#rn + 1,
if(#acc := ACCOUNT, 1, 1) as rn
FROM T1, (SELECT #rn := 0, #acc:= '') P
WHERE ACCOUNT IN ( 'Bill', 'Ann')
ORDER BY ACCOUNT, TIME desc, BALANCE desc
) T
WHERE T.rn = 1
OUTPUT
| ACCOUNT | BALANCE | TIME |
|---------|---------|------------|
| Bill | 10 | 1478885001 |
| Ann | 20 | 1478885003 |

The error is quite clear. If you want the latest balance for each account, here is one way:
select t1.*
from t1
where t1.time = (select max(tt1.time) from t1 tt1 where t1.account = tt1.account);
You can add the where in the outer query to filter for particular accounts.

You have column in your select that are not in group by
or you add all the column not in aggregated function
SELECT ACCOUNT, BALANCE, max(TIME)
FROM T1
WHERE ACCOUNT IN ( 'Bill', 'Ann')
GROUP BY ACCOUNT, BALANCE
or you change the sql_mode using
SET sql_mode = ''
or
SELECT ACCOUNT, BALANCE, TIME
FROM T1
where id In (
select id from T1 where (account, time ) in (
select account, max(time)
from t1
WHERE ACCOUNT IN ( 'Bill', 'Ann') group by account))

Related

How join statements execute in sql

I'm trying to fetch the data from user table such that every row contains date value(not null). If value is null then it should be view that column with a date of id of above date which have same id.
Without updating the table rows, only with select statement?
Here is the table
NAME, DATE, ID
A, 2021-01-21, 1
B, null, 1
C, null, 1
D, 2021-01-18, 2
D, null, 2
It should be viewed like
A, 2021-01-21, 1
B, 2021-01-21, 1
C, 2021-01-21, 1
D, 2021-01-18, 2
D, 2021-01-18, 2
Now the query I think is =>
select t1.name, t2.date ,t1.id from user t1
left join (select id ,date from user where id=1) t2
on t1.id=t2.id;
But this query doesn't work like I thought.
Can anyone please tell me how above join query works ? And how can I improve it ? So that I got the required result.
For testing of above query use this queries =>
create table user(
name varchar(20),
date date,
id integer
);
insert into user values("A",'2021-01-21',1);
insert into user values("",null,1);
insert into user values("",null,1);
insert into user values("",null,1);
insert into user values("",null,1);
insert into user values("",null,1);
insert into user values("B",'2021-01-20',2);
select t1.name, t2.date ,t1.id from user t1
left join (select id ,date from user where id=1) t2
on t1.id=t2.id;
The first problem is that you are joining a table with itself on the condition t1.id = t2.id. So if you have 4 rows with id=1 and 3 rows with id=2 just as an example, you will end up with a result that had 4 * 4 + 3 * 3 = 25 rows. In your specific case you will end up with 6 * 6 + 1 * 1 = 37 rows.
The second problem is that you have hard-code selecting id=1 in your subquery:
(select id ,date from user where id=1) t2
This can't be the appropriate value for all possible rows.
You could try the obvious:
select
t1.name,
ifnull(t1.date, (select t2.date from user t2 where t2.date is not null and t2.id = t1.id limit 1)) as date,
t1.id
from user t1
;
see db-fiddle
name
id
date
A
1
2021-01-21
1
2021-01-21
1
2021-01-21
1
2021-01-21
1
2021-01-21
1
2021-01-21
B
2
2021-01-20
But better would be to use a join:
select u.name, ifnull(u.date, sq.date) as date, u.id
from user u join (
select id, min(date) as date from user group by id
) sq on u.id = sq.id
;
see db-fiddle
I would expect the second version using a join to be more efficient because the first version has a dependent subquery that has to get executed for every row that has a null date.
You don't need a join. Just use a window function:
select name,
max(date) over (partition by id) as date,
id
from users;
Note that your sample data doesn't match the data in the question. That data suggests:
select max(name) over (partition by id) as name,
max(date) over (partition by id) as date,
id
from user;
Here is a db<>fiddle.

Showing balances of transactions for a given user

I have a situation when I need to show the balance for each user, in relation to other users.
Table structure & dummy data script:
CREATE TABLE transactions (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
user1 INT NOT NULL,
user2 INT NOT NULL,
amount INT NOT NULL
);
INSERT INTO transactions VALUES(1, 1, 2, 10);
INSERT INTO transactions VALUES(2, 1, 3, 15);
INSERT INTO transactions VALUES(3, 4, 1, 25);
INSERT INTO transactions VALUES(4, 1, 5, 20);
INSERT INTO transactions VALUES(5, 5, 1, 18);
INSERT INTO transactions VALUES(6, 5, 1, 2);
Result:
Now I want to sum-up information (balances) for user = 1. The result that I want to see is this:
user balance
2 10
3 15
4 -25
5 0
Now, I am using the latest stable MySQL version 5.7.17-0ubuntu0.16.04.1.
And I have 2 problems:
MySQL does not support FULL OUTER JOIN clause
MySQL does not support WITH clause
My hands are tied at this point. I want to write a fast and efficient query for above situation. Here are my two attempts (none is working):
This one is not working because I can not use FULL OUTER JOIN clause
SELECT IFNULL(t3.user, t4.user), IFNULL(t3.amount, 0) - IFNULL(t4.amount, 0)
FROM (
select t1.user2 user, sum(t1.amount) amount
from transactions t1
where 1=1
and t1.user1 = 1
group by t1.user2
) t3
FULL OUTER JOIN (
select t2.user1 user, sum(t2.amount) amount
from transactions t2
where 1=1
and t2.user2 = 1
group by t2.user1
) t4 ON t3.user = t4.user
This one is not working because I can not use WITH clause
WITH t3 AS
(
select t1.user2 user, sum(t1.amount) amount
from transactions t1
where 1=1
and t1.user1 = 1
group by t1.user2
),
t4 AS
(
select t2.user1 user, sum(t2.amount) amount
from transactions t2
where 1=1
and t2.user2 = 1
group by t2.user1
)
SELECT
t1.user,
IFNULL(t3.amount, 0) - IFNULL(t4.amount, 0) balance
FROM t1
LEFT JOIN t3 ON t1.user = t2.user
UNION
SELECT t2.user FROM t1
RIGHT JOIN t3 ON t1.user = t2.user
Update
Using the solution provided by Gurwinder Singh I was able to test the performance for both queries on around 5 millions of rows of test data (although number of data where either user1 = 1 or user2 = 1 - is far less than that).
and (with union)
accordingly. Query 1 is 34% faster ((3.4-2.24)/3.4*100 = 34).
Note that there are no indexes on this table. I will later try to do the same kind of testing using MariaDB and compare the results.
Update 2
After indexing columns: user1, user2, amount the situation has changed.
Query 1 run time:
Showing rows 0 - 2 (3 total, Query took 1.9857 seconds.)
Query 2 run time:
Showing rows 0 - 2 (3 total, Query took 1.5641 seconds.)
But I still think that this is quite bad result. Maybe I will put some triggers to update the balance into a dedicated table. But at this point the answer is answered.
You can use CASE based conditional aggregation:
Try this:
select case
when user1 = 1
then user2
else user1
end as user,
sum(case
when user1 = 1
then amount
else - amount
end) as amount
from transactions
where 1 in (user1, user2)
group by case
when user1 = 1
then user2
else user1
end;
Demo
Or a two step aggregation:
select user, sum(amount) as amount
from (
select user2 as user, sum(amount) as amount
from transactions
where user1 = 1
group by user2
union all
select user1 as user, -sum(amount) as amount
from transactions
where user2 = 1
group by user1
) t
group by user;
Demo

MySQL SUM of LIMIT(3) rows with GROUP BY

Good people, need a bit of a help with MySQL. Tried few solutions online but could get it right.
I have this simple table.
name amount
john | 150
john | 100
john | 100
john | 150
jack | 300
jack | 100
jack | 100
Basically, I have to get the users that have sum of 500 in at least 3 rows(ordered by the highest amount).
The correct answer should only return jack because only he has sum of 500 in 3 records(ordered by highest). Where else john has 500 in total sum, 3 of his highest amounts would only return 400(150+150+100), so the query doesn't return john.
SELECT
*,
SUM(amount) as sums
FROM (SELECT * FROM transfer GROUP BY name ORDER BY amount DESC LIMIT 3) as ttl
GROUP BY name
HAVING sums >= 500
It works fine(no errors at least), but the second select(the one inside the bracket) only returns the first row.
Any help is highly appreciated.
Let me assuming that you have another column that is a unique id. Then you can do this as:
select distinct t1.name
from transfer t1 left join
transfer t2
on t1.name = t2.name and t1.id < t2.id left join
transfer t3
on t1.name = t3.name and t2.id < t3.id
where t1.amount + coalesce(t2.amount, 0) + coalesce(t3.amount, 0) >= 500;
This is not wildly efficient for larger tables. For that, use variables to enumerate the values:
select name
from (select t.*,
(#rn := if(#n = name, #rn + 1,
if(#n := name, 1, 1)
)
) as rn
from transfer cross join
(select #n := '', #rn := 0) params
order by name, amount desc
) t
where rn <= 3
group by name
having sum(amount) >= 500;
This also has the benefit that it does not rely on an id column.

Select row with most recent date per user

I have a table ("lms_attendance") of users' check-in and out times that looks like this:
id user time io (enum)
1 9 1370931202 out
2 9 1370931664 out
3 6 1370932128 out
4 12 1370932128 out
5 12 1370933037 in
I'm trying to create a view of this table that would output only the most recent record per user id, while giving me the "in" or "out" value, so something like:
id user time io
2 9 1370931664 out
3 6 1370932128 out
5 12 1370933037 in
I'm pretty close so far, but I realized that views won't accept subquerys, which is making it a lot harder. The closest query I got was :
select
`lms_attendance`.`id` AS `id`,
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`,
`lms_attendance`.`io` AS `io`
from `lms_attendance`
group by
`lms_attendance`.`user`,
`lms_attendance`.`io`
But what I get is :
id user time io
3 6 1370932128 out
1 9 1370931664 out
5 12 1370933037 in
4 12 1370932128 out
Which is close, but not perfect. I know that last group by shouldn't be there, but without it, it returns the most recent time, but not with it's relative IO value.
Any ideas?
Thanks!
Query:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
FROM lms_attendance t2
WHERE t2.user = t1.user)
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
Note that if a user has multiple records with the same "maximum" time, the query above will return more than one record. If you only want 1 record per user, use the query below:
SQLFIDDLEExample
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
FROM lms_attendance t2
WHERE t2.user = t1.user
ORDER BY t2.id DESC
LIMIT 1)
No need to trying reinvent the wheel, as this is common greatest-n-per-group problem. Very nice solution is presented.
I prefer the most simplistic solution (see SQLFiddle, updated Justin's) without subqueries (thus easy to use in views):
SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
ON t1.user = t2.user
AND (t1.time < t2.time
OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL
This also works in a case where there are two different records with the same greatest value within the same group - thanks to the trick with (t1.time = t2.time AND t1.Id < t2.Id). All I am doing here is to assure that in case when two records of the same user have same time only one is chosen. Doesn't actually matter if the criteria is Id or something else - basically any criteria that is guaranteed to be unique would make the job here.
Based in #TMS answer, I like it because there's no need for subqueries but I think ommiting the 'OR' part will be sufficient and much simpler to understand and read.
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL
if you are not interested in rows with null times you can filter them in the WHERE clause:
SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
ON t1.user = t2.user
AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL
Already solved, but just for the record, another approach would be to create two views...
CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));
CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la
GROUP BY la.user;
CREATE VIEW latest_io AS
SELECT la.*
FROM lms_attendance la
JOIN latest_all lall
ON lall.user = la.user
AND lall.time = la.time;
INSERT INTO lms_attendance
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');
SELECT * FROM latest_io;
Click here to see it in action at SQL Fiddle
If your on MySQL 8.0 or higher you can use Window functions:
Query:
DBFiddleExample
SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;
Result:
| ID | USER | TIME | IO |
--------------------------------
| 2 | 9 | 1370931664 | out |
| 3 | 6 | 1370932128 | out |
| 5 | 12 | 1370933037 | in |
The advantage I see over using the solution proposed by Justin is that it enables you to select the row with the most recent data per user (or per id, or per whatever) even from subqueries without the need for an intermediate view or table.
And in case your running a HANA it is also ~7 times faster :D
Ok, this might be either a hack or error-prone, but somehow this is working as well-
SELECT id, MAX(user) as user, MAX(time) as time, MAX(io) as io FROM lms_attendance GROUP BY id;
select b.* from
(select
`lms_attendance`.`user` AS `user`,
max(`lms_attendance`.`time`) AS `time`
from `lms_attendance`
group by
`lms_attendance`.`user`) a
join
(select *
from `lms_attendance` ) b
on a.user = b.user
and a.time = b.time
I have tried one solution which works for me
SELECT user, MAX(TIME) as time
FROM lms_attendance
GROUP by user
HAVING MAX(time)
I have a very large table and all of the other suggestions here were taking a very long time to execute. I came up with this hacky method that was much faster. The downside is, if the max(date) row has a duplicate date for that user, it will return both of them.
SELECT * FROM mb_web.devices_log WHERE CONCAT(dtime, '-', user_id) in (
SELECT concat(max(dtime), '-', user_id) FROM mb_web.devices_log GROUP BY user_id
)
select result from (
select vorsteuerid as result, count(*) as anzahl from kreditorenrechnung where kundeid = 7148
group by vorsteuerid
) a order by anzahl desc limit 0,1
I have done same thing like below
SELECT t1.*
FROM lms_attendance t1
WHERE t1.id in (SELECT max(t2.id) as id
FROM lms_attendance t2
group BY t2.user)
This will also reduce memory utilization.
Thanks.
Possibly you can do group by user and then order by time desc. Something like as below
SELECT * FROM lms_attendance group by user order by time desc;
Try this query:
select id,user, max(time), io
FROM lms_attendance group by user;
This worked for me:
SELECT user, time FROM
(
SELECT user, time FROM lms_attendance --where clause
) AS T
WHERE (SELECT COUNT(0) FROM table WHERE user = T.user AND time > T.time) = 0
ORDER BY user ASC, time DESC

Finding a users maximum score and the associated details

I have a table in which users store scores and other information about said score (for example notes on score, or time taken etc). I want a mysql query that finds each users personal best score and it's associated notes and time etc.
What I have tried to use is something like this:
SELECT *, MAX(score) FROM table GROUP BY (user)
The problem with this is that whilst you can extra the users personal best from that query [MAX(score)], the returned notes and times etc are not associated with the maximum score, but a different score (specifically the one contained in *). Is there a way I can write a query that selects what I want? Or will I have to do it manually in PhP?
I'm assuming that you only want one result per player, even if they have scored the same maximum score more than once. I am also assuming that you want each player's first time that they got their personal best in the case that there are repeats.
There's a few ways of doing this. Here's a way that is MySQL specific:
SELECT user, scoredate, score, notes FROM (
SELECT *, #prev <> user AS is_best, #prev := user
FROM table1, (SELECT #prev := -1) AS vars
ORDER BY user, score DESC, scoredate
) AS T1
WHERE is_best
Here's a more general way that uses ordinary SQL:
SELECT T3.* FROM table1 AS T3
JOIN (
SELECT T1.user, T1.score, MIN(scoredate) AS scoredate
FROM table1 AS T1
JOIN (SELECT user, MAX(score) AS score FROM table1 GROUP BY user) AS T2
ON T1.user = T2.user AND T1.score = T2.score
GROUP BY T1.user
) AS T4
ON T3.user = T4.user AND T3.score = T4.score AND T3.scoredate = T4.scoredate
Result:
1, '2010-01-01 17:00:00', 50, 'Much better'
2, '2010-01-01 14:00:00', 100, 'Perfect score'
Test data I used to test this:
CREATE TABLE table1 (user INT NOT NULL, scoredate DATETIME NOT NULL, score INT NOT NULL, notes NVARCHAR(100) NOT NULL);
INSERT INTO table1 (user, scoredate, score, notes) VALUES
(1, '2010-01-01 12:00:00', 10, 'First attempt'),
(1, '2010-01-01 17:00:00', 50, 'Much better'),
(1, '2010-01-01 22:00:00', 30, 'Time for bed'),
(2, '2010-01-01 14:00:00', 100, 'Perfect score'),
(2, '2010-01-01 16:00:00', 100, 'This is too easy');
You can join with a sub query, as in the following example:
SELECT t.*,
sub_t.max_score
FROM table t
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The above query can be explained as follows. It starts with:
SELECT t.* FROM table t;
... This by itself will obviously list all the contents of the table. The goal is to keep only the rows that represent a maximum score of a particular user. Therefore if we had the data below:
+------------------------+
| user | score | notes |
+------+-------+---------+
| 1 | 10 | note a |
| 1 | 15 | note b |
| 1 | 20 | note c |
| 2 | 8 | note d |
| 2 | 12 | note e |
| 2 | 5 | note f |
+------+-------+---------+
...We would have wanted to keep just the "note c" and "note e" rows.
To find the rows that we want to keep, we can simply use:
SELECT MAX(score), user FROM table GROUP BY user;
Note that we cannot get the notes attribute from the above query, because as you had already noticed, you would not get the expected results for fields not aggregated with an aggregate function, like MAX() or not part of the GROUP BY clause. For further reading on this topic, you may want to check:
Debunking GROUP BY Myths
How does MySQL decide which id to return in group by clause?
Why does MySql allow “group by” queries WITHOUT aggregate functions?
Now we only need to keep the rows from the first query that match the second query. We can do this with an INNER JOIN:
...
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The sub query is given the name sub_t. It is the set of all the users with the personal best score. The ON clause of the JOIN applies the restriction to the relevant fields. Remember that we only want to keep rows that are part of this subquery.
SELECT *
FROM table t
ORDER BY t.score DESC
GROUP BY t.user
LIMIT 1
Side note: It is better to specify the fields than use SELECT *