use mysql SUM() in a WHERE clause - mysql

suppose I have this table
id | cash
1 200
2 301
3 101
4 700
and I want to return the first row in which the sum of all the previous cash is greater than a certain value:
So for instance, if I want to return the first row in which the sum of all the previous cash is greater than 500, is should return to row 3
How do I do this using mysql statement?
using WHERE SUM(cash) > 500
doesn't work

You can only use aggregates for comparison in the HAVING clause:
GROUP BY ...
HAVING SUM(cash) > 500
The HAVING clause requires you to define a GROUP BY clause.
To get the first row where the sum of all the previous cash is greater than a certain value, use:
SELECT y.id, y.cash
FROM (SELECT t.id,
t.cash,
(SELECT SUM(x.cash)
FROM TABLE x
WHERE x.id <= t.id) AS running_total
FROM TABLE t
ORDER BY t.id) y
WHERE y.running_total > 500
ORDER BY y.id
LIMIT 1
Because the aggregate function occurs in a subquery, the column alias for it can be referenced in the WHERE clause.

Not tested, but I think this will be close?
SELECT m1.id
FROM mytable m1
INNER JOIN mytable m2 ON m1.id < m2.id
GROUP BY m1.id
HAVING SUM(m1.cash) > 500
ORDER BY m1.id
LIMIT 1,2
The idea is to SUM up all the previous rows, get only the ones where the sum of the previous rows is > 500, then skip one and return the next one.

In general, a condition in the WHERE clause of an SQL query can reference only a single row. The context of a WHERE clause is evaluated before any order has been defined by an ORDER BY clause, and there is no implicit order to an RDBMS table.
You can use a derived table to join each row to the group of rows with a lesser id value, and produce the sum of each sum group. Then test where the sum meets your criterion.
CREATE TABLE MyTable ( id INT PRIMARY KEY, cash INT );
INSERT INTO MyTable (id, cash) VALUES
(1, 200), (2, 301), (3, 101), (4, 700);
SELECT s.*
FROM (
SELECT t.id, SUM(prev.cash) AS cash_sum
FROM MyTable t JOIN MyTable prev ON (t.id > prev.id)
GROUP BY t.id) AS s
WHERE s.cash_sum >= 500
ORDER BY s.id
LIMIT 1;
Output:
+----+----------+
| id | cash_sum |
+----+----------+
| 3 | 501 |
+----+----------+

When using aggregate functions to filter, you must use a HAVING statement.
SELECT *
FROM tblMoney
HAVING Sum(CASH) > 500

Related

SQL select maximum number of duplicates value in a column

Here I have this table:
Copies
nInv | Subject | LoanDate | BookCode |MemberCode|
1 |Storia |15/04/2019 00:00:00 |7844455544| 1 |
2 |Geografia |12/09/2020 00:00:00 |8004554785| 4 |
4 |Francese |17/05/2006 00:00:00 |8004894886| 3 |
5 |Matematica |17/06/2014 00:00:00 |8004575185| 3 |
I'm trying to find the value of the highest number of duplicates in the MemberCode column. So in this case I should get 3 as result, as its value appears two times in the table. Also, MemberCode is PK in another table, so ideally I should select all rows of the second table that match the MemberCode in both tables. For the second part I guess I should write something like SELECT * FROM Table2, Copies WHERE Copies.MemberCode = Table2.MemberCode but I'm missing out almost everything on the first part. Can you guys help me?
Use group by and limit:
select membercode, count(*) as num
from t
group by membercode
order by count(*) desc
limit 1;
SELECT MAX(counted) FROM
(SELECT COUNT(MemberCode) AS counted
FROM table_name GROUP BY MemberCode)
Using analytic functions, we can assign a rank to each member code based on its count. Then, we can figure out what its count is.
WITH cte AS (
SELECT t2.MemberCode, COUNT(*) AS cnt,
RANK() OVER (ORDER BY COUNT(*) DESC, t2.MemberCode) rnk
FROM Table2 t2
INNER JOIN Copies c ON c.MemberCode = t2.MemberCode
GROUP BY t2.MemberCode
)
SELECT cnt
FROM cte
WHERE rnk = 1;
Something like this
with top_dupe_member_cte as (
select top(1) MemberCode, Count(*)
from MemberTable
group by MemberCode
order by 2 desc)
select /* columns from your other table */
from OtherTable ot
join top_dupe_member_cte dmc on ot.MemberCode=dmc.MemberCode;

How do I find duplicate values across multiple columns in Mysql?

I have a table like this
I want to check the all rows in Column A with column B and get the count of duplicates.
For example, I want to get the
count of 12 as 3(2 times in A+1 time in B)
count of 11 as 2(2 times in A+0 time in B)
count of 13 as 2(1 time in A+0 time in B)
How can I acheive it?
You can calculate the total occurrences from a union all. A where clause can show only the values that occur in the A column:
select nr
, count(*)
from (
select A as nr
from YourTable
union all
select B
from YourTable
) sub
where nr in -- only values that occur at least once in the A column
(
select A
from YourTable
)
group by
nr
having count(*) > 1 -- show only duplicates
You can combine all values in A and B then do the group by.
Then only select those values found in column A.
Select A, count(A) as cnt
From (
Select A
from yourTable
Union All
Select B
from yourTable) t
Where t.A in
(select distinct A from yourTable)
Group by t.A
Order by t.A;
Result:
A cnt
11 2
12 3
13 1
See demo: http://sqlfiddle.com/#!9/9fcfe9/3

MySQL max value in row

I am facing a problem with MySQL query which is a variant of "Id for row with max value". I am either getting error or incorrect result for all my trials.
Here is the table structure
Row_id
Group_id
Grp_col1
Grp_col2
Field_for_aggregate_func
Another_field_for_row
For all rows with a particular group_id, I want to group by fields Grp_col1, Grp_col2 then get max value of Field_for_aggregate_func and then corresponding value of Another_field_for_row.
Query I have tried is like below
SELECT c.*
FROM mytable as c left outer join mytable as c1
on (
c.group_id=c1.group_id and
c.Grp_col1 = c1.Grp_col1 and
c.Grp_col2 = c1.Grp_col2 and
c.Field_for_aggregate_func > c1.Field_for_aggregate_func
)
where c.group_id=2
Among alternative solutions for this problem I want a high performance solution as this will be used for large set of data.
EDIT: Here is the sample set of row and expected answer
Group_ID Grp_col1 Grp_col2 Field_for_aggregate_func Another_field_for_row
2 -- N 12/31/2015 35
2 -- N 1/31/2016 15 select 15 from group for max value 1/31/2016
2 -- Y 12/31/2015 5
2 -- Y 1/1/2016 15
2 -- Y 1/2/2016 25
2 -- Y 1/3/2016 30 select 30 from group for max value 1/3/2016
You can use a sub-query to find the maximums, then join that with the original table, along the lines of:
select m1.group_id, m1.grp_col1, m1.grp_col2, m1.another_field_for_row, max_value
from mytable m1, (
select group_id, grp_col1, grp_col2, max(field_for_aggregate_func) as max_value
from mytable
group by group_id, grp_col1, grp_col2) as m2
where m1.group_id=m2.group_id
and m1.grp_col1=m2.grp_col1
and m1.grp_col2=m2.grp_col2
and m1.field_for_aggregate_func=m2.max_value;
Watch out for when there is more than one max_value for the given grouping. You'll get multiple rows for that grouping. Fiddle here.
Try this.
See Fiddle demo here
http://sqlfiddle.com/#!9/9a3c26/8
Select t1.* from table1 t1 inner join
(
Select a.group_id,a.grp_col2,
A.Field_for_aggregate_func,
count(*) as rnum from table1 a
Inner join table1 b
On a.group_id=b.group_id
And a.grp_col2=b.grp_col2
And a.Field_for_aggregate_func
<=b.Field_for_aggregate_func
Group by a.group_id,
a.grp_col2,
a.Field_for_aggregate_func) t2
On t1.group_id=t2.group_id
And t1.grp_col2=t2.grp_col2
And t1.Field_for_aggregate_func
=t2.Field_for_aggregate_func
And t2.rnum=1
Here first I am assigning a rownumber in descending order based on date. The selecting all the records for that date.

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?

Get all the rows for the most recent 3 groups

I googled a bit and looked on SO but I didn't find anything that helped me.
I have a working MySQL query that selects some columns (accross three tables, with two JOIN statements) and I am looking to do something extra on the result set.
I would like to SELECT all rows from the 3 most recent groups. (I can only assume I have to use a GROUP BY on that column) I'm having a hard time explaining this clearly so I'll use an example:
id | group
--------------
1 | 1
2 | 2
3 | 2
4 | 2
5 | 3
6 | 3
7 | 4
8 | 4
Of course, I dumbed it down a lot for the sake of simplicity (and my current query doesn't include an id column).
Right now my ideal query would return, in order (that's the id field):
8, 7, 6, 5, 4, 3, 2
If I were to add the following 9th element:
id | group
--------------
9 | 5
My ideal query would then return, in order:
9, 8, 7, 6, 5
Because these are all the rows from the most 3 recent groups. Also, when two rows have the same group (and are still in the results set), I would like to ORDER them BY another field (which I have not included in my dumbed down example).
In my search I only found how to do actions on elements of GROUPS (MAX of each, AVG of group elements, etc.) and not GROUPS themselves (first 3 groups ordered by a field).
Thank you in advance for your help!
Edit: Here is what my real query looks like.
SELECT t1.f1, t1.f2, t2.f1, t2.f2, t2.f3, t3.f1, t3.f2, t3.f3, t3.f4
FROM t1
LEFT JOIN t2 ON t2.f1=t1.f3
LEFT JOIN t3 ON t2.f1=t3.f5
WHERE t1.f4='some_constant' AND t2.f4='some_other_constant'
ORDER BY t1.f2 DESC
SELECT `table`.* FROM
(SELECT DISTINCT `group`
FROM `table`
ORDER BY `group` DESC LIMIT 3) t1
INNER JOIN `table` ON `table`.`group` = t1.`group`
the subquery should return the three groups with the largest value, the INNER JOIN will ensure no rows are included which do not have these group values.
assuming t1.f2 is your group column:
SELECT a,b,c,d,e,f,g,h,i
FROM
(
SELECT t1.f1 as a, t1.f2 as b, t2.f1 as c, t2.f2 as d, t2.f3 as e, t3.f1 as f, t3.f2 as g, t3.f3 as h, t3.f4 as i
FROM t1
LEFT JOIN t2 ON t2.f1=t1.f3
LEFT JOIN t3 ON t2.f1=t3.f5
WHERE t1.f4='some_constant' AND t2.f4='some_other_constant'
ORDER BY t1.f2 DESC
) first_table
INNER JOIN
(
SELECT DISTINCT `f2`
FROM `t1`
ORDER BY `f2` DESC LIMIT 3
) second_table
ON first_table.b = second_table.f2
Note that this may be very inefficient depending on your table structure, but is the best I can do without more information.
how about this way... (i use groupId instead of 'group'
[QUERY] => something like (SELECT id, groupId from tables.....) (your query with 2 joins).
-- with this query you have the last thre groups.
[QUERY2] => SELECT distinct(groupId) as groupId FROM ([QUERY]) ORDER BY groupId DESC LIMIT 0,3
and finally you will have:
SELECT id, groupId from tables----...... WHERE groupId in ([QUERY2]) order by groupId DESC, id DESC