MySQL Conditions from Multiple Rows - mysql

I have been trying to calculate a table using a SELECT statement. I have a table like this:
--------------------------------------------------------------
AgentID | Date | Incurred | FallOffDate
==============================================================
kegomez | 2012-11-19 | 2.0 | 2013-11-19
kegomez | 2012-11-24 | 0.5 | 2013-11-24
kegomez | 2013-01-21 | 2.0 | 2014-01-21
kegomez | 2013-08-18 | 2.0 | 2014-08-18
I was trying to do the calculations on during the select and possibly create a view but have no luck so far. In the end the table will look like this.
--------------------------------------------------------------
AgentID | Date | Incurred | 90 | 180 | Total | FallOffDate
==============================================================
kegomez | 2012-11-19 | 2.0 | 2.0 | 2.0 | 2.0 | 2013-11-19
kegomez | 2012-11-24 | 0.5 | 0.5 | 0.5 | 2.5 | 2013-11-24
kegomez | 2013-01-21 | 2.0 | 1.0 | 0.0 | 2.5 | 2014-01-21
kegomez | 2013-08-18 | 2.0 | 2.0 | 2.0 | 4.5 | 2014-08-18
The total column uses values from the previous row to calculate its values. For example the date in row 4 will need to reference the date in row 3 to see if the date is greater. Would I need to try this with a subquery? How this will eventually work is that every 90 days up to 180 days the agent will loose 1 point if no more are incurred. Thus the reason why I need to reference other rows. If it helps this data is currently in Excel but is getting too large to manage and we need to move it over to something that performs better.
SELECT AgentID, Date, Incurred,
#90 := IF(Date<=CURDATE()-90 AND #r=0, Incurred-1.0, IF(Difference>90, Incurred-1, Incurred)) AS 90Day,
#180 := IF(Date<=CURDATE()-90 AND #r=0, Incurred-1.0, IF(Difference>180, Incurred-2, #90)) AS 180Day,
#Total := IF(#180<0,0,IF(FallOffDate<=CURDATE(),0, #180)) AS Total,
FallOffDate
FROM (SELECT mo.AgentID, mo.Incurred, FallOffDate,
#r AS LEAD_date,
DATEDIFF(#r,Date) AS Difference,
(#r := Date) AS Date
FROM (
SELECT m.*
FROM (
SELECT #_date = NULL
) VARIABLE,
attendance m
ORDER BY
AgentID, Date DESC
) mo
WHERE (CASE WHEN #_date IS NULL OR #_date <> date THEN #r := NULL ELSE NULL END IS NULL)
AND (#_date := date) IS NOT NULL) T
ORDER BY AgentID, Date;

Yes, you need to do this with a subquery, try with something like this (if you want the 4th row to access the date in 3rd row):
SELECT mo.AgentID, mo.date,
#r AS 'LAG(date)',
(case when #r<Date then 'YES' when #r is null then 'IS NULL' else 'NO' end) 'Is Bigger',
(#r := Date) AS Date
FROM (
SELECT m.*
FROM (
SELECT #_date = NULL
) variable,
data m
ORDER BY
AgentID
) mo
WHERE (CASE WHEN #_date IS NULL OR #_date <> date THEN #r := NULL ELSE NULL END IS NULL)
AND (#_date := date) IS NOT NULL
You can see a working demo here
Or you can try this query if you want that 3rd row has access to date in 4th row
SELECT AgentID,date,LEAD_date,concat(Difference,' days') FROM
(SELECT mo.AgentID,
#r AS LEAD_date,
DATEDIFF(#r,Date) as Difference,
(#r := Date) AS Date
FROM (
SELECT m.*
FROM (
SELECT #_date = NULL
) variable,
data m
ORDER BY
AgentID,date desc
) mo
WHERE (CASE WHEN #_date IS NULL OR #_date <> date THEN #r := NULL ELSE NULL END IS NULL)
AND (#_date := date) IS NOT NULL) T
order by AgentID,date;
You can see a working demo here

Related

mysql avg length of a date squence

I have a report i'm trying to figure out, but I would like to do it all with in a SQL statement instead of needing to iterate over a bunch of data in script to do it.
I have a table that is structured like:
CREATE TABLE `batch_item` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`record_id` int(11) DEFAULT NULL,
`created` DATE NOT NULL,
PRIMARY KEY (`id`),
KEY `record_id` (`record_id`)
);
The Date field is always YEAR-MONTH-01. Data looks something like:
+------+-----------+------------+
| id | record_id | created |
+------+-----------+------------+
| 1 | 1 | 2019-01-01 |
| 2 | 2 | 2019-01-01 |
| 3 | 3 | 2019-01-01 |
| 4 | 1 | 2019-02-01 |
| 5 | 2 | 2019-02-01 |
| 6 | 1 | 2019-03-01 |
| 7 | 3 | 2019-03-01 |
| 8 | 1 | 2019-04-01 |
| 9 | 2 | 2019-04-01 |
+------+-----------+------------+
So what I'm trying to do, with out having to create a looping script, is find the AVG number of sequential months for each record. Example with the data above would be:
Record_id 1 would have a avg of 4 months.
Record_id 2 would be 1.5
Record_id 3 would be 1
I can write a script to iterate through all the records. I just would rather avoid that.
This is a gaps-and-islands problem. You simply need an enumeration of the rows for this to work. In MySQL 8+, you would use row_number() but you can use a global enumeration here:
select record_id, min(created) as min_created, max(created) as max_created, count(*) as num_months
from (select bi.*, (created - interval n month) as grp
from (select bi.*, (#rn := #rn + 1) as n -- generate some numbers
from batch_item bi cross join
(select #rn := 0) params
order by bi.record_id, bi.month
) bi
) bi
group by record_id, grp;
Note that when using row_number(), you would normally partition by record_id. However that is not necessary, if the numbers are created in the correct sequence.
The above query gets the islands. For your final results, you need one more level of aggregation:
select record_id, avg(num_months)
from (select record_id, min(created) as min_created, max(created) as max_created, count(*) as num_months
from (select bi.*, (created - interval n month) as grp
from (select bi.*, (#rn := #rn + 1) as n -- generate some numbers
from batch_item bi cross join
(select #rn := 0) params
order by bi.record_id, bi.month
) bi
) bi
group by record_id, grp
) bi
group by record_id;
This is not a tested solution. It should work in MySQL 8.x with minor tweaks, since I don't remember date arithmetic in MySQL:
with
a as ( -- the last row of each island
select *
from batch_item
where lead(created) over(partition by record_id order by created) is null
or lead(created) over(partition by record_id order by created)
> created + 1 month -- Fix the date arithmetic here!
),
e as ( -- each row, now with the last row of its island
select b.id, b.record_id, min(a.last_created) as end_created
from batch_item b
join a on b.record_id = a.record_id and b.created <= a.created
group by b.id, b.record_id
),
m as ( -- each island with the number of months it has
select
record_id, end_created, count(*) as months
from e
group by record_id, end_created
)
select -- the average length of islands for each record_id
record_id, avg(months) as avg_months
from m
group by record_id

MySQL: how to assign same ID for records with close timestamp

I have a MySQL table with timestamp column t. I need to create another integer column (groupId) which will have the same value for records with timestamp with
less then 3 sec difference. My version of MySQL has no window function support. This is the expected output in 2nd column:
+---------------------+--------+
| t | groupId|
+---------------------+--------+
| 2017-06-17 18:15:13 | 1 |
| 2017-06-17 18:15:14 | 1 |
| 2017-06-17 20:30:06 | 2 |
| 2017-06-17 20:30:07 | 2 |
| 2017-06-17 22:44:58 | 3 |
| 2017-06-17 22:44:59 | 3 |
| 2017-06-17 23:59:50 | 4 |
| 2017-06-17 23:59:51 | 4 |
I tried to use self-join and TIMESTAMPDIFF(SECOND,t1,t2) <3
but I do not know how to generate the unique groupId.
P.S.
It is guaranteed by the nature of data what there is no continues range which spans > 3 sec
You can do this using variables.
select tm
,#diff:=timestampdiff(second,#prev,tm)
,#prev:=tm
,#grp:=case when #diff<3 or #diff is null then #grp else #grp+1 end as groupID
from t
cross join (select #prev:='',#diff:=0,#grp:=1) r
order by tm
For this, I believe that you need to create a stored procedure that first sort your table by the column t (timestamp) and then goes through it grouping and assigning the groupId accordingly.... in this case you can use your own counter as groupID.
What it is important here, is how you split the time into frames of 2 seconds, you could end with different results depending of your point of reference...
This query puts every record in the same group when the previous record is just 3 seconds before:
UPDATE t
JOIN (
SELECT
t.*
, #gid := IF(TIMESTAMPDIFF(SECOND, #prev, t) > 3, #gid + 1, #gid) AS gid
, #prev := t
FROM t
, (SELECT #prev := NULL, #gid := 1) v
ORDER BY t
) sq ON t.t = sq.t
SET t.groupId = sq.gid;
see it working live in an sqlfiddle
learn more about user-defined variables here
This query will work in Oracle sql:
select *
from (
select e.*,
rank() over (partition by trunc(hiredate,'mi') order by trunc(hiredate,'mi') desc) MINu
from emp e
)

Same Query returns different results (MySQL Group By)

This only happens for queries that force GROUP BY after ORDER BY.
Goal:
Get latest balance for each unit for the given cardID.
Table:
cardID | unit | balance | date
--------|-----------|-----------|--------------
A1 | DEPOSIT | 100 | 2016-05-01
A1 | DEPOSIT | 90 | 2016-05-02
A1 | DEPOSIT | 80 | 2016-05-03
A1 | DEPOSIT | 75 | 2016-05-04
A1 | MINUTE | 1000 | 2016-05-01
A1 | MINUTE | 900 | 2016-05-02
A1 | MINUTE | 800 | 2016-05-03
Query:
SELECT * FROM (
SELECT unit, balance
FROM cardBalances
WHERE cardID = 'A1'
ORDER BY date DESC
) AS cb
GROUP BY cb.unit;
Expected Result (MySQL v5.5.38):
unit | balance
---------|-----------
DEPOSIT | 75
MINUTE | 800
Unexpected Result (MySQL v5.7.13):
unit | balance
---------|-----------
DEPOSIT | 100
MINUTE | 1000
After upgrading to MySQL v5.7.13, the result returns the initial balances; as if no deduction occurred for the given card.
Is this a bug in MySQL version?
Would you suggest any other, more reliable way to solve this?
This is a bug in your use of the database. MySQL is quite explicit that when you include columns in the SELECT clause in an aggregation query -- and they are not in the GROUP BY -- then they come from indeterminate rows.
Such syntax is specific to MySQL. It is not only a bad idea to learn, but it simply normally not work in other databases.
You can do what you want in various ways. Here is one:
SELECT cb.*
FROM cardBalances cb
WHERE cardId = 'A1' AND
cb.date = (SELECT MAX(date)
FROM cardBalances cb2
WHERE cb2.cardId = 'A1' AND cb2.unit = cb.unit
);
This has the advantage that it can use an index on cardBalances(unit, CardId, date).
Just an other perspective by adding a row number based on the cardId, unit and descending order of date.
Query
select t1.unit, t1.balance from
(
select cardId, unit, balance, `date`,
(
case unit when #curA
then #curRow := #curRow + 1
else #curRow := 1 and #curA := unit end
) + 1 as num
from cardBalances t,
(select #curRow := 0, #curA := '') r
order by cardId, unit, `date` desc
)t1
where t1.num = 1
order by t1.unit;
SQL Fiddle Demo

SQL: transform rows into columns in MySQL (SELECT statement)

I got table orders and order_comments. Each order can have from 0 to n comments. I would like to get list of all orders with their comments in a sepcific order.
Table orders:
order_id | order_nr
1 | 5252
4 | 6783
5 | 6785
Table order_comments
id_order_comments | order_fk | created_at | email | content
1 | 4 | 2015-01-12 | jack | some text here
2 | 5 | 2015-01-13 | marta | some text here
3 | 5 | 2015-01-14 | beata | some text here
4 | 4 | 2015-01-16 | julia | some text here
As a result, I would like to get 1 row for each order. Comments should be shown in separate columns, starting from the oldest comment. So desired output in this case is:
order_id | 1_comment_created_at | 1_comment_author | 1_comment_content | 2_comment_created_at | 2_comment_author | 2_comment_content
1 | NULL | NULL | NULL | NULL | NULL | NULL
4 | 2015-01-12 | jack | some text here | 2015-01-16 | Julia | some text here
5 | 2015-01-13 | marta | some text here | 2015-01-14 | beata | some text here
I found this: MySQL - Rows to Columns - but I cannot use 'create view'.
I found this: http://dev.mysql.com/doc/refman/5.5/en/while.html - but I cannot create procedure in this db.
What I got:
SELECT #c := (SELECT count(*) FROM order_comments GROUP BY order_fk ORDER BY count(*) DESC LIMIT 1);
SET #rank=0;
SET #test=0;
SELECT
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.created_at END AS created,
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.author END AS author,
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.content END AS content
/*But I cannot set #test as +1. And I cannot name column with variable - like CONCAT(#test, '_created')*/
FROM (
SELECT #rank := #rank +1 AS comment_id, created_at, author, content
FROM order_comments
WHERE order_fk = 4
ORDER BY created_at
) AS temp
Problem: I would like to search more than 1 order. I should get orders with no comments too.
What can I do?
You can use variables for this type of pivot, but the query is a bit more complicated, because you need to enumerate the values for each order:
SELECT o.order_id,
MAX(case when rank = 1 then created_at end) as created_at_1,
MAX(case when rank = 1 then email end) as email_1,
MAX(case when rank = 1 then content end) as content_1,
MAX(case when rank = 2 then created_at end) as created_at_2,
MAX(case when rank = 2 then email end) as email_2,
MAX(case when rank = 2 then content end) as content_2,
FROM orders o LEFT JOIN
(SELECT oc.*,
(#rn := if(#o = order_fk, #rn + 1,
if(#o := order_fk, 1, 1)
)
) as rank
FROM order_comments oc CROSS JOIN
(SELECT #rn := 0, #o := 0) vars
ORDER BY order_fk, created_at
) oc
ON o.order_id = oc.order_fk
GROUP BY o.order_id;

Null Values during Query

I have the below table, pretty simple.
==========================================================================
attendanceID | agentID | incurredDate | points | comment
==========================================================================
10 | vimunson | 2013-07-22 | 2 | Some Text
11 | vimunson | 2013-07-29 | 2 | Some Text
12 | vimunson | 2013-12-06 | 1 | Some Text
The his query below:
SELECT
attendanceID,
agentID,
incurredDate,
leadDate,
points,
#1F:=IF(incurredDate <= curdate() - 90
AND leadDate = NULL,
points - 1,
IF(DATEDIFF(leadDate, incurredDate) > 90,
points - 1,
points)) AS '1stFallOff',
#2F:=IF(incurredDate <= curdate() - 180
AND leadDate = NULL,
points - 2,
IF(DATEDIFF(leadDate, incurredDate) > 180,
points - 2,
#1F)) AS '2ndFallOff',
IF(#total < 0, 0, #total:=#total + #2F) AS Total,
comment,
linked
FROM
(SELECT
attendanceID,
mo.agentID,
#r AS leadDate,
(#r:=incurredDate) AS incurredDate,
comment,
points,
linked
FROM
(SELECT
m . *
FROM
(SELECT #_date = NULL, #total:=0) varaible, attendance m
ORDER by agentID , incurredDate desc) mo
where
agentID = 'vimunson'
AND (case
WHEN #_date is NULL or #_date <> incurredDate THEN #r:=NULL
ELSE NULL
END IS NULL)
AND (#_date:=incurredDate) IS NOT NULL) T
ORDER BY agentID , incurredDate
When I run the query it returns the below:
========================================================================================================================================
attendanceID | agentID | incurredDate | leadDate | points | 1stFallOff | 2ndFallOff | Total | comment
========================================================================================================================================
10 | vimunson | 2013-07-22 | NULL | 2 | 2 | 2 | 2 | Some Text
11 | vimunson | 2013-07-29 | NULL | 2 | 2 | 2 | 4 | Some Text
12 | vimunson | 2013-12-06 | NULL | 1 | 2 | 1 | 5 | Some Text
I cannot figure out why the leadDate column is `null'. I have narrowed it down to a user session. For example if I run it again with the same user session it will come back with what I want.
The way variables #r and #_date are passed around relies on a specific order in which certain parts of the query are evaluated. That's a risky assumption to make in a query language that is declarative rather than imperative. The more sophisticated a query optimizer is, the more unpredictable the behaviour of this query will be. A 'simple' engine might follow your intentions, another engine might adapt its behaviour as you go, for example because it uses temporary indexes to improve query performance.
In situations where you need to pass values from one row to another, it would be better to use a cursor.
http://dev.mysql.com/doc/refman/5.0/en/cursors.html
EDIT: sample code below.
I focused on column 'leadDate'; implementation of the falloff and total columns should be similar.
CREATE PROCEDURE MyProc()
BEGIN
DECLARE done int DEFAULT FALSE;
DECLARE currentAttendanceID int;
DECLARE currentAgentID, previousAgentID varchar(8);
DECLARE currentIncurredDate date;
DECLARE currentLeadDate date;
DECLARE currentPoints int;
DECLARE currentComment varchar(9);
DECLARE myCursor CURSOR FOR
SELECT attendanceID, agentID, incurredDate, points, comment
FROM attendance
ORDER BY agentID, incurredDate DESC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
CREATE TEMPORARY TABLE myTemp (
attendanceID int,
agentID varchar(8),
incurredDate date,
leadDate date,
points int,
comment varchar(9)
) ENGINE=MEMORY;
OPEN myCursor;
SET previousAgentID := NULL;
read_loop: LOOP
FETCH myCursor INTO currentAttendanceID, currentAgentID, currentIncurredDate, currentPoints, currentComment;
IF done THEN
LEAVE read_loop;
END IF;
IF previousAgentID IS NULL OR previousAgentID <> currentAgentID THEN
SET currentLeadDate := NULL;
SET previousAgentID := currentAgentID;
END IF;
INSERT INTO myTemp VALUES (currentAttendanceID, currentAgentID, currentIncurredDate, currentLeadDate, currentPoints, currentComment);
SET currentLeadDate := currentIncurredDate;
END LOOP;
CLOSE myCursor;
SELECT *
FROM myTemp
ORDER BY agentID, incurredDate;
DROP TABLE myTemp;
END
FYC: http://sqlfiddle.com/#!2/910a3/1/0
Try this:
SELECT a.attendanceID, a.agentID, a.incurredDate, a.leadDate, a.points,
#1F:= (CASE WHEN incurredDate <= CURDATE()-90 AND leadDate=NULL THEN points-1
WHEN DATEDIFF(leadDate, incurredDate) > 90 THEN points-1
ELSE points
END) AS '1stFallOff',
#2F:= (CASE WHEN incurredDate <= CURDATE()-180 AND leadDate=NULL THEN points-2
WHEN DATEDIFF(leadDate, incurredDate) > 180 THEN points-2
ELSE #1F
END) AS '2ndFallOff',
(#Total:=#Total + a.points) AS Total, a.comment
FROM (SELECT a.attendanceID, a.agentID, a.incurredDate, b.incurredDate leadDate, a.points, a.comment
FROM attendance a
LEFT JOIN attendance b ON a.agentID = b.agentID AND a.incurredDate < b.incurredDate
GROUP BY a.agentID, a.incurredDate
) AS A, (SELECT #Total:=0, #1F:=0, #2F:=0) B;
Check the SQL FIDDLE DEMO
OUTPUT
| ATTENDANCEID | AGENTID | INCURREDDATE | LEADDATE | POINTS | 1STFALLOFF | 2NDFALLOFF | TOTAL | COMMENT |
|--------------|----------|---------------------------------|---------------------------------|--------|------------|------------|-------|-----------|
| 10 | vimunson | July, 22 2013 00:00:00+0000 | July, 29 2013 00:00:00+0000 | 2 | 2 | 2 | 2 | Some Text |
| 11 | vimunson | July, 29 2013 00:00:00+0000 | December, 06 2013 00:00:00+0000 | 2 | 1 | 1 | 4 | Some Text |
| 12 | vimunson | December, 06 2013 00:00:00+0000 | (null) | 1 | 1 | 1 | 5 | Some Text |
After reviewing multiple results I was able to come up with something that is what I expect. I have entered more data and the below syntax. I liked the idea of the cursor but it was not ideal for my use, so I did not use it. I did not want to use CASE or any JOINS since they can be complex.
http://sqlfiddle.com/#!2/2fb86/1
SELECT
attendanceID,
agentID,
incurredDate,
#ld:=(select
incurredDate
from
attendance
where
incurredDate > a.incurredDate
and agentID = a.agentID
order by incurredDate
limit 1) leadDate,
points,
#1F:=IF(incurredDate <= DATE_SUB(curdate(),
INTERVAL IF(incurredDate < '2013-12-02', 90, 60) DAY)
AND #ld <=> NULL,
points - 1,
IF(DATEDIFF(COALESCE(#ld, '1900-01-01'),
incurredDate) > IF(incurredDate < '2013-12-02', 90, 60),
points - 1,
points)) AS '1stFallOff',
#2F:=IF(incurredDate <= DATE_SUB(curdate(),
INTERVAL IF(incurredDate < '2013-12-02',
180,
120) DAY)
AND getLeadDate(incurredDate, agentID) <=> NULL,
points - 1,
IF(DATEDIFF(COALESCE(#ld, '1900-01-01'),
incurredDate) > IF(incurredDate < '2013-12-02',
180,
120),
points - 2,
#1F)) AS '2ndFallOff',
IF((#total + #2F) < 0,
0,
IF(DATE_ADD(incurredDate, INTERVAL 365 DAY) <= CURDATE(),
#total:=0,
#total:=#total + #2F)) AS Total,
comment,
linked,
DATE_ADD(incurredDate, INTERVAL 365 DAY) AS 'fallOffDate'
FROM
(SELECT #total:=0) v,
attendance a
WHERE
agentID = 'vimunson'
GROUP BY agentID , incurredDate