Create a time series from from-to-entries - mysql

I have some period data that look like this:
PName DtFrom DtTo Amount
Period_1 2018-01-01 2018-01-10 100
Period_2 2018-01-03 2018-01-08 10
Period_3 2018-01-05 2018-01-12 1
I would like to get the following time series adding the amounts that are inside the date frame of each period:
2018-01-01 100
2018-01-02 100
2018-01-03 110
2018-01-04 110
2018-01-05 111
2018-01-06 111
2018-01-07 111
2018-01-08 111
2018-01-09 101
2018-01-10 101
2018-01-11 1
2018-01-12 1
I have done a lot of research using DATE_ADD, DATEDIFF and so on, also using other StackOverflow questions. But without success. Any idea?

Although I'd really advocate handling this in application code, here's a solution which uses a little integer utility table. You could use a calendar table instead, or construct a sequence of integers on-the-fly...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(Perid SERIAL PRIMARY KEY
,DtFrom DATE NOT NULL
,DtTo DATE NOT NULL
,Amount INT NOT NULL
);
INSERT INTO my_table VALUES
(1,'2018-01-01','2018-01-10',100),
(2,'2018-01-03','2018-01-08',10),
(3,'2018-01-05','2018-01-12',1);
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT dtfrom + INTERVAL y.i DAY dt
, SUM(x.amount) total
FROM my_table x
JOIN ints y
WHERE dtfrom + INTERVAL y.i DAY <= dtto
GROUP
BY dt;
+------------+-------+
| dt | total |
+------------+-------+
| 2018-01-01 | 100 |
| 2018-01-02 | 100 |
| 2018-01-03 | 110 |
| 2018-01-04 | 110 |
| 2018-01-05 | 111 |
| 2018-01-06 | 111 |
| 2018-01-07 | 111 |
| 2018-01-08 | 111 |
| 2018-01-09 | 101 |
| 2018-01-10 | 101 |
| 2018-01-11 | 1 |
| 2018-01-12 | 1 |
+------------+-------+
If the gap between dtfrom and dtto can be more than 10 days, you can expand the solution along these lines...
SELECT dtfrom + INTERVAL i2.i*10 + i1.i DAY dt
, SUM(x.amount) total
FROM my_table x
JOIN ints i1
JOIN ints i2
WHERE dtfrom + INTERVAL i2.i*10 + i1.i DAY <= dtto
GROUP
BY dt;

Try this:
SELECT A.`DATE`, B.Amount
FROM
(SELECT TO_DATE('2018-01-01','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-02','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-03','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-04','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-05','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-06','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-07','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-08','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-09','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-10','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-11','YYYY-MM-DD') `DATE` UNION ALL
SELECT TO_DATE('2018-01-12','YYYY-MM-DD') `DATE`
) A LEFT JOIN yourTable B ON A.`DATE` BETWEEN B.DtFrom AND b.DtTo;
That's the query if you must do it in sql, but it's easily getting the expected result in the application logic as #Strawberry indicated in his comment to your question.

Try this
select
sum(if(STR_TO_DATE(?, '%d-%m-%Y') >= so_period_date.dt_from and
STR_TO_DATE(?, '%d-%m-%Y') <= so_period_date.dt_to,
so_period_date.amount, 0)) as amount
from so_period_date

Related

How to find duplicate records in MySQL, but with a degree of variance?

Assume I have the following table structure and data:
+------------------+-------------------------+--------+
| transaction_date | transaction_description | amount |
+------------------+-------------------------+--------+
| 2020-08-20 | Burger King | 10.06 |
| 2020-08-23 | Burger King | 10.06 |
| 2020-08-29 | McDonalds | 6.48 |
| 2020-09-04 | Wendy's | 7.45 |
| 2020-09-05 | Dairy Queen | 14.36 |
| 2020-09-06 | Wendy's | 7.45 |
| 2020-09-13 | Burger King | 10.06 |
+------------------+-------------------------+--------+
I'd like to be able to find duplicate transactions where the description and amounts match, but the date would have some degree of variance +/- 3 days from each other.
Because the "Burger King" transactions are within three days of each other (2020-08-20 and 2020-08-23), they would be counted as duplicates, but the entry on 2020-09-13 would not be.
I have the following query so far, but the degree of variance piece is what's stumping me.
SELECT t.transaction_date, t.transaction_description, t.amount
FROM transactions t
JOIN (SELECT transaction_date, transaction_description, amount, COUNT(*)
FROM transactions
GROUP BY transaction_description, amount
HAVING count(*) > 1 ) b
ON t.transaction_description = b.transaction_description
AND t.amount = b.amount
ORDER BY t.amount ASC;
Ideally, I'd love for the output to be something along the lines of:
+------------------+-------------------------+--------+
| transaction_date | transaction_description | amount |
+------------------+-------------------------+--------+
| 2020-08-20 | Burger King | 10.06 |
| 2020-08-23 | Burger King | 10.06 |
| 2020-09-04 | Wendy's | 7.45 |
| 2020-09-06 | Wendy's | 7.45 |
+------------------+-------------------------+--------+
Am I way off? Or is this even possible? Thanks in advance.
You can use exists:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where
t1.transaction_description = t.transaction_description
and t1.transaction_date <> t.transaction_date
and t1.transaction_date >= t. transaction_date - interval 3 day
and t1.transaction_date <= t. transaction_date + interval 3 day
If you are running MySQL 8.0, a count within a window date range is a reasonable alternative:
select t.*
from (
select t.*,
count(*) over(
partition by transaction_description
order by transaction_date
range between interval 3 day preceding and interval 3 day following
) cnt
from mytable t
) t
where cnt > 1

Percentage growth month by month - MySQL 5.x

I have a table sales with some columns and data like this:
SELECT order_date, sale FROM sales;
+------------+------+
| order_date | sale |
+------------+------+
| 2020-01-01 | 20 |
| 2020-01-02 | 25 |
| 2020-01-03 | 15 |
| 2020-01-04 | 30 |
| 2020-02-05 | 20 |
| 2020-02-10 | 20 |
| 2020-02-06 | 25 |
| 2020-03-07 | 15 |
| 2020-03-08 | 30 |
| 2020-03-09 | 20 |
| 2020-03-10 | 40 |
| 2020-04-01 | 20 |
| 2020-04-02 | 25 |
| 2020-04-03 | 10 |
+------------+------+
and I would like to calculate, for example, monthly growth rate.
From the previous data example the expected result would be like this:
month sale growth_rate
1 90 0
2 65 -27.78
3 105 61.54
4 55 -47.62
We have an old MySQL version, 5.x.
could anyone help or give me some clues to achieve this?
It is a bit complicate:
select
s.*
-- calculate rate
, ifnull(round((s.mnt_sale - n.mnt_sale)/n.mnt_sale * 10000)/100, 0) as growth_rate
from (
-- calculate monthly summary
select month(order_date) mnt, sum(sale) mnt_sale
from sales
group by mnt
) s
left join ( -- join next month summary
-- calculate monthly summary one more time
select month(order_date) mnt, sum(sale) mnt_sale
from sales
group by mnt) n on n.mnt = s.mnt - 1
;
DB Fiddle
You can use aggregation and window functions. Something like his:
select year(order_date) as year, month(order_date) as month, sum(sale) as sale,
100 * (1 - sum(sale) / lag(sum(sale), 1, sum(sale)) over (order by min(order_date)) as growth_rate
from t
group by year, month
A little tricky for me, but I think the code below works as expected
SELECT month, sale,growth_rate
FROM(
SELECT month, sale,
IF(#last_entry = 0, 0, ROUND(((sale - #last_entry) / #last_entry) * 100,2)) AS growth_rate,
#last_entry := sale AS last_entry
FROM
(SELECT #last_entry := 0) x,
(SELECT month, sum(sale) sale
FROM (SELECT month(order_date) as month,sum(sale) as sale
FROM sales GROUP BY month(order_date)) monthly_sales
GROUP BY month) y) t;
expected result
+-------+------+-------------+
| month | sale | growth_rate |
+-------+------+-------------+
| 1 | 90 | 0.00 |
| 2 | 65 | -27.78 |
| 3 | 105 | 61.54 |
| 4 | 55 | -47.62 |
+-------+------+-------------+

MySQL query time periods and value of maximum drop vs preceding max value

How to write query that shows a date of maximum value drop vs its preceding max value (if there is a series of values that are lower than preceding max value and there are two or more values that are the lowest ones then the date of the first lowest value occurrence should be provided)
The query will be executed on real time data so for a particular date only values that date and all before are considered.
How to write a query that shows a date period between the end of series where a measured data was lower than its maximum value that preceded it?
This is an equivalent of a date period between last maximum data value and a following date of the value that is the same amount as previous maximum value or higher (whatever comes first).
The query will be executed on historical data so all rows before and after a considered row are available.
Please See Replicate at the end of the question to generate the test table and an example query.
I tried to use window functions to achieve these queries but I could't build them. I only managed to get difference between current data of measurement and its closest max value that preceded it.
the test data looks this way:
+---------------------+------+
| date_time | data |
+---------------------+------+
| 2017-01-02 00:00:00 | 2 |
| 2017-01-03 00:00:00 | 4 |
| 2017-01-04 00:00:00 | 1 |
| 2017-01-05 00:00:00 | 3 |
| 2017-01-06 00:00:00 | 1 |
| 2017-01-07 00:00:00 | 4 |
| 2017-01-08 00:00:00 | 5 |
| 2017-01-09 00:00:00 | -2 |
| 2017-01-10 00:00:00 | 0 |
| 2017-01-11 00:00:00 | -5 |
| 2017-01-12 00:00:00 | 6 |
| 2017-01-13 00:00:00 | 4 |
| 2017-01-14 00:00:00 | 6 |
+---------------------+------+
and this is the difference of a current data row vs prev max data I already have
+------------+------+----------+-----------+
| date | data | data_max | data_diff |
+------------+------+----------+-----------+
| 2017-01-02 | 2 | 2 | NULL |
| 2017-01-03 | 4 | 4 | NULL |
| 2017-01-04 | 1 | 4 | -3 |
| 2017-01-05 | 3 | 4 | -1 |
| 2017-01-06 | 1 | 4 | -3 |
| 2017-01-07 | 4 | 4 | NULL |
| 2017-01-08 | 5 | 5 | NULL |
| 2017-01-09 | -2 | 5 | -7 |
| 2017-01-10 | 0 | 5 | -5 |
| 2017-01-11 | -5 | 5 | -10 |
| 2017-01-12 | 6 | 6 | NULL |
| 2017-01-13 | 4 | 6 | -2 |
| 2017-01-14 | 6 | 6 | NULL |
+------------+------+----------+-----------+
This is wished result (Question 1):
+---------------+----------+
| diff_max_date | diff_max |
+---------------+----------+
| 2017-01-04 | -3 |
| 2017-01-09 | -7 |
| 2017-01-11 | -10 |
| 2017-01-13 | -2 |
+---------------+----------+
Please note that first entry -3 is for the date 2017-01-04 because this is the first lowest value after its preceding max value: 4 date: 2017-01-03 therefore value: -3 date: 2017-01-06 is ignored.
the query for question 1 works on live data that is inserted to the test table and because of that it is not looking forward for future entries. This is reason why there should be two lowest data entries value: -7 date: 2017-01-09 and value: -10 date: 2017-01-11 because at the date 2017-01-09 a value of -10 date: 2017-01-11 was unknown.
Wished result (Question 2)
+----------------+--------------+---------------+----------+
| diff_date_from | diff_date_to | diff_max_date | diff_max |
+----------------+--------------+---------------+----------+
| 2017-01-04 | 2017-01-06 | 2017-01-04 | -3 |
| 2017-01-09 | 2017-01-11 | 2017-01-11 | -10 |
| 2017-01-13 | 2017-01-13 | 2017-01-13 | -2 |
+----------------+--------------+---------------+----------+
Please note that second row is only value: -10 date: 2017-01-11 and value: -7 date: 2017-01-09 is ignored since it is not the lowest value and the query works on historical data so whole date range is available to it and not just a current date row with all preceding ones.
The queries do not need to be as a single query. It is possible to me to create dedicated tables for Q1 and for example use it to generate another table for Q2. Or add column of data from Q1 to test table and then generate table for Q2. But tried many times and failed.
Query (MySQL 8) to Replicate test data table and get calculated data_diff and data_max:
CREATE TABLE IF NOT EXISTS `test`
(
`date_time` DATETIME UNIQUE NOT NULL,
`data` INT NOT NULL
)
ENGINE InnoDB;
INSERT INTO `test` VALUES
('2017-01-02', 2),
('2017-01-03', 4),
('2017-01-04', 1),
('2017-01-05', 3),
('2017-01-06', 1),
('2017-01-07', 4),
('2017-01-08', 5),
('2017-01-09', -2),
('2017-01-10', 0),
('2017-01-11', -5),
('2017-01-12', 6),
('2017-01-13', 4),
('2017-01-14', 6)
;
SELECT
DATE(`date_time`) AS `date`,
`data`,
`data_max`,
IF(`data` < `data_max`, - (`data_max` - `data`), NULL)
AS `data_diff`
FROM
(
SELECT
`date_time`,
`data`,
MAX(`data`) OVER (ORDER BY `date_time` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `data_max`
FROM
`test`
) t
;
Perhaps you know at least how to get date ranges and can help me solving this problem by answering this question
I suspect they can be optimised somewhat but these queries should give you the results you want. They share the same first 3 CTEs which generate the diff_max value for each data_max. In the first query we just then look for a change in that value (from NULL to a value, or a decrease in the value) in order to generate the output rows. The second query's 4th and 5th CTEs are similar to the first query, but we add a RANK to the diff_max values, so we can JOIN the minimum value (with it's associated date) to the date_diff_from and date_diff_to values from the 6th CTE (which is the same as my answer to your other question).
Question 1:
WITH cte AS (SELECT DATE(`date_time`) AS `date`,
`data`,
MAX(`data`) OVER (ORDER BY `date_time`) AS `data_max`
FROM `test`),
cte2 AS (SELECT `date`,
`data`,
`data_max`,
CASE WHEN `data` < `data_max` THEN `data` - `data_max` END AS `data_diff`
FROM cte),
cte3 AS (SELECT `date`,
MIN(`data_diff`) OVER (PARTITION BY `data_max` ORDER BY `date`) AS `diff_max`
FROM cte2),
cte4 AS (SELECT `date`, `diff_max`, LAG(`diff_max`) OVER (ORDER BY `date`) AS `old_diff_max`
FROM cte3)
SELECT `date`, `diff_max`
FROM cte4
WHERE `diff_max` < `old_diff_max` OR `old_diff_max` IS NULL AND `diff_max` IS NOT NULL
Output:
date diff_max
2017-01-04 -3
2017-01-09 -7
2017-01-11 -10
2017-01-13 -2
Question 2:
WITH cte AS (SELECT DATE(`date_time`) AS `date`,
`data`,
MAX(`data`) OVER (ORDER BY `date_time`) AS `data_max`
FROM `test`),
cte2 AS (SELECT `date`,
`data`,
`data_max`,
CASE WHEN `data` < `data_max` THEN `data` - `data_max` END AS `data_diff`
FROM cte),
cte3 AS (SELECT `data_max`, `date`,
MIN(`data_diff`) OVER (PARTITION BY `data_max` ORDER BY date) AS `diff_max`
FROM cte2),
cte4 AS (SELECT `data_max`, `date`, `diff_max`,
LAG(`diff_max`) OVER (ORDER BY `date`) AS `old_diff_max`
FROM cte3),
cte5 AS (SELECT `date`, `diff_max`,
RANK() OVER (PARTITION BY `data_max` ORDER BY `diff_max`) AS `diff_rank`
FROM cte4
WHERE `diff_max` < `old_diff_max` OR `old_diff_max` IS NULL AND `diff_max` IS NOT NULL),
cte6 AS (SELECT `data_max`,
MIN(CASE WHEN `data_diff` IS NOT NULL THEN date END) AS diff_date_from,
MAX(CASE WHEN `data_diff` IS NOT NULL THEN date END) AS diff_date_to
FROM cte2
GROUP BY `data_max`
HAVING diff_date_from IS NOT NULL)
SELECT diff_date_from, diff_date_to, `date` AS diff_max_date, `diff_max`
FROM cte6
JOIN cte5 ON cte5.date BETWEEN cte6.diff_date_from AND cte6.diff_date_to
WHERE cte5.diff_rank = 1
Output:
diff_date_from diff_date_to diff_max_date diff_max
2017-01-04 2017-01-06 2017-01-04 -3
2017-01-09 2017-01-11 2017-01-11 -10
2017-01-13 2017-01-13 2017-01-13 -2
Demo on dbfiddle

MySQL user retention and day to day

I'm trying to figure out how to write my SQL query to get users day to day and retention.
consider having the following row table round_statistics
on each play round i have date of the round,
now i would like to:
1. know how many users play two days in a row meaning played on Sunday and Monday, Monday and Tuesday, but Sunday and Tuesday doesn't count as two days in a row.
2. users retention 1-7
retention 7 is : % of users that have the chance to play the last 7 days (meaning they are registered at least 7 days) and had some activity (record) after 7 days.
retention 6-1 are the same only for 6-1 days.
Please help me to find out my game retention :) you will get a free coins to play it....
Thanks.
The table structure is:
user_id,round_time
for example if i played 3 times today:
user id | round_time
1000, | '2013-08-10 14:02:53'
1000, | '2013-08-10 14:03:25'
1000, | '2013-08-10 14:04:47'
the result structure is:
date | 2013-08-10 | 2013-07-10
day to day | 10 | 100
retention 7 | 15 | 125
retention 6 | 20 | 210
retention 5 | 30 | 320
retention 4 | 40 | 430
retention 3 | 50 | 540
retention 2 | 60 | 650
retention 1 | 120 | 1620
My sql don't has analytic functions, neither CTE and pivot table features, for this reasons it is not direct to do your required query (and nobody answer your question).
For this data:
create table t ( uid int, rt date);
insert into t values
(99, '2013-08-7 14:02:53' ), <- gap
(99, '2013-08-9 14:02:53' ), <-
(99, '2013-08-10 14:03:25' ),
(1000, '2013-08-7 14:02:53' ),
(1000, '2013-08-8 14:03:25' ),
(1000, '2013-08-9 14:03:25' ),
(1000, '2013-08-10 14:04:47');
This is an approach before pivot retentions, for a given date ( '2013-08-10 00:00:00' , '%Y-%m-%d') :
select count( distinct uid ) as n, d, dt from
(
select uid,
'2013-08-10 00:00:00' as d,
G.dt
from
t
inner join
( select 7 as dt union all
select 6 union all select 5 union all
select 4 union all select 3 union all
select 2 union all select 1 union all select 0) G
on DATE_FORMAT( t.rt, '%Y-%m-%d') between
DATE_FORMAT( date_add( '2013-08-10 00:00:00', Interval -1 * G.dt DAY) ,
'%Y-%m-%d')
and
DATE_FORMAT( '2013-08-10 00:00:00' , '%Y-%m-%d')
where DATE_FORMAT(rt , '%Y-%m-%d') <= DATE_FORMAT( '2013-08-10 00:00:00' ,
'%Y-%m-%d')
group by uid, G.dt
having count( distinct DATE_FORMAT( T.rt, '%Y-%m-%d') ) = G.dt + 1
) TT
group by dt
Your pre-cooked data ( DT = 0 means today visits, DT = 1 means 2 consecutive days, ...):
| N | D | DT |
--------------------------------
| 2 | 2013-08-10 00:00:00 | 0 |
| 2 | 2013-08-10 00:00:00 | 1 |
| 1 | 2013-08-10 00:00:00 | 2 |
| 1 | 2013-08-10 00:00:00 | 3 |
Here it is ( for same data ):
select count( distinct uid ) as n, d, dt from
(
select uid,
z.zt as d,
G.dt
from
t
cross join
( select distinct DATE_FORMAT( t.rt, '%Y-%m-%d') as zt from t) z
inner join
( select 7 as dt union all
select 6 union all select 5 union all
select 4 union all select 3 union all
select 2 union all select 1 union all select 0) G
on DATE_FORMAT( t.rt, '%Y-%m-%d') between
DATE_FORMAT( date_add( z.zt, Interval -1 * G.dt DAY) ,
'%Y-%m-%d')
and
z.zt
where z.zt <= z.zt
group by uid, G.dt, z.zt
having count( distinct DATE_FORMAT( T.rt, '%Y-%m-%d') ) = G.dt + 1
) TT
group by d,dt
order by d,dt
Results at sqlfiddle: http://sqlfiddle.com/#!2/c26ec/10/0
| N | D | DT | GROUP_CONCAT( UID) |
--------------------------------------------
| 2 | 2013-08-07 | 0 | 1000,99 |
| 1 | 2013-08-08 | 0 | 1000 |
| 1 | 2013-08-08 | 1 | 1000 |
| 2 | 2013-08-09 | 0 | 1000,99 |
| 1 | 2013-08-09 | 1 | 1000 |
| 1 | 2013-08-09 | 2 | 1000 |
| 2 | 2013-08-10 | 0 | 1000,99 |
| 2 | 2013-08-10 | 1 | 99,1000 |
| 1 | 2013-08-10 | 2 | 1000 |
| 1 | 2013-08-10 | 3 | 1000 |

MySQL - how to select id where min/max dates difference is more than 3 years

I have a table like this:
| id | date | user_id |
----------------------------------------------------
| 1 | 2008-01-01 | 10 |
| 2 | 2009-03-20 | 15 |
| 3 | 2008-06-11 | 10 |
| 4 | 2009-01-21 | 15 |
| 5 | 2010-01-01 | 10 |
| 6 | 2011-06-01 | 10 |
| 7 | 2012-01-01 | 10 |
| 8 | 2008-05-01 | 15 |
I’m looking for a solution how to select user_id where the difference between MIN and MAX dates is more than 3 yrs. For the above data I should get:
| user_id |
-----------------------
| 10 |
Anyone can help?
SELECT user_id
FROM mytable
GROUP BY user_id
HAVING MAX(`date`) > (MIN(`date`) + INTERVAL '3' YEAR);
Tested here: http://sqlize.com/MC0618Yg58
Similar to bernie's approach, I'd keep date formats native. I'd also probably list the MAX first as to avoid an ABS call (secure a positive number is always returned).
SELECT user_id
FROM my_table
WHERE DATEDIFF(MAX(date),MIN(date)) > 365
DATEDIFF just returns delta (in days) between two given date fields.
SELECT user_id
FROM (SELECT user_id, MIN(date) m0, MAX(date) m1
FROM table
GROUP by user_id)
HAVING EXTRACT(YEAR FROM m1) - EXTRACT(YEAR FROM m0) > 3
SELECT A.USER_ID FROM TABLE AS A
JOIN TABLE AS B
ON A.USER_ID = B.USER_ID
WHERE DATEDIFF(A.DATE,B.DATE) > 365