Uneven automated buckets/bins in SQL - mysql

ads Table:
-one row per ad per day
date | ad_id | account_id | spend
2018-05-01 123 1101 100
2018-05-02 123 1101 125
2018-05-03 124 1101 150
2018-05-04 124 1101 150
2018-05-04 125 1105 150
2018-05-04 126 1105 150
2018-05-04 123 1101 150
2018-01-01 123 1101 150
I am trying to create a histogram to show the how much advertisers have spent in last 7 days.
I want the first bucket to be $10-999.99 and others to be $1000-1999.99,$2000-2999.99 etc but this I want to achieve through automation not by manually mentioning buckets through case function.
My current code does well in creating even automated buckets:
select CONCAT(1000*FLOOR(last_7_days_spend/1000), "-", 1000*FLOOR(last_7_days_spend/1000)+999.99) "spend($)" , count(*) "frequency"
from
(select account_id, sum(spend) "last_7_days_spend"
from fb_ads
where date between date_sub(curdate(), interval 7 day) and date_sub(curdate(), interval 1 day)
group by account_id) as abc
group by 1
order by 1;
and it returns this:
spend | frequency
0-999.99 2
2000-2999.99 1
But want to write some similar kind of query which should filter out records and start from $10-999.99 instead of $0.00-999.99.
Desired output:
spend | frequency
10-999.99 2
2000-2999.99 1

You'll need to use a CASE expression to define the first bucket, but you can automate the other buckets within that expression. Note that if you don't want a bucket for a spend of less than $10, you'll need to filter those values out:
SELECT
CASE WHEN last_7_days_spend < 1000 THEN '10-999.99'
ELSE CONCAT(1000*FLOOR(last_7_days_spend/1000), "-", 1000*FLOOR(last_7_days_spend/1000)+999.99)
END AS `spend($)`,
COUNT(*) AS `frequency`
FROM (
SELECT account_id, SUM(spend) AS `last_7_days_spend`
FROM fb_ads
WHERE date BETWEEN DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND DATE_SUB(CURDATE(), INTERVAL 1 DAY)
GROUP BY account_id
) as abc
WHERE last_7_days_spend >= 10
GROUP BY 1
ORDER BY 1
Small demo on db-fiddle

Related

Subtract rows and create new columns with results

I have an issue with one SQL query in MySQL. My table looks like below:
Index User Date Speed
1 X 2018-01-01 10:00:00 23
1 X 2018-01-01 10:00:20 50
1 X 2018-01-02 10:00:00 40
1 Z 2018-01-01 10:00:00 20
1 Z 2018-01-02 10:00:00 40
1 Z 2018-01-03 10:00:00 50
and result should be like this:
Index User Date Speed Date_diff Speed_diff
1 X 2018-01-01 10:00:00 23
1 X 2018-01-01 10:00:20 50 20s 27
1 X 2018-01-01 10:02:00 40 1m40s -10
1 Z 2018-01-01 10:00:00 20 -2m -20
1 Z 2018-01-02 10:00:00 40 1d 20
1 Z 2018-01-03 10:00:00 50 1d 10
So basically I need to substract rows one after another and create a new columns one with results. I am starting an adventure with SQL and I am not sure how I could do this? Any idea?
I tried to do this using this https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lag but I think that my syntax is wrong
SELECT objid,
LAG(Date) OVER AS 'lag',
LEAD(Speed) OVER AS 'Lead',
date- LAG(date) OVER AS 'lag diff',
speed- LEAD(speed) OVER AS 'Lead diff',
FROM tabel;
Try something like:
SELECT Index, User, 'Date', Speed,
'Date' - LAG('Date') OVER w AS Date_diff,
Speed - LAG(Speed) OVER w AS Speed_diff
FROM table
WINDOW w AS (ORDER BY User, 'Date');
Only use single quotes for string and date values -- never for column names.
Your code also needs a windowing clause, and to adjust the date/time arithmetic. If you can represent the date/time difference as a time, then:
SELECT t.*,
secs_to_time(to_seconds(t.date) - LAG(to_seconds(t.date)) OVER (PARTITION BY user ORDER BY DATE)) AS date_diff
(t.speed - LAG(speed) OVER (PARTITION BY user ORDER BY DATE)) as speed_diff
FROM tabel t;

How to do Week over Week Increase in SQL [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
ads Table:
-one row per ad per day
date | ad_id | account_id | spend
2018-05-01 123 1101 100
2018-05-02 123 1101 125
2018-05-03 124 1101 150
2018-05-04 124 1101 150
2018-05-04 125 1105 150
2018-05-04 126 1105 150
2018-05-04 123 1101 150
2018-01-01 123 1101 150
I am trying to write a query to find out: on each day, the total spend and the week-over-week change since 1st January.
So, week over week should show 8th day's total spending - 1st days total spending. I can achieve that lag window function but what I am not sure what to do if the first day is not mentioned in the date column. Let's say there's no spending on the first day of may then the answer would go wrong if I had used lag function. Is there a way that I could write a query that would look for the total spending through dates rather than lag function? and if on the first day I have no spending, I could get 1200-0=1200 is the WOW change. Also, I can't create a dates table that I can join the ads table on.
I have written this much so far:
select dates, sum(spend) "total_spend_each_day",
from fb_ads as f
where dates>= '2018-01-01'
group by dates
order by 1;
Desired Output:
date | total_spend_each_day | Week_over_week_change
2018-05-01 500 Null
2018-05-02 600 Null
2018-05-03 700 Null
2018-05-04 800 Null
2018-05-05 900 Null
2018-05-06 1000 Null
2018-01-07 1100 Null
2018-01-08 1200 700
Just use lag(). Assuming you have at least one record per day:
select dates, sum(spend) as total_spend_each_day,
sum(spend) - lag(sum(spend), 7) over (order by dates) as diff
from fb_ads as f
where dates >= '2018-01-01'
group by dates
order by 1;
If you don't have data for each day, then just use a window frame with range():
select dates, sum(spend) as total_spend_each_day,
(sum(spend) -
max(sum(spend)) over (order by dates range between interval 7 day and interval 7 day)
) as diff
from fb_ads as f
where dates >= '2018-01-01'
group by dates
order by 1;

SUM timestampdiff of multiple durations per day

I'm having a table from my thermostat.
It records data as follows.
So when it switches on, I get a timestate with Status 1 meaning on, Status 0 mean heating switches off. Additionally it gives me with every on/off the total heatings per day.
Date | Status | Total_heatings
2019-01-20 10:00:00 | 1 | 1
2019-01-20 10:10:00 | 0 | 1
2019-01-20 14:00:00 | 1 | 2
2019-01-20 14:25:00 | 0 | 2
2019-01-20 18:00:00 | 1 | 3
2019-01-20 18:15:00 | 0 | 3
2019-01-21 01:00:00 | 1 | 1
2019-01-21 01:30:00 | 0 | 1
2019-01-21 06:00:00 | 1 | 2
2019-01-21 06:15:00 | 0 | 2
I'm trying to get the total duration by day. I tried the below script, which gives me the durations for the multiple heating sessions for each day.
When I use SUM(TIMESTAMPDIFF(Minute,Min(Date),MAX(Date))) it throws an error because of wrong usage of grouping.
SELECT
DATE_FORMAT(Date, '%d.%m') AS 'day',
TIMESTAMPDIFF(MINUTE,MIN(Date),MAX(Date)) AS 'Duration'
FROM thermostat
WHERE (Date BETWEEN '2019-01-21 00:00:00' + INTERVAL -7 DAY AND '2019-01-21 00:00:00')
GROUP BY DAY(Date),Total_heatings;
All I would need is to get a SUM by day of these various heating sessions per day.
So the result should have the following:
Day | Duration
20.01 | 50
21.01 | 45
Now I'm stuck with not being able to further summing all heating session per day, like total duration each day.
Thanks a lot for any pointers and help.
This query will work for MySQL versions before 8.0. It uses a SELF JOIN to find matching heater off rows for a given heater on row. Where a matching row doesn't exist, it uses either the end of the day or the current time, whichever is lower.
SELECT DATE_FORMAT(t1.Date, '%d.%m') AS `day`,
SUM(TIMESTAMPDIFF(MINUTE, t1.Date, COALESCE(t2.Date, LEAST(NOW(), DATE(t1.Date) + INTERVAL 1 DAY)))) AS Duration,
MAX(t1.Total_heatings) AS Total_heatings
FROM thermostat t1
LEFT JOIN thermostat t2 ON t2.Status = 0 AND t2.Total_heatings = t1.Total_heatings AND DATE(t2.Date) = DATE(t1.Date)
WHERE t1.Status = 1 AND DATE(t1.Date) BETWEEN '2019-01-21' - INTERVAL 7 DAY AND '2019-01-21'
GROUP BY `day`
Output:
day Duration Total_heatings
20.01 50 3
21.01 45 2
Demo on dbfiddle
If you are using MySQL 8, you can use window function LAG to access the previous switch. In the outer query, you can filter on intervals where the previous status was on.
SELECT
DATE_FORMAT(x.date, '%d.%m'),
SUM(TIMESTAMPDIFF( minute, x.date, x.last_date) duration
FROM (
SELECT
t.*,
LAG(t.date) OVER (PARTITION BY DATE_FORMAT(t.date, '%d.%m') ORDER BY t.date) last_date,
LAG(t.status) OVER (PARTITION BY DATE_FORMAT(t.date, '%d.%m') ORDER BY t.date) last_status
FROM mytable t
) x
WHERE x.last_status = 1
GROUP BY DATE_FORMAT(x.date, '%d.%m')
ORDER BY 1
In this db fiddle, this matches your expected output.
Using window function available in MySQL-8.0 and MariaDB-10.2:
select DATE(ts) as 'day', sum(ontime) as 'on time'
from (
select status, lead(ts,1,ts) over w - ts as 'ontime'
from (
select unix_timestamp(ts) as ts, status
from t
order by ts
) x
window w as (order by ts)
) y
where status=1
group by 'day';

Finding date where conditions within 30 days has elapsed

For my website, I have a loyalty program where a customer gets some goodies if they've spent $100 within the last 30 days. A query like below:
SELECT u.username, SUM(total-shipcost) as tot
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND user = :user
AND date >= DATE(NOW() - INTERVAL 30 DAY)
:user being their user ID. Column 2 of this result gives how much a customer has spent in the last 30 days, if it's over 100, then they get the bonus.
I want to display to the user which day they'll leave the loyalty program. Something like "x days until bonus expires", but how do I do this?
Take today's date, March 16th, and a user's order history:
id | tot | date
-----------------------
84 38 2016-03-05
76 21 2016-02-29
74 49 2016-02-20
61 42 2015-12-28
This user is part of the loyalty program now but leaves it on March 20th. What SQL could I do which returns how many days (4) a user has left on the loyalty program?
If the user then placed another order:
id | tot | date
-----------------------
87 12 2016-03-09
They're still in the loyalty program until the 20th, so the days remaining doesn't change in this instance, but if the total were 50 instead, then they instead leave the program on the 29th (so instead of 4 days it's 13 days remaining). For what it's worth, I care only about 30 days prior to the current date. No consideration for months with 28, 29, 31 days is needed.
Some create table code:
create table users (
userident int,
username varchar(100)
);
insert into users values
(1, 'Bob');
create table orders (
id int,
user int,
shipped int,
date date,
total decimal(6,2),
shipcost decimal(3,2)
);
insert into orders values
(84, 1, 1, '2016-03-05', 40.50, 2.50),
(76, 1, 1, '2016-02-29', 22.00, 1.00),
(74, 1, 1, '2016-02-20', 56.31, 7.31),
(61, 1, 1, '2015-12-28', 43.10, 1.10);
An example output of what I'm looking for is:
userident | username | days_left
--------------------------------
1 Bob 4
This is using March 16th as today for use with DATE(NOW()) to remain consistent with the previous bits of the question.
The following is basically how to do what you want. Note that references to "30 days" are rough estimates and what you may be looking for is "29 days" or "31 days" as works to get the exact date that you want.
Retrieve the list of dates and amounts that are still active, i.e., within the last 30 days (as you did in your example), as a table (I'll call it Active) like the one you showed.
Join that new table (Active) with the original table where a row from Active is joined to all of the rows of the original table using the date fields. Compute a total of the amounts from the original table. The new table would have a Date field from Active and a Totol field that is the sum of all the amounts in the joined records from the original table.
Select from the resulting table all records where the Amount is greater than 100.00 and create a new table with Date and the minimum Amount of those records.
Compute 30 days ahead from those dates to find the ending date of their loyalty program.
You would need to take the following steps (per user):
join the orders table with itself to calculate sums for different (bonus) starting dates, for any of the starting dates that are in the last 30 days
select from those records only those starting dates which yield a sum of 100 or more
select from those records only the one with the most recent starting date: this is the start of the bonus period for the selected user.
Here is a query to do that:
SELECT u.userident,
u.username,
MAX(base.date) AS bonus_start,
DATE(MAX(base.date) + INTERVAL 30 DAY) AS bonus_expiry,
30-DATEDIFF(NOW(), MAX(base.date)) AS bonus_days_left
FROM users u
LEFT JOIN (
SELECT o.user,
first.date AS date,
SUM(o.total-o.shipcost) as tot
FROM orders first
INNER JOIN orders o
ON o.user = first.user
AND o.shipped = 1
AND o.date >= first.date
WHERE first.shipped = 1
AND first.date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY o.user,
first.date
HAVING SUM(o.total-o.shipcost) >= 100
) AS base
ON base.user = u.userident
GROUP BY u.username,
u.userident
Here is a fiddle.
With this input as orders:
+----+------+---------+------------+-------+----------+
| id | user | shipped | date | total | shipcost |
+----+------+---------+------------+-------+----------+
| 61 | 1 | 1 | 2015-12-28 | 42 | 0 |
| 74 | 1 | 1 | 2016-02-20 | 49 | 0 |
| 76 | 1 | 1 | 2016-02-29 | 21 | 0 |
| 84 | 1 | 1 | 2016-03-05 | 38 | 0 |
| 87 | 1 | 1 | 2016-03-09 | 50 | 0 |
+----+------+---------+------------+-------+----------+
The above query will return this output (when executed on 2016-03-20):
+-----------+----------+-------------+--------------+-----------------+
| userident | username | bonus_start | bonus_expiry | bonus_days_left |
+-----------+----------+-------------+--------------+-----------------+
| 1 | John | 2016-02-29 | 2016-03-30 | 10 |
+-----------+----------+-------------+--------------+-----------------+
Simple solution
Seeing how you do your first query, I guessed that when you are at the point where you look for the "expiration date", you already know that the user meets the 100 points over last 30 days. Then you can do this :
SELECT DATE_ADD(MIN(date),INTERVAL 30 DAY)
FROM orders o
WHERE shipped = 1
AND user = :user
AND date >= (DATE(NOW() - INTERVAL 30 DAY))
It takes the minimum order date of a user over the last 30 days, and add 30 days to the result.
But that really is a poor design to achieve what you want.
You would better to think further and implement what's next.
Advanced solution
In order to reproduce all the following solution, I have used the Fiddle that Trincot kindly built, and expanded it to test on more data : 4 users having 4 orders.
SQL FIddle http://sqlfiddle.com/#!9/668939/1
Step 1 : Design
The following query will return all the users meeting the loyalty program criteria, along with their earlier order date within 30 days and the loyalty program expiration date calculated from the earlier date, and the number of days before it expires.
SELECT O.user, u.username, SUM(total-shipcost) as tot, MIN(date) AS mindate,
DATE_ADD(MIN(date),INTERVAL 30 DAY) AS expirationdate,
DATEDIFF(DATE_ADD(MIN(date),INTERVAL 30 DAY), DATE(NOW())) AS daysleft
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY user
HAVING tot >= 100;
Now, create a VIEW with the above query
CREATE VIEW loyalty_program AS
SELECT O.user, u.username, SUM(total-shipcost) as tot, MIN(date) AS mindate,
DATE_ADD(MIN(date),INTERVAL 30 DAY) AS expirationdate,
DATEDIFF(DATE_ADD(MIN(date),INTERVAL 30 DAY), DATE(NOW())) AS daysleft
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY user
HAVING tot >= 100;
It is important to understand that this is only a one-shot action on your database.
Step 2 : Use your new VIEW
Once you have the view, you can get easily, for all users, the "state" of the loyalty program:
SELECT * FROM loyalty_program
user username tot mindate expirationdate daysleft
1 John 153 February, 28 2016 March, 29 2016 9
2 Joe 112 February, 24 2016 March, 25 2016 5
3 Jack 474 February, 23 2016 March, 24 2016 4
4 Averel 115 February, 22 2016 March, 23 2016 3
For a specific user, you can get the date you are looking for like this:
SELECT expirationdate FROM loyalty_program WHERE username='Joe'
You can also request all the users for which the expiration date is today
SELECT user FROM loyalty_program WHERE expirationdate=DATE(NOW))
But there are other easy possibilities that you'll discover after having played with your VIEW.
Conclusion
Make your life easier: learn to use VIEWS !
I am assuming your table looks like this:
user | id | total | date
-------------------------------
12 84 38 2016-03-05
12 76 21 2016-02-29
23 74 49 2016-02-20
23 61 42 2015-12-28
then try this:
SELECT x.user, x.date, x.id, x.cum_sum, d,date, DATEDIFF(NOW(), x.date) from (SELECT a.user, a.id, a.date, a.total,
(SELECT SUM(b.total) FROM order_table b WHERE b.date <= a.date and a.user=b.user ORDER BY b.user, b.id DESC) AS cum_sum FROM order_table a where a.date>=DATE(NOW() - INTERVAL 30 DAY) ORDER BY a.user, a.id DESC) as x
left join
(SELECT c.user, c.date as start_date, c.id from (SELECT a.user, a.id, a.date, a.total,
(SELECT SUM(b.total) FROM order_table b WHERE b.date <= a.date and a.user=b.user ORDER BY b.user, b.id DESC) AS cum_sum FROM order_table a where a.date>=DATE(NOW() - INTERVAL 30 DAY) ORDER BY a.user, a.id DESC) as c WHERE FLOOR(c.cum_sum/100)=MIN(FLOOR(c.cum_sum/100)) and MOD(c.cum_sum,100)=MAX(MOD(c.cum_sum,100)) group by concat(c.user, "_", c.id)) as d on concat(x.user, "_", x.id)=concat(d.user, "_", d.id) where x.date=d.date;
You will get a table something like this:
user | Date | cum_sum | start_date | Time_left
----------------------------------------------------
12 2016-03-05 423 2016-03-05 24
13 2016-02-29 525 2016-02-29 12
23 2016-02-20 944 2016-02-20 3
29 2015-12-28 154 2015-12-28 4
i have not tested this. But what i am trying to do is to create a table in descending order of id and user, and get a cumulative total column along with it. I have created another table by using this table with cumulative total, with relevant date (i.e. date from which date difference is to be calculated) for each user. I have left joined these two tables, and put in the condition x.date=d.date. I have put start_date and date in the table to check if the query is working.
Also, this is not the most optimum way of writing this code, but i have tried to stay as safe as possible by using sub queries, since i did not have the data to test this. Let me know if you face any error.

Daily count of Active Users for a given date range

I need to find the Daily total count of Active Users based on the Start Date and End Date.
REGISTRATION TABLE
id registration_no start_date end_date
1 1000 2014/12/01 2014/12/03
2 1001 2014/12/01 2014/12/03
3 1002 2014/12/02 2014/12/04
4 1003 2014/12/02 2014/12/04
5 1004 2014/12/02 2014/12/04
6 1005 2014/12/03 2014/12/05
7 1006 2014/12/05 2014/12/06
8 1007 2014/12/05 2014/12/09
9 1008 2014/12/06 2014/12/10
10 1009 2014/12/07 2014/12/11
The result should be in the following format.
Date Active Users
2014-12-01 2
2014-12-02 5
2014-12-03 6
2014-12-04 4
2014-12-05 3
2014-12-06 3
2014-12-07 3
2014-12-08 3
2014-12-09 3
2014-12-10 2
2014-12-11 1
2014-12-12 0
I know the following query is not working.
SELECT start_date, count(*) FROM registration
WHERE start_date >= '2014/12/01' AND end_date <='2014/12/12'
GROUP BY start_date
Which is not the desired output :
2014-12-01 2
2014-12-02 3
2014-12-03 1
2014-12-05 2
2014-12-06 1
2014-12-07 1
Any help would be much appreciated.
You need to create a "calendar" with all the days you need and then use a query like:
SELECT calDay as `Date`, count(id) as `Active Users`
FROM (SELECT cast('2014-12-01' + interval `day` day as date) calDay
FROM days31
WHERE cast('2014-12-01' + interval `day` day as date) < '2014-12-12') calendar
LEFT JOIN registration on (calDay between start_date and end_date)
GROUP BY calDay
ORDER BY calDay;
You can see it working in this fiddle, where days31 is just a view with integers 0-30. This allows the query to work in any calendar up to a period of 31 days. You can add more days to the view or generate them on the fly using cross joins. See http://www.artfulsoftware.com/infotree/qrytip.php?id=95
Try it.... please note on where condition FOR 2014-12-02, as per comment
SELECT DATE_FORMAT(start_date,'%Y-%m-%d')as Date, count(*) as ActiveUser FROM registration
WHERE (start_date >= '2014/12/02' AND end_date <='2014/12/02')
GROUP BY start_date