SQL: aggregation from different tables with different date formats - mysql

Working with the following sql tables:
table: fiscal
DateID | date | fiscal_year | fiscal_week
20170101 2017-01-01 00:00:00.0 2017 2017 WK 01
20170102 2017-01-02 00:00:00.0 2017 2017 WK 01
table: email_info
email_id | email_name | email_subjectline
123 New_Year_2017 Welcome the new year!
345 Reminder Don't forget
table: sent_info
email_id | sent_date
123 | 1/1/2017 8:58:39 PM
345 | 1/2/2017 6:33:39 AM
table: click_info
recipient | email_id | click_date
XYZ 123 1/7/2017 4:25:27 PM
ABC 123 1/5/2017 3:13:56 AM
CDF 345 1/6/2017 2:20:16 AM
ABC 345 1/14/2017 3:33:25 AM
Obviously there are many rows in each table.
The joining between the email tables is straightforward.
SELECT *
FROM email_info
JOIN sent_info
ON sent_info.email_id = email_info.email_id
JOIN click_info
ON click_info.email_id = email_info.email_id
I am struggling with the following:
how to get all dates into the same format? ( I don't need the times,
only the day)
how to join the fiscal table so I can filter by fiscal week for example
how to count all clicks for an email for 7 days after the sent date (this cannot be hard-coded by dates, but must be dynamic)
This is the output I am looking for (filtered by fiscal week = 2017 WK 01):
email_id | email_name | sent_date | fiscal_week | Clicks
123 New_year_2017 1/1/2017 2017 WK 01 2
345 Reminder 1/2/2017 2017 WK 01 1
*Please note that the last click in the click_info table example was not counted, because it was beyond the 7 days after sent date.
** DateID is an integer and sent_date and click_date are strings/varchar

assuming that dateId is varchar and the others are datetime should be
select a.email_id, a.email_name, date(b.sent_date), c.fiscal_week, count(d.click_date)
from email_info a
inner join fiscal c on str_to_date(c.dateID, '%Y%m%d') = date(b.sent_date)
inner join sent_info b on b.email_id = c.email_id
inner join click_info d on d.email_id = b.email_id
and date(d.click_date) between date(b.sent_date) and DATEADD(week,1,date(b.sent_date))
group by a.email_id, a.email_name, date(b.sent_date), c.fiscal_week
PS do the fact the query struct is defined if you have other format you can convert properly and change the single piece

Related

How to get the days within a range of two dates?

I have the calendar table with all the dates of the month of December 2021 (I will only exemplify some dates within the table, but it is understood that it actually contains all the days of said month):
ID
date
01
2021-12-01
02
2021-12-02
03
2021-12-03
04
2021-12-04
05
2021-12-05
I have the users table:
ID
name
num_employee
01
Andrew
101
02
Mary
102
I have the table assistances
ID
date
num_employee
01
2021-12-03
101
02
2021-12-04
101
03
2021-12-03
102
04
2021-12-04
102
05
2021-12-05
101
06
2021-12-06
102
I have a query to display the employee number, their name, the days they attended and the days they were absent:
SELECT u.num_employee,
u.name,
a.date AS attendances,
(SELECT GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM calendar
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
AND NOT FIND_IN_SET(EXTRACT(DAY FROM date),a.date)) as faults FROM users u
JOIN (SELECT num_employee,
GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM attendances
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY num_employee) a ON a.num_employee = u.num_employee
With the above query, I get this:
num_employee
name
attendances
faults
101
Andrew
3,4,5
1,2,3,6,7,8,9,10...
102
Mary
3,4,6
1,2,5,7,8,9,10...
Now, the point is that in addition to the attendances table, I have to consider another table called vacations. The structure of this table is as follows:
id
initial_date
final_date
num_employee
01
2021-12-07
2021-12-09
101
02
2021-12-07
2021-12-09
102
And taking this table into consideration, the days within the ranges that are handled as vacations should stop appearing in the "faults" column. The result should be the following:
num_employee
name
attendances
faults
101
Andrew
3,4,5
1,2,3,6,10...
102
Mary
3,4,6
1,2,5,10...
How can I adapt my query to get the above?
The query in question cannot be adapted to use CTE given the version of MariaDB I am using. I am working on phpMyAdmin.
As MySQL and MariaDB are bad with sequences, it is good that you have a calendar table to run against.
So another subquery that retrieves the dates of the vacation is needed.
I used a GROUP BY in the subselect as there could be more than 1 vacation period in a month.
SELECT u.num_employee,
u.name,
a.date AS attendances,
(SELECT GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM calendar
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
AND NOT FIND_IN_SET(EXTRACT(DAY FROM date),a.date)
AND NOT FIND_IN_SET(EXTRACT(DAY FROM date),vac.vac_days)) as faults
FROM users u
LEFT JOIN (SELECT num_employee,
GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM attendances
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY num_employee) a ON a.num_employee = u.num_employee
LEFT JOIN (SELECT v.`num_employee`, GROUP_CONCAT(DAY(c.`date`)) vac_days
FROM vacations v INNER JOIN calendar c ON c.`date` BETWEEN v.`initial_date` AND `final_date`
AND c. date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY v.`num_employee`) vac ON vac.`num_employee` = u.num_employee
num_employee | name | attendances | faults
-----------: | :----- | :---------- | :----------
101 | Andrew | 3,4,5 | 1,2,6,10,11
102 | Mary | 3,4,6 | 1,2,5,10,11
db<>fiddle here
FIND_IN_SET needs a string and doesn't work with NULL, so you need to check for NULL Values and replace them
SELECT u.num_employee,
u.name,
a.date AS attendances,
(SELECT GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM calendar
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
AND NOT FIND_IN_SET(EXTRACT(DAY FROM date),IFNULL(a.date,''))
AND NOT FIND_IN_SET(EXTRACT(DAY FROM date),IFNULL(vac.vac_days,''))
) as faults
FROM users u
LEFT JOIN (SELECT num_employee,
GROUP_CONCAT(DISTINCT EXTRACT(DAY FROM date)) AS date FROM attendances
WHERE date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY num_employee) a ON a.num_employee = u.num_employee
LEFT JOIN (SELECT v.`num_employee`, GROUP_CONCAT(DAY(c.`date`)) vac_days
FROM vacations v INNER JOIN calendar c ON c.`date` BETWEEN v.`initial_date` AND `final_date`
AND c. date BETWEEN '2021-12-01' AND '2021-12-31'
GROUP BY v.`num_employee`) vac ON vac.`num_employee` = u.num_employee
num_employee | name | attendances | faults
-----------: | :----- | :---------- | :----------------
101 | Andrew | 3,4,5 | 1,2,6,10,11
102 | Mary | 3,4,6 | 1,2,5,7,8,9,10,11
db<>fiddle here

Finding date where conditions within 30 days has elapsed

For my website, I have a loyalty program where a customer gets some goodies if they've spent $100 within the last 30 days. A query like below:
SELECT u.username, SUM(total-shipcost) as tot
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND user = :user
AND date >= DATE(NOW() - INTERVAL 30 DAY)
:user being their user ID. Column 2 of this result gives how much a customer has spent in the last 30 days, if it's over 100, then they get the bonus.
I want to display to the user which day they'll leave the loyalty program. Something like "x days until bonus expires", but how do I do this?
Take today's date, March 16th, and a user's order history:
id | tot | date
-----------------------
84 38 2016-03-05
76 21 2016-02-29
74 49 2016-02-20
61 42 2015-12-28
This user is part of the loyalty program now but leaves it on March 20th. What SQL could I do which returns how many days (4) a user has left on the loyalty program?
If the user then placed another order:
id | tot | date
-----------------------
87 12 2016-03-09
They're still in the loyalty program until the 20th, so the days remaining doesn't change in this instance, but if the total were 50 instead, then they instead leave the program on the 29th (so instead of 4 days it's 13 days remaining). For what it's worth, I care only about 30 days prior to the current date. No consideration for months with 28, 29, 31 days is needed.
Some create table code:
create table users (
userident int,
username varchar(100)
);
insert into users values
(1, 'Bob');
create table orders (
id int,
user int,
shipped int,
date date,
total decimal(6,2),
shipcost decimal(3,2)
);
insert into orders values
(84, 1, 1, '2016-03-05', 40.50, 2.50),
(76, 1, 1, '2016-02-29', 22.00, 1.00),
(74, 1, 1, '2016-02-20', 56.31, 7.31),
(61, 1, 1, '2015-12-28', 43.10, 1.10);
An example output of what I'm looking for is:
userident | username | days_left
--------------------------------
1 Bob 4
This is using March 16th as today for use with DATE(NOW()) to remain consistent with the previous bits of the question.
The following is basically how to do what you want. Note that references to "30 days" are rough estimates and what you may be looking for is "29 days" or "31 days" as works to get the exact date that you want.
Retrieve the list of dates and amounts that are still active, i.e., within the last 30 days (as you did in your example), as a table (I'll call it Active) like the one you showed.
Join that new table (Active) with the original table where a row from Active is joined to all of the rows of the original table using the date fields. Compute a total of the amounts from the original table. The new table would have a Date field from Active and a Totol field that is the sum of all the amounts in the joined records from the original table.
Select from the resulting table all records where the Amount is greater than 100.00 and create a new table with Date and the minimum Amount of those records.
Compute 30 days ahead from those dates to find the ending date of their loyalty program.
You would need to take the following steps (per user):
join the orders table with itself to calculate sums for different (bonus) starting dates, for any of the starting dates that are in the last 30 days
select from those records only those starting dates which yield a sum of 100 or more
select from those records only the one with the most recent starting date: this is the start of the bonus period for the selected user.
Here is a query to do that:
SELECT u.userident,
u.username,
MAX(base.date) AS bonus_start,
DATE(MAX(base.date) + INTERVAL 30 DAY) AS bonus_expiry,
30-DATEDIFF(NOW(), MAX(base.date)) AS bonus_days_left
FROM users u
LEFT JOIN (
SELECT o.user,
first.date AS date,
SUM(o.total-o.shipcost) as tot
FROM orders first
INNER JOIN orders o
ON o.user = first.user
AND o.shipped = 1
AND o.date >= first.date
WHERE first.shipped = 1
AND first.date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY o.user,
first.date
HAVING SUM(o.total-o.shipcost) >= 100
) AS base
ON base.user = u.userident
GROUP BY u.username,
u.userident
Here is a fiddle.
With this input as orders:
+----+------+---------+------------+-------+----------+
| id | user | shipped | date | total | shipcost |
+----+------+---------+------------+-------+----------+
| 61 | 1 | 1 | 2015-12-28 | 42 | 0 |
| 74 | 1 | 1 | 2016-02-20 | 49 | 0 |
| 76 | 1 | 1 | 2016-02-29 | 21 | 0 |
| 84 | 1 | 1 | 2016-03-05 | 38 | 0 |
| 87 | 1 | 1 | 2016-03-09 | 50 | 0 |
+----+------+---------+------------+-------+----------+
The above query will return this output (when executed on 2016-03-20):
+-----------+----------+-------------+--------------+-----------------+
| userident | username | bonus_start | bonus_expiry | bonus_days_left |
+-----------+----------+-------------+--------------+-----------------+
| 1 | John | 2016-02-29 | 2016-03-30 | 10 |
+-----------+----------+-------------+--------------+-----------------+
Simple solution
Seeing how you do your first query, I guessed that when you are at the point where you look for the "expiration date", you already know that the user meets the 100 points over last 30 days. Then you can do this :
SELECT DATE_ADD(MIN(date),INTERVAL 30 DAY)
FROM orders o
WHERE shipped = 1
AND user = :user
AND date >= (DATE(NOW() - INTERVAL 30 DAY))
It takes the minimum order date of a user over the last 30 days, and add 30 days to the result.
But that really is a poor design to achieve what you want.
You would better to think further and implement what's next.
Advanced solution
In order to reproduce all the following solution, I have used the Fiddle that Trincot kindly built, and expanded it to test on more data : 4 users having 4 orders.
SQL FIddle http://sqlfiddle.com/#!9/668939/1
Step 1 : Design
The following query will return all the users meeting the loyalty program criteria, along with their earlier order date within 30 days and the loyalty program expiration date calculated from the earlier date, and the number of days before it expires.
SELECT O.user, u.username, SUM(total-shipcost) as tot, MIN(date) AS mindate,
DATE_ADD(MIN(date),INTERVAL 30 DAY) AS expirationdate,
DATEDIFF(DATE_ADD(MIN(date),INTERVAL 30 DAY), DATE(NOW())) AS daysleft
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY user
HAVING tot >= 100;
Now, create a VIEW with the above query
CREATE VIEW loyalty_program AS
SELECT O.user, u.username, SUM(total-shipcost) as tot, MIN(date) AS mindate,
DATE_ADD(MIN(date),INTERVAL 30 DAY) AS expirationdate,
DATEDIFF(DATE_ADD(MIN(date),INTERVAL 30 DAY), DATE(NOW())) AS daysleft
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY user
HAVING tot >= 100;
It is important to understand that this is only a one-shot action on your database.
Step 2 : Use your new VIEW
Once you have the view, you can get easily, for all users, the "state" of the loyalty program:
SELECT * FROM loyalty_program
user username tot mindate expirationdate daysleft
1 John 153 February, 28 2016 March, 29 2016 9
2 Joe 112 February, 24 2016 March, 25 2016 5
3 Jack 474 February, 23 2016 March, 24 2016 4
4 Averel 115 February, 22 2016 March, 23 2016 3
For a specific user, you can get the date you are looking for like this:
SELECT expirationdate FROM loyalty_program WHERE username='Joe'
You can also request all the users for which the expiration date is today
SELECT user FROM loyalty_program WHERE expirationdate=DATE(NOW))
But there are other easy possibilities that you'll discover after having played with your VIEW.
Conclusion
Make your life easier: learn to use VIEWS !
I am assuming your table looks like this:
user | id | total | date
-------------------------------
12 84 38 2016-03-05
12 76 21 2016-02-29
23 74 49 2016-02-20
23 61 42 2015-12-28
then try this:
SELECT x.user, x.date, x.id, x.cum_sum, d,date, DATEDIFF(NOW(), x.date) from (SELECT a.user, a.id, a.date, a.total,
(SELECT SUM(b.total) FROM order_table b WHERE b.date <= a.date and a.user=b.user ORDER BY b.user, b.id DESC) AS cum_sum FROM order_table a where a.date>=DATE(NOW() - INTERVAL 30 DAY) ORDER BY a.user, a.id DESC) as x
left join
(SELECT c.user, c.date as start_date, c.id from (SELECT a.user, a.id, a.date, a.total,
(SELECT SUM(b.total) FROM order_table b WHERE b.date <= a.date and a.user=b.user ORDER BY b.user, b.id DESC) AS cum_sum FROM order_table a where a.date>=DATE(NOW() - INTERVAL 30 DAY) ORDER BY a.user, a.id DESC) as c WHERE FLOOR(c.cum_sum/100)=MIN(FLOOR(c.cum_sum/100)) and MOD(c.cum_sum,100)=MAX(MOD(c.cum_sum,100)) group by concat(c.user, "_", c.id)) as d on concat(x.user, "_", x.id)=concat(d.user, "_", d.id) where x.date=d.date;
You will get a table something like this:
user | Date | cum_sum | start_date | Time_left
----------------------------------------------------
12 2016-03-05 423 2016-03-05 24
13 2016-02-29 525 2016-02-29 12
23 2016-02-20 944 2016-02-20 3
29 2015-12-28 154 2015-12-28 4
i have not tested this. But what i am trying to do is to create a table in descending order of id and user, and get a cumulative total column along with it. I have created another table by using this table with cumulative total, with relevant date (i.e. date from which date difference is to be calculated) for each user. I have left joined these two tables, and put in the condition x.date=d.date. I have put start_date and date in the table to check if the query is working.
Also, this is not the most optimum way of writing this code, but i have tried to stay as safe as possible by using sub queries, since i did not have the data to test this. Let me know if you face any error.

Finding MAX and MIN values for each same start and end week

There is a query I am trying to implement in which I am not having much success with in trying to find the MAX and MIN for each week.
I have 2 Tables:
SYMBOL_DATA (contains open,high,low,close, and volume)
WEEKLY_LOOKUP (contains a list of weeks(no weekends) with a WEEK_START and WEEK_END)
**SYMBOL_DATA Example:**
OPEN, HIGH, LOW, CLOSE, VOLUME
23.22 26.99 21.45 22.49 34324995
WEEKLY_LOOKUP (contains a list of weeks(no weekends) with a WEEK_START and WEEK_END)
**WEEKLY_LOOKUP Example:**
WEEK_START WEEK_END
2016-01-25 2016-01-29
2016-01-18 2016-01-22
2016-01-11 2016-01-15
2016-01-04 2016-01-08
I am trying to find for each WEEK_START and WEEK_END the high and low for that particular week.
For instance, if the WEEK is WEEK_START=2016-01-11 and WEEK_END=2016-01-15, I would have
5 entries for that particular symbol listed:
DATE HIGH LOW
2016-01-15 96.38 93.54
2016-01-14 98.87 92.45
2016-01-13 100.50 95.21
2016-01-12 99.96 97.55
2016-01-11 98.60 95.39
2016-01-08 100.50 97.03
2016-01-07 101.43 97.30
2016-01-06 103.77 100.90
2016-01-05 103.71 101.67
2016-01-04 102.24 99.76
For each week_ending (2016-01-15) the HIGH is 100.50 on 2016-01-13 and the LOW is 92.45 on 2016-01-14
I attempted to write a query that gives me a list of highs and lows, but when I tried adding a MAX(HIGH), I had only 1 row returned back.
I tried a few more things in which I couldn't get the query to work (some sort of infinite run type). For now, I just have this that gives me a list of highs and lows for every day instead of the roll-up for each week which I am not sure how to do.
select date, t1.high, t1.low
from SYMBOL_DATA t1, WEEKLY_LOOKUP t2
where symbol='ABCDE' and (t1.date>=t2.START_DATE and t1.date<=t2.END_DATE)
and t1.date<=CURDATE()
LIMIT 30;
How can I get for each week (Start and End) the High_Date, MAX(High), and Low_Date, MIN(LOW) found each week? I probably don't need a
full history for a symbol, so a LIMIT of like 30 or (30 week periods) would be sufficient so I can see trending.
If I wanted to know for example each week MAX(High) and MIN(LOW) start week ending 2016-01-15 the result would show
**Result:**
WEEK_ENDING 2016-01-15 100.50 2016-01-13 92.45 2016-01-14
WEEK_ENDING 2016-01-08 103.77 2016-01-06 97.03 2016-01-08
etc
etc
Thanks to all of you with the expertise and knowledge. I greatly appreciate your help very much.
Edit
Once the Week Ending list is returned containing the MAX(HIGH) and MIN(LOW) for each week, is it possible then on how to find the MAX(HIGH) and MIN(LOW) from that result set so it return then only 1 entry from the 30 week periods?
Thank you!
To Piotr
select part1.end_date,part1.min_l,part1.max_h, s1.date, part1.min_l,s2.date from
(
select t2.start_date, t2.end_date, max(t1.high) max_h, min(t1.low) min_l
from SYMBOL_DATA t1, WEEKLY_LOOKUP t2
where symbol='FB'
and t1.date<='2016-01-22'
and (t1.date>=t2.START_DATE and t1.date<=t2.END_DATE)
group by t2.start_date, t2.end_date order by t1.date DESC LIMIT 1;
) part1, symbol_data s1, symbol_data s2
where part1.max_h = s1.high and part1.min_l = s2.low;
You will notice that the MAX and MIN for each week is staying roughly the same and not changing as it should be different for week to week for both the High and Low.
SQL Fiddle
I have abbreviated some of your names in my example.
Getting the high and low for each week is pretty simple; you just have to use GROUP BY:
SELECT s1.symbol, w.week_end, MAX(s1.high) AS weekly_high, MIN(s1.LOW) as weekly_low
FROM weeks AS w
INNER JOIN symdata AS s1 ON s1.zdate BETWEEN w.week_start AND w.week_end
GROUP BY s1.symbol, w.week_end
Results:
| symbol | week_end | weekly_high | weekly_low |
|--------|---------------------------|-------------|------------|
| ABCD | January, 08 2016 00:00:00 | 103.77 | 97.03 |
| ABCD | January, 15 2016 00:00:00 | 100.5 | 92.45 |
Unfortunately, getting the dates of the high and low requires that you re-join to the symbol_data table, based on the symbol, week and values. And even that doesn't do the job; you have to account for the possibility that there might be two days where the same high (or low) was achieved, and decide which one to choose. I arbitrarily chose the first occurrence in the week of the high and low. So to get that second level of choice, you need another GROUP BY. The whole thing winds up looking like this:
SELECT wl.symbol, wl.week_end, wl.weekly_high, MIN(hd.zdate) as high_date, wl.weekly_low, MIN(ld.zdate) as low_date
FROM (
SELECT s1.symbol, w.week_start, w.week_end, MAX(s1.high) AS weekly_high, MIN(s1.low) as weekly_low
FROM weeks AS w
INNER JOIN symdata AS s1 ON s1.zdate BETWEEN w.week_start AND w.week_end
GROUP BY s1.symbol, w.week_end) AS wl
INNER JOIN symdata AS hd
ON hd.zdate BETWEEN wl.week_start AND wl.week_end
AND hd.symbol = wl.symbol
AND hd.high = wl.weekly_high
INNER JOIN symdata AS ld
ON ld.zdate BETWEEN wl.week_start AND wl.week_end
AND ld.symbol = wl.symbol
AND ld.low = wl.weekly_low
GROUP BY wl.symbol, wl.week_start, wl.week_end, wl.weekly_high, wl.weekly_low
Results:
| symbol | week_end | weekly_high | high_date | weekly_low | low_date |
|--------|---------------------------|-------------|---------------------------|------------|---------------------------|
| ABCD | January, 08 2016 00:00:00 | 103.77 | January, 06 2016 00:00:00 | 97.03 | January, 08 2016 00:00:00 |
| ABCD | January, 15 2016 00:00:00 | 100.5 | January, 13 2016 00:00:00 | 92.45 | January, 14 2016 00:00:00 |
To get the global highs and lows, just remove the weekly table from the original query:
SELECT wl.symbol, wl.high, MIN(hd.zdate) as high_date, wl.low, MIN(ld.zdate) as low_date
FROM (
SELECT s1.symbol, MAX(s1.high) AS high, MIN(s1.low) as low
FROM symdata AS s1
GROUP BY s1.symbol) AS wl
INNER JOIN symdata AS hd
ON hd.symbol = wl.symbol
AND hd.high = wl.high
INNER JOIN symdata AS ld
ON ld.symbol = wl.symbol
AND ld.low = wl.low
GROUP BY wl.symbol, wl.high, wl.low
Results:
| symbol | high | high_date | low | low_date |
|--------|--------|---------------------------|-------|---------------------------|
| ABCD | 103.77 | January, 06 2016 00:00:00 | 92.45 | January, 14 2016 00:00:00 |
The week table seems entirely redundant...
SELECT symbol
, WEEK(zdate)
, MIN(low) min
, MAX(high) max_high
FROM symdata
GROUP
BY symbol, WEEK(zdate);
This is a simplified example. In reality, you might use DATE_FORMAT or something like that instead.
http://sqlfiddle.com/#!9/c247f/3
Check if following query produces desired result:
select part1.end_date,part1.min_l,part1.max_h, s1.date, part1.min_l,s2.date from
(
select t2.start_date, t2.end_date, max(t1.high) max_h, min(t1.low) min_l
from SYMBOL_DATA t1, WEEKLY_LOOKUP t2
where symbol='ABCDE'
and (t1.date>=t2.START_DATE and t1.date<=t2.END_DATE)
group by t2.start_date, t2.end_date
) part1, symbol_data s1, symbol_data s2
where part1.max_h = s1.high and part1.min_l = s2.low
and (s1.date >= part1.start_date and part1.end_date)
and (s2.date >= part1.start_date and part1.end_date)

Group and sum data based on a day of the month

I have a reoccurring payment day of 14th of each month and want to group a subset of data by month/year and sum the sent column. For example for the given data:-
Table `Counter`
Id Date Sent
1 10/04/2013 2
2 11/04/2013 4
3 15/04/2013 7
4 10/05/2013 3
5 14/05/2013 5
6 15/05/2013 3
7 16/05/2013 4
The output I want is something like:
From Count
14/03/2013 6
14/04/2013 10
14/05/2013 12
I am not worried how the from column is formatted or if its easier to split into month/year as I can recreated a date from multiple columns in the GUI. So the output could easily just be:
FromMth FromYr Count
03 2013 6
04 2013 10
05 2013 12
or even
toMth toYr Count
04 2013 6
05 2013 10
06 2013 12
If the payment date is for example the 31st then the date comparison would need to be the last date of each month. I am also not worried about missing months in the result-set.
I will also turn this into a Stored procedure so that I can push in the the payment date and other filtered criteria. It is also worth mentioning that we can go across years.
Try this query
select
if(day(STR_TO_DATE(date, "%Y-%d-%m")) >= 14,
concat('14/', month(STR_TO_DATE(date, "%Y-%d-%m")), '/', year(STR_TO_DATE(date, "%Y-%d-%m"))) ,
concat('14/', if ((month(STR_TO_DATE(date, "%Y-%d-%m")) - 1) = 0,
concat('12/', year(STR_TO_DATE(date, "%Y-%d-%m")) - 1),
concat(month(STR_TO_DATE(date, "%Y-%d-%m"))-1,'/',year(STR_TO_DATE(date, "%Y-%d-%m")))
)
)
) as fromDate,
sum(sent)
from tbl
group by fromDate
FIDDLE
| FROMDATE | SUM(SENT) |
--------------------------
| 14/10/2013 | 3 |
| 14/12/2012 | 1 |
| 14/3/2013 | 6 |
| 14/4/2013 | 10 |
| 14/5/2013 | 12 |
| 14/9/2013 | 1 |
Pay date could be grouped by months and year separatedly
select Sum(Sent) as "Count",
Extract(Month from Date - 13) as FromMth,
Extract(Year from Date - 13) as FromYr
from Counter
group by Extract(Year from Date - 13),
Extract(Month from Date - 13)
Be careful, since field's name "Date" coninsides with the keyword "date" in ANSISQL
I think the simplest way to do what you want is to just subtract 14 days rom the date and group by that month:
select date_format(date - 14, '%Y-%m'), sum(sent)
from counter
group by date_format(date - 14, '%Y-%m')

My SQL Finding a span of dates accross rows

I am looking for some help with even knowing where to start. Essentially we have a table for clients that hold employment start dates and end dates. For annual reports we have to calculate "continuous employment" which is defined as earliest start date to last end date as long as there is not more than 21 days between one end date and the next start date.
here is an example
employee | Start Date | End Date
1 | 2012-10-1 | 2012-11-05
1 | 2012-11-08 | 2013-1-25
2 | 2012-10-1 | 2012-11-05
2 | 2012-11-30 | 2013-1-02
in the above, i would like to see employee 1 as continuously employed from 2012-10-1 to 2013-1-25
but employee 2 would have 2 separate employment lines showing continuous employment from 2012-10-1 to 2012-11-05 and a different from 012-11-30 to 2013-1-02
Thanks for the help!
The theory is similar to #mellamokb's answer, but somewhat more concise:
SELECT employee, MIN(start) start, end
FROM (
SELECT #end:=IF(employee<=>#emp AND #stt<=end+INTERVAL 21 DAY,#end,end) end,
#stt:=start start,
#emp:=employee AS employee
FROM my_table, (SELECT #emp:=NULL, #stt:=0, #end:=0) init
ORDER BY employee, start DESC
) t
GROUP BY employee, end
See it on sqlfiddle.
One way to find "continuous groups" among a set of records is to use variables to track the difference between each line and develop groupings that combine continuous ranges together. In the example below, I use three variables to track enough information for generating the groups:
#curEmployee - tracks the current employee from the previous record, and is compared with the employee on the current record to know when we've switched to a different employee, which automatically becomes another grouping
#curEndDate - tracks the last end date from the previous record, so it can be compared with the start date of the current record to see if the current record belongs in the same "group" as the previous record - that is to say, it is part of continuous employment with the previous record
#curGroup - this is the key variable which segregates the rows into separate "groups" that represent continuous employment. The logic is that a row should be considered as continuous with the previous row if and only if the following two conditions are true: the two rows have the same employee number, and the end date of the previous row is less than 21 days from the current row.
NOTE: You may want to validate the edge conditions, i.e., whether exactly 20/21/22 days apart will be considered continuous employment or not, and tweak the logic below.
Here is the sample query that calculates those groups. A couple things to take note of: the order of variable assignment matters, because they are assigned from top to bottom in the select list. We need to assign #curGroup first, so that it still has the values of #curEmployee and #curEndDate from the previous record to draw on. Secondly, the order by clause is very important to ensure that when we are comparing the previous and current record, they are the two records that are the closest to each other. If we looked at the records in a random order, they would likely end up all as separate groups.
select
e.employee, e.`start date`, e.`end date`
,#curGroup :=
case when employee = #curEmployee
and #curEndDate + INTERVAL 21 DAY >= e.`start date`
then #curGroup
else #curGroup + 1
end as curGroup
,#curEmployee := employee as curEmployee
,#curEndDate := e.`end date` as curEndDate
from
employment e
JOIN (SELECT #curEmployee := 0, #curEndDate := NULL, #curGroup := 0) r
order by e.employee, e.`start date`
Sample Result (DEMO) - notice how CURGROUP stays at 1 for the first two lines, because they are within 21 days of each other and represent continuous employment, while the last two lines get identified as separate group numbers:
| EMPLOYEE | START DATE | END DATE | CURGROUP | CUREMPLOYEE | CURENDDATE |
-------------------------------------------------------------------------------------------------------------------------------
| 1 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 | 1 | 1 | 2012-11-05 00:00:00 |
| 1 | November, 08 2012 00:00:00+0000 | January, 25 2013 00:00:00+0000 | 1 | 1 | 2013-01-25 00:00:00 |
| 2 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 | 2 | 2 | 2012-11-05 00:00:00 |
| 2 | November, 30 2012 00:00:00+0000 | January, 02 2013 00:00:00+0000 | 3 | 2 | 2013-01-02 00:00:00 |
Now that we've established groups of records that are part of continuous employment, we merely need to group by those group numbers and find the minimum and maximum date range for the output:
select
employee,
min(`start date`) as `start date`,
max(`end date`) as `end date`
from (
select
e.employee, e.`start date`, e.`end date`
,#curGroup :=
case when employee = #curEmployee
and #curEndDate + INTERVAL 21 DAY >= e.`start date`
then #curGroup
else #curGroup + 1
end as curGroup
,#curEmployee := employee as curEmployee
,#curEndDate := e.`end date` as curEndDate
from
employment e
JOIN (SELECT #curEmployee := 0, #curEndDate := NULL, #curGroup := 0) r
order by e.employee, e.`start date`
) as T
group by curGroup
Sample Result (DEMO):
| EMPLOYEE | START DATE | END DATE |
--------------------------------------------------------------------------------
| 1 | October, 01 2012 00:00:00+0000 | January, 25 2013 00:00:00+0000 |
| 2 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 |
| 2 | November, 30 2012 00:00:00+0000 | January, 02 2013 00:00:00+0000 |