SQL self-join to get values of 3 weeks ago - mysql

I am not very familiar with sql however I have to do a query for a sale prediction.
The data is for the sale with distinct prodID, shopID, weekDay, date and Sale. I need to get the sale of the same product in the same shop and same weekDay in the past (e.g) 3 weeks. Therefore some proper pivoting is necessary. There might be that for some days there is no sale record for that product-shop-weekday combination. Also importantly I have to ignore negative sale values (if any) for average calculation.
This is header of the data:
DATE prodID shopID sale weekDay
2017-03-01 8 16 4.8 Wednesday
2017-03-01 2 16 18.8 Wednesday
2017-03-01 62 16 1.7 Wednesday
2017-03-01 34 16 3.6 Wednesday
2017-03-01 32 16 12.0 Wednesday
2017-03-02 8 16 3.6 Thursday
2017-03-02 34 16 15.8 Thursday
Ideal outcome is:
DATE prodID shopID sale weekDay saleWeek-1 saleWeek-2 saleWeek-3 ave_3sale
Perhaps a self-join can be used to build the new columns. Thank you very much much for your help.

I made you an sqlfiddle example to show you how you can make your sql statement
http://www.sqlfiddle.com/#!9/1877b3/4/0
You of course put in your tablename and columnnames.
the example would show you the sales of every day plus what you sold of that product for that salesID last week and that 2 weeks before.
CREATE TABLE salestable (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
datelit Date NOT NULL,
productID int(10) NOT NULL,
saleID int(10) NOT NULL,
sale float(4,2) NOT NULL,
weekday CHAR(30) NOT NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
INSERT INTO `salestable` (`id`,`datelit`, `productID`, `saleID`,`sale`,`weekday`) VALUES
(NULL,'2019-05-18', 8, 16, 4.8, 'Wednesday'),
(NULL,'2019-05-18', 2, 16, 18.8, 'Wednesday'),
(NULL,'2019-05-18', 62, 16, 1.7, 'Wednesday'),
(NULL,'2019-05-18', 34, 16, 3.6, 'Wednesday'),
(NULL,'2019-05-17', 32, 16, 12.0, 'Wednesday'),
(NULL,'2019-05-18', 8, 16, 3.6, 'Wednesday'),
(NULL,'2019-05-18', 34, 16, 15.8, 'Wednesday');
SELECT a.datelit,a.productID, a.saleID, a.sale,a.weekday, b.salesumweek1, c.salesumweek2
FROM `salestable` a
Left JOIN (
SELECT saleID,productID, SUM(sale) as salesumweek1
FROM `salestable`
Where datelit BETWEEN DATE_ADD(CURDATE(), INTERVAL -7 DAY) AND CURDATE()
GROUP BY saleID,productID
) b ON a.saleID = b.saleID AND a.productID = b.productID
Left JOIN (
SELECT saleID,productID, SUM(sale) as salesumweek2
FROM `salestable`
Where datelit BETWEEN DATE_ADD(CURDATE(), INTERVAL -7 DAY) AND DATE_ADD(CURDATE(), INTERVAL -14 DAY)
GROUP BY saleID,productID
) c ON a.saleID = c.saleID AND a.productID = c.productID;

Related

count by hours in between with start and end time data

In table, data is in Timestamp format, but I shared it in Time(start_at), Time(end_at) format.
Table structure:
id, start_at, end_at
1, 03:00:00, 06:00:00
2, 02:00:00, 05:00:00
3, 01:00:00, 08:00:00
4, 08:00:00, 13:00:00
5, 09:00:00, 21:00:00
6, 13:00:00, 16:00:00
6, 15:00:00, 19:00:00
For result we need to count ids which were active in between the start_at, end_at time.
hours, count
0, 0
1, 1
2, 2
3, 3
4, 3
5, 2
6, 1
7, 1
8, 1
9, 2
10, 2
11, 2
12, 2
13, 3
14, 2
15, 3
16, 2
17, 2
18, 2
19, 1
20, 1
21, 0
22, 0
23, 0
Either
WITH RECURSIVE
cte AS (
SELECT 0 `hour`
UNION ALL
SELECT `hour` + 1 FROM cte WHERE `hour` < 23
)
SELECT cte.`hour`, COUNT(test.id) `count`
FROM cte
LEFT JOIN test ON cte.`hour` >= HOUR(test.start_at)
AND cte.`hour` < HOUR(test.end_at)
GROUP BY 1
ORDER BY 1;
or
WITH RECURSIVE
cte AS (
SELECT CAST('00:00:00' AS TIME) `hour`
UNION ALL
SELECT `hour` + INTERVAL 1 HOUR FROM cte WHERE `hour` < '23:00:00'
)
SELECT cte.`hour`, COUNT(test.id) `count`
FROM cte
LEFT JOIN test ON cte.`hour` >= test.start_at
AND cte.`hour` < test.end_at
GROUP BY 1
ORDER BY 1;
The 1st query returns hours column in time format whereas the 2nd one returns numeric value for this column. Select the variant which is safe for you.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=5a77b6e3158be06c7a551cb7e64673de

Cumulative sum grouped by year, month and day in a JSON object

Let's say I have a table orders with the following rows:
ID Cost Date (timestamp)
1 100 2020-06-30 21:18:53.328386+00
2 45 2020-06-30 11:18:53.328386+00
3 200 2020-05-29 21:32:56.620174+00
4 20 2020-06-28 21:32:56.620174+00
And I need a query that returns exactly this:
Month Year Costs
5 2020 {"1": 0, "2": 0, ..., "29": 200, "30": 200, "31": 200}
6 2020 {"1": 0, "2": 0, ..., "28": 20, "29": 20, "30": 165}
Please note that the column Costs has to be a json with the key being the day in the month and the value being the cumulative sum of all previous days in that month.
I know this is probably not a task that postgres should be doing, but I'm just curious to see what is the solution to it (even if its not the most efficient in production environments)
You can use two levels of aggregation and json_object_agg():
select date_month, date_year, json_object_agg(date_day, cnt) costs
from (
select
extract(month from date) date_month,
extract(year from date) date_year,
extract(day from date) date_day,
sum(sum(cost)) over(
partition by extract(month from date), extract(year from date)
order by extract(day from date)
) cnt
from mytable
group by 1, 2, 3
) t
group by date_month, date_year

Combine two queries and add a new column

DB-Fiddle
CREATE TABLE sales (
id int auto_increment primary key,
category VARCHAR(255),
event_date DATE,
sent_date DATE,
sales_Product_gross VARCHAR(255),
return_Product_gross VARCHAR(255)
);
INSERT INTO sales
(category, event_date, sent_date,
sales_Product_gross, return_Product_gross
)
VALUES
("CAT_01", "2017-05-30", "2017-05-30", "500", NULL),
("CAT_01", "2017-06-05", "2017-05-30", NULL, "250"),
("CAT_01", "2018-07-08", "2018-07-08", "700", NULL),
("CAT_01", "2018-07-18", "2018-07-08", NULL, "370"),
("CAT_01", "2019-02-15", "2019-02-15", "400", NULL),
("CAT_01", "2019-03-21", "2019-02-15", NULL, "120"),
("CAT_02", "2019-04-24", "2019-04-24", "300", NULL),
("CAT_02", "2019-04-30", "2019-04-24", NULL, "145"),
("CAT_02", "2019-12-14", "2019-12-14", "900", NULL),
("CAT_02", "2019-12-28", "2019-12-14", NULL, "340"),
("CAT_03", "2020-03-09", "2020-03-09", "800", NULL),
("CAT_03", "2020-03-17", "2020-03-09", NULL, "425");
The table displays the sales and returns in different categories.
Now, I want to calculate:
a) the return_rate per month per campaign and store it in a new column called calc_type with the name monthly.
b) the return_rate on a rolling 2 YEAR basis and also store it in the new column calc_type with the name rolling.
The result should look like this:
category calc_type year month return_rate
CAT_01 rolling NULL NULL 0.445
CAT_01 monthly 2017 5 0.500
CAT_01 monthly 2018 7 0.528
CAT_01 monthly 2019 2 0.300
CAT_02 rolling NULL NULL 0.404
CAT_02 monthly 2019 4 0.480
CAT_02 monthly 2019 12 0.377
CAT_03 rolling NULL NULL 0.531
CAT_03 monthly 2020 3 0.531
I have created a query for criteria a) and for criteria b). Separately, those queries work exactly the way I need it.
Now, I tried to combine them using UNION ALL the same way it is done here:
SELECT
category,
'rolling' AS calc_type,
'NULL' AS year,
'NULL' As month,
sum(return_Product_gross) / sum(sales_Product_gross) as return_rate
FROM sales
WHERE sent_date BETWEEN DATE_SUB(CURDATE(), INTERVAL 2 YEAR) AND CURDATE()
GROUP BY 1,2,3,4
ORDER BY 1,2,3,4;
UNION ALL
SELECT
category,
'monthly' AS calc_type,
YEAR(sent_date) AS year,
MONTH(sent_date) AS month,
sum(return_Product_gross) / sum(sales_Product_gross) as return_rate
FROM sales
WHERE sent_date BETWEEN "2017-01-01" AND CURDATE()
GROUP BY 1,2,3,4
ORDER BY 1,2,3,4;
However, now only the values for rolling are displayed in the result.
What do I need to change in my queries to get the expected result?
This query looks worked:
SELECT
category,
'rolling' AS calc_type,
'NULL' AS year,
'NULL' As month,
sum(return_Product_gross) / sum(sales_Product_gross) as return_rate
FROM sales
WHERE sent_date BETWEEN DATE_SUB(CURDATE(), INTERVAL 2 YEAR) AND CURDATE()
GROUP BY category, year, month
UNION ALL
SELECT
category,
'monthly' AS calc_type,
YEAR(sent_date) AS year,
MONTH(sent_date) AS month,
sum(return_Product_gross) / sum(sales_Product_gross) as return_rate
FROM sales
WHERE sent_date BETWEEN "2017-01-01" AND CURDATE()
GROUP BY category, year, month
ORDER BY category, calc_type DESC, year, month;
DBFiddle

SQL Query:- Difference prices of same month with same year with different tables , if not just show zero

There are two different tables, just need to subtract price between same month with same year, if no data just show zero for that particular month and year .Now, it just subtracting with row by row irrespective of month and year.
Table 1 Table2
Price tran_date Price post_date
60 2018-01-01 30 2018-01-15
40 2018-02-08 30 2018-02-02
50 2018-12-28 30 2018-11-01
40 2019-03-01 10 2019-01-08
80 2019-04-11 60 2019-04-29
40 2019-10-01
Expected Answer:
Sum(price). Year
30 January 2018
10 February 2018
30 November 2018
50 December 2018
-10 January 2019
40 March 2019
20 April 2019.
40 October 2019
Actual Answer:
Sum(Price) Year
30 January 2018
10 February 2018
10 December 2018
30 March 2019
20 April 2019
-40 October 2019
SQL Query for table1
Select sum(price) from table1 where date(tran_date)
between ‘2018-01-01’ and ‘2019-12-31’
group by month(tran_date),year(tran_date)
SQL Query for table2
Select sum(price) from table2 where date(post_date)
between ‘2018-01-01’ and ‘2019-12-31’
group by month(post_date),year(post_date)
It’s should not subtract from 1st row of table1 with 1st row of table2,it should subtract with same month with same year. If there is no data just show zero for that particular month and year.
Please do help.Thanks in Advance.
seems you want the absolute difference, try add abs()
sample
select date_year, date_month,
abs(sum(price))
from ((select date_year, date_month, price from
(values (60, '2018', '01'),
(40, '2018', '02'),
(50, '2018', '12'),
(40, '2019', '03'),
(80, '2019', '04') ) table1 (price, date_year, date_month)
) union all
(select date_year, date_month, - price from (
values (30, '2018', '01'),
(30, '2018', '02'),
(30, '2018', '11'),
(10, '2019', '01'),
(60, '2019', '04'),
(40, '2019', '10')
) table2 (price, date_year, date_month)
)
) t
group by date_year, date_month
order by date_year, date_month
see the fiddle
https://www.db-fiddle.com/f/qVQYB2KXSTbJNEkSH1oGuG/0
Is this what you want?
select year(dte), month(dte),
greatest( sum(price), 0)
from ((select tran_date as dte, price from table1
) union all
(select post_date, - price from table2
)
) t
group by year(dte), month(dte);
It seems very strange to not subtract the values. I suspect you might just want:
select year(dte), month(dte),
sum(price)
from ((select tran_date as dte, price from table1
) union all
(select post_date, - price from table2
)
) t
group by year(dte), month(dte)

How can I modify this query to add a new field containing the maximum value of a field of a subset of the total original records?

I am not so into database and I have the following problem implementing a query. I am using MySql
I have a MeteoForecast table like this:
CREATE TABLE MeteoForecast (
id BigInt(20) NOT NULL AUTO_INCREMENT,
localization_id BigInt(20) NOT NULL,
seasonal_forecast_id BigInt(20),
meteo_warning_id BigInt(20),
start_date DateTime NOT NULL,
end_date DateTime NOT NULL,
min_temp Float,
max_temp Float,
icon_link VarChar(255) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
PRIMARY KEY (
id
)
) ENGINE=InnoDB AUTO_INCREMENT=3 ROW_FORMAT=DYNAMIC DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;
It contains meteo forecast information, something like this:
id localization_id start_date end_date min_temp max_temp icon_link
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 1 18/09/2017 06:00:00 18/09/2017 12:00:00 15 24 Mostly_Cloudy_Icon.png
2 1 18/09/2017 12:00:00 18/09/2017 18:00:00 15 24 Light_Rain.png
3 1 19/09/2017 06:00:00 19/09/2017 12:00:00 12 22 Mostly_Cloudy_Icon.png
4 1 19/09/2017 12:00:00 19/09/2017 18:00:00 13 16 Mostly_Cloudy_Icon.png
5 1 20/09/2017 06:00:00 20/09/2017 12:00:00 18 26 Light_Rain.png
6 1 20/09/2017 12:00:00 20/09/2017 18:00:00 17 25 Light_Rain.png
So, as you can see in the previous dataset, each record have a starting datetime and and ending datetime. This because I am collecting more forecast information in a specific day (it is based on time range, in the example for each day a record from 06:00 am to 12:00 and another record from 12:00 to 18:00 pm).
So, I created this simple query that extracts all the records in a specific range (in this case 2 days):
select * from MeteoForecast
where start_date between '2017-09-18 06:00:00' and '2017-09-20 06:00:00'
order by start_date desc;
I have to modify this query in the following way:
For each record retrieved by the previous query have to be added a new field named global_max_temp that is the maximum value of the max_temp field in the same day.
Doing an example related to the records related to theday having start_date value equal to 19/09/2017..., these are the records that I need to obtain:
id localization_id start_date end_date min_temp max_temp icon_link global_max_temp
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 1 19/09/2017 06:00:00 19/09/2017 12:00:00 12 22 Mostly_Cloudy_Icon.png 22
4 1 19/09/2017 12:00:00 19/09/2017 18:00:00 13 16 Mostly_Cloudy_Icon.png 22
As you can see here the last field (inserted manually in this mock) is global_max_temp and in both records related to this day contains the value 22 because it is the maximum value of the max_temp field of all the records related to a specific day.
This is the query calculating these global_max_temp value:
select max(max_temp) from MeteoForecast
where start_date = '2017-09-19 06:00:00'
How can I add this feature to my original query?
Can you try something like this:
SELECT A.*, B.GLOBAL_MAX_TEMP
FROM (
select id, start_date, end_date, min_temp, max_temp
from MeteoForecast
where start_date between '2017-09-18 06:00:00' and '2017-09-20 06:00:00'
) A
INNER JOIN (SELECT date(start_date) AS date_only, MAX(max_temp) AS GLOBAL_MAX_TEMP
FROM MeteoForecast
WHERE start_date BETWEEN '2017-09-18 06:00:00' and '2017-09-20 06:00:00'
GROUP BY date(start_date)
) B ON date(A.start_date) = B.date_only
ORDER by start_date desc;