SQL: Join two tables by date range and group by year - mysql

I want to join two tables:
Table1:
Task
Hours
Client
Time
Task A
1
Client A
2023-01-01
Task A
2
Client A
2022-03-04
Task A
3
Client A
2023-01-01
Task A
4
Client A
2022-03-04
Task B
5
Client A
2023-01-01
Task B
6
Client A
2022-03-04
Task B
7
Client A
2023-01-01
Task B
8
Client A
2022-03-04
Table 2:
Task
Time Budget
Client
Start Range
End Range
Task A
50
Client A
2023-01-01
2023-12-31
Task A
60
Client A
2022-01-01
2022-12-31
Task B
80
Client A
2023-01-01
2023-12-31
Task B
70
Client A
2022-01-01
2022-12-31
I want to get such a table:
Task
Time Budget
Client
Start Range
End Range
Time spent
Task A
50
Client A
2023-01-01
2023-12-31
4
Task A
60
Client A
2022-01-01
2022-12-31
6
Task B
80
Client A
2023-01-01
2023-12-31
12
Task B
70
Client A
2022-01-01
2022-12-31
14
What I tried:
SELECT
t2.task as task,
t2.budget as budget,
t1.client as client,
t2.from_date as start_range,
t2.to_date as end_range,
sum(t1.hours) AS time_spent,
FROM `Table1` t1
LEFT JOIN
`Table2` t2
ON t1.task = t2.task
AND t1.client = t2.client
AND date(t1.time) BETWEEN t2.start_range and t2.end_range
Group by
task, client, start_range, end_range
However, this does not work. The best I can get is where it is joined, but for example the whole year 2022 is ignored.
Any help is so much appreciated!
With this query (and the suggested one) it leads to:
Task
Time Budget
Client
Start Range
End Range
Time spent
Task A
50
Client A
2023-01-01
2023-12-31
4
Task A
60
NULL
2022-01-01
2022-12-31
NULL
Task B
80
Client A
2023-01-01
2023-12-31
12
Task B
70
NULL
2022-01-01
2022-12-31
NULL

The t2.budget is not included in the group by and you need to sum by the hour column.
If you want to include all of the table2 results, start with this table, then left join table1. Here is a post about table joins.
Using https://www.db-fiddle.com/
Schema (MySQL v5.7)
CREATE TABLE Table1 (
`Task` VARCHAR(6),
`Hours` INTEGER,
`Client` VARCHAR(8),
`Time` DATE
);
INSERT INTO Table1
(`Task`, `Hours`, `Client`, `Time`)
VALUES
('Task A', '1', 'Client A', '2023-01-01'),
('Task A', '2', 'Client A', '2022-03-04'),
('Task A', '3', 'Client A', '2023-01-01'),
('Task A', '4', 'Client A', '2022-03-04'),
('Task B', '5', 'Client A', '2023-01-01'),
('Task B', '6', 'Client A', '2022-03-04'),
('Task B', '7', 'Client A', '2023-01-01'),
('Task B', '8', 'Client A', '2022-03-04');
CREATE TABLE Table2 (
`Task` VARCHAR(6),
`Time Budget` INTEGER,
`Client` VARCHAR(8),
`Start Range` DATE,
`End Range` DATE
);
INSERT INTO Table2
(`Task`, `Time Budget`, `Client`, `Start Range`, `End Range`)
VALUES
('Task A', '50', 'Client A', '2023-01-01', '2023-12-31'),
('Task A', '60', 'Client A', '2022-01-01', '2022-12-31'),
('Task B', '80', 'Client A', '2023-01-01', '2023-12-31'),
('Task B', '70', 'Client A', '2022-01-01', '2022-12-31');
Query #1
SELECT
t2.`task`,
t2.`Time Budget`,
t2.client,
t2.`Start Range`,
t2.`End Range`,
SUM(t1.hours)
FROM Table2 AS t2
LEFT JOIN Table1 AS t1
ON t2.Task=t1.task
AND t2.client=t1.client
AND t1.time between t2.`Start Range` AND t2.`End Range`
GROUP BY 1,2,3,4,5;
task
Time Budget
client
Start Range
End Range
SUM(t1.hours)
Task A
50
Client A
2023-01-01
2023-12-31
4
Task A
60
Client A
2022-01-01
2022-12-31
6
Task B
70
Client A
2022-01-01
2022-12-31
14
Task B
80
Client A
2023-01-01
2023-12-31
12
View on DB Fiddle

Try to use UNION and WHERE to fetch data from 2 SELECT:
Select * from
(Select a as a1
UNION
Select b as b1)
WHERE b1=a1

Related

Calculate the period of validity of the price

I have a table with an item, its cost and the date it was added.
CREATE TABLE item_prices (
item_id INT,
item_name VARCHAR(30),
item_price DECIMAL(12, 2),
created_dttm DATETIME
);
INSERT INTO item_prices(item_id, item_name, item_price, created_dttm) VALUES
(1, 'spoon', 10.20 , '2023-01-01 01:00:00'),
(1, 'spoon', 10.20 , '2023-01-08 01:35:00'),
(1, 'spoon', 10.35 , '2023-01-14 15:00:00'),
(2, 'table', 40.00 , '2023-01-01 01:00:00'),
(2, 'table', 40.00 , '2023-01-03 11:22:00'),
(2, 'table', 41.00 , '2023-01-10 08:28:22'),
(1, 'spoon', 10.35 , '2023-01-28 21:52:00'),
(1, 'spoon', 11.00 , '2023-02-15 16:36:00'),
(2, 'table', 41.00 , '2023-02-16 21:42:11'),
(2, 'table', 45.20 , '2023-02-19 20:25:25'),
(1, 'spoon', 9.00 , '2023-03-02 14:50:00'),
(1, 'spoon', 9.00 , '2023-03-06 16:36:00'),
(1, 'spoon', 8.50 , '2023-03-15 12:00:00'),
(2, 'table', 30 , '2023-03-05 10:10:10'),
(2, 'table', 30 , '2023-03-10 15:45:00');
I need to create a new table with the following fields:
"item_id",
"item_name",
"item_price",
"valid_from_dt": date on which the price was effective (created_dttm price record)
"valid_to_dt": date until which this price was valid (created_dttm of the next record for this product "minus" one day)
I thought it might be possible to start by selecting days on which new entries are added with new prices with such a request:
SELECT item_id, item_name, item_price,
MIN(created_dttm) as dt
FROM table
GROUP BY item_price, item_id, item_name
that provides me this output:
The expected output is the following:
item_id
item_name
item_price
valid_from_dt
valid_to_dt
1
spoon
10.20
2023-01-01
2023-01-13
1
spoon
10.35
2023-01-14
2023-02-14
1
spoon
11.00
2023-02-15
2023-03-01
1
spoon
9.00
2023-03-02
2023-03-01
1
spoon
8.50
2023-03-15
2023-03-14
2
table
40.00
2023-01-01
2022-01-09
2
table
41.00
2023-01-10
2023-02-18
....
....
....
....
....
select distinct
item_id,
item_name,
first_value(item_price) over (partition by item_id order by created_dttm) as item_price,
min(created_dttm) over (partition by item_id ) as valid_from_dt,
max(created_dttm) over (partition by item_id ) as valid_to_dt
from item_prices
;
output:
item_id
item_name
item_price
valid_from_dt
valid_to_dt
1
spoon
10.20
2023-01-01 01:00:00
2023-03-15 12:00:00
2
table
40.00
2023-01-01 01:00:00
2023-03-10 15:45:00
see: DBFIDDLE
Your query is correct. It's only missing the next step:
retrieving the next "valid_from_dt" in the partition <item_id, item_name>, using the LEAD function
subtract 1 day from it
WITH cte AS (
SELECT item_id, item_name, item_price,
MIN(created_dttm) AS valid_from_dt
FROM item_prices
GROUP BY item_id, item_name, item_price
)
SELECT *,
LEAD(valid_from_dt) OVER(PARTITION BY item_id, item_name) - INTERVAL 1 DAY AS valid_to_dt
FROM cte
Check the demo here.

Is it possible to fetch needed data in one query?

I have a database containing tickets. Each ticket has a unique number but this number is not unique in the table. So for example ticket #1000 can be multiple times in the table with different other columns (Which I have removed here for the example).
create table countries
(
isoalpha varchar(2),
pole varchar(50)
);
insert into countries values ('DE', 'EMEA'),('FR', 'EMEA'),('IT', 'EMEA'),('US','USCAN'),('CA', 'USCAN');
create table tickets
(
id int primary key auto_increment,
number int,
isoalpha varchar(2),
created datetime
);
insert into tickets (number, isoalpha, created) values
(1000, 'DE', '2021-01-01 00:00:00'),
(1001, 'US', '2021-01-01 00:00:00'),
(1002, 'FR', '2021-01-01 00:00:00'),
(1003, 'CA', '2021-01-01 00:00:00'),
(1000, 'DE', '2021-01-01 00:00:00'),
(1000, 'DE', '2021-01-01 00:00:00'),
(1004, 'DE', '2021-01-02 00:00:00'),
(1001, 'US', '2021-01-01 00:00:00'),
(1002, 'FR', '2021-01-01 00:00:00'),
(1005, 'IT', '2021-01-02 00:00:00'),
(1006, 'US', '2021-01-02 00:00:00'),
(1007, 'DE', '2021-01-02 00:00:00');
Here is an example:
http://sqlfiddle.com/#!9/3f4ba4/6
What I need as output is the number of new created tickets for each day, devided into tickets from USCAN and rest of world.
So for this Example the out coming data should be
Date | USCAN | Other
'2021-01-01' | 2 | 2
'2021-01-02' | 1 | 3
At the moment I use this two queries to fetch all new tickets and then add the number of rows with same date in my application code:
SELECT MIN(ti.created) AS date
FROM tickets ti
LEFT JOIN countries ct ON (ct.isoalpha = ti.isoalpha)
WHERE ct.pole = 'USCAN'
GROUP BY ti.number
ORDER BY date
SELECT MIN(ti.created) AS date
FROM tickets ti
LEFT JOIN countries ct ON (ct.isoalpha = ti.isoalpha)
WHERE ct.pole <> 'USCAN'
GROUP BY ti.number
ORDER BY date
but that doesn't look like a very clean method. So how can I improved the query to get the needed data with less overhead?
Ii is recommended that is works with mySQL 5.7
You may logically combine the queries using conditional aggregation:
SELECT
MIN(CASE WHEN ct.pole = 'USCAN' THEN ti.created END) AS date_uscan,
MIN(CASE WHEN ct.pole <> 'USCAN' THEN ti.created END) AS date_other
FROM tickets ti
LEFT JOIN countries ct ON ct.isoalpha = ti.isoalpha
GROUP BY ti.number
ORDER BY date;
You can create unique entries for each date/country then use that value to count USCAN and non-USCAN
SELECT created,
SUM(1) as total,
SUM(CASE WHEN pole = 'USCAN' THEN 1 ELSE 0 END) as uscan,
SUM(CASE WHEN pole != 'USCAN' THEN 1 ELSE 0 END) as nonuscan
FROM (
SELECT created, t.isoalpha, MIN(pole) AS pole
FROM tickets t JOIN countries c ON t.isoalpha = c.isoalpha
GROUP BY created,isoalpha
) AS uniqueTickets
GROUP BY created
Results:
created total uscan nonuscan
2021-01-01T00:00:00Z 4 2 2
2021-01-02T00:00:00Z 3 1 2
http://sqlfiddle.com/#!9/3f4ba4/45/0
Regarding the answer of SQL Hacks I found the right solution
SELECT created,
SUM(1) as total,
SUM(CASE WHEN pole = 'USCAN' THEN 1 ELSE 0 END) as uscan,
SUM(CASE WHEN pole != 'USCAN' THEN 1 ELSE 0 END) as nonuscan
FROM (
SELECT created, t.isoalpha, MIN(pole) AS pole
FROM tickets t JOIN countries c ON t.isoalpha = c.isoalpha
GROUP BY t.number
) AS uniqueTickets
GROUP BY SUBSTR(created, 1 10)

How to get difference or delta of counts entries of each days with window functions?

I have a table with few fields like id, country, ip, created_at. Then I am trying to get the deltas between total entry of one day and total entry of the next day.
CREATE TABLE session (
id int NOT NULL AUTO_INCREMENT,
country varchar(50) NOT NULL,
ip varchar(255),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id)
);
INSERT INTO `session` (`id`, `country`, `ip`, `created_at`) VALUES
('1', 'IN', '10.100.102.11', '2021-04-05 20:26:02'),
('2', 'IN', '10.100.102.11', '2021-04-05 19:26:02'),
('3', 'US', '10.120.102.11', '2021-04-17 10:26:02'),
('4', 'US', '10.100.112.11', '2021-04-16 12:26:02'),
('5', 'AU', '10.100.102.122', '2021-04-12 19:36:02'),
('6', 'AU', '10.100.102.122', '2021-04-12 18:20:02'),
('7', 'AU', '10.100.102.122', '2021-04-12 23:26:02'),
('8', 'US', '10.100.102.2', '2021-04-16 21:33:01'),
('9', 'AU', '10.100.102.122', '2021-04-18 20:46:02'),
('10', 'AU', '10.100.102.111', '2021-04-04 13:19:12'),
('11', 'US', '10.100.112.11', '2021-04-16 12:26:02'),
('12', 'IN', '10.100.102.11', '2021-04-05 15:26:02'),
('13', 'IN', '10.100.102.11', '2021-04-05 19:26:02');
Now I have written this query to get the delta
SELECT T1.date1 as date, IFNULL(T1.cnt1-T2.cnt2, T1.cnt1) as delta from (
select TA.dateA as date1, MAX(TA.countA) as cnt1 from (
select DATE(created_at) AS dateA, COUNT(*) AS countA
FROM session
GROUP BY DATE(created_at)
UNION
select DISTINCT DATE(DATE(created_at)+1) AS dateA, 0 AS countA
FROM session
) as TA
group by TA.dateA
) as T1
LEFT OUTER JOIN (
select DATE(DATE(created_at)+1) AS date2,
COUNT(*) AS cnt2
FROM session
GROUP BY DATE(created_at)
) as T2
ON T1.date1=T2.date2
ORDER BY date;
http://sqlfiddle.com/#!9/4f5fd26/60
Then I am getting the results as
date delta
2021-04-04 1
2021-04-05 3
2021-04-06 -4
2021-04-12 3
2021-04-13 -3
2021-04-16 3
2021-04-17 -2
2021-04-18 0
2021-04-19 -1
Now, is there any place of improvements/optimizes on it with/or window functions? (I am zero with SQL, still playing around).
Try a shorter version
with grp as (
SELECT t.dateA, SUM(t.cnt) AS countA
FROM session,
LATERAL (
select DATE(created_at) AS dateA, 1 as cnt
union all
select DATE(DATE(created_at)+1), 0 as cnt
) t
GROUP BY dateA
)
select t1.dateA as date, IFNULL(t1.countA-t2.countA, t1.countA) as delta
from grp t1
left join grp t2 on DATE(t2.dateA + 1) = t1.dateA
order by t1.dateA
db<>fiddle

Select change column value if in list

I am trying to query my table to count the number of votes and if the voting method is in list ['C', 'M', 'S', 'L', 'T', 'V', 'B', 'E'] then count it as one and replace the voting_method to 'L'.
Right now I have the following query which returns the right results but doesn't take care of the duplicates.
select `election_lbl`, `voting_method`, count(*) as numVotes
from `gen2014` group by `election_lbl`, `voting_method` order by `election_lbl` asc
election_lbl voting_method numVotes
2014-09-04 M 1
2014-09-05 M 2
2014-09-05 S 1
2014-09-08 C 16
2014-09-08 M 5
2014-09-08 S 9
2014-09-09 10 5
2014-09-09 C 46
2014-09-09 M 4
2014-09-09 S 5
2014-09-10 C 92
2014-0g-10 M 3
2014-09-10 S 7
2014-09-11 C 96
2014-09-11 M 3
2014-09-11 S 2
2014-09-12 C 104
2014-09-12 M 10
2014-09-12 S 3
2014-09-15 C 243
2014-09-15 M 18
2014-09-15 S 3
2014-09-16 10 1
2014-09-16 C 161
2014-09-16 M 4
2014-09-16 S 3
2014-09-17 C 157
2014-09-17 M 5
2014-09-17 S 12
You can see that for 2014-09-05 I have two voting_method M and S both of which is in the list. I want the ideal result to remove the duplicate date field if the values are in the list. So it would be 2014-09-05 'L' 3. I don't want the vote for that date to disappear so the results should count them as one.
Changed the query to this but mysql says wrong syntax.
select `election_lbl`, `voting_method`, count(*) as numVotes from `gen2014`
(case `voting_method` when in ('C', 'M', 'S', 'L', 'T', 'V', 'B', 'E')
then 'L' END) group by `election_lbl`, `voting_method` order by `election_lbl` asc
Table Schema
CREATE TABLE `gen2014` (
`voting_method` varchar(255) DEFAULT NULL,
`election_lbl` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SELECT election_lbl
, CASE WHEN voting_method IN ('C','M','S','L','T','V','B','E')
THEN 'L'
ELSE voting_method END my_voting_method
, COUNT(*)
FROM my_table
GROUP
BY my_voting_method -- or vice
, election_lbl; -- versa
If you just want the total votes using those methods for each date, listed as method 'L', then do not include method in the group by, and have the SELECT select 'L' as voting_method
select `election_lbl`, 'L' AS `voting_method`, count(*) as numVotes
from `gen2014`
where voting_method IN ('C', 'M', 'S', 'L', 'T', 'V', 'B', 'E')
group by `election_lbl`
order by `election_lbl` asc
select x.`election_lbl`, x.`voting_method`, count(*) as numVotes
from (
select `election_lbl`,
CASE when `voting_method` in ('C', 'M', 'S', 'L', 'T', 'V', 'B', 'E')
then 'L'
else `voting_method`
END as `voting_method`
from `gen2014`) x
group by x.`election_lbl`, x.`voting_method`
order by x.`election_lbl` asc

Joining Summed data that has nulls - SQL Server

How do I retain the Acct_Name field where appropriate when summing the data below by the Amount column and grouping by the Line_Num field? The "Null" values in Line_Num column cause a problem in the grouping terms when the account name is added. Accounts C and D both have Null values in Line_Num. If I add Acct_Name to the group by clause, I lose the ability to sum the values only by the Line_Num field.
I am attempting to sum lines of accounting and group based on the line number. The null data isn't my doing, unfortunately it's just the data set I was handed.
Original data:
Acct_Name ID Line_Num Amount
Acct A 1 1_01 100.0000
Acct A 1 1_01 -50.0000
Acct A 1 1_02 75.0000
Acct A 1 _02 125.0000
Acct B 2 2_01 200.0000
Acct B 2 2_01 50.0000
Acct B 2 2_02 25.0000
Acct C 3 3_01 75.0000
Acct C 3 3_02 50.0000
Acct C 3 3_03 -25.0000
Acct C 3 Null 65.0000
Acct D 4 Null 300.0000
Acct D 4 _02 100.0000
Acct D 4 Null -50.0000
Acct D 4 Null 75.0000
If the Line_Num value is null, that line is allowed to be aggregated with the other null values. It will show up in reports as being unaccounted for and it can be dealt with appropriately.
Ideal processed data set:
Amount Line_Num Acct_Name
390.00 Null Null
225.00 _02 Null
50.00 1_01 Acct A
75.00 1_02 Acct A
250.00 2_01 Acct B
25.00 2_02 Acct B
75.00 3_01 Acct C
50.00 3_02 Acct C
-25.00 3_03 Acct C
Here are the following queries I have used:
Select SUM(Amount), Line_Num
FROM dbo.tblRawData
Group By Line_Num
This query works just fine, but it does not include the account name in any of the aggregated fields. I need the account name in the fields that did not contain null values.
Select SUM(Amount), Line_Num, Acct_Name
FROM dbo.tblRawData
Group By Line_Num, Acct_Name
This query includes the account name, but it ends up grouping based on Account Name and not just Line_Num.
Select *
From dbo.tblRawData a
Inner Join dbo.tblRawData b On (a.Line_Num = b.Line_Num)
(SELECT SUM(CAST(Amount as money)) as Amount, Line_Num
FROM dbo.tblRawData
GROUP BY Line_Num)
This inner join is intended to join only those lines that are equivalent on the Line Num, but I am receiving a cartesian result set. Clearly I have not written this join correctly or I am using the incorrect command.
Here is the query that can be used to build the same schema that I am using:
CREATE TABLE [dbo].[tblRawData](
[Acct_Name] [nvarchar](50) NULL,
[ID] [nvarchar](50) NULL,
[Line_Num] [nvarchar] (50),
[Amount] [money]
) ON [PRIMARY]
GO
insert into dbo.tblRawData values ('Acct A', '1', '1_01', '100')
insert into dbo.tblRawData values ('Acct A', '1', '1_01', '-50')
insert into dbo.tblRawData values ('Acct A', '1', '1_02', '75')
insert into dbo.tblRawData values ('Acct A', '1', '_02', '125')
insert into dbo.tblRawData values ('Acct B', '2', '2_01', '200')
insert into dbo.tblRawData values ('Acct B', '2', '2_01', '50')
insert into dbo.tblRawData values ('Acct B', '2', '2_02', '25')
insert into dbo.tblRawData values ('Acct C', '3', '3_01', '75')
insert into dbo.tblRawData values ('Acct C', '3', '3_02', '50')
insert into dbo.tblRawData values ('Acct C', '3', '3_03', '-25')
insert into dbo.tblRawData values ('Acct C', '3', '', '65')
insert into dbo.tblRawData values ('Acct D', '4', '', '300')
insert into dbo.tblRawData values ('Acct D', '4', '_02', '100')
insert into dbo.tblRawData values ('Acct D', '4', '', '-50')
insert into dbo.tblRawData values ('Acct D', '4', '', '75')
P.S. SQL Fiddle appears to be inaccessible at the moment (might be on my end, don't know)
Edit
Take a look at the following code and holler if it seems that there are blatant flaws in trying to accomplish my goal. I'd prefer for Acct_Name to remain null if Line_Item doesn't match up, but perhaps I can sort that out.
IF (SELECT object_id('TempDB..#temp4')) IS NOT NULL
BEGIN
DROP TABLE #temp4
END
SELECT SUM(CAST(Amount as money)) as Amount, Line_Num INTO #temp4
FROM dbo.tblRawData
GROUP BY Line_Num
Select * from #temp4
Select MAX(a.Acct_Name) as Acct_Name, MAX(b.Line_Num) as Line_Num, MAX(b.Amount) as Amount
From dbo.tblRawData a
Inner Join #temp4 b On (a.Line_Num = b.Line_Num)
Group By b.Line_Num
Results:
Acct_Name Line_Num Amount
Acct D Null 390.00
Acct D _02 225.00
Acct A 1_01 50.00
Acct A 1_02 75.00
Acct B 2_01 250.00
Acct B 2_02 25.00
Acct C 3_01 75.00
Acct C 3_02 50.00
Acct C 3_03 -25.00
Here you go:
;WITH CTE AS
(
SELECT Line_Num,
SUM(Amount) Amount,
MIN(Acct_Name) MinAcct_Name,
MAX(Acct_Name) MaxAcct_Name
FROM tblRawData
GROUP BY Line_Num
)
SELECT Amount,
Line_Num,
CASE WHEN Line_Num IS NULL
OR MinAcct_Name <> MaxAcct_Name THEN NULL
ELSE MinAcct_Name END Acct_Name
FROM CTE