Recursive CTE to traverse a hierarchy with dual descendant IDs - mysql

Given the following table of sports matches with two players:
match_id
match_date
p1_id
p2_id
1
01/01/2022
1
2
2
02/01/2022
3
1
3
03/01/2022
3
4
4
04/01/2022
2
3
5
05/01/2022
5
6
6
06/01/2022
1
2
7
07/01/2022
3
1
8
08/01/2022
3
4
9
09/01/2022
2
3
10
10/01/2022
5
6
11
11/01/2022
3
4
12
12/01/2022
7
8
13
13/01/2022
3
1
14
14/01/2022
5
7
15
15/01/2022
4
5
I’m trying to write a query with a recursive CTE that when given a match_id the query will return all match_id values for future matches for each of the two players. The recursion is needed because I need the query to also include all future matches for any of the future matches' players.
Using the example above and match_id = 6 then the two player IDs are 1 and 2. I need the query to return all future matches for these player IDs. This means the query needs to return 7, 9 and 13. However, in match_id = 7 player ID 1 plays player ID 3 so now all of their future match_id values from that point also need to be included. This means the query also needs to return 8 and 11. In match_id = 8 and match_id = 11 player ID 3 plays player ID 4 so the final match_id to be returned is 15.
The expected output is as follows:
match_id
7
8
9
11
13
15
I've written the following query:
WITH RECURSIVE match_ids AS (
SELECT
m1.match_id,
m1.match_date,
m1.p1_id,
m1.p2_id
FROM recursive_test AS m1
WHERE m1.match_id = 6
UNION ALL
SELECT
m2.match_id,
m2.match_date,
m2.p1_id,
m2.p2_id
FROM recursive_test AS m2
INNER JOIN match_ids
ON (
match_ids.p1_id = m2.p1_id
OR match_ids.p1_id = m2.p2_id
OR match_ids.p2_id = m2.p1_id
OR match_ids.p2_id = m2.p2_id
)
AND match_ids.match_date > m2.match_date
)
SELECT match_id
FROM match_ids
However, this returns:
match_id
6
2
4
1
1
2
3
1
2
1
Where might I be going wrong?
Here's the SQL to create the table:
CREATE TABLE `recursive_test` (
`match_id` int NOT NULL,
`match_date` date NOT NULL,
`p1_id` int NOT NULL,
`p2_id` int NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `recursive_test` VALUES (1,'2022-01-01',1,2),(2,'2022-01-02',3,1),(3,'2022-01-03',3,4),(4,'2022-01-04',2,3),(5,'2022-01-05',5,6),(6,'2022-01-06',1,2),(7,'2022-01-07',3,1),(8,'2022-01-08',3,4),(9,'2022-01-09',2,3),(10,'2022-01-10',5,6),(11,'2022-01-11',3,4),(12,'2022-01-12',7,8),(13,'2022-01-13',3,1),(14,'2022-01-14',5,7),(15,'2022-01-15',4,5);

WITH RECURSIVE
cte AS (
SELECT *
FROM recursive_test
WHERE match_id = #starting_match_id
UNION ALL
SELECT recursive_test.*
FROM recursive_test
JOIN cte ON recursive_test.match_date > cte.match_date
WHERE recursive_test.p1_id IN (cte.p1_id, cte.p2_id)
OR recursive_test.p2_id IN (cte.p1_id, cte.p2_id)
)
SELECT DISTINCT *
FROM cte;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=6ca1e57845bae995bacb04455beb6340

Related

SQL - Max value from a group by when creating a new field

I have a database with a table called BOOKINGS containing the following values
main-id place-id start-date end-date
1 1 2018-8-1 2018-8-8
2 2 2018-6-6 2018-6-9
3 3 2018-5-5 2018-5-8
4 4 2018-4-4 2018-4-5
5 5 2018-3-3 2018-3-10
5 1 2018-1-1 2018-1-6
4 2 2018-2-1 2018-2-10
3 3 2018-3-1 2018-3-28
2 4 2018-4-1 2018-4-6
1 5 2018-5-1 2018-5-15
1 3 2018-6-1 2018-8-8
1 4 2018-7-1 2018-7-6
1 1 2018-8-1 2018-8-18
1 2 2018-9-1 2018-9-3
1 5 2018-10-1 2018-10-6
2 5 2018-11-1 2018-11-5
2 3 2018-12-1 2018-12-25
2 2 2018-2-2 2018-2-19
2 4 2018-4-4 2018-4-9
2 1 2018-5-5 2018-5-23
What I need to do is for each main-id I need to find the largest total number of days for every place-id. Basically, I need to determine where each main-id has spend the most time.
This information must then be put into a view, so unfortunately I can't use temporary tables.
The query that gets me the closest is
CREATE VIEW `MOSTTIME` (`main-id`,`place-id`,`total`) AS
SELECT `BOOKINGS`.`main-id`, `BOOKINGS`.`place-id`, SUM(DATEDIFF(`end-date`, `begin-date`)) AS `total`
FROM `BOOKINGS`
GROUP BY `BOOKINGS`.`main-id`,`RESERVATION`.`place-id`
Which yields:
main-id place-id total
1 1 24
1 2 18
1 5 5
2 1 2
2 2 20
2 4 9
3 1 68
3 2 24
3 3 30
4 1 5
4 2 10
4 4 1
5 1 19
5 2 4
5 5 7
What I need is then the max total for each distinct main-id:
main-id place-id total
1 1 24
2 2 20
3 1 68
4 2 10
5 1 19
I've dug through a large amount of similar posts that recommend things like self joins; however, due to the fact that I have to create the new field total using an aggregate function (SUM) and another function (DATEDIFF) rather than just querying an existing field, my attempts at implementing those solutions have been unsuccessful.
I am hoping that my query that got me close will only require a small modification to get the correct solution.
Having hyphen character - in column name (which is also minus operator) is a really bad idea. Do consider replacing it with underscore character _.
One possible way is to use Derived Tables. One Derived Table is used to determine the total on a group of main id and place id. Another Derived Table is used to get maximum value out of them based on main id. We can then join back to get only the row corresponding to the maximum value.
CREATE VIEW `MOSTTIME` (`main-id`,`place-id`,`total`) AS
SELECT b1.main_id, b1.place_id, b1.total
FROM
(
SELECT `main-id` AS main_id,
`place-id` AS place_id,
SUM(DATEDIFF(`end-date`, `begin-date`)) AS total
FROM BOOKINGS
GROUP BY main_id, place_id
) AS b1
JOIN
(
SELECT dt.main_id, MAX(dt.total) AS max_total
FROM
(
SELECT `main-id` AS main_id,
`place-id` AS place_id,
SUM(DATEDIFF(`end-date`, `begin-date`)) AS total
FROM BOOKINGS
GROUP BY main_id, place_id
) AS dt
GROUP BY dt.main_id
) AS b2
ON b1.main_id = b2.main_id AND
b1.total = b2.max_total
MySQL 8+ solution would be utilizing the Row_Number() functionality:
CREATE VIEW `MOSTTIME` (`main-id`,`place-id`,`total`) AS
SELECT b.main_id, b.place_id, b.total
FROM
(
SELECT dt.main_id,
dt.place_id,
dt.total
ROW_NUMBER() OVER (PARTITION BY dt.main_id
ORDER BY dt.total DESC) AS row_num
FROM
(
SELECT `main-id` AS main_id,
`place-id` AS place_id,
SUM(DATEDIFF(`end-date`, `begin-date`)) AS total
FROM BOOKINGS
GROUP BY main_id, place_id
) AS dt
GROUP BY dt.main_id
) AS b
WHERE b.row_num = 1

group by month returns only April for two tables

Currently I am honestly at loss what I am doing wrong. It is a rather simple query I think.
Tables:
operations:
id processedon clientid
1 2018-01-01 9
2 2018-03-16 9
3 2018-04-21 9
4 2018-04-20 9
5 2018-05-09 9
items:
id operation_id quantity unitprice
1 1 10 2
2 1 5 3
3 2 20 4
4 3 10 2
5 4 8 4
6 4 10 4
7 5 2 2
The expected result of the operation/query is:
month total_value
1 35
3 80
4 92
5 4
That is quantity * unitprice based. For some reason, it only returns month=4
SELECT
month(`operations`.`processedon`) AS `month`,
SUM((`items`.`quantity` * `items`.`unitprice`)) AS `total_value`
FROM `items`
INNER JOIN `operations` ON (`items`.`operation_id` = `operations`.`id`)
GROUP BY 'month'
ORDER BY 'month'
According to the info provided the join should be
INNER JOIN operations ON items.operation_id = operations.id
Eg
SELECT
month(`operations`.`processedon`) AS `month`,
SUM((`items`.`quantity` * `items`.`unitprice`)) AS `total_value`
FROM `items`
INNER JOIN `operations` ON `items`.`operation_id` = `operations`.`id`
GROUP BY month(`operations`.`processedon`)
ORDER BY `month`
There is no efficiency gain by using a column alias in the group by clause, I prefer to avoid using them except perhaps in the order by clause.
The following query will give you the required answer
SELECT
month(`operations`.`processedon`) AS `month`,
SUM((`items`.`quantity` * `items`.`unitprice`)) AS `total_value`
FROM items
INNER JOIN operations ON (items.operation_id = operations.id)
GROUP BY month(operations.processedon)
ORDER BY month(operations.processedon)
You need to specify month correctly since it is not an existing column.
You'll get the following result
month total_value
1 35
3 80
4 92
5 4

MySQL multiple count based on two column with multiple GROUP BY in single table

I have a query like below, it is working fine but not optimized, since it takes 1.5 sec to run. How to make this to an optimized result?
select h.keyword_id,
( select count(DISTINCT(user_id)) from history where category_id = 6
and h.keyword_id=keyword_id group by keyword_id ) as cat_6,
( select count(DISTINCT(user_id)) from history where category_id = 7
and h.keyword_id = keyword_id group by keyword_id ) as cat_7
from
history h group by h.keyword_id
History table
his_id keyword_id category_id user_id
1 1 6 12
2 1 6 12
3 1 7 12
4 1 7 12
5 2 6 13
6 2 6 13
7 2 7 13
8 3 6 13
Result:
keyword_id cat_6 cat_7
1 2 2 (unique users)
2 2 1
3 1 0
You can rewrite your query like this:
select h.keyword_id,
count(distinct if(category_id = 6, user_id, null)) as cat_6,
count(distinct if(category_id = 7, user_id, null)) as cat_7
from
history h
group by h.keyword_id
Your desired result based on the sample data is by the way false. In each keyword_id there's always just one distinct user_id.
you can see the query in action in an sqlfiddle here
For more optimization, you'd have to post the result of show create table history and the output of explain <your_query>;

Mysql JOIN with extra priority column

I have two days trying to do this query with no luck.
I have two tables 'DEMAND' and 'DEMAND_STATE' (one to many relation). The table DEMAND_STATE have millions entries.
CREATE TABLE DEMAND
(
ID INT NOT NULL,
DESTINY_ID INT NOT NULL
)
CREATE TABLE DEMAND_STATE
(
ID INT NOT NULL,
PRIORITY INT NOT NULL,
QUANTITY DOUBLE NOT NULL,
CASE_ID INT NOT NULL,
DEMAND_ID INT NOT NULL,
PHASE_ID INT NOT NULL
)
The QUANTITY of the DEMAND_STATE is given according to a CASE_ID and PHASE_ID. We have 'N' PHASES in 'M' CASES. Always the same number of Phases in all Cases. We always have a initial Base Quantity called 'BASE CASE' in the Case with CASE_ID = 1.
For example to obtain quantity for Case (id=2) and Case Base (id=1)
select D.*, S.PRIORITY, S.QUANTITY, S.CASE_ID, S.DEMAND_ID, S.PHASE_ID
FROM DEMAND D
join DEMAND_STATE S on (D.ID = S.DEMAND_ID)
WHERE (S.CASE_ID = 2 OR S.CASE_ID = 1)
(paste only for id=8)
ID PRIORITY QUANTITY CASE_ID DEMAND_ID PHASE_ID
8 0 85 1 8 1
8 0 83 1 8 2
8 0 88 1 8 3
8 0 89 1 8 4
8 10 85 2 8 1
8 10 84 2 8 2
8 10 86 2 8 3
8 10 89 2 8 4
We need to obtain for all Demand in 'DEMAND' only the Quantity for Each Phase with MAX priority. The idea is no duplicate DEMAND_STATE data for each new Case creation. Only create new state rows when Demand-Case-Phase is different to Case Base. This is a new project and we accept changes in model for better performance.
I also tried with the MAX calculation. This query over DEMAND_STATE works fine but only obtain data for a concrete DEMAND_ID. Further i think this solution can be so expensive.
SELECT P.ID, P.QUANTITY, P.CASE_ID, P.DEMAND_ID, P.PHASE_ID
FROM DEMAND_STATE P
JOIN (
SELECT PHASE_ID, MAX(PRIORITY) max_priority, S.DEMAND_ID
from DEMAND_STATE S
WHERE S.DEMAND_ID = 1
AND (S.CASE_ID=1 OR S.CASE_ID=2)
GROUP BY S.PHASE_ID
) SUB
ON (SUB.PHASE_ID = P.PHASE_ID AND SUB.max_priority = P.PRIORITY)
WHERE P.DEMAND_ID = 1
GROUP BY P.PHASE_ID
The result:
ID QUANTITY CASE_ID DEMAND_ID PHASE_ID
1 86 1 1 1
2 85 1 1 2
3 81 1 1 3
8 500 2 1 4
This is the result expected:
ID ID PRIORITY QUANTITY CASE_ID PHASE_ID
8 1 0 86 1 1 (data from Case Base id=1 priority 0)
8 2 10 85 1 2 (data from Case Baseid=1 priority 0)
8 3 10 81 1 3 (data from Case Base id=1 priority 0)
8 64 10 500 2 4 (data from Case id=2 priority 10)
thank for help :)
Edit:
Result of Simon proposal:
ID QUANTITY CASE_ID DEMAND_ID PHASE_ID
1 86 1 1 1
2 85 1 1 2
3 81 1 1 3
4 84 1 1 4 (this row shouldnt exist)
8 500 2 1 4 (this is the correct row)
Also would have to join it with DEMAND
#didierc response:
ID ID MAX(S.PRIORITY) QUANTITY CASE_ID PHASE_ID
1 8 10 500 2 4
2 13 10 81 2 1
2 14 10 83 2 2
2 15 10 84 2 3
3 21 10 81 2 1
4 31 10 86 2 3
4 32 10 80 2 4
4 29 10 85 2 1
4 30 10 81 2 2
we need for each DEMAND four rows with the quantity Value. In Case Base we have four quantity and in Case 2 we only change the quantity for phase 4. We need always four rows for each demand.
Database DEMAND_STATE data:
ID PRIORITY QUANTITY CASE_ID DEMAND_ID PHASE_ID
1 0 86 1 1 1
2 0 85 1 1 2
3 0 81 1 1 3
4 0 84 1 1 4
8 10 500 2 1 4
We need to obtain for all Demand in 'DEMAND' only the Quantity for Each Phase with MAX priority
I translate the above, according to your sample result set, as:
SELECT
D.ID, S.ID, MAX(S.PRIORITY), S.QUANTITY, S.CASE_ID, S.PHASE_ID
FROM DEMAND D
LEFT JOIN DEMAND_STATE S
ON D.ID = S.DEMAND_ID
GROUP BY S.PHASE_ID, S.DEMAND_ID
Update:
To get the maximum priority for each pair(demand_id,phase_id)n we use the following query:
SELECT
DEMAND_ID, PHASE_ID, MAX(PRIORITY) AS PRIORITY
FROM DEMAND_STATE
GROUP BY DEMAND_ID, PHASE_ID
Next, to retrieve the set of phases for a given demand, just make an inner join on demand state:
SELECT S.* FROM DEMAND_STATE S
INNER JOIN (
SELECT
DEMAND_ID, PHASE_ID, MAX(PRIORITY) AS PRIORITY
FROM DEMAND_STATE
GROUP BY DEMAND_ID, PHASE_ID
) S2
USING (DEMAND_ID,PHASE_ID, PRIORITY)
WHERE DEMAND_ID = 1
If you want to limit the possible cases, include a where clause in the query S2:
SELECT S.* FROM DEMAND_STATE S
INNER JOIN (
SELECT
DEMAND_ID, PHASE_ID, MAX(PRIORITY) AS PRIORITY
FROM DEMAND_STATE
WHERE CASE_ID IN (1,2)
GROUP BY DEMAND_ID, PHASE_ID
) S2
USING (DEMAND_ID,PHASE_ID, PRIORITY)
WHERE DEMAND_ID = 1
However, your comments and update indicates that MAX(PRIORITY) does not seem very relevant after all. My understanding is that you have a base case, which may be overriden by another case in a given scenario (that scenario is the pair base case + some other case). Clarify that point in your question body if this is incorrect. If that is the case, you may change the above query by replacing PRIORITY by CASE_ID:
SELECT S.* FROM DEMAND_STATE S
INNER JOIN (
SELECT
DEMAND_ID, PHASE_ID, MAX(CASE_ID) AS CASE_ID
FROM DEMAND_STATE
WHERE CASE_ID IN (1,2)
GROUP BY DEMAND_ID, PHASE_ID
) S2
USING (DEMAND_ID,PHASE_ID, CASE_ID)
WHERE DEMAND_ID = 1
The only reason I see from having a priority is if you wish to combine more than 2 cases, and use priority to select which case will prevail depending on the phase.
You may of course prepend an inner join on DEMAND to include the related demand data.
Use of subqueries should be able to do as you wish, if I understand your question correctly. Something along the lines of the following:
SELECT
P.ID,
P.QUANTITY,
P.CASE_ID,
P.DEMAND_ID,
P.PHASE_ID
FROM DEMAND_STATE P
INNER JOIN (
-- Next level up groups it down and so gets the rows first returned for each PHASE_ID, which is the highest priority due to the subquery
SELECT
D.PHASE_ID,
D.PRIORITY,
D.DEMAND_ID
FROM (
-- Top level query to get all rows and order them in desc priority order
SELECT
S.PHASE_ID,
S.PRIORITY,
S.DEMAND_ID
FROM DEMAND_STATE S
WHERE S.DEMAND_ID IN (1) -- Update this to be whichever DEMAND_IDs you are interested in
AND S.CASE_ID IN (1,2)
ORDER BY
S.PHASE_ID ASC,
S.DEMAND_ID ASC,
S.PRIORITY DESC
) D
GROUP BY
D.PHASE_ID,
S.DEMAND_ID
) SUB
ON SUB.PHASE_ID = P.PHASE_ID
AND SUB.DEMAND_ID = P.DEMAND_ID
The top level subquery exists to get the rows you are interested in and order them in an order which allows predictable results when they are then grouped down by PHASE_ID and DEMAND_ID. This in turn allows a simple INNER JOIN to DEMAND_STATE hopefully (unless I have misunderstood your query)
This may still be expensive though depending on how much data is within that top level query.

How to add duplicate rows using sql sum function

Few days ago, I came to a problem where I have to sum the value of some duplicate row in MySql & I've tried some queries but they didn't work.
Here is table data :-
card_id tic_id game_id card_symbol card_symbol_no qty
1 6 1 C 6 2
2 6 1 H 7 6
3 6 1 C 6 7
And My desired output is :-
card_id tic_id game_id card_symbol card_symbol_no qty
1 6 1 C 6 (9)
2 6 1 H 7 (6)
some other given factor :-
1.) the "tic_id", & "game_id" is same.
select
min(card_id) as card_id,
tic_id,
game_id,
card_symbol,
card_symbol_no,
sum(qty) as qty
from
yourTabel
group by
tic_id,
game_id,
card_symbol,
card_symbol_no