I have 3 tables:
users
user_groups
groups
A user can be in multiple (sub)groups. They are stored in the user_groups table like this
+--------------+--------------+---------------+
| id | user_id | group_id |
+--------------+--------------+---------------+
| 1 | 1 | 23 |
+--------------+--------------+---------------+
| 2 | 2 | 24 |
-----------------------------------------------
Now in my groups table, the top categories are parent_id = 0
+--------------+--------------+---------------+
| id | parent_id | name |
+--------------+--------------+---------------+
| 1 | 2 | Group 1.1 |
+--------------+--------------+---------------+
| 2 | 0 | Group 1 |
+--------------+--------------+---------------+
| 3 | 2 | Group 1.2 |
+--------------+--------------+---------------+
| 4 | 3 | Group 1.2.1 |
+--------------+--------------+---------------+
| 5 | 2 | Group 1.3 |
+--------------+--------------+---------------+
Now I want to build a query which gives me all the parent groups for all users. I did some research about recursive queries and I found this particular post:
How to create a MySQL hierarchical recursive query
But I have no idea how I should approach this when I join the tables.
This is what I got so far:
SELECT
`users`.`id`,
`users`.`first_name`,
`users`.`last_name`,
`users`.`email`,
`users`.`language`,
`groups`.`name`,
`groups`.`parent_id`
FROM `users`
LEFT JOIN `user_groups`
ON `user_groups`.`user_id` = `users`.`id`
LEFT JOIN `groups`
ON `groups`.`id` = `user_groups`.`group_id`
WHERE
`users`.`created`
BETWEEN
DATE_SUB(NOW(), INTERVAL 365 DAY) AND NOW()
But this query just gets me the name and the id of the subgroup. What I want is the top level group.
Thanks for the help!
The typical solution is to create a stored function that returns the top-level group for any given group by tracing the parentage up until it finds a row with parent_id = 0.
Then you can apply that function to each group that the user is a member of, and select the distinct set of top level groups.
Something like this should work for you:
delimiter $$
drop function if exists get_top_level_group_id $$
create function get_top_level_group_id (p_group_id int) returns int
begin
declare v_return_val int;
declare v_group_id int;
declare v_parent_id int;
declare continue handler for not found
begin
return -1;
end;
set v_group_id = p_group_id;
set v_parent_id = p_group_id;
while v_parent_id != 0
do
set v_group_id = v_parent_id;
select `parent_id`
into v_parent_id
from `groups`
where id = v_group_id;
end while;
return v_group_id;
end $$
delimiter ;
Then you can update your query like this to get those users and their distinct top-level groups:
SELECT DISTINCT
`users`.`id`,
`users`.`first_name`,
`users`.`last_name`,
`users`.`email`,
`users`.`language`,
get_top_level_group_id(`user_groups`.`group_id`) as top_level_group_id
FROM `users`
LEFT JOIN `user_groups`
ON `user_groups`.`user_id` = `users`.`id`
WHERE
`users`.`created`
BETWEEN
DATE_SUB(NOW(), INTERVAL 365 DAY) AND NOW()
Related
The problem:
I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.
The current code below took me over a night, still unfinished! I want to learn to be more efficient and speed this progress up. I feel like there is very much to optimize because there are A LOT of company records.
Another issue was that i found no way how to check where my query was while looping (resulting in no way to check the progress). Because the progress took so long i killed the query and I'm searching for a better way to solve this issue.
The information:
There is a table with companies like:
----------------------------------------
| companies_1 |
----------------------------------------
| id | category_id | name |
----------------------------------------
| 1 | 1 | example-1 |
| 2 | 2 | example-1 |
| 3 | 1 | example-2 |
| 4 | 2 | example-2 |
| 5 | 3 | example-2 |
| 6 | 1 | example-3 |
----------------------------------------
A table with the DISTINCT company names:
-------------------------
| companies_2 |
-------------------------
| id | name |
-------------------------
| 1 | example-1 |
| 2 | example-2 |
| 3 | example-3 |
-------------------------
A categories table, like:
-------------------------
| categories |
-------------------------
| id | name |
-------------------------
And a junction table, like:
---------------------------------
| company_categories |
---------------------------------
| company_id | category_id |
---------------------------------
The current code:
This code works, but is far from efficient.
DELIMITER $$
DROP PROCEDURE IF EXISTS fill_junc_table$$
CREATE PROCEDURE fill_junc_table()
BEGIN
DECLARE r INT;
DECLARE i INT;
DECLARE i2 INT;
DECLARE loop_length INT;
DECLARE company_old_len INT;
DECLARE _href VARCHAR(255);
DECLARE cat_id INT;
DECLARE comp_id INT;
SET r = 0;
SET i = 0;
SET company_old_len = 0;
SELECT COUNT(*) INTO loop_length FROM companies;
WHILE i < loop_length DO
SELECT href INTO _href FROM company_old LIMIT i,1;
SELECT id INTO comp_id FROM companies WHERE site_href=_href;
SELECT COUNT(*) INTO company_old_len FROM company_old WHERE href=_href;
SET i2 = 0;
WHILE i2 < company_old_len DO
SELECT category_id INTO cat_id FROM company_old WHERE href=_href LIMIT i2,1;
INSERT INTO company_categories (company_id, category_id) VALUES (comp_id, cat_id);
SET r = r + 1;
SET i2 = i2 + 1;
END WHILE;
SET i = i + 1;
END WHILE;
SELECT r;
END$$
DELIMITER ;
CALL fill_junc_table();
Edit (new idea):
I am going to test another way to solve this problem by fully copying the companies_1 table with the following columns (company_id empty on copy):
---------------------------------------------
| company_id | category_id | name |
---------------------------------------------
Then, I will loop through the companies_2 table to fill the correct company_id related to the name-column.
I hope you can give your thoughts about this. When I finish my test I will leave the result over here for others.
To clarify, I don't see any PIVOT transformation in the company_categories. What I see is you want a JUNCTION TABLE because it seems that companies and categories tables have many-to-many relationship.
In your case, you have company which has multiple categories. And you also have categories assigned to multiple companies.
Now base from your requirement:
I want to move the links of the categories from the table companies_1
into the company_categories table. The company_id in the
company_categories table need to be equal to the id of the companies_2
table. The records of the companies_1 and the companies_2 table are
linked by the "name"-column.
I arrived with this query:
INSERT INTO company_categories (company_id, category_id)
SELECT C2.id
, C1.category_id
FROM companies_1 C1
INNER JOIN companies_2 C2 ON C2.name = C1.name
Let me know if this works. The nested loops that you created will really take a while.
As #DanielE pointed out, this query will work in the assumption that company_categories is empty. We will need to use UPDATE otherwise.
Why not just update companies_1?
ALTER TABLE companies_1 ADD (company_id INT)
UPDATE companies_1 SET company_id = (SELECT id FROM companies_2 WHERE name=companies_1.name)
ALTER TABLE companies_1 DROP name, RENAME TO company_categories
SELECT * FROM `company_categories`
Output
id category_id company_id
1 1 1
2 2 1
3 1 2
4 2 2
5 3 2
6 1 3
I have a MySQL table like this:
| CategoryId | Name | CategoryParentId |
|------------|---------------|------------------|
| 0 | Tech Support | (null) |
| 1 | Configuration | 0 |
| 2 | Questions | 1 |
| 3 | Sales | (null) |
| 4 | Questions | 3 |
| 5 | Other | (null) |
This is the output I desire when a query the ID 2 (for example):
Tech Support/Configuration/Questions
How do I do this without having to do multiple joins?
Fiddle
EDIT: Not sure if is the best way to do this, but I solved by creating a function:
DELIMITER $$
CREATE FUNCTION get_full_tree (CategoryId int) RETURNS VARCHAR(200)
BEGIN
SET #CategoryParentId = (SELECT CategoryParentId FROM category c WHERE c.CategoryId = CategoryId);
SET #Tree = (SELECT Name FROM category c WHERE c.CategoryId = CategoryId);
WHILE (#CategoryParentId IS NOT NULL) DO
SET #ParentName = (SELECT Name FROM category c WHERE c.CategoryId = #CategoryParentId);
SET #Tree = CONCAT(#ParentName, '/', #Tree);
SET #CategoryParentId = (SELECT CategoryParentId FROM category c WHERE c.CategoryId = #CategoryParentId);
END WHILE;
RETURN #Tree;
END $$
DELIMITER ;
I can now do this query:
SELECT CategoryId, get_full_tree(CategoryId) FROM category
You could create a new table, lets name it as hierarchy (could be a better name) where we would store all the ancestry of a category.
CREATE TABLE `hierarchy` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent` int(11) NOT NULL,
`child` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
For example in this case for Questions i.e. ID->2 we will have the below entries:
id parent child
====================
6 0 2
7 1 2
8 2 2
For the whole example the content of the table would be:
id parent child
===========================
1 0 0
2 3 3
3 5 5
4 0 1
5 1 1
6 0 2
7 1 2
8 2 2
9 3 4
10 4 4
Now whenever you want to retrieve the whole ancestry of node execute the below query:
select name from category where id in (select parent from hierarchy where child = 2 order by id ASC)
The above query will return all the ancestry names for the Questions (ID->2) i.e.
name
==================
Tech Support
Configuration
Questions
For completeness shake below is the content for category table
id Name
============================
0 Tech Support
1 Configuration
2 Questions
3 Sales
4 Questions
5 Other
N.B. This is just an idea i am sure you can definitely build more elegant solution on top of it.
If you are using MySQL 8 or above you can use Common Table Expressions for recursive queries. Query would be the following
WITH RECURSIVE CategoryPath (CategoryId, Name, path) AS
(
SELECT CategoryId, Name, Name as path
FROM category
WHERE CategoryParentId IS NULL
UNION ALL
SELECT c.CategoryId, c.Name, CONCAT(cp.path, ' / ', c.Name)
FROM CategoryPath AS cp JOIN category AS c
ON cp.CategoryId = c.CategoryParentId
)
SELECT * FROM CategoryPath ORDER BY path;
I have a Categories table which has some duplicate Categories as described below,
`Categories`
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 2 |
| 2 | Category 1 | 1 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 1 |
| 5 | Category 3 | 1 |
+--------+------------+------------+
Here is another junction table which relates to another Items table. The item_count in the first table is the total number of items per cat_id.
`Junction`
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 2 | 102 |
| 3 | 103 |
| 3 | 104 |
| 4 | 105 |
| 5 | 106 |
+--------+---------+
How do I add or combine those items from the duplicate Categories into ones each having maximum item_count among their duplicates? (e.g. Category 1).
Also, if the item_count is the same for those duplicate ones, then the Category with maximum cat_id will be chosen and item_count will be combined to that record. (e.g. Category 3).
Note: Instead of removing the duplicate records, the item_count will
be set to 0.
Below is the expected result.
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 1 | 102 |
| 3 | 103 |
| 3 | 104 |
| 5 | 105 |
| 5 | 106 |
+--------+---------+
In the result, there are two duplicates Category 1 and Category 3. And we have 2 scenarios,
cat_id=2 is eliminated because its item_count=1 is less than
that of cat_id=1 which is item_count=2.
cat_id=4 is eliminated even though its item_count is the same
as that of cat_id=5 since 5 is the maximum among duplicate
Category 3.
Please help me if any query that can join and update both tables in order to solve the duplicates.
Here's a SELECT. You can figure out to adapt it to an UPDATE ;-)
I've ignored the jucntion table for simplicity
SELECT z.cat_id
, z.cat_name
, (z.cat_id = x.cat_id) * new_count item_count
FROM categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE y.cat_id IS NULL;
+--------+------------+------------+
| cat_id | cat_name | item_count |
+--------+------------+------------+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
DELIMITER $$
DROP PROCEDURE IF EXISTS cursor_proc $$
CREATE PROCEDURE cursor_proc()
BEGIN
DECLARE #cat_id INT;
DECLARE #cat_name VARCHAR(255);
DECLARE #item_count INT;
DECLARE #prev_cat_Name VARCHAR(255);
DECLARE #maxItemPerCategory INT;
DECLARE #maxItemId INT DEFAULT 0;
DECLARE #totalItemsCount INT;
-- this flag will be set to true when cursor reaches end of table
DECLARE exit_loop BOOLEAN;
-- Declare the cursor
DECLARE categories_cursor CURSOR FOR
SELECT select cat_id ,cat_name ,item_count from Categories Order By cat_name, cat_id;
-- set exit_loop flag to true if there are no more rows
DECLARE CONTINUE HANDLER FOR NOT FOUND SET exit_loop = TRUE;
-- open the cursor
OPEN categories_cursor;
-- start looping
categories_loop: LOOP
-- read the name from next row into the variables
FETCH categories_cursor INTO #cat_id, #cat_name, #item_count ;
-- close the cursor and exit the loop if it has.
IF exit_loop THEN
CLOSE categories_loop;
LEAVE categories_loop;
END IF;
IF(#prev_cat_Name <> #cat_name)
THEN
-- Category has changed, set the item_count of the 'best' category with the total items count
IF(#maxItemId > 0)
THEN
UPDATE Categories
SET Categories.item_count=#totalItemsCount
WHERE Categories.cat_id=#maxItemId;
END IF;
-- Reset Values with the actual row values
SET #maxItemPerCategory = #item_count;
SET #prev_cat_Name = #cat_name;
SET #maxItemId = #cat_id
SET #totalItemsCount = #item_count;
ELSE
-- increment the total items count
SET #totalItemsCount = #totalItemsCount + #item_count
-- if the actual row has the maximun item counts, then it is the 'best'
IF (#maxIntPerCategory < #item_count)
THEN
SET #maxIntPerCategory = #item_count
SET #maxItemId = #cat_id
ELSE
-- else, this row is not the best of its Category
UPDATE Categories
SET Categories.item_count=0
WHERE Categories.cat_id=#cat_id;
END IF;
END IF;
END LOOP categories_loop;
END $$
DELIMITER ;
It's not pretty and copied in part from Strawberry's SELECT
UPDATE categories cat,
junction jun,
(select
(z.cat_id = x.cat_id) * new_count c,
x.cat_id newcatid,
z.cat_id oldcatid
from categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE
y.cat_id IS NULL) sourceX
SET cat.item_count = sourceX.c, jun.cat_id = sourceX.newcatid
WHERE cat.cat_id = jun.cat_id and cat.cat_id = sourceX.oldcatid
I think it's better to do what you want one step at time:
First, get data you need:
SELECT Max(`cat_id`), sum(`item_count`) FROM `Categories` GROUP BY `cat_name`
With these data you'll be able to check if update was correctly done.
Then, with a loop on acquired data, update:
update Categories set item_count =
(
Select Tot FROM (
Select sum(`item_count`) as Tot
FROM `Categories`
WHERE `cat_name` = '#cat_name') as tmp1
)
WHERE cat_id = (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM Categories
WHERE `cat_name` = '#cat_name') as tmp2)
Pay attention, if you run twice this code the result will be wrong.
Finally, set others Ids to 0
UPDATE Categories set item_count = 0
WHERE `cat_name` = '#cat_name'
AND cat_id <> (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM items
WHERE `cat_name` = '#cat_name0') as tmp2)
I need to create procedure which will find the worst user in one table by counting status with 'P' and 'U' calculate ratio then compare it with other users, take that user id and find it in another table and write all user information that are in two tables. And i call that procedure from java application.
Table Rezervacija
id | SifKorisnikPK | Status
1 | 1 | 'P'
2 | 1 | 'U'
3 | 1 | 'U'
4 | 2 | 'U'
5 | 2 | 'P'
6 | 2 | 'P'
7 | 2 | 'P'
8 | 2 | 'P'
9 | 3 | 'U'
10 | 3 | 'U'
11 | 3 | 'P'
12 | 3 | 'P'
13 | 3 | 'P'
14 | 3 | 'P'
So the user with id 2 is worst user because of 4 P's, and one U, so his ratio is 3 P. Then it's should go to Korisnik table and return all the info for user with id 2
I try with this but can't get any return values
CREATE PROCEDURE sp_getBadPremiumUsers
AS
BEGIN
DECLARE #BrLr int
DECLARE #BrDr int
SELECT #BrLr = (SELECT COUNT(*) FROM Rezervacija A
INNER JOIN Rezervacija B
ON A.SifKorisnikPK = B.SifKorisnikPK
WHERE A.Status = 'P')
SELECT #BrDr = (SELECT COUNT(*) FROM Rezervacija A
INNER JOIN Rezervacija B
ON A.SifKorisnikPK = B.SifKorisnikPK
WHERE A.Status = 'U')
SELECT * INTO #PremiKoris FROM Korisnik
INNER JOIN PremiumKorisnik
ON SifKorisnik = SifKorisnikPK
ALTER TABLE #PremiKoris
DROP COLUMN Password
SELECT * FROM #PremiKoris
WHERE #BrLr > #BrDr
DROP TABLE #PremiKoris
END
GO
You can get the worse user using:
select r.SifKorisnikPK
from Rezervacija r
where status = 'P'
group by SifKorisnikPK
order by count(*) desc
limit 1;
You can then use this in a query to get more information:
select k.*
from PremiumKorisnik k join
(select r.SifKorisnikPK
from Rezervacija r
where status = 'P'
group by SifKorisnikPK
order by count(*) desc
limit 1
) r
on r.SifKorisnikPK = k.SifKorisnikPK
This does in one query what you describe you want to do.
I have exports of person data which I would like to import into a table considering historization.
I wrote single sql-steps but two questions arises:
1. There is a step where I got a unexpected date
2: I would like to avoid manually submitting some steps and using stored procedure
The tables are:
Table to be filled considering historization:
CREATE TABLE person (
id INTEGER DEFAULT NULL
, name VARCHAR(50) DEFAULT NULL
, effective_dt DATE DEFAULT NULL
, expiry_dt DATE DEFAULT NULL
);
Table with person data to be imported:
CREATE TABLE person_stg (
id INTEGER DEFAULT NULL
, name VARCHAR(50) DEFAULT NULL
, export_dt DATE DEFAULT NULL
, import_flag TINYINT DEFAULT 0
);
-- Several exports which has to be imported
INSERT INTO person_stg (id, name, export_dt) VALUES
(1,'Jonn' , '2000-01-01')
, (2,'Marry' , '2000-01-01')
, (1,'John' , '2000-01-05')
, (2,'Marry' , '2000-01-06')
, (2,'Mary' , '2000-01-10')
, (3,'Samuel', '2000-01-10')
, (2,'Maria' , '2000-01-15')
;
The following first step (1) populates the table person with the first state of the person:
INSERT INTO person
SELECT a.id, a.name, a.export_dt, '9999-12-31' expiry_dt
FROM person_stg a
LEFT JOIN person_stg b
ON a.id = b.id
AND a.export_dt > b.export_dt
WHERE b.id IS NULL
;
SELECT * FROM person ORDER BY id, effective_dt;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 9999-12-31 |
| 2 | Marry | 2000-01-01 | 9999-12-31 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
Step (2) changes the expiry date:
-- (2) Update expiry_dt where changes happened
UPDATE
person a
, person_stg b
SET a.expiry_dt = SUBDATE(b.export_dt,1)
WHERE a.id = b.id
AND a.name <> b.name
AND a.expiry_dt = '9999-12-31'
AND b.export_dt = (SELECT MIN(b.export_dt)
FROM person_stg c
WHERE b.id = c.id
AND c.import_flag = 0
)
;
SELECT * FROM person ORDER BY id, effective_dt;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
The third step (3) inserts the second status of person data:
-- (3) Insert new exports which has changes
INSERT INTO person
SELECT a.id, a.name, a.export_dt, '9999-12-31' expiry_dt
FROM person_stg a
INNER JOIN person b
ON a.id = b.id
AND b.expiry_dt = SUBDATE(a.export_dt,1)
AND a.export_dt > b.effective_dt
AND a.import_flag = 0
;
SELECT * FROM person ORDER BY id, effective_dt;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 1 | John | 2000-01-05 | 9999-12-31 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 2 | Mary | 2000-01-10 | 9999-12-31 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
And the last step (4) defines on person_stg which record was inserted:
-- (4) Define imported records
UPDATE
person_stg a
, person b
SET import_flag = 1
WHERE a.id = b.id
AND a.export_dt = b.effective_dt
;
So far, so good. If I repeat step (2) I got the following table:
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 1 | John | 2000-01-05 | 9999-12-31 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 2 | Mary | 2000-01-10 | 1999-12-31 | <--- ??? Should be 2000-01-14
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
+----+--------+--------------+------------+
Mary/2000-01-10 got expiry_dt 1999-12-31 instead of 2000-01-14. I don't understand how this can happened.
So, my questions are:
(1a) Why this update of the expiry date gives this strange date?
(1b) Is there maybe a better code then (2)?
(2) How can I repeat steps (2) until (4) automatically? I need only some hints for a stored procedure.
-- (4) Define imported records
UPDATE
person_stg a
, person b
SET import_flag = 1
WHERE a.id = b.id
AND a.export_dt = b.effective_dt
;
If I understand what you want to do, you don't need a multi-step process. You are just looking for the "end date" for each record. Here is a method that uses correlated subqueries:
SELECT p.*, export_dt as effdate,
COALESCE((SELECT export_dt - interval 1 day
FROM person_stg p2
WHERE p2.id = p.id AND
p2.export_dt > p.export_dt
ORDER BY p2.export_dt
LIMIT 1
), '9999-12-31') as enddate
FROM person_stg p;
You can also do something using variables.
I'm not sure if this answers your question, because it replaces the whole process with a simpler query.
I found a solution using cursor which I never used before. First I made a stored procedure (SP) sp_add_record which update, insert new status or insert a new element given id and export_dt from patient_stg. This stored procedure was then used using SP with cursor (curs_add_records):
CALL curs_add_records();
SELECT * FROM person;
+----+--------+--------------+------------+
| id | name | effective_dt | expiry_dt |
+----+--------+--------------+------------+
| 1 | Jonn | 2000-01-01 | 2000-01-04 |
| 2 | Marry | 2000-01-01 | 2000-01-09 |
| 1 | John | 2000-01-05 | 9999-12-31 |
| 2 | Mary | 2000-01-10 | 2000-01-14 |
| 3 | Samuel | 2000-01-10 | 9999-12-31 |
| 2 | Maria | 2000-01-15 | 9999-12-31 |
+----+--------+--------------+------------+
The advantage of this procedure is that I can load table with the same code independently if it is an inital load (population load) or incremental.
Literatur I used:
Djoni Damrawikarte: Dimensional Data Warehousing with MySQL (DWH issues)
Ben Forta: MariaDB Crash Course (SP issues)
What follows are the SP I used.
PS: Was it appropriate to answer to my own question?
DELIMITER //
DROP PROCEDURE IF EXISTS sp_add_record //
CREATE PROCEDURE sp_add_record(
IN p_id INTEGER
, IN p_export_dt DATE
)
BEGIN
-- Change expiry_dt
UPDATE
person p
, person_stg s
SET p.expiry_dt = SUBDATE(p_export_dt,1)
WHERE p.id = s.id
AND p.id = p_id
AND s.export_dt = p_export_dt
AND p.effective_dt <= p_export_dt
AND ( p.name <> s.name )
AND p.expiry_dt = '9999-12-31'
;
-- Add new status
INSERT INTO person
SELECT s.id, s.name, s.export_dt, '9999-12-31' expiry_dt
FROM
person p
, person_stg s
WHERE p.id = s.id
AND p.id = p_id
AND s.export_dt = p_export_dt
AND ( p.name <> s.name )
-- does a entry exists with new expiry_dt?
AND EXISTS (SELECT *
FROM person p2
WHERE p2.id = p.id
AND p.expiry_dt = SUBDATE(p_export_dt,1)
)
-- entry with open expiry_dt not should not exist
AND NOT EXISTS (SELECT *
FROM person p3
WHERE p3.id = p.id
AND p3.expiry_dt = '9999-12-31'
)
;
-- Add new id
INSERT INTO person
SELECT s.id, s.name, s.export_dt, '9999-12-31' expiry_dt
FROM person_stg s
WHERE s.export_dt = p_export_dt
AND s.id = p_id
-- Add new id from stage if it does not exist in person
AND s.id NOT IN (SELECT p3.id
FROM person p3
WHERE p3.id = s.id
AND p3.expiry_dt = '9999-12-31'
)
;
END
//
DELIMITER ;
DELIMITER //
DROP PROCEDURE IF EXISTS curs_add_records //
CREATE PROCEDURE curs_add_records()
BEGIN
-- Local variables
DECLARE done BOOLEAN DEFAULT 0;
DECLARE p_id INTEGER;
DECLARE p_export_dt DATE;
-- Cursor
DECLARE c1 CURSOR
FOR
SELECT id, export_dt
FROM person_stg
ORDER BY export_dt, id
;
-- Declare continue handler
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
-- Open cursor
OPEN c1;
-- Loop through all rows
REPEAT
-- Get record
FETCH c1 INTO p_id, p_export_dt;
-- Call add record procedure
CALL sp_add_record(p_id,p_export_dt);
-- End of loop
UNTIL done END REPEAT;
-- Close cursor
CLOSE c1;
END;
//
DELIMITER ;