Generic procedure to perform SCD in sql - mysql

I have 2 tables in mssql server.I can perform scd through custom insert/update/delete and also through Merge statement.
Awesome Merge
I want to know that is there any generic procedure that could server the purpose. we just pass it 2 tables and it should porform the SCD. any option in SQL server 2008?
Thanks

No, there isn't and there can't be a generic one suitable for no matter what tables you pass to it. For several reasons:
How do you know which SCD type? (Okay, could be another parameter, but...)
How do you know which column should be historicized and which should be overwritten?
How do you determine which column is the business key, the surrogate key, the expiration column and so on?
To specify the columns in an update statement you must write dynamic sql, which is possible, but the above point comes into play
Not a reason why it's not possible but also consider: For a proper UPSERT one usually works with temporary tables, the MERGE statement sucks for SCDs except in special cases. That is because you can't use a MERGE statement together with an INSERT/UPDATE and you would have to disable foreign keys for that, since an UPDATE is implemented as DELETE THEN INSERT (or something like that, don't remember clearly, but I had those problems when I tried).
I prefer doing it this way (SCD type 2 and SQL Server that is):
Step 1:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);
Step 2:
IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;
SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;
The ISNULL(d.sid, 0) in step 2 is important. It returns the surrogate id of your dimension, if an entry already exists, otherwise 0.
Step 3:
UPDATE D_yourDimensionName SET
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn
)
;
In Step 3 you mark the existing entry as not valid anymore and keep a history of it. The valid entry you get with WHERE validTo IS NULL.
You can also add another UPDATE to overwrite any other column with the new value if needed.
Step 4:
INSERT INTO D_yourDimensionName
SELECT * FROM tmpDimYourDimensionName
WHERE sid = 0;
And that's it.

Related

How can I show datas not in table with out compairing with other tables in sql

I have one table named FILEINFO. Daily some file_names(around 100) will come. I need to check whether the names are present in the table or not. in a single query
Thanks in advance
I don't know if this fits in with your data pipeline, but one clean approach here would be to write those new incoming file names into some sort of table (temporary or permanent), and then use a simple exists query to check if they are already present in your current table, e.g.
SELECT filename AS new_file
FROM temp_names t1
WHERE NOT EXISTS (SELECT 1 FROM FILEINFO t2 WHERE t2.filename = t1.filename);
If you wanted to insert only new file names, you could use similar logic, e.g.
INSERT INTO FILEINFO (filename)
SELECT filename
FROM temp_names t1
WHERE NOT EXISTS (SELECT 1 FROM FILEINFO t2 WHERE t2.filename = t1.filename);
The other approach would be without using a temporary table (Tim Biegeleisen answer).
This approach will only use one query
SELECT
FILEINFO.filename
FROM (
SELECT 'filename1' AS filename FROM DUAL
UNION
SELECT 'filename2' AS filename FROM DUAL
) AS file_names
LEFT JOIN
FILEINFO
ON
file_names.filename = FILEINFO.filename
WHERE
FILEINFO.filename IS NULL
This query should also work in both MySQL and Oracle database.
Both database systems are using DUAL as a "dummy" table to allow tableless SELECTS

convert mysql (on duplicate key ) query to oracle merge

INSERT INTO table1(id,dept_id,name,description,creation_time,modified_time)
VALUES('id','dept_id','name','description','creation_time','modified_time')
ON DUPLICATE KEY UPDATE dept_id=VALUES(dept_id),name=VALUES(name),
description=VALUES(description),creation_time=VALUES(creation_time),
modified_time=VALUES(modified_time)
I used the below oracle to convert the above mysql query. The query fails. Can you please help me figure out what is wrong with the oracle query.
Merge into table1 t1 using
(VALUES ('id','dept_id','name','description','creation_time','modified_time')) as temp
(id,dept_id,name,description,creation_time,modified_time) on t1. id = temp.id
WHEN MATCHED THEN UPDATE SET dept_id=t1.dept_id, description=t1.description, name=t1.name,
creation_time=t1.creation_time, modified_time=t1.modified_time
WHEN NOT MATCHED THEN INSERT (id,dept_id,name,description,creation_time,modified_time)
VALUES ('id','dept_id','name','description','creation_time','modified_time')
To do this, you need to use a table or subquery in the using clause (in your case, you need a subquery).
In Oracle, you can use the dual table if you need to select something without needing to select from an actual table; this is a table that contains only a single row and a single column.
Your merge statement should therefore look something like:
MERGE INTO table1 tgt
USING (SELECT 'id' id,
'dept_id' dept_id,
'name' NAME,
'description' description,
'creation_time' creation_time,
'modified_time' modified_time
FROM dual) src
ON tgt.id = src.id
WHEN MATCHED THEN
UPDATE
SET tgt.dept_id = src.dept_id,
tgt.description = src.description,
tgt.name = src.name,
tgt.creation_time = src.creation_time,
tgt.modified_time = src.modified_time
WHEN NOT MATCHED THEN
INSERT
(tgt.id,
tgt.dept_id,
tgt.name,
tgt.description,
tgt.creation_time,
tgt.modified_time)
VALUES
(src.id,
src.dept_id,
src.name,
src.description,
src.creation_time,
src.modified_time);
Note how the when not matched clause uses the columns from the source subquery, rather than using the literal values you supplied. (I assume that in your actual code, these literal values are actually variables; creation_time is a pretty odd value to store in a column labelled creation_time!).
I've also switched the aliases to make it clearer where you're merging to and from; I find this makes it easier to understand what the merge statement is doing. YMMV.

Update MySQL table any time another table changes

I am trying to reduce the number of queries my application uses to build the dashboard and so am trying to gather all the info I will need in advance into one table. Most of the dashboard can be built in javascript using the JSON which will reduce server load doing tons of PHP foreach, which was resulting in excess queries.
With that in mind, I have a query that pulls together user information from 3 other tables, concatenates the results in JSON group by family. I need to update the JSON object any time anything changes in any of the 3 tables, but not sure what the "right " way to do this is.
I could set up a regular job to do an UPDATE statement where date is newer than the last update, but that would miss new records, and if I do inserts it misses updates. I could drop and rebuild the table, but it takes about 16 seconds to run the query as a whole, so that doesn't seem like the right answer.
Here is my initial query:
SET group_concat_max_len = 100000;
SELECT family_id, REPLACE(REPLACE(REPLACE(CONCAT("[", GROUP_CONCAT(family), "]"), "\\", ""), '"[', '['), ']"', ']') as family_members
FROM (
SELECT family_id,
JSON_OBJECT(
"customer_id", c.id,
"family_id", c.family_id,
"first_name", first_name,
"last_name", last_name,
"balance_0_30", pa.balance_0_30,
"balance_31_60", pa.balance_31_60,
"balance_61_90", pa.balance_61_90,
"balance_over_90", pa.balance_over_90,
"account_balance", pa.account_balance,
"lifetime_value", pa.lifetime_value,
"orders", CONCAT("[", past_orders, "]")
) AS family
FROM
customers AS c
LEFT JOIN accounting AS pa ON c.id = pa.customer_id
LEFT JOIN (
SELECT patient_id,
GROUP_CONCAT(
JSON_OBJECT(
"id", id,
"item", item,
"price", price,
"date_ordered", date_ordered
)
) as past_orders
FROM orders
WHERE date_ordered < NOW()
GROUP BY customer_id
) AS r ON r.customer_id = c.id
where c.user_id = 1
) AS results
GROUP BY family_id
I briefly looked into triggers, but what I was hoping for was something like:
create TRIGGER UPDATE_FROM_ORDERS
AFTER INSERT OR UPDATE
ON orders
(EXECUTE QUERY FROM ABOVE WHERE family_id = orders.family_id)
I was hoping to create something like that for each table, but at first glance it doesn't look like you can run complex queries such as that where we are creating nested JSON.
Am I wrong? Are triggers the right way to do this, or is there a better way?
As a demonstration:
DELIMITER $$
CREATE TRIGGER orders_au
ON orders
AFTER UPDATE
FOR EACH ROW
BEGIN
SET group_concat_max_len = 100000
;
UPDATE target_table t
SET t.somecol = ( SELECT expr
FROM ...
WHERE somecol = NEW.family_id
ORDER BY ...
LIMIT 1
)
WHERE t.family_id = NEW.family_id
;
END$$
DELIMITER ;
Notes:
MySQL triggers are row level triggers; a trigger is fired for "for each row" that is affected by the triggering statement. MySQL does not support statement level triggers.
The reference to NEW.family_id is a reference to the value of the family_id column of the row that was just updated, the row that the trigger was fired for.
MySQL trigger prohibits the SQL statements in the trigger from modifying any rows in the orders table. But it can modify other tables.
SQL statements in a trigger body can be arbitrarily complex, as long as its not a bare SELECT returning a resultset, or DML INSERT/UPDATE/DELETE statements. DDL statements (most if not all) are disallowed in a MySQL trigger.

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

I have below query in mysql where I want to check if branch id and year of finance type from branch_master are equal with branch id and year of manager then update status in manager table against branch id in manager
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
(
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)
)
)
but getting error
Table 'm1' is specified twice, both as a target for 'UPDATE' and as a
separate source for data
This is a typical MySQL thing and can usually be circumvented by selecting from the table derived, i.e. instead of
FROM manager AS m2
use
FROM (select * from manager) AS m2
The complete statement:
UPDATE manager
SET status = 'Y'
WHERE branch_id IN
(
select branch_id
FROM (select * from manager) AS m2
WHERE (branch_id, year) IN
(
SELECT branch_id, year
FROM branch_master
WHERE type = 'finance'
)
);
The correct answer is in this SO post.
The problem with here accepted answer is - as was already mentioned multiple times - creating a full copy of the whole table. This is way far from optimal and the most space complex one. The idea is to materialize the subset of data used for update only, so in your case it would be like this:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT * FROM(
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance')
) t
)
Basically you just encapsulate your previous source for data query inside of
SELECT * FROM (...) t
Try to use the EXISTS operator:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE EXISTS (SELECT 1
FROM (SELECT m2.branch_id
FROM branch_master AS bm
JOIN manager AS m2
WHERE bm.type = 'finance' AND
bm.branch_id = m2.branch_id AND
bm.year = m2.year) AS t
WHERE t.branch_id = m1.branch_id);
Note: The query uses an additional nesting level, as proposed by #Thorsten, as a means to circumvent the Table is specified twice error.
Demo here
Try :::
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
(SELECT DISTINCT branch_id
FROM branch_master
WHERE type = 'finance'))
AND m1.year IN ((SELECT DISTINCT year
FROM branch_master
WHERE type = 'finance'))
The problem I had with the accepted answer is that create a copy of the whole table, and for me wasn't an option, I tried to execute it but after several hours I had to cancel it.
A very fast way if you have a huge amount of data is create a temporary table:
Create TMP table
CREATE TEMPORARY TABLE tmp_manager
(branch_id bigint auto_increment primary key,
year datetime null);
Populate TMP table
insert into tmp_manager (branch_id, year)
select branch_id, year
from manager;
Update with join
UPDATE manager as m, tmp_manager as tmp_m
inner JOIN manager as man on tmp_m.branch_id = man.branch_id
SET status = 'Y'
WHERE m.branch_id = tmp_m.branch_id and m.year = tmp_m.year and m.type = 'finance';
This is by far the fastest way:
UPDATE manager m
INNER JOIN branch_master b on m.branch_id=b.branch_id AND m.year=b.year
SET m.status='Y'
WHERE b.type='finance'
Note that if it is a 1:n relationship the SET command will be run more than once. In this case that is no problem. But if you have something like "SET price=price+5" you cannot use this construction.
Maybe not a solution, but some thoughts about why it doesn't work in the first place:
Reading data from a table and also writing data into that same table is somewhat an ill-defined task. In what order should the data be read and written? Should newly written data be considered when reading it back from the same table? MySQL refusing to execute this isn't just because of a limitation, it's because it's not a well-defined task.
The solutions involving SELECT ... FROM (SELECT * FROM table) AS tmp just dump the entire content of a table into a temporary table, which can then be used in any further outer queries, like for example an update query. This forces the order of operations to be: Select everything first into a temporary table and then use that data (instead of the data from the original table) to do the updates.
However if the table involved is large, then this temporary copying is going to be incredibly slow. No indexes will ever speed up SELECT * FROM table.
I might have a slow day today... but isn't the original query identical to this one, which souldn't have any problems?
UPDATE manager as m1
SET m1.status = 'Y'
WHERE (m1.branch_id, m1.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)

Reorder a MYSQL table

I have a MySql table with a 'Order' field but when a record gets deleted a gap appears
how can i update my 'Order' field sequentially ?
If possible in one query 1 1
id.........order
1...........1
5...........2
4...........4
3...........6
5...........8
to
id.........order
1...........1
5...........2
4...........3
3...........4
5...........5
I could do this record by record
Getting a SELECT orderd by Order and row by row changing the Order field
but to be honest i don't like it.
thanks
Extra info :
I also would like to change it this way :
id.........order
1...........1
5...........2
4...........3
3...........3.5
5...........4
to
id.........order
1...........1
5...........2
4...........3
3...........4
5...........5
In MySQL you can do this:
update t join
(select t.*, (#rn := #rn + 1) as rn
from t cross join
(select #rn := 0) const
order by t.`order`
) torder
on t.id = torder.id
set `order` = torder.rn;
In most databases, you can also do this with a correlated subquery. But this might be a problem in MySQL because it doesn't allow the table being updated as a subquery:
update t
set `order` = (select count(*)
from t t2
where t2.`order` < t.`order` or
(t2.`order` = t.`order` and t2.id <= t.id)
);
There is no need to re-number or re-order. The table just gives you all your data. If you need it presented a certain way, that is the job of a query.
You don't even need to change the order value in the query either, just do:
SELECT * FROM MyTable WHERE mycolumn = 'MyCondition' ORDER BY order;
The above answer is excellent but it took me a while to grok it so I offer a slight rewrite which I hope brings clarity to others faster:
update
originalTable
join (select originalTable.ID,
(#newValue := #newValue + 10) as newValue
from originalTable
cross join (select #newValue := 0) newTable
order by originalTable.Sequence)
originalTable_reordered
on originalTable.ID = originalTable_reordered.ID
set originalTable.Sequence = originalTable_reordered.newValue;
Note that originalTable.* is NOT required - only the field used for the final join.
My example assumes the field to be updated is called Sequence (perhaps clearer in intent than order but mainly sidesteps the reserved keyword issue)
What took me a while to get was that "const" in the original answer was not a MySQL keyword. (I'm never a fan of abbreviations for that reason -- the can be interpreted many ways at times especially at these very when it is best they not be misinterpreted. Makes for verbose code I know but clarity always trumps convenience in my books.)
Not quite sure what the select #newValue := 0 is for but I think this is a side effect of having to express a variable before it can be used later on.
The value of this update is of course an atomic update to all the rows in question rather than doing a data pull and updating single rows one by one pragmatically.
My next question, which should not be difficult to ascertain, but I've learned that SQL can be a trick beast at the best of times, is to see if this can be safely done on a subset of data. (Where some originalTable.parentID is a set value).