We have the below requirement:
Currently, we get the data from source (another server, another team, another DB) into a temp DB (via batch jobs) and after we get data into our temp DB, we process the data, transform and update our primary DB with the difference (i.e. the records that changed or the newly added records).
Source->tempDB (daily recreated)->delta->primaryDB
Requirement:
- To delete the data in primary DB once its deleted in source.
Ex: suppose a record with ID=1 is created in source, it comes to temp DB and eventually makes it to primary DB. When this record is deleted in source, it should get deleted in primary DB also.
Challenge:
How do we delete from primary DB when there is nothing to refer to in temp DB (since the record is already deleted in source, nothing comes in tempDB).
Naive approach:
- We can clean up primary DB, before every transform and load afresh. However, it takes a significant amount of time to clean up and populate primary DB everytime.
You could create triggers on each table that fills a history table with deleted entries. Synch that over to your tempDB and use it to delete stuff i your primary DB.
You either want one "delete-history-table" per table or a combined history table that also includes the tablename which triggered the deletion.
You might want to look into SQL Compare or other tools for synching tables.
If you have access to tempDB and primeDB (same server or linked servers) at the same time you could also try a
delete *
from primeBD.Tablename
where not exists (
select 1
from tempDB.Tablename where id = primeDB.Tablename.Id
)
which will perform awfully - ask your db designers.
In this scenorio if TEMPDB & Primary DB have no direct reference then can use track event notification on database level .
Here is the link i got for same :
https://www.mssqltips.com/sqlservertip/2121/event-notifications-in-sql-server-for-tracking-changes/
Related
I have a help desk system that uses a MySQL database and I've linked to some of its tables from Access 2016. I also created a local table in the Access DB that contains some fields for management to add info such as comments, priority, target date and a few others and linked it to the main 'ticket' table's ID I brought in from MySQL. What I'd like to do is pull in the ticket data (read-only) and display it on a form with my local table data (writable) so management can track the status of the tickets. Here's the property sheet of the query:
Record Locks = Edited Record
Recordset Type = Dynaset (Inconsistent Updates)
So, the linked table 'Tickets' is left joined to my comments table (comments table has tickets' ID). When I run the query (which includes fields from both the local and linked tables) I can add info into the fields from the local table, and a record is created in that local table but the ID from the linked table (tickets) is not written to the local table (comments) although a sub record is created (the record in the local table has a 'plus' sing that opens what I assume is the record from the MySQL DB.
I tried using a one-to-one relationship (which is the most there should ever be) but I don't get anything because the local table doesn't have any records yet.
Hopefully I'm overlooking something simple..
I want to be able to update a specific column of a table using data from another table. Here's what the two tables look like, the DB type and SSIS components used to get the tables data (btw, both ID and Code are unique).
Table1(ID, Code, Description) [T-SQL DB accessed using ADO NET Source component]
Table2(..., Code, Description,...) [MySQL DB accessed using ODBC Source component]
I want to update the column Table1.Description using the Table2.Description by matching them with the right Code first (because Table1.Code is the same as Table2.Code).
What i tried:
Doing a Merge Join transformation using the Code column but I couldn't figure out how to reinsert the table because since Table1 has relationships i can't simply drop the table and replace it with the new one
Using a Lookup transformation but since both tables are not the same type it didn't allow me to create the lookup table's connection manager (which would be for in my case MySQL)
I'm still new to SSIS but any ideas or help would be greatly appreciated
My solution is based on #Akina's comments. Although using a linked server would've definitely fit, my requirement is to make an SSIS package to take care of migrating some old data.
The first and last are SQL tasks, while the Migrate ICDDx is the DFT that transfers the data to a staging table created during the first SQL task.
Here's the SQL commands that gets executed during Create Staging Table :
DROP TABLE IF EXISTS [tempdb].[##stagedICDDx];
CREATE TABLE ##stagedICDDx (
ID INT NOT NULL,
Code VARCHAR(15) NOT NULL,
Description NVARCHAR(500) NOT NULL,
........
);
and here's the sql command (based on #Akina's comment) for transferring from staged to final (inside Transfer Staged):
UPDATE [MyDB].[dbo].[ICDDx]
SET [ICDDx].[Description] = [##stagedICDDx].[Description]
FROM [dbo].[##stagedICDDx]
WHERE [ICDDx].[Code]=[##stagedICDDx].[Code]
GO
Here's the DFT used (both TSQL and MySQL sources return sorted output using ORDER BY Code, so i didnt have to insert Sort components before the Merge Join) :
Note: Btw, you have to setup the connection manager to retain/reuse the same connection so that the temporary table doesn't get deleted before we transfer data to it. If all goes well, then after the Transfer Staged SQL Task, the connection would be closed and the global temporary table would be deleted.
I am designing a MySQL database for a new project. I will be importing 50-60 MB of data on a daily basis.
There will be a main table with a primary key. Then there will be child tables with their own primary key and a foreign key pointing back to the main table.
New data has to be parsed from a giant text file and then some minor manipulations made prior to importing into the master database. The parsing and import operation may involve a significant amount of troubleshooting so I want to import new data into a temporary database and ensure its integrity before adding to the master.
For this reason, I thought initially to parse and import new data into a separate, temporary database each day. In this way, I would be able to inspect the data prior to adding to the master and at the same time I would have each day's data stored as a separate database should I ever need to rebuild the master later on from the individual temporary databases.
I am considering the use of primary keys / foreign keys with the InnoDB engine in order to maintain relational integrity across tables. This means I have to worry about auto-increment ids (primary key) not having any duplicates when I go to import the new data each day.
So, given this situation, what would be best?
Make a copy of the master and import directly into the copy of the master each day. Replace existing master with the new copy.
Import new data into a temporary database each day but change auto-increment start value of the primary keys to be greater than the maximum in the master. Would I then also change the auto-increment values for the primary keys for all tables (main table and its children)?
Import new data into a temporary database each day, not worrying about the primary key values. Find some other way to merge the temporary database with the master without collisions of the primary keys? If using this strategy, how can I update the primary key in the main table for the new data while making sure all the relationships with the child tables remain correct?
I'm not sure this is as complicated as you are making it?
Why not just do this:
Import raw data into temporary table (why does it have to be a separate database?)
Run your transformations/integrity checks on the temporary table.
When the data is good, insert it directly into the master table.
Use auto incrementing ids on the master table that are not dependent on your data being imported. That allows you to have a unique id and the original ids that might have existed in your import.
Add a field to your master table(s) that gives you a record of which import the records came from.
In addition to copying the data to your master table, make a log that ties back to the data you merged. Helps you back out the data if you find it's wrong/bad and gives you an audit trail.
In the end just set up a sandbox database, write a bunch of stored procedures and test the crap out of it. =)
Finally reached data migration part of my Project and now trying to move data from MySQL to SQL Server.
SQL Server has new schema (mapping is not always one to one).
I am trying to use SSIS for the conversion, which I started learning today morning.
We have customer and customer location table in MySQL and equivalent table in SQL Server. In SQL server all my tables now have surrogate key column (GUID) and I am creating the same in Script Component.
Also note that I do have a primary key in current mysql tables.
What I am looking for is how I can add child records to customer location table with newly created guid as parent key.
I see that SSIS have Foreach loop container, is this of any use here.
if not another possibility that I can think of is create two Data Flow Task and [somehow] just before the master data is sent to Destination Component [Table] on primary dataflow task , add a variable with newly created GUID and another with old PrimaryID, which will be used to create source for DataTask Flow for child records.
May be to simplyfy , this can also be done once datatask for master is complete and then datatask for child reads this master data and inserts child records from MySQL to SQL Server table. This would though mean that I have to load all my parent table records back into memory.
I know this is all too confusing and it is mainly because I am very confused :-(, to bear with me and if you want more information let me know.
I have been through may links that i found through google search but none of them really explains( or I was not able to uderstand) how the process is carried out.
Please advise
regards,
Mar
** Edit 1**
after further searching and refining key words i found this link in SO and going through it to see if it can be used in my scenario
How to load parent child data found in EDI 823 lockbox file using SSIS?
OK here is what I would do. Put the my sql data into staging tables in sql server that have identity columns set up and an extra column for the eventual GUID which will start out as null. Now your records have a primary key.
Next comes the sneaky trick. Pick a required field (we use last_name) and instead of the real data insert the value form the id field in the staging table. Now you havea record that has both the guid and the id in it. Update the guid field in the staging table by joing to it on the ID and the required field you picked out. Now update the last_name field with the real data.
To avoid the sneaky trick and if this is only a onetime upload, add a column to your tables that contains the staging table id. Again you can use this to get the guid for inserting to related tables. Then when you are done, drop the extra column.
You are aware that there are performance issues involved with using GUIDs? Make sure not to make them the clustered index (as the PK they will be by default unless you specify differntly) and use newsequentialid() to populate them. Why are you using GUIDs? If an identity would work, it is usually better to use it.
Is there a way to keep a timestamped record of every change to every column of every row in a MySQL table? This way I would never lose any data and keep a history of the transitions. Row deletion could be just setting a "deleted" column to true, but would be recoverable.
I was looking at HyperTable, an open source implementation of Google's BigTable, and this feature really wet my mouth. It would be great if could have it in MySQL, because my apps don't handle the huge amount of data that would justify deploying HyperTable. More details about how this works can be seen here.
Is there any configuration, plugin, fork or whatever that would add just this one functionality to MySQL?
I've implemented this in the past in a php model similar to what chaos described.
If you're using mysql 5, you could also accomplish this with a stored procedure that hooks into the on update and on delete events of your table.
http://dev.mysql.com/doc/refman/5.0/en/stored-routines.html
I do this in a custom framework. Each table definition also generates a Log table related many-to-one with the main table, and when the framework does any update to a row in the main table, it inserts the current state of the row into the Log table. So I have a full audit trail on the state of the table. (I have time records because all my tables have LoggedAt columns.)
No plugin, I'm afraid, more a method of doing things that needs to be baked into your whole database interaction methodology.
Create a table that stores the following info...
CREATE TABLE MyData (
ID INT IDENTITY,
DataID INT )
CREATE TABLE Data (
ID INT IDENTITY,
MyID INT,
Name VARCHAR(50),
Timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)
Now create a sproc that does this...
INSERT Data (MyID, Name)
VALUES(#MyID,#Name)
UPDATE MyData SET DataID = ##IDENTITY
WHERE ID = #MyID
In general, the MyData table is just a key table. You then point it to the record in the Data table that is the most current. Whenever you need to change data, you simply call the sproc which Inserts the new data into the Data table, then updates the MyData to point to the most recent record. All if the other tables in the system would key themselves off of the MyData.ID for foreign key purposes.
This arrangement sidesteps the need for a second log table(and keeping them in sync when the schema changes), but at the cost of an extra join and some overhead when creating new records.
Do you need it to remain queryable, or will this just be for recovering from bad edits? If the latter, you could just set up a cron job to back up the actual files where MySQL stores the data and send it to a version control server.