How to insert updated records count, new records count and table name in audit table using ssis? - ssis

I am creating a SSIS package in which I have 15 Data flow tasks each for different table.In each data flow task I am inserting the records from source to destination and I am checking if record is available in destination I am updating that record if not inserting as new record.Now I want to store updated records count,new records count and table name in audit table
How can I do that easily?

Related

delete and insert records using vlookup in ssis

I have an ssis package source ODBC, after aggregate using vlookup inserting records into sql table using OLEDB. Using vlookup inserting new records in destination table(sql) and matching records in temp table(sql). And the table is having Primary key on date field. Inserting records, some records are dupliating violating PK. So here base on date field, delete records of 3 days and insert again.
In ssis First taken Execute task, and written delete query, then Data Flow to move data form source to destination. Here i am not getting Correct records.
Task:
Before inserting Delete records and insert them.
Please help me out how to achieve this.

Dealing with large overlapping sets of data - updating just the delta

My Python application generates a CSV file containing a few hundred unique records, one unique row per line. It runs hourly and very often the data remains the same from one run to another. If there are changes, then they are small, e.g.
one record removed.
a few new records added.
occasional update to an existing record.
Each record is just four simple fields (name, date, id, description), and there will be no more than 10,000 records by the time the project is at maximum, so it can all be contained in a single table.
What the best way to merge changes into the table?
A few approaches I'm considering are:
1) empty the table and re-populate on each run.
2) write the latest data to a staging table and run a DB job to merge the changes into the main table.
3) read the existing table data into my python script, collect the new data, find the differences, run multiple 'CRUD' operations to apply the changes one by one
Can anyone suggest a better way?
thanks
I would do this in the following way:
Load the new CSV file into a second table.
DELETE rows in the main table that are missing from the second table:
DELETE m FROM main_table AS m
LEFT OUTER JOIN new_table AS t ON m.id = t.id
WHERE t.id IS NULL;
Use INSERT ON DUPLICATE KEY UPDATE to update rows that need to be updated. This becomes a no-op on each row that already contains the same values.
INSERT INTO main_table (id, name, date, description)
SELECT id, name, date, description FROM new_table
ON DUPLICATE KEY UPDATE
name = VALUES(name), date = VALUES(date), description = VALUES(description);
Drop the second table once you're done with it.
This is assuming id is the primary key and you have no other UNIQUE KEY in the table.
Given a data set size of 10,000 rows, this should be quick enough to do it in one batch. Once the data set gets 10x larger, you may have to reconsider the solution, for example do batches of 10,000 rows at a time.

SSIS- Update few columns of a row for which the primary key already exists

The following is an example to better explain my scenario. My database table has following columns
Column -1: Operating_ID (which is the primary key)
Column -2: Name
Column -3: Phone
Column -4: Address
Column -5: Start Date
Column -6: End Date
The values for the columns 1,2,3,4 come from an extract and this extract is pushed to the database daily using SSIS data flow task
The values for the columns 5 and 6 are user inputted from a web applicxation and saved to the database.
Now in SSIS process instead of throwing violation of primary key error, i need to update Columns 2,3,4 if the primary key i.e column 1 already exists.
First i considered replace but that deletes the user inputted data columns 4,5.
I would like to keep the data in columns 4,5 and update columns 2,3,4 when column 1 already exists.
Do a LOOKUP for Operating_ID. Change that lookup from "FAIL ON NOT FOUND" to "REDIRECT ROWS TO NO MATCH"
If match not found, go to INSERT
If match found, go to UPDATE. You can run OLAP commands to update, but if it is a large set of data, you are better off putting into a table, and doing an UPDATE with a JOIN
This is what I would do. I would put all the data in a staging table. Then I woudl use a data flow to insert the new records and the source of that dataflow would be the staging table with a not exists clause referencing the prod table.
Then I would use an Execute SQL task in the control flow to update the data for existing rows.

how to get unique records using pentaho data integration

I have log_table with columns state,region,district,timestamp in sql
server.
ID state region district timestamp
-- ----- ------ -------- ---------
1 GJ RE056 DI137 2014-02-05 09:00:00.257
2 CA RE027 DI154 2014-02-04 14:00:00.183
3 GJ RE056 DI137 2014-12-09 16:00:00.257
I would like to load these records to another table in mysql with unique records and the existing data should not insert into new table in mysql while loading the data from sql server to mysql, only daily updated
records should load in to the new table with out duplicate records. Help me in this issue how to do using pentaho data integration.
I assume the timestamp column stands for last_updated_timestamp of a row from source database (sql server).
If your goal is to run the transformation daily and you expect that only new or updated records from source database will be loaded to target (mysql) database, you need to store the timestamp into target database (e.g. table log_target) and the transformation steps could be:
Table input (target db): Get MAX timestamp from table log_target.
SELECT COALESCE(MAX(timestamp), '1970-01-01 00:00:00') AS max FROM log_target
Table input (source db): Select updated data from log_table
Step settings: Insert data step (to obtain data from previous step); Replace variables in script? (true)
SELECT * FROM log_table WHERE (timestamp > ?)
Process your data
Table output or Insert/Update (target db): Store output data to log_target table. Don't forget to store timestamp value.
Table Input -> sort rows (state,region,district,timestamp descending) -> Unique rows ->Insert / Update.
Sort rows timestamp descending will help you keep last modified record after remove duplicate.

How to only sync the new records from table A to table B in SSIS?

Does any one how can I only sync the new records from table A to table B in SSIS?
I have 2 tables which table A and table B. Table A will be update time to time from user. However, records from table B is sync from table A every 30 mins. I have created a SSIS job to do the sync, see below for more details:
I have an issue here, every times when I'm ruining the job, it will copy all data from table A and insert into table B (this causing duplicate records had been insert). Any way that I can set the job so that it will only sync the new records into table B?
I would use a MERGE statement into an execute SQL Task.
Please review some examples about how to use the merge command:
Using MERGE in SQL Server to insert, update and delete at the same time
The MERGE Statement in SQL Server 2008
Kind Regards,
Paul