Customised data load in SSIS - ssis

I am trying to design a SSIS Package, where if records in the files come for the first time than
load it to primary table,
else to be load to secondary table.
files from different customer has unique customer id.
i am not sure if it is possible in SSIS.... can anyone please help me on this.
Thanks

You can read data in staging table and then compare ID from staging table with the ID in target table, if matches then insert in secondary table else insert into primary table.

If you add a lookup step in your dataflow, you can compare the data to the table1, if the data does not exist, put it in table1 if so put in table2

Related

Pentaho Kettle (Spoon) - Delete Records From Different Tables

I'm trying to delete records in my target table based on whether records exist in source table. I tried using a 'Delete' step, but I noticed that this step is based on a conditional clause.
My condition is quite simple "if the record/row does NOT exist in table A [source], delete the record/row from table B [destination]".
I also read about using a 'Merge Rows (diff)' step, but it seems to check/compare the entire set of tables for differences.
The table has several million records with many hundreds of columns in a MySQL server, I need to do this in the most efficient way.
I'm doing a search of table A with the Table input object and sql command:
'' ' SELECT I went , user , password , attribute , op FROM viewuserradiusunisulma
Any help would be appreciated.
print - image screen pentaho transformation
Transformation
Delete Pentaho
if your source and target table are in the same database, you can use a SQL query to delete all records in tableB that don't have a corresponding entry in tableA:
delete tableB where not exists (select id from tableA where id = tableB.id)
if source and destination tables are not in the same database, you would have to go through all rows in tableB and check whether the record exists in tableA. If your source tableA has a limited number of rows, loading the key values in memory and then performing a stream lookup instead of a database lookup would be much faster. I'd probably try that even with higher number of rows because of the significant performance impact.
note: I hope I haven't messed up the sql syntax, I'm thinking almost exclusively in abap at the moment and that messes with my memory a bit. So please test this on some backup before firing away.
I found the solution. In this case, I check the records, then report, update and enter the new data
Trasnsformation

Csv to Mysql INSERT INTO the last row

I have a problem, when I import a data from .csv file to Mysql, this data is inserted in a new row. I want this data next to the other cells, in the same row.
I have an auto-incremental Id,and the default/expression of the columns is NULL. I don't know where is the problem.
Can you help me?
Thanks!!
Assuming that the imported data rows have some columns that have values in common with the existing data (auto-incrementing IDs won't do it), import the data into a temporary table, then use an UPDATE statement to join your base table to the temporary table on the common columns, and transfer the data from the temporary table to the base table.

Duplicate data in SQL

I have to keep duplicate data in my database so my question is...Is it preferable to keep the duplicate data in the same table and just add a column to identify the original data or I have to create another table to hold the copied data?
I suggest to save the duplicate data in a different table or even a different schema so it won't be confusing to keep working with this table.
Imagine yourself in six months form now trying to guess what are all this duplicate rows for.
In addition those duplicate rows does not reflect the business purpose of this table.
It will be nicer to store them in a table named [table_name]_dup or a schema named [schema_name]_dup
To create a backup you should read this
To duplicate a website with it's content. Bad solution but you still have to make a backup and restore it in a different database.
Duplicate a table in mysql:
CREATE TABLE newtable LIKE oldtable;
INSERT newtable SELECT * FROM oldtable;

Cant copy unique records from one database table to another?

Hi,
I am trying to copy unique records from a database table to another table of the same name but different database. The source database contains some records that are already present in the destination database, so those I dont need, only the other ones. Database destination is called "test" and the source database is "forums". The table name is store for both cases. I am using this query:
INSERT INTO test.store (cs_key, cs_value, cs_array, cs_updated,cs_rebuild)
SELECT DISTINCT cs_key, cs_value, cs_array, cs_updated,cs_rebuild
FROM forums.store
But I am getting many errors as I try to run this query. Why?
Thank you.

SSIS how to find deleted records

I'm having data flow from source tables to destination table. To simplify the question, I'll say there are two merge joined source tables and one destination table. Also, there are primary keys helping me identify each record
The package is running everyday, and if one record is deleted from source table, how could I know which one is deleted so that I could delete that in destination table?
(FYI~~ I've dong checking to see if a record exists in destination table and if so update else insert, but don't know how to find deleted data)
Another possible approach:
Assuming you receive all records from source, not just imports and updates:
Amend package to stamp records that have been inserted or updated using a unique id or run datetime
Following the package run, process the destination table where records weren't inserted or updated in the last package run. By a process of elimination, any records that weren't provided in the source file should be deleted.
Again, assuming that all records are sent, not just imports and updates. But then again, if you don't receive all records, it's going to be physically impossible to detect if a record has been deleted.
The problem with comparing source to destination is that you have to compare every source row to the destination in every load, and as the number of rows increases that takes up more and more time.
As a result, the best way to handle this is probably on the source side. Two common approaches are a 'soft delete' where you set a flag column to mark the row as deleted; or a trigger that records the PK of the deleted row in a log table (or moves the entire row to an archive log table). Your ETL process then looks at the flags or the log/archive table to determine which rows were deleted since the last load.
Another possibility is that the source platform offers some built-in feature you can use to track deleted rows, e.g. CDC in SQL Server. But if you have no control at all over the source database (if it even is a database) then there may be no alternative to comparing the full data set.
One possible approach:
Prior to running package, delete the destination table records (using a stored procedure)
Just import all records in to destination table
Pros:
Your destination table will always mirror the incoming data, no need to check for deletions
Cons:
You won't have any historical information (if that is required)
I had the same problem, as in how to mark my old/archive records as being "deleted" because they no longer exist in the original data source.
Basically, I built two tables, where one is the main table containing all the records that came in from the original data source, and a temporary table I kept to store the original data source every time I ran my scripts.
MAIN TABLE
ID, NAME, SURNAME, DATE_MODIFIED, ORDERS_COUNT, etc
plus a STATUS column (1 for Active, 0 for Deleted)
TEMP TABLE same as the original, but without STATUS column
ID, NAME, SURNAME, DATE_MODIFIED, ORDERS_COUNT, etc
The key was to update the MAIN TABLE with STATUS = 0 if the ID of the MAIN table was no longer in the Temp table. ie: The source records have been deleted.
I did it like this:
UPDATE m
SET m.Status = 0
FROM tblMAIN AS m
LEFT JOIN tblTEMP AS t
ON t.ID = m.ID
WHERE t.ID IS NULL