There is duplicates in SSIS - ssis

I have done a SSIS package that shall insert values if they dont match the lookup
I use a look up on 4 columns
Then i did a check on the source data
(combine the 4 columns into a nvarchar to easy check it)
Select value
From X
Group by value
Having count(value) > 1
There is no duplicates
but still it say its duplicates
The Package don't error it just spinning and say its a duplicates in the output.
Is there a way to catch what duplicate value it think it is?

Related

SSIS- Update few columns of a row for which the primary key already exists

The following is an example to better explain my scenario. My database table has following columns
Column -1: Operating_ID (which is the primary key)
Column -2: Name
Column -3: Phone
Column -4: Address
Column -5: Start Date
Column -6: End Date
The values for the columns 1,2,3,4 come from an extract and this extract is pushed to the database daily using SSIS data flow task
The values for the columns 5 and 6 are user inputted from a web applicxation and saved to the database.
Now in SSIS process instead of throwing violation of primary key error, i need to update Columns 2,3,4 if the primary key i.e column 1 already exists.
First i considered replace but that deletes the user inputted data columns 4,5.
I would like to keep the data in columns 4,5 and update columns 2,3,4 when column 1 already exists.
Do a LOOKUP for Operating_ID. Change that lookup from "FAIL ON NOT FOUND" to "REDIRECT ROWS TO NO MATCH"
If match not found, go to INSERT
If match found, go to UPDATE. You can run OLAP commands to update, but if it is a large set of data, you are better off putting into a table, and doing an UPDATE with a JOIN
This is what I would do. I would put all the data in a staging table. Then I woudl use a data flow to insert the new records and the source of that dataflow would be the staging table with a not exists clause referencing the prod table.
Then I would use an Execute SQL task in the control flow to update the data for existing rows.

How to run a series of checks governed by a mysql table and store the results

I have to run a series of checks (governed by the table "Checks") and store the results in a table "Checks_result" (in a mysql database).
The table "Checks" contains an identifier (checkno) and a sql-statement (possibly returning many rows with a single value) to be executed.
The table "Check_results" has to contain all the rows returned from the sql-statement, with a reference to checkno and an autoincrement row checkentry for each returned row from the sql-statement.
Is it possible to do this?
What I was suggesting was when your table has the 2 SQL statements, you should read each record and construct another SQL statement along the lines of:
insert into check_results(checkno, checkresult )
select 1, i.val1-i.val2 from import i;
The select just needs the checkno added into it and the checkentry should be an autoincrement column.

SSIS-Replace the duplicate column with empty string keeping the original column

Can anyone please help me with below Requirement.
I have a requirement to check if a column in a record matches with any other column i want to replace the duplicate column with empty string.
Say i have x1,x2,x3 columns. How to check if x1 matches with any of the x1,x2,x3 columns and if it matches i want to replace the duplicate column with empty string.
Doing this is more complexe than one would expect. Here are 2 options:
Try the fuzzy lookup by duplicating the file and comparing it with itself with a high threshold. I suspect you want to check for the same record if there is a match on other columns so you will need to create an exact match on the key (go under the Columns tab and right click on the link, Edit Mappings) and do the fuzzy on the others. You can only link a field once so duplicate the columns as needed.
Do a stored proc with all the combinations and have it generate an out table with the results (you can run a stored proc using the OLE DB Command). I would probably go with that one if I am sure of the "exactness" of the data. Otherwise, go with the fuzzy.
Since you only have a few columns, you could just run a set of update statements like the following:
update Contacts
set Phone2 = null
where Phone2 = Phone1
update Contacts
set Phone3 = null
where Phone3 = Phone1
update Contacts
set Phone3 = null
where Phone3 = Phone2
Accomplishing this task within an SSIS dataflow would be a bit tricky, because you would be trying to compare all of the other rows in all the buffers compared to the current row.
Instead, I would recommend staging the data in a table as Gordon Bell has suggested. Then you need to determine which row wins when a duplicate is found. You might have a date column to sort it out, or you may add a row number column to the data flow in ssis and sort by how you received the data.
Here is an example of how you might find the winning row and update others with a self join: Deleting duplicate record in SQL Server
m

SSIS Lookup data update

I have created a SSIS Package that reads data from a CSV file and loads into table1 . the other data flow tasks does a look up on table 1 .Table1 has columns x , y, z, a ,b . Table 2 has columns a , b ,y,z Lookup is done based on columns y and z . Based on the column y and z , it is picking up a and b from table 1 and updating table 2 . The problem is the data gets updated but i get multiple rows of data thats is one without updation and one after updation .
I can provide more clear explanation if needed .
Fleshing out Nick's suggestion, I would get rid of your second data flow (the one from Table 2 to Table 2).
After the first Dataflow that populates table 1, then just do an EXECUTE SQL task that performs an UPDATE on Table 2, and joins to Table 1 to get the new data.
EDIT in response to comment:
You need to use a WHERE clause that will match rows uniquely. Apparently Model_Cd is not a UNIQUE column in JLRMODEL_DIMS. If you cannot make the WHERE clause unique because of the relationship between the two tables, then you need to select either an aggregate [Length (cm)] like MIN(), MAX() etc, or you need to use TOP 1, so that you only get one row from the subquery.

Show Duplicates on MySQL bulk insert

I have a CSV file which I need to enter into my Database. My modus operandi is a Bulk insert. One of the columns has a uniqueness constraint attached to it but it is not the primary column. If there is a duplicate entry, it correctly skips the line and does not enter it into the database. (On command line it indicates Duplicates: n, where n is the total number of duplicates).
Is there anyway I can retrieve the duplicate row numbers ? For instance, using Show Warnings or Show Errors, it states the last MySQL errors and warnings, is there anyway I can retrieve the duplicates from MySQL alone ?
Thanks.
You could enter the data into a temporary table first, without the uniqueness constraint, and perform a query to find all the duplicates.
SELECT unique_column, count(*) c
FROM temp_tablename
GROUP BY unique_column
HAVING c > 1;
Then copy it from the temporary table to the real table with:
INSERT IGNORE INTO tablename SELECT * FROM temp_tablename;