Removing duplicate rows based on a null value - ssis

We have a SSIS package that does a union all on 2 tables and we need to get rid of duplicate rows if the row is a duplicate and a column value is null. Any ideas?

You can use the aggregate component to group by all the columns resulting in a distinct list for you destination
You will also need to use a conditional split to branch the null column away from your target table as follows

Related

SQL loader when clause conditions to satisfy if a column is starting with certain four digits then skip the row

I am trying to create a SQL loader CTL file and required to ignore rows that are starting with specific number in a column. suppose a column record_id =1211.6540D then ignore the row if the record column value starts with 1211.
I tried with position of numbers but the columns values before the record_id are not constant.
Can you please help.

id not auto incremented and duplicate records

This is my table tb_currency_minimum_amount
When I am inserting a record using
INSERT IGNORE INTO tb_currency_minimum_amount ( id, currency_id, payment_method, minimum_amount) VALUES (NULL, 1, 16, 0.02)
again and again it creates a new entry
rows with id 32,33,34 are same
It i assigning it id 32 not 27 and ignoring duplicates
2:
The purpose of an AUTO_INCREMENT column is to ensure unique identifiers for the rows in the table.
It is just an implementation detail that it uses consecutive integer values. It is not a requirement for these values to be consecutive. Your code must not rely on the values being consecutive.
The value of the AUTO_INCREMENT is increased by each INSERT statement when a value for that column is not provided, no matter if the updated value is used (the query creates a new row) or not (it fails because of a duplicate key constraint or it updates an existing row because of ON DUPLICATE KEY).
Auto increment is count every record with +1. But it does not check of other columns are unique or duplicates.
In your id column which is auto increment I do not see any duplicates.
Could you elaborate your purpose what you trying to archive.
It is adding assigning it id 32 not 27 and ignoring duplicates
If i understand your question correctly, you should use the SQL SELECT DISTINCT Statement.
The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.
The SELECT DISTINCT statement is used to return only distinct (different) values.

How to check each row in a table for at least one null or constant value in SSIS?

I have a table with 20 fields.
1) Is there any way to check all fields if any of them are NULL?
2) For all columns that are nvarchar i want to search if they contain a constant string value e.g. "Missing".
Can i do the above in SSIS without having to check the columns one by one?
There is no Null-inator that would check if any column is null in a row. However, you could concatenate all the columns in the row and if any column is null then the result will be null. So in a derived column task it could look like this:
Field1 + Field2 + (DT_WSTR, 50)Field3...
As demonstrated in field3, all fields that are not string would need to converted.
Similarly, you can find a keyword, like "missing" by concatenating all the fields in a derived column task and using FINDSTRING(). That could looks like this:
FINDSTRING(Field1 + Field2, "Missing",1)
If the value is greater than 0, you have a hit. There a drawbacks though. Given that your columns are nullable, you would need null handling on all the columns unless this test was only performed on rows where there are no nulls. Also, it doesn't tell you which column has the value, so that probably is not very useful unless you are rejecting or quarantining the entire row.

Not able to insert merge values in one row using ssis?

I have two excel source on one excel file 1st fetching value date and 2nd fetching value price
now i have tried merge then union all also to get those two values in one derived column
but when i execute my package it is inserting values separately.
like this into two rows one by one but i want to insert these two values in one row only.
for example this is my problem:
date price
12-12-2001 null
date price
null 54
but i want it in one row only like
date price
12-12-2001 54
Create two derived columns with the same value (i.e. call them id1 and id2 and set both to have a value of 1).
Change the sort to sort by the new id columns.
Change the merge component to a merge join and use the newly created ids to link the data based on an inner join
which will give you a single row

Deleting Duplicates in Access 2003

I have an Access 2003 table with ~4000 records which was made from 17 different tables. Roughly half of these records are duplicates. There is no unique identifying column (id, name etc). There is an id column which was auto filled when the tables were combined meaning that the duplicates aren't completely identical (though this column could be removed if it makes things easier).
I have used the Access Find Duplicates Query Wizard which gives me a list of the duplicated records but won't let me delete them (seriously what use is this query if I can't delete them?). I've tried converting the generated query to a remove query but that changes the number of rows that it finds. I'd alter the sql by hand but it's a bit beyond me and is 7 lines long.
Does anyone know a good way of getting rid of the duplicates?
The reason the find duplicates query won't let you delete the records is because it is basically just an aggregate query, it is counting the number of duplicates it finds and returning the cases where the count is greater than 1.
Consider that if you did make a delete query based on the find duplicates, it would delete all rows that have duplicate values, which is maybe not what you want. You want to delete all but one of the duplicates.
You should try to delete all duplicates of a record apart from one, excluding the ID column in your comparison. I suggest the simplest way to do this is to make a make-table query of all the unique values (Select Distinct Field1, Field2... from MyTable) instead for every field except for the ID field, using the results in a to create a new table of around 2000 records (if half are duplicates).
Then, create an ID column on your new table, use an update query to update this ID to the first matching ID in the original table (you could do this using DLookup, which will return the first EXPRESSION value where CRITERIA is true in DOMAIN).
The DLookup() function returns one
value from a single field even if more
than one record satisfies the
criteria. If no record satisfies the
criteria, or if the domain contains no
records, DLookup() returns a Null.
Since you are identifying the first matching ID based on all the other fields, which are unique values, the unmatched IDs will belong to duplicates. You will be reversing the PK relation, identifying the first matching key given a set of unique fields. After that, you should set the ID to be PK. Of course this assumes the ID has no inherent meaning, and you don't care about keeping one particular ID for a given duplicated row over any of the IDs belonging to the other duplicated rows. This assumes you care about the data in the ID column so you want to preserve it for all remaining rows, otherwise just ignore the DLookup step and do a Select Distinct on all columns apart from the ID.
Use a select with all columns except the ID column:
SELECT DISTINCTROW Column1, Column2, Column3
INTO MYNEWTABLE
FROM TABLE
You can simply swap the names.
This solution will give you a new table with non duplicates.
The following will preserve original IDs and do it in one step:
DELETE FROM table_with_duplicates
WHERE table_with_duplicates.id NOT IN
(SELECT max(id)
FROM table_with_duplicates
GROUP BY duplicated_field_1, duplicated_field_2, ...
)
Now you have the original table with no duplicates and preserved ids.
And always remember to backup you data before trying large DELETEs.
DELETE * FROM table_with_duplicates
WHERE table_with_duplicates.ID In
(SELECT max(ID)
FROM table_with_duplicates
GROUP BY [duplicated_field_1]
HAVING Count(*)>1
)
Actually I Found A very simple solution took a while but it all of your fields across are the same like a complete duplicate record then just make one query with every field and sort by "Group BY". Thus the duplicates will combine and you can just append this information to a new table and rename it the same as the existing table. If you have a primary key field you could just ignore it in the query and then it would still combine the data (assuming that you don't care about the data in the primary field). I don't know why no one has mentioned this solution took me 5 hr. to come up with. :)