Merge Join Transformation - Matching Multiple Columns - ssis

I Have two OLE DB Data Sources. one has the columns
ID,
Premium
and the other has
ID,
Premium,
Cost
They are both in the same data flow and both sorted.
I use them both as a source for a Merge Join Transformation. I have told it that both ID and Premium are the ones to match on.
I was hoping for the output of this merge join to be ONLY rows where both ID and Premium both match. Instead what i seem to be getting is rows which match EITHER Id or Premium.
It is set to be an Inner Join as i dont want to get columns with Nulls / non matches in hanging around.
Does anyone know how to achieve the desired output? Am I using the right Transformation tool to achieve this?

You are using the correct transformation. However, by default the merge join transformation treats NULL values as equal. You can turn this off by setting the TreatNullsAsEqual property of the transformation to false. This way, you should get the same result as a regular SQL query.

Related

Performing Merge-Join Using Derived Column as Join Key

I have created a derived column in my dataflow that is the simple concatenation of two columns. I have done this to two separate data sources. I then want to perform a merge join with my newly derived column as the outer join key. However, it doesn't seem like it is possible to accomplish this? Does anyone have experience with something like this?
The issue stems from the fact that I am unable to set a "Sort Key Position" to my newly created column as this is specified at the source. It is not possible to set this at the Derived Column transformation.
You would need to add a sort component between the merge join and the Derived Column sorting on the column(s) introduced in the Derived Column components.
While Excel is definitely going to need the derived column + Sort to make this work, I have not run into a situation where I could express an idea in the SSIS Expression language that I could not also do the same in TSQL. If you can, it will simplify your package as well as speed up the execution time.
Also, it's been my experience a Lookup Component is most often the tool people want compared to a Merge Join. If I'm augmenting an existing row, Lookup. If I need to be able to have 1 row generate 0 to many rows, then a merge join may be appropriate.
I know you had a lookup question earlier. Excel can act as a Lookup source if you use the Cache Connection Manager Excel Source as Lookup Transformation Connection

Both inputs of the transformation must contain at least one sorted column, and those columns must have matching metadata ssis

I have created an SSIS package and I used Merge Join to join a Dimension with the result of another Merge Join and I got the following error:
Both inputs of the transformation must contain at least one sorted column, and those columns must have matching metadata ssis
I have found that the issue is related to the data type of the two sorted columns, I just made a conversion to make both of them "INT" and everything is going fine.
The message is pretty clear. SSIS merge operations required that the data to be compared is sorted so comparisons are faster.
Make sure that you are retrieving ordered data from your database using the ORDER BY clause (if on SQL), and mark the columns with their corresponding order at the property IsSorted.
If you can't have the data ordered at the source, you can add a Sort operation in SSIS which will sort the merging columns (before the actual merge). You will have to do this on both flows before the merge. Please be adviced that using this componene will block the data flow until all rows are sorted.
The Merge error message will go away once you join both data flows with sorted columns.

Columns changing to Null after passing through a Merge Join Transformation

I have a project where I am migrating data from a DB2 table over to a SQL table.
I pull the data in, then pass the dataset through a Script Transformation to hash each row in the dataset (to be used for comparison later in detecting any updates that need to be made). After the Script Transformation, I sort both datasets and pass them into a Merge Join Transformation.
Here's the problem that I'm running into.
After passing the sorted datasets into the Merge Join, the resulting Left Outer Join dataset that emerges contains a NULL value for each record in the dataset....and I don't know why?
Here is a picture of my Merge Join Transformation Editor:
I enabled a data viewer earlier in my project to verify that hashes are being generated for both the Host table and the SQL table. Everything works fine up until after it passes through the Merge Join Transformation.
I have a similar project that does the exact same thing with a different table. Both are modeled and designed the same, except this one is the only one that is having this particular hiccup with the SQL Hash column.
Does anyone have any thoughts that could help me troubleshoot this?
My apologies again. After doing some digging, apparently EVERY single row that emerged from the Merge Join was a new row and I was searching for the new rows by the wrong column name. Logic error on my part.

SSIS merge join giving null results while adding more columns to one of the sources

I am trying to use the merge join in ssis; however, when i am putting in two columns on basis of which i am joining it is giving me correct results. But as soon as I add more columns in one of the sources, it gives me matches with nulls only.

Merge data from two sources into one destination without duplicates

I have data from two different source locations that need to be combined into one. I am assuming I would want to do this with a merge or a merge join, but I am unsure of what exactly I need to do.
Table 1 has the same fields as Table 2 but the data is different which is why I would like to combine them into one destination table. I am trying to do this with SSIS, but I have never had to merge data before.
The other issue that i have is that some of the data is duplicated between the two. How would I only keep 1 of the duplicated records?
Instead of making an entirely new table which will need to be updated again every time Table 1 or 2 changes, you could use a combination of views and UNIONs. In other words create a view that is the result of a UNION query between your two tables. To get rid of duplicates you could group by whatever column uniquely identifies each record.
Here is a UNION query using Group By to remove duplicates:
SELECT
MAX (ID) AS ID,
NAME,
MAX (going)
FROM
(
SELECT
ID :: VARCHAR,
NAME,
going
FROM
facebook_events
UNION
SELECT
ID :: VARCHAR,
NAME,
going
FROM
events
) AS merged_events
GROUP BY
NAME
(Postgres not SSIS, but same concept)
Instead of Merge and Sort , Use union all Sort. because Merge transform need two sorted input and performance will be decreased
1)Give Source1 & Source2 as input to UnionALL Transformation
2) Give Output of UnionALL transfromation to Sort transformation and check remove duplicate keys.
This sounds like a pretty classic merge. Create your source and destination connections. Put in a Data Flow task. Put both sources into the Data Flow. Make sure the sources are both sorted and connect them to a Merge. You can either add in a Sort transformation between the connection and the Merge or sort them using a query when you pull them in. It's easier to do it with a query if that's possible in your situation. Put a Sort transformation after the Merge and check the "Remove rows with duplicate sort values" box. That will take care of any duplicates you have. Connect the Sort transformation to the data destination.
You can do this without SSIS, too.