Can I use SQL on the output of a Merge Join? - ssis

I'm using SSIS to fetch data from two Informix servers. I'm using a Merge Join object to combine the data together. Now, I need to summarize the data for reporting.
I worked up the T-SQL for pivoting, counting and summing the data I want, but I don't know how to do that in SSIS. I just want to run a query against the output of the Merge Join. How do I do that?
The pivot object looks too simplex for the job.
Thanks!

I'd suggest that you insert the product of the merge join to a staging table and finish off the dataflow that way. Then start a new data flow with your t-sql code (referencing the staging table) as the source and then run it directly into the destination.

I would try sending the results of your data flow task to a object-typed variable. I believe you can manipulate the data directly in memory rather than dumping it to disk.
http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/64014/

Related

Passing 50000+ parameters in WHERE clause using SSIS package

I have a query to extract data from server. The server contains millions of data and I need to filter out only 56000 of Doc IDs out of those. Could anyone help me build an SSIS? I cannot use Merge here since I would ave to extract the entire data from the server and then merge it with 56000 IDs.
Put your 56000 IDs into a table called e.g. IDsWanted on the server. Join your data table to IDsWanted in an SQL query, and use this as the source for your SSIS operations.
Obviously a PK on column IDsWanted.ID will help performance of this query.

SSIS: recordset or temp table

I have an SSIS application that needs to get data from 2 databases of different servers (not link). I need to get the match names and DOB records between 2 database then use the results to insert/update a table.
My initial approach is to use OLE DB source then Merge Join and put the results to recordset. Then on controlflow, use the results of the recordset to insert/update a table. But I can't see the recordset at the control flow.
Alternative solution is to create temp tables. But the temp tables are not visible since they reside at the tempdb database of each servers.
What is a better approach for this problem?
what do you mean by put the results to recordset?
If you join two sources on the data flow using a join, that "recordset" on the join will only be available during the current dataflow. You cant use it on the control flow after the data flow is finisehd.
why cant you just insert the resultset on the destination DB? You can perform any other transform operation on the same data flow and insert the result on the destination database.
Or, if you really need to do something that can only be done on the control flow before insert the data, you can yes, insert the recordset on a temp table on the destination using a oleDBDestination and access in on another dataflow (not a very good approach, though)
In this case, I would keep a database around for work table or create a schema for those work tables.
Next, add a SQL control flow task that truncates the table that will hold the intermediate result. After this, load the intermediate result set into the table, do the operation and optionally, truncate the table again.
The recordset destination is fine for smaller datasets. But if you plan to use it for larger datasets that dont fit memory it will be very slow.
If you dont have a database/schema that can serve as a workspace, you could use RAW files to hold the intermediate result. Those are very fast too.

complex query as source

I have a query to be used as source. but this is a huge query containing lot of temp tables created and finally joining which brings select data. So I used Script Task for this query which works perfectly. Is there any other way instead of Script task ?
If you use CTEs instead of temp tables you can directly use it as a source query in the OLE DB Source.
Alternatively, you could keep your current logic in a script task but then insert the data from the final select into a temporary (physical) table. The data flow task could then do a simple select directly on that temp table.

SSIS two staging tables

I would like to bring in an XML source and do data conversion and update it in a table. Data from this table will be used to update another table. How to accomplish this in SSIS?
I understand the first two steps. But lost after that.
XML Source (under dataflow task)
Data Conversion
OLE DB Destination? (If I use OLE DB Destination, then I cannot use that as a source again to update another table). What component should I be using to accomplish this?
TIA
Within a dataflow you can split the records to go to multiple tables using either a conditional split (if you want some records to go one way and some to go another way) or a mulicast task if you want all records to go to both destinations. We use a multicast to create two staging tables, one where the raw data from the file will stay and one where the data will be cleaned and transformed before going into our prod tables. This enables us to easily research if some problem data that came in was due to our transformation process (a bug) or bad data being sent (a problem at the client end, but which might require more steps to handle if they can't fix).
You can also have multiple data flows that all have the same source. Or you can insert to one staging table and then have a second data flow or exec SQL task to move that data to where you want it.
Use the OLE DB Destination to inject your XML source data into your staging table. Then, in your control flow use an Execute SQL task after your data flow task to execute a stored procedure or T-SQL script to move your data from the staging table into the production table(s) and truncate the staging table if required.
I've found that SSIS is great for ETL work, but moving data around inside a DB or aggregation work is best carried out using T-SQL in stored procs. Easier to write, control and you know you're not going to have any RBAR shenanigans you can happen upon in a DFT.
YMMV

Join multiple tables using MergeJoin programatically using ssis in C#

Any idea how to join multiple tables in ssis programatically using Merge Join ? I'm using c#.
thanks,
Jibin
First of all, I would caution you in using the merge join as far as it can cause performance issues. If you are pulling data from the save server, I would recommend joining the data in your proc or inline sql. If your particular case is that you have heterogeneous data sources(excel, Oracle, etc.) or data on different sources, then all you need to do to use the merge join is drag your data flow buffer outputs from your data sources to your merge join component. After a little configuration, which is explained here http://www.mssqltips.com/tip.asp?tip=1322, you then are done. Note- Dont forget that your data has to be sorted before you can use the merge join.