I'm trying to create an SSIS package that reads a mapping table that contains foreign key information and tables they point to and store the full result set to be used to populate 7 columns representing columns in the result set that is then used to update an xxxSID column on 6 servers.
I'm stuck! Please help.
I've created the SQL Task with query to build the result set and mapped to object variable SidMap and the task runs successfully however, I don't know where to go from there. Some blogs say create a ForEachLoop Container and map the object variable to the collection which I've done. I've also created string variables representing the 7 columns but don't know how to populate them.
The blogs I've read so far suggest this can only be done from a Script task. Is that true? If so how is it done?
Another user posted a question that sounded like he may be doing the same or very similar thing using a SQL Task but I didn't see how he was populating the column object variables and then converting data into string variables.
SSIS Result set, Foreachloop and Variable
Currently I'm updating tables manually using a cursor. If anyone cares to see the code I can post it but didn't think it relevant to the question other than providing a clear picture of what I'm doing.
I would create a For Each Loop Container using the Foreach ADO Enumerator, and map the object variable to the collection. I would map the 7 string variables on the Variable Mappings page.
This process is documented in detail here:
http://technet.microsoft.com/en-us/library/cc879316.aspx
A common "gotcha" is mismatched datatypes between the result set and the Variables. To avoid this I always wrap CAST ( ... AS NVARCHAR ( 4000 ) ) or similar around the columns in the dataflow that produces the dataset, and all my receiving Variables are String datatype.
Related
I am trying to accomplish something that is pretty easy to do in SQL, but seemingly very challenging to do in SSIS without using SQL. Basically, I need to consolidate and concatenate a field of a many-to-one relationship.
Given entities: [Contract Item] (many) to (one) [Account]
There is a field [ari_productsummary] that contains the product listed on the Contract Item entity. We want to write that value to the Account as [ari_activecontractitems]. However, an Account may have more than one Contract Item record associated to it, in which case, we want to concatenate those values. We also only want the distinct values to be concatenated (distinct rows already solved within my data flow).
This can be accomplished by writing to a temporary table, and then using a query or view to obtain the summarized results as followed. I created a SQL table called TESTTABLE that contains the [ari_productsummary] from the Contract Item entity along with the referring [accountid] to map it back to Account. I then wrote the following query as a view:
SELECT distinct accountid,
(SELECT TT2.ari_productsummary + '; '
FROM TESTTABLE TT2
WHERE TT2.accountid = TT.accountid
FOR XML PATH ('')
) AS 'ari_activecontractitems'
FROM TESTTABLE TT
Executing that Query provides me the results that I want, which I can then use for importing into the Account entity as shown below:
But how do I do this in a SSIS dataflow without writing to a SQL table as a temporary placeholder for the data?? I want to do the entire process inside one dataflow container, without using a temporary SQL table/view. The whole summarization process needs to be done on the fly:
Does anyone have a solution that doesn't require a temporary SQL table/view/query, but is contained entirely within the data flow?
I am using VS 2017 and the KingswaySoft Dynamic CRM 365 ETL toolset to develop my solution/package.
Spit balling here as I don't Dynamics nor do I have the custom components.
Data Flow 1 - Contract aggregation
The purpose of this data flow is to replicate your logic in the elegant query you provided and shove that into a Cache Connection Manager (see Notes for 2008+ at the end)
KingswaySoft Dynamics Source -> Script Task -> Cache Transform
If you want to keep the sort in there, do it before the script task. The implementation I'll take with the Script Task is that it's fully blocking - that is all the rows must arrive before it can send any on. Tasks like the Merge Join are only partially blocking because the requirement of sorted data means that once you no longer have a match for the current item, you can send it on down the pipeline.
The Script Task is going to be asynchronous transformation. You'll have two output columns, your key accountid and your new derived column of ari_activecontractitems. That column will might need to be big - you'll know your data best but if it's a blob type in Dynamics (> 4k unicode or > 8k ascii characters) then you'll have to define the data type as DT_TEXT/DT_NTEXT
As inputs, you'll select accountid and ari_productsummary from your source.
The code should be pretty easy. We're going to accumulate the inbound data into a Dictionary.
// member variable
Dictionary<string, List<string>> accumulator;
The PreProcess method, we'll tack this in there to initialize our variable
// initialize in PreProcess method
accumulator = new Dictionary<string, List<string>>();
In the OnBufferRowSent (name approx)
// simulate the inbound queue
// row_id would be something like Rows.row_id
if (!accumulator.ContainsKey(row_id))
{
// Create an empty dictionary for our list
accumulator.Add(row_id, new List<string>());
}
// add it if we don't have it
if (!accumulator[row_id].Contains(invoice))
{
accumulator[row_id].Add(invoice);
}
Once you get the signal sent of no more data available, that's when you start buffering output data. The auto generated code will have placeholders for all this.
// This is how we shove data out the pipe
foreach(var kvp in accumulator)
{
// approximately thus
OutputBuffer1.AddRow();
OutputBuffer1.row_id = kvp.Key;
OutputBuffer1.ari_productsummary = string.Join("; ", kvp.Value);
}
We have an upcoming release that comes with a component that does exactly what you are trying to achieve without the need of writing custom code. The feature is currently under preview, please reach out to us for private access to the feature. You can find our contact information on our website.
UPDATE - June 5, 2020, we have made the components available for public access at https://www.kingswaysoft.com/products/ssis-productivity-pack/ as a result of our 2020 Release Wave 1. We have two components available that serve this kind of purpose. The Composition component will take input values and transform into a composite value in a SSIS column. The Decomposition component does the opposite, it would take an input value and split it into multiple rows using either delimiter-based text splitting or XML/JSON array splitting.
I want to insert values into database when the biml code is ran and the package has completed expansion is this possible using BIML or c#?
I have a table called BIML expansion created in my DB and I have test.biml which loads the package test.dtsx whenever the BIML expansion is completed a record should be inserted into my table that expansion has been completed.
Let me know if you have any questions or needs any additional info.
From comments
I tried your code
string connectionString = "Data Source=hq-dev-sqldw01;Initial Catalog=IM_Stage;Integrated Security=SSPI;Provider=SQLNCLI11.1";
string SrcTablequery=#"INSERT INTO BIML_audit (audit_id,Package,audit_Logtime) VALUES (#audit_id, #Package,#audit_Logtime)";
DataTable dt = ExternalDataAccess.GetDataTable(connectionString,SrcTablequery);
It has an error below must declare the scalar variable audit_id can you let me know the issue behind it?
In it's simplest form, you'd have content like this in your Biml script
// Define the connection string to our database
string connectionStringSource = #"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;Provider=SQLNCLI11.1";
// Define the query to be run after *ish* expansion
string SrcTableQuery = #"INSERT INTO dbo.MyTable (BuildDate) SELECT GETDATE()";
// Run our query, nothing populates the data table
DataTable dt = ExternalDataAccess.GetDataTable(connectionStringSource, SrcTableQuery);
Plenty of different ways to do this - you could have spun up your own OLE/ADO connection manager and used the class methods. You could have pulled the connection string from the Biml Connections collection (depending on the tier this is executed in), etc.
Caveats
Depending on the product (BimlStudio vs BimlExpress), there may be a background process compiling your BimlScript to ensure all the metadata is ready for intellisense to pick it up. You might need to stash that logic into a very high tiered Biml file to ensure it's only called when you're ready for it. e.g.
<## template tier="999" #>
<#
// Define the connection string to our database
string connectionStringSource = #"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;Provider=SQLNCLI11.1";
// Define the query to be run after *ish* expansion
string SrcTableQuery = #"INSERT INTO dbo.MyTable (BuildDate) SELECT GETDATE()";
// Run our query, nothing populates the data table
DataTable dt = ExternalDataAccess.GetDataTable(connectionStringSource, SrcTableQuery);
#>
Is that the problem you're trying to solve?
Addressing comment/questions
Given the query of
string SrcTablequery=#"INSERT INTO BIML_audit (audit_id,Package,audit_Logtime) VALUES (#audit_id, #Package,#audit_Logtime)";
it errors out due to #audit_id not being specified. Which makes sense - this query specifies it will provide three variables and none are provided.
Option 1 - the lazy way
The quickest resolution would be to redefine your query in a manner like this
string SrcTablequery=string.Format(#"INSERT INTO BIML_audit (audit_id,Package,audit_Logtime) VALUES ({0}, '{1}', '{2})'", 123, "MyPackageName", DateTime.Now);
I use the string library's Format method to inject the actual values into the placeholders. I assume that audit_id is a number and the other two are strings thus the tick marks surrounding 1 and 2 there. You'd need to define a value for your audit id but I stubbed in 123 as an example. If I were generating packages, I'd likely have a variable for my packageName so I'd reference that in my statement as well.
Option 2 - the better way
Replace the third line with .NET library usage much as you see in heikofritz on using parameters inserting data into access database.
1) Create a database Connection
2) Open connection
3) Create a command object and associate with the connection
4) Specify your statement (use ? as your ordinal marker instead of named parameters since this is oledb)
5) Create an Parameter list and associate with values
Many, many examples out there beyond the referenced but it was the first hit. Just ignore the Access connection string and use your original value.
Ive been searching the internet to what I thought would be a straight forward question to answer. Hope you guys can help?
I am using for each loop to look for specific files and move them with file system task to a different folder.
Say I have 10 csv files. called listed A to J
I only want to move a,e and j but cant seem to get the foreachloop to look for that group.
In the enumerator Files text box i have tried inserting the 3 file names split by various separators, but SSIS thinks its all one specific file and none of the 3 get moved.
Can someone advise how it can be done? Just to confirm, I dont to use wild card logic just group of specific file names - similur to the IN function of SQL query
Thanks in advance
now added img - please advise how to slect 3 specific files in the text box with arrow
Since OP isn't able to proceed with just my comment, I'll explain a bit more in detail -
Use an EXECUTE SQL TASK to dump the names of the files needed into an SSIS object (to do this you could use a stored procedure or an SQL query). Create an object-type variable in the variables tab prior, change the output in the EXECUTE SQL TASK to Full Result Set and map the result to the object you just created. Now this object holds the list of files you need to loop through.
Now drag-and-drop a ForEach container from the SSIS toolbox. It should be configured as a ForEach ADO Enumerator and map the object to it. Create another variable of type string that will hold the file names after each iteration of the ForEach container. Map this also in the Variables tab of the ForEach container.
Now, place the File System Task which you would use to move these files into the ForEach loop. Use the file-name-variable you created to move just the required files.
Now if you're not sure what SQL query to use for your case in step 1 to get the 3 file names -
SELECT 'A.csv'
UNION
SELECT 'E.csv'
UNION
SELECT 'J.csv'
My SSIS package fails with the error:
The type of the value being assigned to variable differs from the
current variable type.
I declared a variable of type string and corresponding column name in SQL table is varchar(33). If I use data type as object it succeeds but again i need to use the variable value in an expression for connection string and there object type is not supported.
Please help me how to proceed.
Note: The goal of my SSIS package is to get server list from a table (that's where it fails) and execute some script with a foreach loop container in all the servers.
If I'm reading your question right, what you actually need are two variables: one is an Object variable that receives the data from your SQL table, and one is a String value that you will pass to your ForEach loop.
The difference between them is that your Object variable can hold multiple rows, while your String value can only hold a single item at a time. What you will do is declare an Object variable, populate it with a list of the servers, then use a ForEach loop to step through each item in the list, feeding the values one at a time to your String variable. You will then use the String variable to set up the script you mentioned, and execute it once per server.
For further reading, there is an excellent walkthrough here that will give you screenshots and examples of what I am talking about.
I know i am replying to this thread very late, below explanation is for people who see this in future.
I encountered this situation recently. I did the below
Firstly my Record Set Destination Variable of object type. Using For Each Loop container, i mapped each and every row data(with multiple columns) to an variable(s) of OBJECT type. When i use any other datatypes other than OBJECT datatype for variable(s) i got the error as below
Error: ForEach Variable Mapping number 3 to variable "User::EMPID" cannot be applied.
Then I used the variable(s) of object type in my Execute SQL Task and it inserts data successfully without any error.
Thumb Rule: Map Record Set Destination output to the variable of
OBJECT type.
I encountered this situation recently. I did the below
Firstly my Record Set Destination Variable of object type.
Using For Each Loop container, i mapped each and every row data(with multiple columns) to an variable(s) of OBJECT type. When i use any other datatypes other than OBJECT datatype for variable(s) i got the error as below
Error: ForEach Variable Mapping number 3 to variable "User::EMPID" cannot be applied.
Then I used the variable(s) of object type in my Execute SQL Task and it inserts data successfully without any error.
Thumb Rule: Map Record Set Destination output to the variable of OBJECT type.
In my SSIS package I have a dataflow that looks something like this.
My requirement is to log the end time of each flatfile destination (Or the time when each of the flat files is created) , in a SQL server table. To be more clear, there will be one row per flatfile in the log table. Is there any simple way(preferably) to accomplish this? Thanks in advance.
Update: I ended up using a script task after the dataflow and read the creation time of each of the file created in the dataflow. I also used same script task to insert logs into the table, just to keep things in one place. For details refer the post masked as answer.
In order to get the accurate date and timestamp of each flat file created as the destination, you'll need to create three new global variables and set up a for-each loop container in the control flow following your current data flow task and then add to the for-each loop container a script task that will read from one flat file at a time the date/time information. That information will then be saved to one of the new global variables that can then be applied in a second SQL task (also in the for-each loop) to write the information to a database table.
The following link provides a good example of the steps you'll need to apply. There are a few extra steps not applicable that you can easily exclude.
http://microsoft-ssis.blogspot.com/2011/01/use-filedates-in-ssis.html
Hope this helps.
After looking more closely at the toolbox, I think the best way to do this is to move each source/destination pairing into its own dataflow and use the OnPostExecute event of each dataflow to write to the SQL table.
Wanted to provide more detail to #TabAlleman's approach.
For each control flow task with a name like Bene_hic, you will have a source file and a destination file.
On the 'Event Handlers' tab for that executable (use the drop-down list,) you can create the OnPostExecute event.
In that event, I have two SQL tasks. One generates the SQL to execute for this control flow task, the second executes the SQL.
These SQL tasks are dependent on two user variables scoped in the OnPostExecute event. The EvaluateAsExpression property for both is set to True. The first one, Variable1, is used as a template for the SQL to execute and has a value like:
"SELECT execSQL FROM db.Ssis_onPostExecute
where stgTable = '" + #[System::SourceName] + "'"
#[System::SourceName] is an SSIS system variable containing the name of the control flow task.
I have a table in my database named Ssis_onPostExecute with two fields, an execSQL field with values like:
DELETE FROM db.TableStats WHERE TABLENAME = 'Bene_hic';
INSERT INTO db.TableStats
SELECT CreatorName ,t.tname, CURRENT_TIMESTAMP ,rcnt FROM
(SELECT databasename, TABLENAME AS tname, CreatorName FROM dbc.TablesV) t
INNER JOIN
(SELECT 'Bene_hic' AS tname,
COUNT(*) AS rcnt FROM db.Bene_hic) u ON
t.tname = u.tname
WHERE t.databasename = 'db' AND t.tname = 'Bene_hic';
and a stgTable field with the name of the corresponding control flow task in the package (case-sensitive!) like Bene_hic
In the first SQL task (named SQL,) I have the SourceVariable set to a user variable (User::Variable1) and the ResultSet property set to 'single row.' The Result Set detail includes a Result Name = 0 and Variable name as the second user variable (User::Variable2.)
In the second SQL task (exec,) I have the SQLSourceType property set to Variable and the SourceVariable property set to User::Variable2.
Then the package is able to copy the data in the source object to the destination, and whether it fails or not, enter a row in a table with the timestamp and number of rows copied, along with the table name and anything else you want to track.
Also, when debugging, you have to run the whole package, not just one task in the event. The variables won't be set correctly otherwise.
HTH, it took me forever to figure all this stuff out, working from examples on several web sites. I'm using code to generate the SQL in the execSQL field for each of the 42 control flow tasks, meaning I created 84 user variables.
-Beth
The easy solution will be:
1) drag the OLE DB Command from the tool box after the Fatfile destination.
2) Update Script to update table with current date when Flat file destination is successful.
3) You can create a variable (scope is project) with value systemdatetime.
4) You might have to create another variable depending on your package construct if Success or fail