Processing records in a sql task - ssis

Using SSIS, I have retrieved a recordset from a query in a dataflow task.
I have a foreach loop that iterates through the records one at a time.
What I need to accomplish is to perform an update on each record.
The two variables are named CUSTOMER_NUMBER and DECEASED_DATE customer_number is a varchar, and deceased date is a date type.
update set deceased = 'T', deceased_date = #deceased_date where customer_no = #customer_number
What do I need to change in the above query to use the variables?

Related

In a VS Report Server Project can I retrieve and save Parameter values?

We have a company website where BI reports are hosted. For one particular report (and possibly for others, if this can be made to work), there is a requirement to:
a) retrieve saved values for report parameters
and
b) to save any changed values for report parameters
I know that parameter values can be retrieved from data by setting the Default Values to "Get values from a query".
However, what I would like to do is when the user presses View Report that the values that [s]he has selected should be saved to a database so that these will then form the default values for the next user.
Can this be done? There doesn't seem to be any way "out of the box".
This is quite simple.
Lets assume you had a table of Countries that drive your parameter's available values and that this table myCountryTable has two columns CountryID and CountryName.
You available values dataset would be something simple like
SELECT * FROM myCountryTable
CountryID would be the parameter value and CountryName would be the parameter label.
OK so you will have probably done all the above already.
Now, in your main dataset query simply add an INSERT statement before you main query runs.
So, if you dataset query looks like this..
SELECT * FROM SomeBigTable WHERE CountryID in (#CountryID)
you would change it to something like
INSERT INTO myLogTable
SELECT CountryID, CountryName FROM myCountryTable WHERE CountryID IN (#CountryID)
-- original query follows
SELECT * FROM SomeBigTable WHERE CountryID in (#CountryID)
Note: If you cannot change your main dataset query for whatever reason, you can do this in a separate dataset but there are a few things you will have to do
First: Change the sql so that it returns a value at the end, anything will do e.g.
INSERT INTO myLogTable
SELECT CountryID, CountryName FROM myCountryTable WHERE CountryID IN (#CountryID)
SELECT 1 as myReturnValue
Second: You must bind this dataset to something on the report such as a table or list, this is to make sure the query only executes when the report is executed, not when parameters are changed.
You could store parameters and their values every time the report is executed.
Note: Some of these integrated SQL functions maybe do not exist on your server, which depends on the server version. If that is the case, it is easy to find alternative, or even create your own function.
For example, at the end of every stored procedure that is used by report place this part of SQL query that uses newly created table dbo.ReportParameterValuePairs:
INSERT INTO dbo.ReportParameterValuePairs
(ReportName, ParameterValuePair, ExecutionDateTime)
VALUES(
'MyReport',
'$$$parameter1$$$: ' + #parameter1 + ',' +
'$$$parameter2$$$: ' + #parameter2,
GETDATE())
Later on will be clear why are these data stored and why in this way.
Nest step would be creating procedure which will retrieve value of some parameter during the last execution of report:
CREATE PROCEDURE spRetrieveReportParameterValue
#parameter NVARCHAR(100),
#report NVARCHAR(100)
AS
BEGIN
-- this is an example
DECLARE #parameters NVARCHAR(MAX) = '$$$parameter1$$$: value1, $$$parameter2$$$: value2'
-- in reality parameter-value pairs will be retrieved from database
--DECLARE #parameters NVARCHAR(MAX) =
-- (SELECT TOP 1 ParameterValuePair
-- FROM dbo.ReportParameterValuePairs
-- WHERE ReportName = #report
-- ORDER BY ExecutionDateTime DESC)
--SELECT #parameters
DECLARE #parameterValuePair NVARCHAR(200) =
(SELECT * FROM STRING_SPLIT (#parameters, ',')
WHERE
VALUE LIKE '%$$$' + #parameter + '$$$%')
--SELECT #parameterValuePair
DECLARE #value NVARCHAR(100) =
(SELECT * FROM STRING_SPLIT (#parameterValuePair, ':') WHERE value NOT LIKE '%$$$%')
SELECT TRIM(#value) AS ParameterValue
END
Parameters of the procedure are: parameter which value is needed, report that is executing.
Parameter-value pairs are stored in a single string. To access that data search table dbo.ReportParameterValuePairs for currently executing report. Order data by date and time of execution, starting from the latest.
Parameter-value pairs string will be split using ,. The result of this split is a table that consists of parameter-value pairs. Distinction between parameters and their values is $$$ mark. Because of that the condition in query is VALUE LIKE '%$$$' + #parameter + '$$$%'.
Variable #parameterValuePair now stores desired parameter and its value.
After another one split, this time using : because it separates value from parameter name, the result of split will be two rows. One contains parameter and $$$ marks ($$$[parameter]$$$) and the other contains the value. Using condition WHERE value NOT LIKE '%$$$%' parameter's value will be stored to #value variable.
Last step of the procedure is to trim the value in case there are empty spaces at the end and at the beginning of the #value and return it as ParameterValue.
In order to retrieve this value to report create DataSet for every report parameter. This DataSet will supply parameter with default value:
right click on DataSets
choose Add Dataset
choose tab/card Query
name DataSet
select Data source
for query type choose Text
enter spRetrieveReportParameterValue 'parameter1', 'MyReport' where parameter1 is name of parameter which last value will be retrieved
click Refresh Fields
The last step is to set default value to the parameter:
right click on parameter
select Parameter Properties
choose card/tab Default Values
choose option Get values from a query
for Dataset choose newly created dataset
for Value field choose ParameterValue
This should be the result:

How to append an auto-incrementing value to a duplicate value?

I have access to a reporting dataset (that I don't control) that we retrieve daily from a cloud service and store in a mysql db to run advanced reporting and report combining locally with 3rd party data visualization software.
The data often has duplicate values on an id field that create problems when joining with other tables for data analysis.
For example:
+-------------+----------+------------+----------+
| workfile_id | zip_code | date | total |
+-------------+----------+------------+----------+
| 78002 | 90210 | 2016-11-11 | 2010.023 |
| 78002 | 90210 | 2016-12-22 | 427.132 |
+-------------+----------+------------+----------+
Workfile_id is duplicated because this is the same job, but additional work on the job was performed in a different month than the original work. Instead of the software creating another workfile id for the job, the same is used.
Doing joins with other tables on workfile_id is problematic when more than one of the same id is present, so I was wondering if it is possible to do one of two things:
Make duplicate workfile_id's unique. Have sql append a number to the workfile id when a duplicate is found. The first duplicate (or second occurrence of the same workfile id) would need to get a .01 appended to the end of the workfile id. Then later, if another duplicate is inserted, it would need to auto increment the appended number, say .02, and so on with any subsequent duplicate workfile_id. This method would work best with our data but I'm curious how difficult this would be for the server from a performance perspective. If I could schedule the alteration to take place after the data is inserted to speed up the initial data insert, that would be ideal.
Sum total columns and remove duplicate workfile_id row. Have a task that identifies duplicate workfile_ids and sums the financial columns of the duplicates, replacing the original total with new sum and deleting the 'new row' after the columns have been added together.
This is more messy from a data preservation perspective, but is acceptable if the first solution isn't possible.
My assumption is that there will be significant overhead to have the server compare new workfile_id values to all existing worlfile_id values each time data is inserted, but our dataset is small and new data is only inserted once daily, at 1:30am, and it also should be feasible to keep the duplicate workfile_id searching to rows inserted within the last 6 mo.
Is finding duplicates in a column (workfile_id) and appending an auto-incrementing value onto the workfile_id possible?
EDIT:
I'm having trouble getting my trigger to work based on sdsc81's answer below.
Any ideas?
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT
ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
It's hard to know if the trigger isn't working at all, or if just the code in the trigger isn't working. I get no errors on insert. Is there any way to debug trigger errors?
Well, everything is posible ;)
You dont control the dataset but you can modifify the database, right?
Then you could use a trigger after every insert of a new value, and update it, if its duplicate. Something like:
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM *your_table* WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE *your_table* SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE some_unique_id = NEW.some_unique_id;
END IF;
If there are only one insert a day, and there is defined an index over the workfile_id value, then it shouldn't be any problem for your server at all.
Also, you could implement the second solution, doing:
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET total = total + NEW.total WHERE workfile_id = NEW.workfile_id AND id <> NEW.id;
DELETE FROM salesjournal WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
Hope this helps.

Merge stored procedure with datatype conversions

I am able to execute my stored procedure. When I execute it a second time instead of updating the existing values same values from source are inserted as new values.
i.e my target has
1
2
3
When I run the stored procedure a second time, instead of updating 1,2,3, it is inserting the same
1
2
3
1
2
3
My condition for when matched then select S.REPORT_TEST1 except T.REPORT_TEST1 is not working.
When I use the same code on a different table which doesn't have data conversions I am able to update.
Can anyone tell where am I going wrong?
CREATE PROCEDURE [dbo].[Merge]
INSERT INTO .[dbo].[TARGET](REPORT_TEST1, REPORT_TEST2, REPOST_TEST3)
FROM (MERGE [dbo].[TARGET] T
USING (SELECT
Cast([REPORT TEST1] as int) [REPORT_TEST1],
Cast([REPORT TEST2] as int) [REPORT_TEST2],
Cast([REPORT TEST3] as int) [REPORT_TEST3]
FROM
[dbo].[SOURCE]) S ON (T.[REPORT_TEST1] = S.[REPORT_TEST1])
WHEN NOT MATCHED BY TARGET
THEN INSERT
VALUES (S.REPORT_TEST1, S.REPORT_TEST2, S.REPOST_TEST3)
WHEN MATCHED
AND EXISTS (SELECT S.REPORT_TEST1, S.REPORT_TEST2, S.REPOST_TEST3
EXCEPT
SELECT T.REPORT_TEST1, T.REPORT_TEST2, T.REPOST_TEST3)
OUTPUT $ACTION ACTION_OUT,
S.REPORT_TEST1, S.REPORT_TEST2, S.REPOST_TEST3) ;
Thanks
would it not suffice to rewrite your WHEN MATCHED statement thusly:
WHEN MATCHED
AND S.REPORT_TEST2 <> T.REPORT_TEST2
AND S.REPORT_TEST3 <> T.REPORT_TEST3
(
SELECT
S.REPORT_TEST1
,S.REPORT_TEST2
,S.REPOST_TEST3
)
I think I understand what you're trying to do, but inside the MERGE context, you're only comparing this row with that row, not the source row against the whole target table. you could modify the subselect thusly if you're trying to query "this source is not at all in the target"
WHEN MATCHED AND EXISTS
(
SELECT
S.REPORT_TEST1
,S.REPORT_TEST2
,S.REPOST_TEST3
EXCEPT SELECT
T2.REPORT_TEST1
,T2.REPORT_TEST2
,T2.REPOST_TEST3
FROM
[dbo].[TARGET] T2
)

Flagging records on large mysql file

We are currently importing very large CSV files into a mySQL data warehouse. A key part of the processing is to flag whether a record in the CSV file match an existing record in the warehouse. The "match" is done by comparing specific fields in the new data against the previous version of the table. If the record is "new" or if there have been updates, we want to add it to the warehouse.
At the moment the processing plan is as follows :
~ read CSV file into mySQL table A
~ is primary key on A on old-A? If it isnt set record status to "NEW"
~ if key is on old-A, issue update statement , JOINING old-A to A
~ if A.field1 = old-A.field1 OR A.field2 = A.old-A.field2 OR A.field3 = old-A.field3 THEN flag record status as "UPDATE"
~ process NEW or UPDATEd records according to record status
File-size on A and old-A is currently in the order of 50M records. We would expect new records to be 1M, updates to be 5-10M.
Although we are currently using MYSQL for this processing, I am wondering whether it would simply be better to do this using a scripting language? We are finding in particular that the step to flag the updates is very time consuming. Essentially we have an UPDATE statement that is unable to use any indexation.
so
CREATE TABLE A (key1 bigint,
field1 varchar(50),
field2 varchar(50),
field 3 varchar(50) );
LOAD DATA ...
... add field rec_status to table A
... then
UPDATE A
LEFT JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'NEW'
WHERE old-A.key1 = NULL;
UPDATE A
JOIN old-A ON A.key1 = old-A.key1
SET rec_status = 'UPDATED'
WHERE A.field1 <> old-A.field1
OR A.field2 <> old-A.field2
OR A.field3 <> old-A.field3;
...
I will consider skipping the "flag" step. Process the CSV file using script or MySql table A using MySQL statement, select a record from old-A table base on whatever criteria, such as field1, or/and field2... of table A, if found, lock and update old-A record, delete processed record from CSV or table A. If not found, create record in old-A with data.

Inserting rows to other tables while importing in SSIS

I have a transactions table in a flat file like
ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
and target transaction table will have
ItemID,CustomerID,Qty,Price,TotalValue
Now I have to import it into the transactions table using SSIS package
But before importing ItemID and CustomerID I should look into the lookup tables ItemMaster and CustomerMaster, if not there, then I have insert new tuples into the tables and take the new itemID or customerID and import the transaction to the transactions table. It can be done using lookup transformations in SSIS.
Or is it better to import transactions into a temporary table using a SSIS package ,update new ItemIDs and customer IDs in the temporary table and then insert transactions from the temp table to the main transactions table
Which option will be better from the performance wise ?
There are several ways of doing it .
1.Using Staging Table
2.Using Lookup
3.Transforming the stored procedure logic in SSIS
1.Using Staging Table
Dump all the flat file data into a staging table .Lets name it as StgTransaction.Create a procedure to perform the tasks .
Merge ItemMaster target
using StgTransaction src
on target.ItemID = src.ItemID
WHEN NOT MATCHED THEN
INSERT (ItemName)
values (src.ItemID);
Merge CustomerMaster target
using Trans src
on target.CustomerID = Src.CustomerID
WHEN NOT MATCHED THEN
INSERT (CustomerName)
values (src.CustomerID);
with cte(ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue) as
(
Select I.ItemID,I.ItemName,
C.CustomerID,C.CustomerName,
f.Qty,f.price,f.TotalValue
from ItemMaster I inner join Trans f
on I.ItemName = f.ItemName
inner join CustomerMaster c
on c.CustomerName = f.CustomerName
)
Insert into Transactions
Select ItemID ,ItemName ,CustomerID ,CustomerName ,Qty ,Price ,TotalValue
from cte
Basically I'm inserting all the missing values into the 2 master tables using Merge Syntax .Instead of Merge you can use NOT EXISTS
Insert into ItemMaster
Select ItemName from stgTransaction s
where not exists
(Select 1 from ItemMaster im
where im.ItemName = s.ItemName
);
Once the missing values are inserted then just join the staging table with the 2 master tables and insert it into target .
Wrap the above query into a procedure and call the procedure after the Data Flow Task (Which loads the Data from flat file to staging table)
2.Using Lookup
The package design will look like
You should go with this approach if you are not allowed to create staging table in your database . This will be slow because of blocking components (Union ALL) and OlEDB command(problem with RBAR (row by agonizing row) issue)
Steps :-
1.Use lookup with ItemMaster table
2.Create a ItemID column (name it as NewItemID) using Derived transformation which will store the new ItemID generated from ItemMaster table when the data is loaded .join Lookup with Derived Transformation using No Match Output
3.The No Matched values should be inserted into ItemMaster table.For this lets create a procedure which inserts the data and retrieves the ItemID value as an Output
ALTER PROCEDURE usp_InsertMaster
#ItemName AS varchar(20),
#id AS INT OUTPUT AS
INSERT INTO ItemMaster
(ItemName)
VALUES
(#ItemName)
SET #id = SCOPE_IDENTITY()
//If your using ID as Identity value else use Output clause to retrieve the ID
3.Call this procedure in OLEDB command and map the output with the column created in Derived transformation
After the OLEDB command using Union ALL to combine the rows from matched and No Matched values and then again follow the same procedure with the CustomerMaster table
3.Last option is Transforming procedure logic in SSIS
Package Design is
1.Load the data into staging
2.Use Merge or Not Exists and load the missing values in 2 Master tables using Execute SQL Task
3.Use Data Flow Task with source as Staging and 2 lookups with the master tables .Since all the missing values are already inserted into Master tables ,so there wont be any Lookup No match Output. Just connect the Lookup Match output with Oledb Destination (Transaction Table)
IMHO i think the 1st approach will be fast . The problem arises only because there are 2 master tables which needs to be updated along with that get the inserted ID's and load it into target table.So doing it synchronously is difficult .