SSIS type 2 scd with batch ID - ssis

I want to modify standard SSIS SCD behavior.
EmployeeID is my business key and title, firstname, lastname are type 2 attributes.
I want BatchLogID to reflect when a change occurred - otherwise it remains unchanged.
BatchLogID is passed to dataflow as an int
EmployeeID,title,firstname,lastname,BatchLogID,startdate,enddate
source data
101,Miss,Jane,Smith,101 -- inserted for first time
101,Miss,Jane,Smith,102 process runs
101,Miss,Jane,Smith,103 process runs
101,Miss,Jane,Smith,104 process runs
101,Mrs, Jane,Brown,105 process runs -- only when data has changed do I want the Batch number in target updated
target data
101,Miss,Jane,Smith,101,101,1 jan 2000,null-- inserted for first time
101,Miss,Jane,Smith,105,105,1 jan 2000,5 Jan 2000 -- as a change is detected the data is updated
101,Mrs, Jane,Brown,105,105 jan 2000,null-- only when data has changed to I want the Batch number updated
any thoughts?

You may need to perform delta load using :
Lookup and derived column
Merge join, Conditional Split and derived column
Change Data Capture
Temporal tables (if the RDBMS supports that)
To know more :
https://www.c-sharpcorner.com/article/design-the-full-load-and-delta-load-patterns-in-ssis/

Used a sql merge command - had to wash through twice for
declare #batchLogID int= 1
MERGE dbo.targetTable AS t
USING dbo.sourceTable AS s `enter code here`
ON (t.[key] = s.[key] and t.endDate is null)
WHEN MATCHED and s.[value] <> t.[value]
THEN UPDATE SET t.enddate = dateadd(ss,-1,cast(cast(getdate() as date) as datetime))
WHEN not MATCHED
THEN INSERT (key,[col1], [col2], [value], [col3],startdate,BatchLogID)
VALUES (s.key,s.[col1], s.[col2], s.[value], s.[col3],cast(getdate() as date),#batchLogID)

Related

How to use the same SSIS Data Flow with different Date Values?

I have a very straightforward SSIS package containing one data flow which is comprised of an OLEDB source and a flat file destination. The OLEDB source calls a query that takes 2 sets of parameters. I've mapped the parameters to Date/Time variables.
I would like to know how best to pass 4 different sets of dates to the variables and use those values in my query?
I've experimented with the For Each Loop Container using an item enumerator. However, that does not seem to work and the package throws a System.IO.IOException error.
My container is configured as follows:
Note that both variables are of the Date/Time data type.
How can I pass 4 separate value sets to the same variables and use each variable pair to run my data flow?
Setup
I created a table and populated it with contiguous data for your sample set
DROP TABLE IF EXISTS dbo.SO_67439692;
CREATE TABLE dbo.SO_67439692
(
SurrogateKey int IDENTITY(1,1) NOT NULL
, ActionDate date
);
INSERT INTO
dbo.SO_67439692
(
ActionDate
)
SELECT
TOP (DATEDIFF(DAY, '2017-12-31', '2021-04-30'))
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '2017-12-31') AS ActionDate
FROM
sys.all_columns AS AC;
In my SSIS Package, I added two Variables, startDate and endDAte2018 both of type Date Time. I added an OLE DB Connection manager pointed to the database where I made the above tables.
I added a Foreach Item Enumerator, configured it for Item Enumerator and defined the columns there as datetime as well
I populated it (what a clunky editor) with the year ranges from 2018 to 2020 as shown and 2021-01-01 to 2021-04-30.
I wired the variables up as shown in the problem definition and ran it as is. No IO error reported.
Once I knew my foreach container was working, the data flow was trivial.
I added a data flow inside the foreach loop with an OLE DB Source using a parameterized query like so
DECLARE #StartDate date, #EndDate date;
SELECT #StartDate = ?, #EndDate = ?;
SELECT *
FROM
dbo.SO_67439692 AS S
WHERE
S.ActionDate >= #StartDate AND S.ActionDate <= #EndDate;
I mapped my two variables in as parameter names of 0 and 1 and ran it.
The setup you described works great. Either there is more to your problem than stated or there's something else misaligned. Follow along with my repro and compare it to what you've built and you should see where things are "off"

How to compare the two table row count , if counts matches than ok if not matches this will restart the SSIS package

I have made the ssis package in which i made the data flow for incremental data. Source and destination server ip's are different. Below you can find the flow diagram of my packageControl flow diagram
Data flow diagram
the package is working fine .
In the Execute SQl task :- it controls the log table and start the incremental task
query which i used is :-
insert into audit_log (
Packagename,
process_date,
start_datetime,
end_datetime,
Record_processed,
status
)values('CRM-TO-TRANSORGDB',null,GETDATE(),null,null,null);
select MAX(ID) as ID,MAX(process_date) as proc_date from audit_log where Packagename ='CRM-TO-TRANSORGDB' ;
store the ID and proc_date in the variable.
in the Execute SQl task 1:- it just update the log table.
UPDATE audit_log
SET
process_date=?,
end_datetime = GETDATE(),
status='SUCCESS'
record_processed=?
WHERE (packagename = 'CRM-TO-TRANSORGDB') AND ID=? ;
this is the query we have used to update the log table.
In the Data flow simple fetching the all the records and put in into the destination table.
this all i have done .
But my question are:-
1) How to compare the total no. of row counts from the source table to destination table in ssis package.
2) if its doesn't matches than it will restart my task automatically.
#thomas as per your instruction i have done the following thing:
1) i have made the Execute SQl Task for source and destination .
2) and Add the Execute Package task and added the condition for not matching the count.
and added the expression for check row_count_src!= row_count_dest
and in Source_table_count i have used the below query:
select count(SubOrderID) as row_count_src from fact_suborder_journey
WHERE Suborderdate between '2016-06-01' and GETDATE()-1 ;
in dest_table_count i have used the below query:
select count(SubOrderID) as row_count_dest from fact_suborder_journey
WHERE Suborderdate between '2016-06-01' and GETDATE()-1 ;
i have added the two variable as int64 in ths ssis package. and map in the result set below you can find the pic what i have done.
but After done all this this i am getting this error:
[Execute SQL Task] Error: An error occurred while assigning a value to variable "row_count_src": "The type of the value being assigned to variable "User::row_count_src" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
".
I havent tested this completely but you might be able to do something like this. This creates a loop of your packages and will executes as long as your count variables are different from each other.
What have i done?
First i have a DataFlow Task which moves data from source to
destination.
Then i have an Execute SQL task which basically counts all rows from
TableA and maps it to variable count1 eg. Source table
Then i have an Execute SQL task which basically counts all rows from
TableB and maps it to variable count2 eg. Destination Table
Then i create an Execute Package task where i reference it too it
self. Then i make a precedence constraint with an expression saying
Count1 != count2.
Because if they are different you want to restart the task. If they
are equal the last task Execute Package task will never be executed.
Hope that is something like that?
If I understand your challenge correctly...
In the data flow task, use a RowCount transformation between source
and destination to capture the rows written to the destination. This
will be stored in a variable.
In the control flow, get the max row counts available from the log table and store that a variable.
Create an execute package tasks that executes this same package and put a precedence constraint before if that compares if variable from Step1 <> variable in Step2.

Generate 10 queries to run in SSIS

I have a driver table, date_driver_table that contains 10 dates. Jan 2014, Feb 2014, ... Nov2014.
I need to run a query
select * from records_Jan2014 where recdate='Jan 2014'
This is query 1 . After this runs and puts the result set in a SQL server table, query 2,
select * from records_Feb2014 where recdate='Feb 2014'
will then run and do the same insert into SQL server table , and then query 3, and so forth until no dates left in driver table.
So in ssis I have an execute SQL task with full result set enabled that puts all the dates from date driver table in a variable called date with type object and then feeds into a for each loop with a variable called single date type string. A dat flow with source and a destination of a SQL server table. The problem is how do I set up a source to do query 1 then put the results in the table then do query 2 etc.
I was thinking maybe creating 10 files with SQL and then using the ole db source with file as the SQL that needs to run but sure there is a way to do this with the for each loop. Can anyone point me how to this ? The question is how to set up the for each loop so it runs query 1, puts into the table then runs query 2 and puts it into the table and so on until all the records are done.
Used sql command expression pointing to a variable on the ADO.net Source . Then the variable was fed from an execute sql task which gave the list to process.

T-SQL Change Data Capture log cleanup

I have enabled CDC on few tables in my SQL server 2008 database. I want to change the number of days I can keep the change history.
I have read that by default change logs are kept for 3 days, before they are deleted by sys.sp_cdc_cleanup_change_table stored proc.
Does anyone know how I can change this default value, so that I can keep the logs for longer.
Thanks
You need to update the cdc_jobs.retention field for your database. The record in the cdc_jobs table won't exist until at least one table has been enabled for CDC.
-- modify msdb.dbo.cdc_jobs.retention value (in minutes) to be the length of time to keep change-tracked data
update
j
set
[retention] = 3679200 -- 7 years
from
sys.databases d
inner join
msdb.dbo.cdc_jobs j
on j.database_id = d.database_id
and j.job_type = 'cleanup'
and d.name = '<Database Name, sysname, DatabaseName>';
Replace <Database Name, sysname, DatabaseName> with your database name.
Two alternative solutions:
Drop the cleanup job:
EXEC sys.sp_cdc_drop_job #job_type = N'cleanup';
Change the job via sp:
EXEC sys.sp_cdc_change_job
#job_type = N'cleanup',
#retention = 2880;
Retention time in minutes, max 52494800 (100 years). But if you drop the job, data is never cleaned up, the job isn't even looking, if there is data to clean up. In case of wanting to keep data indefinitely, I'd prefer dropping the job.

Query not working in execute SQL task in the ssis package

This query works fine in the query window of SQL Server 2005, but throws error when I run it in Execute SQL Task in the ssis package.
declare #VarExpiredDays int
Select #VarExpiredDays= Value1 From dbo.Configuration(nolock) where Type=11
DECLARE #VarENDDateTime datetime,#VarStartDateTime datetime
SET #VarStartDateTime= GETDATE()- #VarExpiredDays
SET #VarENDDateTime=GETDATE();
select #VarStartDateTime
select #VarENDDateTime
SELECT * FROM
(SELECT CONVERT(Varchar(11),#VarStartDateTime,106) AS VarStartDateTime) A,
(SELECT CONVERT(Varchar(11),#VarENDDateTime,106) AS VarENDDateTime) B
What is the issue here?
Your intention is to retrieve the values of start and end and assign those into SSIS variables.
As #Diego noted above, those two SELECTS are going to cause trouble. With the Execute SQL task, your resultset options are None, Single Row, Full resultset and XML. Discarding the XML option because I don't want to deal with it and None because we want rows back, our options are Single or Full. We could use Full, but then we'd need to return values of the same data type and then the processing gets much more complicated.
By process of elimination, that leads us to using a resultset of Single Row.
Query aka SQLStatement
I corrected the supplied query by simply removing the two aforementioned SELECTS. The final select can be simplified to the following (no need to put them into derived tables)
SELECT
CONVERT(Varchar(11),#VarStartDateTime,106) AS VarStartDateTime
, CONVERT(Varchar(11),#VarENDDateTime,106) AS VarENDDateTime
Full query used below
declare #VarExpiredDays int
-- I HARDCODED THIS
Select #VarExpiredDays= 10
DECLARE #VarENDDateTime datetime,#VarStartDateTime datetime
SET #VarStartDateTime= GETDATE()- #VarExpiredDays
SET #VarENDDateTime=GETDATE();
/*
select #VarStartDateTime
select #VarENDDateTime
*/
SELECT * FROM
(SELECT CONVERT(Varchar(11),#VarStartDateTime,106) AS VarStartDateTime) A,
(SELECT CONVERT(Varchar(11),#VarENDDateTime,106) AS VarENDDateTime) B
Verify the Execute SQL Task runs as expected. At this point, it simply becomes a matter of wiring up the outputs to SSIS variables. As you can see in the results window below, I created two package level variables StartDateText and EndDateText of type String with default values of an empty string. You can see in the Locals window they have values assigned that correspond to #VarExpiredDays = 10 in the supplied source query
Getting there is simply a matter of configuring the Result Set tab of the Execute SQL Task. The hardest part of this is ensuring you have a correct mapping between source system type and SSIS type. With an OLE DB connection, the Result Name has no bearing on what the column is called in the query. It is simply a matter of referencing columns by their ordinal position (0 based counting).
Final thought, I find it better to keep things in their base type, like a datetime data type and let the interface format it into a pretty, localized value.
you have more that one output type. You have two variables and one query.
You need to select only one on the "resultset" propertie
are you mapping these to the output parameters?
select #VarStartDateTime
select #VarENDDateTime