SSIS Balanced Data Distributor with Script Component - ssis

We've a small Data Flow Task which exports rows from a table to a flat file .
we added a script component for transformation operation (Converting Varbinary to String ) .
since the script component takes a while we decided to use the new Integration Services
Balanced Data Distributor and divided the export task into two more flat files .
while executing the task , it seems that the BBD isnt dividing the workcload and doesnt
work in parallel mode .
do you have any idea why ?

Have you tried using NTILE and creating multiple OLE DB sources in your Data Flow?
Example below for how to do that for 2 groups. You could of course split your source into as many as you need:
-- SQL Command text for OLE DB Source #1 named "MyGroup NTILE 1"
SELECT v.*
FROM
(SELECT t.* ,
NTILE(2) OVER(
ORDER BY t.my_key) AS MyGroup
FROM my_schema.my_table t) v
WHERE v.MyGroup = 1;
-- SQL Command text for OLE DB Source #2 named "MyGroup NTILE 2"
SELECT v.*
FROM
(SELECT t.* ,
NTILE(2) OVER(
ORDER BY t.my_key) AS MyGroup
FROM my_schema.my_table t) v
WHERE v.MyGroup = 2;
If you have a good idea in advance about the maximum number of NTILEs you need (say 10) then you could create 10 OLD DB Sources in advance.

Related

How to use the same SSIS Data Flow with different Date Values?

I have a very straightforward SSIS package containing one data flow which is comprised of an OLEDB source and a flat file destination. The OLEDB source calls a query that takes 2 sets of parameters. I've mapped the parameters to Date/Time variables.
I would like to know how best to pass 4 different sets of dates to the variables and use those values in my query?
I've experimented with the For Each Loop Container using an item enumerator. However, that does not seem to work and the package throws a System.IO.IOException error.
My container is configured as follows:
Note that both variables are of the Date/Time data type.
How can I pass 4 separate value sets to the same variables and use each variable pair to run my data flow?
Setup
I created a table and populated it with contiguous data for your sample set
DROP TABLE IF EXISTS dbo.SO_67439692;
CREATE TABLE dbo.SO_67439692
(
SurrogateKey int IDENTITY(1,1) NOT NULL
, ActionDate date
);
INSERT INTO
dbo.SO_67439692
(
ActionDate
)
SELECT
TOP (DATEDIFF(DAY, '2017-12-31', '2021-04-30'))
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '2017-12-31') AS ActionDate
FROM
sys.all_columns AS AC;
In my SSIS Package, I added two Variables, startDate and endDAte2018 both of type Date Time. I added an OLE DB Connection manager pointed to the database where I made the above tables.
I added a Foreach Item Enumerator, configured it for Item Enumerator and defined the columns there as datetime as well
I populated it (what a clunky editor) with the year ranges from 2018 to 2020 as shown and 2021-01-01 to 2021-04-30.
I wired the variables up as shown in the problem definition and ran it as is. No IO error reported.
Once I knew my foreach container was working, the data flow was trivial.
I added a data flow inside the foreach loop with an OLE DB Source using a parameterized query like so
DECLARE #StartDate date, #EndDate date;
SELECT #StartDate = ?, #EndDate = ?;
SELECT *
FROM
dbo.SO_67439692 AS S
WHERE
S.ActionDate >= #StartDate AND S.ActionDate <= #EndDate;
I mapped my two variables in as parameter names of 0 and 1 and ran it.
The setup you described works great. Either there is more to your problem than stated or there's something else misaligned. Follow along with my repro and compare it to what you've built and you should see where things are "off"

How to compare the two table row count , if counts matches than ok if not matches this will restart the SSIS package

I have made the ssis package in which i made the data flow for incremental data. Source and destination server ip's are different. Below you can find the flow diagram of my packageControl flow diagram
Data flow diagram
the package is working fine .
In the Execute SQl task :- it controls the log table and start the incremental task
query which i used is :-
insert into audit_log (
Packagename,
process_date,
start_datetime,
end_datetime,
Record_processed,
status
)values('CRM-TO-TRANSORGDB',null,GETDATE(),null,null,null);
select MAX(ID) as ID,MAX(process_date) as proc_date from audit_log where Packagename ='CRM-TO-TRANSORGDB' ;
store the ID and proc_date in the variable.
in the Execute SQl task 1:- it just update the log table.
UPDATE audit_log
SET
process_date=?,
end_datetime = GETDATE(),
status='SUCCESS'
record_processed=?
WHERE (packagename = 'CRM-TO-TRANSORGDB') AND ID=? ;
this is the query we have used to update the log table.
In the Data flow simple fetching the all the records and put in into the destination table.
this all i have done .
But my question are:-
1) How to compare the total no. of row counts from the source table to destination table in ssis package.
2) if its doesn't matches than it will restart my task automatically.
#thomas as per your instruction i have done the following thing:
1) i have made the Execute SQl Task for source and destination .
2) and Add the Execute Package task and added the condition for not matching the count.
and added the expression for check row_count_src!= row_count_dest
and in Source_table_count i have used the below query:
select count(SubOrderID) as row_count_src from fact_suborder_journey
WHERE Suborderdate between '2016-06-01' and GETDATE()-1 ;
in dest_table_count i have used the below query:
select count(SubOrderID) as row_count_dest from fact_suborder_journey
WHERE Suborderdate between '2016-06-01' and GETDATE()-1 ;
i have added the two variable as int64 in ths ssis package. and map in the result set below you can find the pic what i have done.
but After done all this this i am getting this error:
[Execute SQL Task] Error: An error occurred while assigning a value to variable "row_count_src": "The type of the value being assigned to variable "User::row_count_src" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
".
I havent tested this completely but you might be able to do something like this. This creates a loop of your packages and will executes as long as your count variables are different from each other.
What have i done?
First i have a DataFlow Task which moves data from source to
destination.
Then i have an Execute SQL task which basically counts all rows from
TableA and maps it to variable count1 eg. Source table
Then i have an Execute SQL task which basically counts all rows from
TableB and maps it to variable count2 eg. Destination Table
Then i create an Execute Package task where i reference it too it
self. Then i make a precedence constraint with an expression saying
Count1 != count2.
Because if they are different you want to restart the task. If they
are equal the last task Execute Package task will never be executed.
Hope that is something like that?
If I understand your challenge correctly...
In the data flow task, use a RowCount transformation between source
and destination to capture the rows written to the destination. This
will be stored in a variable.
In the control flow, get the max row counts available from the log table and store that a variable.
Create an execute package tasks that executes this same package and put a precedence constraint before if that compares if variable from Step1 <> variable in Step2.

Expression Builder SSIS

I am trying to write the query dynamically in SSIS Expresion Builder but I am stuck with an error and would really appreciate all your help.
Here my source is DB2
My query is :-
Select * FROM schema.table_name
WHERE column_a < 100
OR (column_a >= 100 AND column_b = #[User::days]
FOR FETCH ONLY WITH UR
Note:#[User::days]= current date - x days
That's not how it works.
You can either use an SSIS Variable with an Expression to satisfy this requirement or, assuming your source supports it, parameterize the query.
Expression
Add a Variable to the SSIS package. Call it QuerySource, type is String.
If 2012+, in the Expression, not Value, use the following formula
"Select * FROM schema.table_name WHERE column_a < 100 OR (column_a >= 100 AND column_b "
+ DT_WSTR(5) #[User::days]
+ " FOR FETCH ONLY WITH UR"
If 2005/2008, you will then need to right click on the row in the Variables window and select Properties. In the resulting window, you will need to set EvaluateAsExpression to True as well as copy the above into the Expression property.
The carriage returns above are for readability. They may or may paste well into your version of BIDS/SSDT
Now that you've created your Variable, you'll need to use it in the source. Assuming OLEDB, you will want to select Data Access Mode of "SQL Command from Variable". If you're using an ADO.NET source, then you'll need to to the Control Flow, single click the Data Flow Task and right click and select Properties. From the Properties window, find Expressions. Click the ellipses. Select the the ADO.NET source. Assign the Variable as the Source.
Parameterization
Certain sources, like OLEDB, support parameterization. Set your Data Access Mode to SQL Command.
Select * FROM schema.table_name
WHERE column_a < 100
OR (column_a >= 100 AND column_b = ?
FOR FETCH ONLY WITH UR
The ? is an ordinal based replacement character for OLE DB connections. Click the Parameters button and assign your #[User::days] variable to it.

An item with the same key has already been added - SQL Server 2012 Subquery Issue

I am writing code within SQL server 2012 to transfer into the query designer of Report Builder 3.0.
My code works perfect within Management studio, and it works within the actualy query designer, but once I press Okay within the query designer, it throws me the error:
"Could not update a list of fields for the query. Verify that you can connect to the data source and that your query syntax is correct"
Under details:
"An item with the same key has already been added"
This is the code I am using:
Select *
from
(Select distinct srt.Name,
percentile_disc(.5) WITHIN GROUP(ORDER BY (sr.price)) OVER(PARTITION BY srt.Name) AS MedianSpend
from ServiceReq sr inner join ServiceReqTemplate srt
on srt.RecId = sr.SvcReqTmplLink_RecID Where Name like '%') medQuery
inner join
(select distinct srt.Name,
cast(sum(sr.price) as int) as AvgCost,cast(sum(sr.cost) as int) as
AvgTransCost,cast(avg(sr.TotalTimeSpent) as int) as TotalTimeSpent
from ServiceReq sr, ServiceReqTemplate srt
where sr.SvcReqTmplLink_RecID = srt.RecId
group by srt.Name) avgQuery
on medQuery.Name LIKE avgQuery.Name
I think that the problem is there would be two columns both called "Name" in one table, which is not allowed. I was thinking I could add another column in the same table, and call it "Name_2" and then copy and paste all the data from the "Name" table into "Name_2" and then use it. Would this be the easiest way of successfully implementing this code into Report Builder?
You add AS Name2 to the 2nd query and then call it after avgQuery at the end.

Read SQL Server transaction log

How we can read SQL Server transaction logs, I know using DBCC log (database,4) and it will generate log output now i want to decode Log Record which is is hex format.
0x00003E001C000000A500000001000200BE040000000006021D0000000100000018000000 (only a part of data)
is there any method to read it in text format or convert the hex data to text.i want to make a tool that can read logs.third party tools are available i.e ApexSQL but they are paid tools.
You can use sys.fn_dblog to read the transaction log. Example below.
SELECT [RowLog Contents 0],
[RowLog Contents 1],
[Current LSN],
Operation,
Context,
[Transaction ID],
AllocUnitId,
AllocUnitName,
[Page ID],
[Slot ID]
FROM sys.fn_dblog(NULL,NULL)
WHERE Context IN ('LCX_MARK_AS_GHOST', 'LCX_HEAP', 'LCX_CLUSTERED')
AND Operation IN ('LOP_DELETE_ROWS', 'LOP_INSERT_ROWS')
For a delete and insert operation IIRC the [RowLog Contents 0] contains the whole row inserted and deleted. Updates are a bit more complicated in that only a partial row can be logged.
To decode this row format you need to understand how rows are stored internally in SQL Server. The book Microsoft SQL Server 2008 Internals covers this in some detail. You can also download the SQL Server Internals Viewer to help in this regard (And I believe the source code for Mark Rasmussen's Orca MDF is available too which presumably has some code to decode the internal row format).
For an example of doing this in TSQL see this blog post which demonstrates that it is perfectly possible to extract useful information from the log as long as the aim of the project is limited. Writing a full blown log reader that could cope with schema changes in the objects and things like sparse columns (and column store indexes in next version) would likely be a huge amount of work though.
There are several SQL Server functions and commands (e.g. fn_dblog, fn_dump_dblog, and DBCC PAGE) that potentially provide a way to view LDF file content. However, significant knowledge of T-SQL is required to use them, some are undocumented, and the results they provide are difficult to be converted to a human-readable format. Following are the examples of viewing LDF file content using SQL Server functions and commands:
1 - Here is an example using fn_dblog to read an online transaction log, with a result of 129 columns (only 7 shown here)
2 - The fn_dump_dblog function is used to read transaction log native or natively compressed backups. The result is similar:
Unfortunately, no official documentation is available for fn_dblog and fn_dump_dblog functions. To translate the columns, you need to be familiar with the internal structure and data format, flags and their total number in a row data
3 - DBCC PAGE is used to read the content of database online files – MDF and LDF. The result is a hexadecimal output, which unless you have a hex editor, will be difficult to interpret
Select * from sys.fn_dblog(NULL,NULL)
WHERE Context IN ('LCX_MARK_AS_GHOST', 'LCX_HEAP', 'LCX_CLUSTERED')
AND Operation IN ('LOP_DELETE_ROWS', 'LOP_INSERT_ROWS')
you get transaction related all the information using above query..where log record column displays your actual record which is in Hexadecimal format..
check this link to get your data into human readable format.
check here
Try this.
Select
b.Description,
d.AllocUnitName,
b.[Transaction ID],
d.name,
d.Operation,
b.[Transaction Name],
b.[Begin Time],
c.[End Time]
from (
Select
Description,
[Transaction Name],
Operation,
[Transaction ID],
[Begin Time]
FROM sys.fn_dblog(NULL,NULL)
where Operation like 'LOP_begin_XACT'
) as b
inner join (
Select
Operation,
[Transaction ID],
[End Time]
FROM sys.fn_dblog(NULL,NULL)
where Operation like 'LOP_commit_XACT'
) as c
on c.[Transaction ID] = b.[Transaction ID]
inner join (
select
x.AllocUnitName,
x.Operation,
x.[Transaction ID],
z.name
FROM sys.fn_dblog(NULL,NULL) x
inner join sys.partitions y
on x.PartitionId = y.partition_id
inner join sys.objects z
on z.object_id = y.object_id
where z.type != 'S'
)as d
on d.[Transaction ID] = b.[Transaction ID]
order by b.[Begin Time] ASC
That can get the database transaction (like insert, update, delete), transaction time, and the object name.
Hope that can help.
Step 1.
CREATE TABLE #hex(
[hex_Value] varbinary NULL
)
Step 2.
Insert data into the table, Example
insert into #hex values(0x300008000F000000030000020015001B00536976754D79736F7265)
Step 3.
SELECT LTRIM(RTRIM(CONVERT(VARCHAR(max),REPLACE(hex_Value, 0x00, 0x20))))
FROM #hex
For more Information go through this link
I can not understand your needs, but the data from your log can be extracted by tools like Lumigent LogExplorer. I don't know nothing about there another way to do what you want.