How to find/rip out all dimension/measures on an SSAS cube (Extended Events) - sql-server-2008

I am using Extended Events to analyse dimension/measure usage in an SSAS cube. I have used xmla template to create the event (from Chris Webb blog). I then have parsed the data into a staging table where I have stripped out the required field from the event data XML field.
select TraceFileName
, TraceEvent
, e.EventDataXML.value('(/event/data[#name="TextData"]/value)[1]', 'varchar(max)') as TextData
into #List
from
(
select [file_name] as TraceFileName
, object_name as TraceEvent
, convert(xml, event_data) as EventDataXML
from sys.fn_xe_file_target_read_file('*path*', null, null, null)
) e;
I plan to then use CHARINDEX to find measures/dimension calls within the field TextData. However to do this I will need a list of all dimensions / measures in the SSAS cube. Is there a way to rip this out?

Related

How to use the same SSIS Data Flow with different Date Values?

I have a very straightforward SSIS package containing one data flow which is comprised of an OLEDB source and a flat file destination. The OLEDB source calls a query that takes 2 sets of parameters. I've mapped the parameters to Date/Time variables.
I would like to know how best to pass 4 different sets of dates to the variables and use those values in my query?
I've experimented with the For Each Loop Container using an item enumerator. However, that does not seem to work and the package throws a System.IO.IOException error.
My container is configured as follows:
Note that both variables are of the Date/Time data type.
How can I pass 4 separate value sets to the same variables and use each variable pair to run my data flow?
Setup
I created a table and populated it with contiguous data for your sample set
DROP TABLE IF EXISTS dbo.SO_67439692;
CREATE TABLE dbo.SO_67439692
(
SurrogateKey int IDENTITY(1,1) NOT NULL
, ActionDate date
);
INSERT INTO
dbo.SO_67439692
(
ActionDate
)
SELECT
TOP (DATEDIFF(DAY, '2017-12-31', '2021-04-30'))
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '2017-12-31') AS ActionDate
FROM
sys.all_columns AS AC;
In my SSIS Package, I added two Variables, startDate and endDAte2018 both of type Date Time. I added an OLE DB Connection manager pointed to the database where I made the above tables.
I added a Foreach Item Enumerator, configured it for Item Enumerator and defined the columns there as datetime as well
I populated it (what a clunky editor) with the year ranges from 2018 to 2020 as shown and 2021-01-01 to 2021-04-30.
I wired the variables up as shown in the problem definition and ran it as is. No IO error reported.
Once I knew my foreach container was working, the data flow was trivial.
I added a data flow inside the foreach loop with an OLE DB Source using a parameterized query like so
DECLARE #StartDate date, #EndDate date;
SELECT #StartDate = ?, #EndDate = ?;
SELECT *
FROM
dbo.SO_67439692 AS S
WHERE
S.ActionDate >= #StartDate AND S.ActionDate <= #EndDate;
I mapped my two variables in as parameter names of 0 and 1 and ran it.
The setup you described works great. Either there is more to your problem than stated or there's something else misaligned. Follow along with my repro and compare it to what you've built and you should see where things are "off"

How to Map JSON data from a REST API to Azure SQL using Data Factory

I have a new pipeline in azure data factory.
I created the dataset, one from the rest api (a public one):
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=MSFT&apikey=demo
and then I created an azure sql table with columns shown in the screenshot
The problem, is that I dont know how to do the mapping, as this is a complex JSON object, I am limited with the Mapping Designer:
How do I map the date?
I tend to use an ELT approach for these, calling the REST API with a Web task and storing the JSON in a SQL table and then shredding the JSON using SQL functions like OPENJSON.
Example pipeline:
The key to getting this approach to work is the expression on the stored procedure parameter. This takes the whole JSON output from the Web task and passes it in to the proc. This is a simple logging proc which inserts the record into a logging table:
#string(activity('Web1').output)
I log to a table and then shred the JSON or you could use OPENJSON directly on the stored proc parameter, eg
--INSERT INTO ...
SELECT
CAST( [key] AS DATE ) AS timeSeriesDate,
JSON_VALUE ( x.[value], '$."1. open"' ) AS [open],
JSON_VALUE ( x.[value], '$."2. high"' ) AS [high],
JSON_VALUE ( x.[value], '$."3. low"' ) AS [low],
JSON_VALUE ( x.[value], '$."4. close"' ) AS [close],
JSON_VALUE ( x.[value], '$."5. volume"' ) AS [volume]
FROM dbo.myLog
CROSS APPLY OPENJSON(logDetails , '$."Time Series (Daily)"' ) x
--WHERE logId = 23333;
My results:
Does the data have a structure? If so, you can generate a dummy file, place it in sink and do a one time mapping. If not, you can Lookup on the file, iterate over the content in a For Each Loop Container and insert details on to a SQL table.
E.g.
insert <<your table>>
select '#item().name', '#item().address.city', #item().value
The important thing to remember is to iterate at the correct array. Let me know if it's not clear. Not in front of a system right now, so can't add screenshots.

Finding updated records in SSIS -- to hash or not to hash?

I'm working on migrating data from a table in a DB2 database to our SQL Server database using SSIS. The table that I am pulling data from contains a respectable amount of data--a little less than 100,000 records; but, it also has 46 columns.
I only want to update the rows that NEED to be updated, and so I came to conclusion that I could either use a Lookup Transformation and check all 46 columns and redirect the "no matches" to be updated on the SQL table. Or, I could hash each row in the datasets after I read the data in at the beginning of my data task flow, and then, subsequently, use the hash values as a comparison later on when determining if the rows are equal or not.
My question would be: Which is the better route to take? I like hashing them, but I'm not sure if that is the best route to take. Does anyone have any pearls of wisdom they'd like to share?
Why not both?
Generally speaking, there are two things we look for when doing an incremental load: Does this exist? If it exists, has it changed. If there's a single column, it's trivial. When there are many columns to check, that becomes quite the pain, especially if you're using SSIS to map all those columns and/or have to deal with worrying about NULLs.
I solve the multicolumn problem by cheating - I create two columns in all my tables: HistoricalHashKey and ChangeHashKey. Historical hash key will be all the business keys. Change hash key is all the rest of the material columns (I'd exclude things like audit columns). We are not storing the concatenated values directly in our hash columns. Instead, "we're going Math the stuff out of it" and apply a hashing algorithm called SHA-1. This algorithm will take all the input columns and return a 20 byte output.
There are three caveats to using this approach. You must concatenate the columns in the same order every time. These will be case sensitive. Trailing space is significant. That's it.
In your tables, you would add those the two columns as binary(20) NOT NULL.
Set up
Your control flow would look something like this
and your data flow something like this
OLESRC Incremental Data
(Assume I'm sourced from Adventureworks2014, Production.Product) I'm going to use the CONCAT function from SQL Server 2012+ as it promotes all data types to string and is NULL safe.
SELECT
P.ProductID
, P.Name
, P.ProductNumber
, P.MakeFlag
, P.FinishedGoodsFlag
, P.Color
, P.SafetyStockLevel
, P.ReorderPoint
, P.StandardCost
, P.ListPrice
, P.Size
, P.SizeUnitMeasureCode
, P.WeightUnitMeasureCode
, P.Weight
, P.DaysToManufacture
, P.ProductLine
, P.Class
, P.Style
, P.ProductSubcategoryID
, P.ProductModelID
, P.SellStartDate
, P.SellEndDate
, P.DiscontinuedDate
, P.rowguid
, P.ModifiedDate
-- Hash my business key(s)
, CONVERT(binary(20), HASHBYTES('MD5',
CONCAT
(
-- Having an empty string as the first argument
-- allows me to simplify building of column list
''
, P.ProductID
)
)
) AS HistoricalHashKey
-- Hash the remaining columns
, CONVERT(binary(20), HASHBYTES('MD5',
CONCAT
(
''
, P.Name
, P.ProductNumber
, P.MakeFlag
, P.FinishedGoodsFlag
, P.Color
, P.SafetyStockLevel
, P.ReorderPoint
, P.StandardCost
, P.ListPrice
, P.Size
, P.SizeUnitMeasureCode
, P.WeightUnitMeasureCode
, P.Weight
, P.DaysToManufacture
, P.ProductLine
, P.Class
, P.Style
, P.ProductSubcategoryID
, P.ProductModelID
, P.SellStartDate
, P.SellEndDate
, P.DiscontinuedDate
)
)
) AS ChangeHashKey
FROM
Production.Product AS P;
LKP Check Existence
This query will pull back the stored HistoricalHashKey and ChangeHashKey from our reference table.
SELECT
DP.HistoricalHashKey
, DP.ChangeHashKey
FROM
dbo.DimProduct AS DP;
At this point, it's a simple matter to compare the HistoricalHashKeys to determine whether the row exists. If we match, we want to pull back the ChangeHashKey into our Data Flow. By convention, I name this lkp_ChangeHashKey to differentiate from the source ChangeHashKey.
CSPL Change Detection
The conditional split is also simplified. Either the two Change Hash keys match (no change) or they don’t (changed). That expression would be
ChangeHashKey == lkp_ChangeHashKey
OLE_DST StagedUpdates
Rather than use the OLE DB Command, create a dedicated table for holding the rows that need to be updated. OLE DB Command does not scale well as behind the scenes it issues singleton update commands.
SQL Perform Set Based Updates
After the data flow is complete, all the data that needs updating will be in our staging table. This Execute SQL Task simply updates the existing data matching on our business keys.
UPDATE
TGT
SET
Name = SRC.name
, ProductNumber = SRC.
FROM
dbo.DimProduct AS TGT
INNER JOIN
Stage.DimProduct AS SRC
ON SRC.HistoricalHashKey = TGT.HistoricalHashKey;
-- If clustered on a single column and table is large, this will yield better performance
-- ON SRC.DimProductSK = TGT.DimProductSK;
From the comments
Why do I use dedicated INSERT and UPDATE statements since we have the shiny MERGE? Besides not remembering the syntax as easily, the SQL Server implementation can have some ... unintended consequences. They may be cornerish cases but I'd rather not run into them with the solutions I deliver. Explicit INSERT and UPDATE statements give me the fine grained control I want and need in my solutions. I love SQL Server, think it's a fantastic product but they weird syntax coupled with known bugs keeps me from using MERGE anywhere but a certification exam.

Table valued parameters for SSRS 2008

We have a requirement of generating SSRS reports from where we need to convert multi-valued string and integer parameters to datatable and pass it to stored procedure. The stored procedure contains multiple table type parameters. Earlier we used varchar(8000) but it was also crossing the datatype limit. Then we thought to introducing datatable concept. But we were not aware of how to pass values from SSRS.
We found a solution from GruffCode on Using Table-Valued Parameters With SQL Server Reporting Services.
The solution solved my problem, and we're able to generate reports. However, sometimes SSRS returns the two following errors:
An error has occurred during report processing.
Query execution failed for dataset 'DSOutput'.
String or binary data would be truncated. The statement has been terminated.
And
An unexpected error occurred in Report Processing.
Exception of type 'System.OutOfMemoryException' was thrown.
I'm not sure when and where it's causing the issue.
The approach outlined in that blog post relies on building an enormous string in memory in order to load all of the selected parameter values into the table-valued parameter instance. If you are selecting a very large number of values to pass into the query I could see it potentially causing the 'System.OutOfMemoryException' while trying to build the string containing the insert statements that will load the parameter.
As for the 'string or binary data would be truncated' error that sounds like it's originating within the query or stored procedure that the report is using to gather its data. Without seeing what that t-sql looks like I couldn't say why that's happening, but I'd guess that it's also somehow related to selecting a very large number of parameter values.
Unfortunately I'm not sure that there's a workaround for this, other than trying to see if you could figure out a way to select fewer parameter values. Here's a couple of rough ideas:
If you have a situation where users might select a handful of parameter values or all parameter values then you could have the query simply take a very simple boolean value indicating that all values were selected rather than making the report send all of the values in through a parameter.
You could also consider "zooming out" of your parameter values a bit and grouping them together somehow if they lend themselves to that. That way users would be selecting from a smaller number of parameter values that represent a group of the individual values all rolled up.
I'm not a fan of using a Text parameter and EXEC in the SQL statement like the article you referenced describes as doing so is subject to SQL injection. The default SSRS behavior with a Multi-value parameter substitutes a comma-separated list of the values directly in place of the parameter when the query is sent to the SQL server. That works great for simple IN queries, but can be undesirable elsewhere. This behavior can be bypassed by setting the Parameter Value on the DataSet to an expression of =Join(Parameters!CustomerIDs.Value, ", "). Once you have done that you can get a table variable loaded by using the following SQL:
DECLARE #CustomerIDsTable TABLE (CustomerID int NOT NULL PRIMARY KEY)
INSERT INTO #CustomerIDsTable (CustomerID)
SELECT DISTINCT TextNodes.Node.value(N'.', N'int') AS CustomerID
FROM (
SELECT CONVERT(XML, N'<A>' + COALESCE(N'<e>' + REPLACE(#CustomerIDs, N',', N'</e><e>') + N'</e>', '') + N'</A>') AS pNode
) AS xmlDocs
CROSS APPLY pNode.nodes(N'/A/e') AS TextNodes(Node)
-- Do whatever with the resulting table variable, i.e.,
EXEC rpt_CustomerTransactionSummary #StartDate, #EndDate, #CustomerIDsTable
If using text instead of integers then a couple of lines get changed like so:
DECLARE #CustomerIDsTable TABLE (CustomerID nvarchar(MAX) NOT NULL PRIMARY KEY)
INSERT INTO #CustomerIDsTable (CustomerID)
SELECT DISTINCT TextNodes.Node.value(N'.', N'nvarchar(MAX)') AS CustomerID
FROM (
SELECT CONVERT(XML, N'<A>' + COALESCE(N'<e>' + REPLACE(#CustomerIDs, N',', N'</e><e>') + N'</e>', '') + N'</A>') AS pNode
) AS xmlDocs
CROSS APPLY pNode.nodes(N'/A/e') AS TextNodes(Node)
-- Do whatever with the resulting table variable, i.e.,
EXEC rpt_CustomerTransactionSummary #StartDate, #EndDate, #CustomerIDsTable
This approach also works well for handling user-entered strings of comma-separated items.

SSIS Balanced Data Distributor with Script Component

We've a small Data Flow Task which exports rows from a table to a flat file .
we added a script component for transformation operation (Converting Varbinary to String ) .
since the script component takes a while we decided to use the new Integration Services
Balanced Data Distributor and divided the export task into two more flat files .
while executing the task , it seems that the BBD isnt dividing the workcload and doesnt
work in parallel mode .
do you have any idea why ?
Have you tried using NTILE and creating multiple OLE DB sources in your Data Flow?
Example below for how to do that for 2 groups. You could of course split your source into as many as you need:
-- SQL Command text for OLE DB Source #1 named "MyGroup NTILE 1"
SELECT v.*
FROM
(SELECT t.* ,
NTILE(2) OVER(
ORDER BY t.my_key) AS MyGroup
FROM my_schema.my_table t) v
WHERE v.MyGroup = 1;
-- SQL Command text for OLE DB Source #2 named "MyGroup NTILE 2"
SELECT v.*
FROM
(SELECT t.* ,
NTILE(2) OVER(
ORDER BY t.my_key) AS MyGroup
FROM my_schema.my_table t) v
WHERE v.MyGroup = 2;
If you have a good idea in advance about the maximum number of NTILEs you need (say 10) then you could create 10 OLD DB Sources in advance.