How to Map JSON data from a REST API to Azure SQL using Data Factory - json

I have a new pipeline in azure data factory.
I created the dataset, one from the rest api (a public one):
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=MSFT&apikey=demo
and then I created an azure sql table with columns shown in the screenshot
The problem, is that I dont know how to do the mapping, as this is a complex JSON object, I am limited with the Mapping Designer:
How do I map the date?

I tend to use an ELT approach for these, calling the REST API with a Web task and storing the JSON in a SQL table and then shredding the JSON using SQL functions like OPENJSON.
Example pipeline:
The key to getting this approach to work is the expression on the stored procedure parameter. This takes the whole JSON output from the Web task and passes it in to the proc. This is a simple logging proc which inserts the record into a logging table:
#string(activity('Web1').output)
I log to a table and then shred the JSON or you could use OPENJSON directly on the stored proc parameter, eg
--INSERT INTO ...
SELECT
CAST( [key] AS DATE ) AS timeSeriesDate,
JSON_VALUE ( x.[value], '$."1. open"' ) AS [open],
JSON_VALUE ( x.[value], '$."2. high"' ) AS [high],
JSON_VALUE ( x.[value], '$."3. low"' ) AS [low],
JSON_VALUE ( x.[value], '$."4. close"' ) AS [close],
JSON_VALUE ( x.[value], '$."5. volume"' ) AS [volume]
FROM dbo.myLog
CROSS APPLY OPENJSON(logDetails , '$."Time Series (Daily)"' ) x
--WHERE logId = 23333;
My results:

Does the data have a structure? If so, you can generate a dummy file, place it in sink and do a one time mapping. If not, you can Lookup on the file, iterate over the content in a For Each Loop Container and insert details on to a SQL table.
E.g.
insert <<your table>>
select '#item().name', '#item().address.city', #item().value
The important thing to remember is to iterate at the correct array. Let me know if it's not clear. Not in front of a system right now, so can't add screenshots.

Related

How to reuse JSON arguments within PostgreSQL stored procedure

I am using a stored procedure to INSERT into and UPDATE a number of tables. Some of the data is derived from a JSON parameter.
Although I have successfully used json_to_recordset() to extract named data from the JSON parameter, I cannot figure how to use it in an UPDATE statement. Also, I need to use some items of data from the JSON parameter a number of times.
Q: Is there a way to use json_to_recordset() to extract named data to a temporary table to allow me to reuse the data items throughout my stored procedure? Maybe I should SELECT INTO variables within the stored procedure?
Q: Failing that can anyone please provide a simple example of how to update a table using data returned from json_to_recordset(). I must also include data not from the JSON parameter such as now()::timestamp(0).
This is how I have used json_to_recordset() so far:
INSERT INTO myRealTable (
rec_timestamp,
rec_data1,
rec_data2
)
SELECT
now()::timestamp(0),
x.data1,
x.data2
FROM json_to_recordset(json_parameter) x
(
json_data1 int,
json_data2 boolean
);
Thank you.

PyFlink Error/Exception: "Hive Table doesn't support consuming update changes which is produced by node PythonGroupAggregate"

Using Flink 1.13.1 and a pyFlink and a user-defined table aggregate function (UDTAGG) with Hive tables as source and sinks, I've been encountering an error:
pyflink.util.exceptions.TableException: org.apache.flink.table.api.TableException:
Table sink 'myhive.mydb.flink_tmp_model' doesn't support consuming update changes
which is produced by node PythonGroupAggregate
This is the SQL CREATE TABLE for the sink
table_env.execute_sql(
"""
CREATE TABLE IF NOT EXISTS flink_tmp_model (
run_id STRING,
model_blob BINARY,
roc_auc FLOAT
) PARTITIONED BY (dt STRING) STORED AS parquet TBLPROPERTIES (
'sink.partition-commit.delay'='1 s',
'sink.partition-commit.policy.kind'='success-file'
)
"""
)
What's wrong here?
I imagine you are executing a streaming query that is doing some sort of aggregation that requires updating previously emitted results. The parquet/hive sink does not support this -- once results are written, they are final.
One solution would be to execute the query in batch mode. Another would be to use a sink (or a format) that can handle updates. Or modify the query so that it only produces final results -- e.g., a time-windowed aggregation rather than an unbounded one.

How to use the same SSIS Data Flow with different Date Values?

I have a very straightforward SSIS package containing one data flow which is comprised of an OLEDB source and a flat file destination. The OLEDB source calls a query that takes 2 sets of parameters. I've mapped the parameters to Date/Time variables.
I would like to know how best to pass 4 different sets of dates to the variables and use those values in my query?
I've experimented with the For Each Loop Container using an item enumerator. However, that does not seem to work and the package throws a System.IO.IOException error.
My container is configured as follows:
Note that both variables are of the Date/Time data type.
How can I pass 4 separate value sets to the same variables and use each variable pair to run my data flow?
Setup
I created a table and populated it with contiguous data for your sample set
DROP TABLE IF EXISTS dbo.SO_67439692;
CREATE TABLE dbo.SO_67439692
(
SurrogateKey int IDENTITY(1,1) NOT NULL
, ActionDate date
);
INSERT INTO
dbo.SO_67439692
(
ActionDate
)
SELECT
TOP (DATEDIFF(DAY, '2017-12-31', '2021-04-30'))
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '2017-12-31') AS ActionDate
FROM
sys.all_columns AS AC;
In my SSIS Package, I added two Variables, startDate and endDAte2018 both of type Date Time. I added an OLE DB Connection manager pointed to the database where I made the above tables.
I added a Foreach Item Enumerator, configured it for Item Enumerator and defined the columns there as datetime as well
I populated it (what a clunky editor) with the year ranges from 2018 to 2020 as shown and 2021-01-01 to 2021-04-30.
I wired the variables up as shown in the problem definition and ran it as is. No IO error reported.
Once I knew my foreach container was working, the data flow was trivial.
I added a data flow inside the foreach loop with an OLE DB Source using a parameterized query like so
DECLARE #StartDate date, #EndDate date;
SELECT #StartDate = ?, #EndDate = ?;
SELECT *
FROM
dbo.SO_67439692 AS S
WHERE
S.ActionDate >= #StartDate AND S.ActionDate <= #EndDate;
I mapped my two variables in as parameter names of 0 and 1 and ran it.
The setup you described works great. Either there is more to your problem than stated or there's something else misaligned. Follow along with my repro and compare it to what you've built and you should see where things are "off"

How to find/rip out all dimension/measures on an SSAS cube (Extended Events)

I am using Extended Events to analyse dimension/measure usage in an SSAS cube. I have used xmla template to create the event (from Chris Webb blog). I then have parsed the data into a staging table where I have stripped out the required field from the event data XML field.
select TraceFileName
, TraceEvent
, e.EventDataXML.value('(/event/data[#name="TextData"]/value)[1]', 'varchar(max)') as TextData
into #List
from
(
select [file_name] as TraceFileName
, object_name as TraceEvent
, convert(xml, event_data) as EventDataXML
from sys.fn_xe_file_target_read_file('*path*', null, null, null)
) e;
I plan to then use CHARINDEX to find measures/dimension calls within the field TextData. However to do this I will need a list of all dimensions / measures in the SSAS cube. Is there a way to rip this out?

Parse an object as string in output in Stream Azure Analytics

This question is regarding Stream Analytics. I want to export a blob into a SQL.I know the process, my question is with the query I have to use.
{"performanceCounter":[{"available_bytes":{"value":994164736.0},"categoryName":"Memory","instanceName":""}],"internal":{"data":{"id":"459bf840-d259-11e5-a640-1df0b6342362","documentVersion":"1.61"}},"context":{"device":{"type":"PC","network":"Ethernet","screenResolution":{},"locale":"en-US","id":"RD0003FF73B748","roleName":"Sdm.MyGovId.Static.Web","roleInstance":"Sdm.MyGovId.Static.Web_IN_1","oemName":"Microsoft Corporation","deviceName":"Virtual Machine","deviceModel":"Virtual Machine"},"application":{"version":"R2.0_20160205.5"},"location":{"continent":"North America","country":"United States","clientip":"104.41.209.0","province":"Washington","city":"Redmond"},"data":{"isSynthetic":false,"samplingRate":100.0,"eventTime":"2016-02-13T13:53:44.2667669Z"},"user":{"isAuthenticated":false,"anonAcquisitionDate":"0001-01-01T00:00:00Z","authAcquisitionDate":"0001-01-01T00:00:00Z","accountAcquisitionDate":"0001-01-01T00:00:00Z"},"operation":{},"cloud":{},"serverDevice":{},"custom":{"dimensions":[],"metrics":[]},"session":{}}}
{"performanceCounter":[{"percentage_processor_total":{"value":0.0123466420918703},"categoryName":"Processor","instanceName":"_Total"}],"internal":{"data":{"id":"459bf841-d259-11e5-a640-1df0b6342362","documentVersion":"1.61"}},"context":{"device":{"type":"PC","network":"Ethernet","screenResolution":{},"locale":"en-US","id":"RD0003FF73B748","roleName":"Sdm.MyGovId.Static.Web","roleInstance":"Sdm.MyGovId.Static.Web_IN_1","oemName":"Microsoft Corporation","deviceName":"Virtual Machine","deviceModel":"Virtual Machine"},"application":{"version":"R2.0_20160205.5"},"location":{"continent":"North America","country":"United States","clientip":"104.41.209.0","province":"Washington","city":"Redmond"},"data":{"isSynthetic":false,"samplingRate":100.0,"eventTime":"2016-02-13T13:53:44.2668221Z"},"user":{"isAuthenticated":false,"anonAcquisitionDate":"0001-01-01T00:00:00Z","authAcquisitionDate":"0001-01-01T00:00:00Z","accountAcquisitionDate":"0001-01-01T00:00:00Z"},"operation":{},"cloud":{},"serverDevice":{},"custom":{"dimensions":[],"metrics":[]},"session":{}}}
{"performanceCounter":[{"percentage_processor_time":{"value":0.0},"categoryName":"Process","instanceName":"w3wp"}],"internal":{"data":{"id":"459bf842-d259-11e5-a640-1df0b6342362","documentVersion":"1.61"}},"context":{"device":{"type":"PC","network":"Ethernet","screenResolution":{},"locale":"en-US","id":"RD0003FF73B748","roleName":"Sdm.MyGovId.Static.Web","roleInstance":"Sdm.MyGovId.Static.Web_IN_1","oemName":"Microsoft Corporation","deviceName":"Virtual Machine","deviceModel":"Virtual Machine"},"application":{"version":"R2.0_20160205.5"},"location":{"continent":"North America","country":"United States","clientip":"104.41.209.0","province":"Washington","city":"Redmond"},"data":{"isSynthetic":false,"samplingRate":100.0,"eventTime":"2016-02-13T13:53:44.2668342Z"},"user":{"isAuthenticated":false,"anonAcquisitionDate":"0001-01-01T00:00:00Z","authAcquisitionDate":"0001-01-01T00:00:00Z","accountAcquisitionDate":"0001-01-01T00:00:00Z"},"operation":{},"cloud":{},"serverDevice":{},"custom":{"dimensions":[],"metrics":[]},"session":{}}}
Well you can see 3 json objects which of them have different fields for the objects in the array performanceCounter. Basically the first object of every object. in the first is available_bytes, 2nd is percentage_processor_total, and 3rd is percentage_processor_time.
Because I'm exporting this to a sql table called performaceCounter, I should have a different column for every different object, so I would like to save this into an string and then I will parse it in my app.
As starting point I have this query that reads an input(the blob) and write into an output(SQL)
Select GetArrayElement(A.performanceCounter,0) as a
INTO
PerformanceCounterOutput
FROM PerformanceCounterInput A
This GetArrayElement takes the index 0 of the array in performanceCounter but then writes a different column for each different field that find in every object. So I should have all different counters and create a column for each one, but my idea is more like a column call performanceCounterData and save string like
'"available_bytes":"value":994164736.0},"categoryName":"Memory","instanceName":""'
or this
"{"percentage_processor_total":{"value":0.0123466420918703},"categoryName":"Processor","instanceName":"_Total"}"
or
"{"percentage_processor_time":"value":0.0},"categoryName":"Process","instanceName":"w3wp"}"
How can I cast an array like a String?
I tried CAST(GetArrayElement(A.performanceCounter,0) as nvarchar(max)) but I can't.
Please some good help will be rewarded
With the following solution I get 2 columns with the name of the property and another with the value of the property, that it was my initial purpose
With pc as
(
Select
GetArrayElement(A.[performanceCounter],0) as counter
,A.context.data.eventTime as eventTime
,A.context.location.clientip as clientIp
,A.context.location.continent as continent
,A.context.location.country as country
,A.context.location.province as province
,A.context.location.city as city
FROM PerformanceCounterInput A
)
select
props.propertyName,
props.propertyValue,
pc.counter.categoryName,
pc.counter.instanceName,
pc.eventTime,
pc.clientIp,
pc.continent,
pc.country,
pc.province,
pc.city
from pc
cross apply GetRecordProperties(pc.counter) as props
where props.propertyname<>'categoryname' and props.propertyname<>'instancename'
Anyway if somebody finds how to write an object in plain text in analytics, still rewarded and appreciated will be
You can do something like below, this gives the counters as (propertyName, propertyValue) pairs.
with T1 as
(
select
GetArrayElement(iotInput.performanceCounter, 0) Counter,
System.Timestamp [EventTime]
from
iotInput timestamp by context.data.eventTime
)
select
[EventTime],
Counter.categoryName,
Counter.available_bytes [Value]
from
T1
where
Counter.categoryName = 'Memory'
union all
select
[EventTime],
Counter.categoryName,
Counter.percentage_processor_time [Value]
from
T1
where
Counter.categoryName = 'Process'
Query that gives one column per counter type can also be done, you will have to either do a join or a group by with 'case' statements for every counter.