Parse json content in azure stream analytics - json

My SQL skills are very limited. Please help.
Modbus module on the Azure IoT edge returns JSON content to Stream Analytics job in this format (from downloaded sample data) -
[
{
"PublishTimestamp": "2021-07-28 19:28:15",
"Content": [
{
"HwId": "XY-MOD2-1",
"Data": [
{
"CorrelationId": "DefaultCorrelationId",
"SourceTimestamp": "2021-07-28 19:28:15",
"Values": [
{
"DisplayName": "Temperature",
"Address": "30002",
"Value": "210"
},
{
"DisplayName": "Temperature",
"Address": "30003",
"Value": "538"
}
]
}
]
}
],
"EventProcessedUtcTime": "2021-07-28T20:26:23.9127084Z",
"PartitionId": 0,
"EventEnqueuedUtcTime": "2021-07-28T19:28:15.9460000Z",
"IoTHub": {
"MessageId": null,
"CorrelationId": null,
"ConnectionDeviceId": "rp4linuxedge1",
"ConnectionDeviceGenerationId": "637630846187016425",
"EnqueuedTime": "2021-07-28T19:28:15.9550000Z",
"StreamId": null
}
},
I am unable to figure out what SQL syntax should I use to get this as output -
SourceTimestamp
Address
Value
Time1
30002
210
Time1
30003
538
Time2
30002
215
Time2
30003
540

The array within the JSON object can be accessed using this block of codes -
select
cast(dataArr.ArrayValue.SourceTimestamp as datetime) as SourceTimestamp,
cast(valuesArr.ArrayValue.Address as bigint) as Address,
cast(valuesArr.ArrayValue.Value as float) as Value
into powerbioutput
from iotinput i
cross apply GetArrayElements(i.Content) as contentArr
cross apply GetArrayElements(contentArr.ArrayValue.Data) as dataArr
cross apply GetArrayElements(dataArr.ArrayValue.[Values]) as valuesArr

Related

Extract value of Tags from cloudTrail logs using Athena

I am trying to query cloudtrail logs using Athena. My goal is to find specific instances and extract them with their Tags.
The query I am using is:
SELECT eventTime, awsRegion , json_extract(responseelements, '$.instancesSet.items[0].instanceId') AS instanceId, json_extract(responseelements, '$.instancesSet.items[0].tagSet.items') AS TAGS FROM cloudtrail_logs_PP WHERE (eventName = 'RunInstances' OR eventName = 'StartInstances' ) AND requestparameters LIKE '%mytest1%' AND "timestamp" BETWEEN '2021/09/01' AND '2021/10/01' ORDER BY eventTime;
Using this query - I am able to get all Tags under one column.
Output of query
I want to extract only specific Tags and need help in the same. How cam I extract the only specific Tag?
I tried enhancing my query as json_extract(responseelements, '$.instancesSet.items[0].tagSet.items[0]' but the order of Tags is diff in diff logs - so cant pass the index location.
My json file in S3 is something like below:
{
"eventVersion": "1",
"eventTime": "2022-05-27T18:44:29Z",
"eventName": "RunInstances",
"awsRegion": "us-east-1",
"requestParameters": {
"instancesSet": {
"items": [{
"imageId": "ami-1234545",
"keyName": "DDKJKD"
}]
},
"instanceType": "m5.2xlarge",
"monitoring": {
"enabled": false
},
"hibernationOptions": {
"configured": false
}
},
"responseElements": {
"instancesSet": {
"items": [{
"tagSet": {
"items": [ {
"key": "11",
"value": "DS"
}, {
"key": "1",
"value": "A"
}]
}]
}
}
}

Azure ADF - Array elements can only be selected using an integer index

Hi I am trying to select Status from Json Array in azure data factory
{
"dataRead": 2997,
"dataWritten": 2714,
"filesWritten": 1,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"rowsRead": 11,
"rowsCopied": 11,
"copyDuration": 3,
"throughput": 0.976,
"errors": [],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (East US)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.06666666666666667,
"unit": "DIUHours"
}
]
},
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "AzureSqlDatabase",
"region": "East US"
},
"sink": {
"type": "AzureBlobStorage",
"region": "East US"
},
"status": "Succeeded",
"start": "2020-03-19T06:24:39.0666585Z",
"duration": 3,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
I have tried selecting #activity('Copy data From CCP TO Blob').output.executionDetails.status.It throws an error:
'Array elements can only be selected using an integer index'.
Any way to resolve it?
executionDetails is an array, you have to set index to refer elements in it.
Please try:
#activity('Copy data From CCP TO Blob').output.executionDetails[0].status
Thank you for the reply
Yes, we have to use slicing and indexing the lists and Dictionaries
I have tried Dispensing_Unit_Master_Dim
#activity('Copy data From CCP TO Blob').output.executionDetails[0]['status'] and it works
0 and status there is no Dot

How do I define this variable from a JSON response?

I am working with calling API data from weather providers and am trying to define a variable mtwnsd24 with the following code:
var mtwnsd24 = data.data.coordinates.dates.value[2];
$(".mtwnsd24").append(mtwnsd24);
}
);
The response, when run in Postman, gives the following JSON and I want to get the value "42.4".
"status": "OK",
"data": [
{
"parameter": "wind_speed_10m:kmh",
"coordinates": [
{
"lat": 40.014994,
"lon": -73.811646,
"dates": [
{
"date": "2020-01-04T05:00:00Z",
"value": 5.0
},
{
"date": "2020-01-05T05:00:00Z",
"value": 42.4
},
{
"date": "2020-01-06T05:00:00Z",
"value": 17.7
}
]
}
]
},
The definition nor any variations seem to work.
This should do the trick
data.data[0].coordinates[0].dates[1].value
Result is
42.4
Note that json array indexes are zero-based so if you want second element, you need to use index of 1

AWS Athena - Querying JSON - Searching for Values

I have nested JSON files on S3 and am trying to query them with Athena.
However, I am having problems to query the nested JSON values.
My JSON file looks like this:
{
"id": "17842007980192959",
"acount_id": "17841401243773780",
"stats": [
{
"name": "engagement",
"period": "lifetime",
"values": [
{
"value": 374
}
],
"title": "Engagement",
"description": "Total number of likes and comments on the media object",
"id": "17842007980192959/insights/engagement/lifetime"
},
{
"name": "impressions",
"period": "lifetime",
"values": [
{
"value": 11125
}
],
"title": "Impressions",
"description": "Total number of times the media object has been seen",
"id": "17842007980192959/insights/impressions/lifetime"
},
{
"name": "reach",
"period": "lifetime",
"values": [
{
"value": 8223
}
],
"title": "Reach",
"description": "Total number of unique accounts that have seen the media object",
"id": "17842007980192959/insights/reach/lifetime"
},
{
"name": "saved",
"period": "lifetime",
"values": [
{
"value": 0
}
],
"title": "Saved",
"description": "Total number of unique accounts that have saved the media object",
"id": "17842007980192959/insights/saved/lifetime"
}
],
"import_date": "2017-12-04"
}
What I'm trying to do is to query the "stats" field value where name=impressions.
So ideally something like:
SELECT id, account_id, stats.values.value WHERE stats.name='engagement'
AWS example: https://docs.aws.amazon.com/athena/latest/ug/searching-for-values.html
Any help would be appreciated.
You can query the JSON with the following table definition:
CREATE EXTERNAL TABLE test(
id string,
acount_id string,
stats array<
struct<
name:string,
period:string,
values:array<
struct<value:string>>,
title:string
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucket/';
Now, the value column is available through the following unnesting:
select id, acount_id, stat.name,x.value
from test
cross join UNNEST(test.stats) as st(stat)
cross join UNNEST(stat."values") as valx(x)
WHERE stat.name='engagement';

DocumentDB get min of subarray

my json data:
[
{
"Code": "GB-00001",
"BasicInformation": {
"WGS84Longitude": -4.670000,
"WGS84Latitude": 50.340000
},
"Availability": [{
"ArrivalDate": "2017-04-21",
"Price": 689
},
{
"ArrivalDate": "2017-04-28",
"Price": 1341
}
]},
{
"Code": "GB-00002",
"BasicInformation": {
"WGS84Longitude": -4.680000,
"WGS84Latitude": 50.350000
},
"Availability": [{
"ArrivalDate": "2017-04-21",
"Price": 659
},
{
"ArrivalDate": "2017-04-28",
"Price": 1440
}
]}
}]
I'd like the result to be like:
[
{
"HouseCode": "GB-00001",
"Country": "GB",
"location": {
"type": "Point",
"coordinates": [
50.340000,
-4.670000
]
}, "lowestPrice": 689
},
{
"HouseCode": "GB-00002",
"Country": "GB",
"location": {
"type": "Point",
"coordinates": [
50.350000,
-4.680000
]
}, "lowestPrice" : 659
}
My problem is: how to use the min(c.Availability.Price)
This is my current query with the lat lng convert to point, but no idea how to get the minimum/lowest price.
SELECT c.Code, c.BasicInformation.Country ,
{"type":"Point","coordinates": [c.BasicInformation.Latitude, c.BasicInformation.Longitude]} as location
FROM c
already tried with Join c.Availability a and , min(a.Price)
edit perhaps I am too early? https://feedback.azure.com/forums/263030-documentdb/suggestions/18561901-add-group-by-support-for-aggregate-functions
found that url in https://stackoverflow.com/a/42697673/169714
This is a pretty close to ideal situation for a user defined function (UDF).
Here is one that should do the trick:
function lowestPrice(availability) {
var i, len, lowest, row;
lowest = 2e308;
for (i = 0, len = availability.length; i < len; i++) {
row = availability[i];
lowest = Math.min(lowest, row.Price);
}
return lowest;
};
You call it like this:
SELECT
c.Code,
c.BasicInformation.Country,
{"type":"Point","coordinates": [
c.BasicInformation.Latitude, c.BasicInformation.Longitude
]} as location,
udf.lowestPrice(c.Availability) as lowestPrice
FROM c
AFAIK, you could only use UDF to achieve your requirement for now. Also, I have checked the code provided by Larry Maccherone, and it could both work on Azure DocumentDB service and my DocumentDB Emulator (version 1.11.136.2) as follows:
DocumentDB.GatewayService.exe has stopped working
For DocumentDB.GatewayService crash, I assumed that you need to collect the dump files and attach them with an email to askdocdb#microsoft.com. For more details, you could refer to DocumentDB Emulator troubleshooting.