Reading multi dimensional array output from an API - json

I have an output from an API that extracts a store's details as follows.
Here, I have a store with three groups of employee information.
Output:
"store": {
"group ID":"123456"
"group name":"Group A"
{
"employees": [
{
"name": "bob",
"hiredate": "2015-01-01"
},
{
"name": "sam",
"hiredate": "2015-01-02"
},
{
"name": "ken",
"hiredate": "2015-01-03"
}
]
},
"group ID":"123457"
"group name":"Group B"
{
"employees": [
{
"name": "bob1",
"hiredate": "2015-01-01"
},
{
"name": "sam1",
"hiredate": "2015-01-02"
},
{
"name": "ken1",
"hiredate": "2015-01-03"
}
]
},
"group ID":"123458"
"group name":"Group C"
{
"employees": [
{
"name": "bob2",
"hiredate": "2015-01-01"
},
{
"name": "sam2",
"hiredate": "2015-01-02"
},
{
"name": "ken2",
"hiredate": "2015-01-03"
}
]
}
}
Query:
Now, I would like to write the property mapping in my XML script in a way that helps me fetch the above data into a workbook table as follows.
Group A 123456 bob
Group A 123456 sam
Group A 123457 ken
Group B 123457 bob1
Group B 123457 sam1
Group C 123458 bob2
Group C 123458 sam2
What should the hierarchy be in my extraction instruction to help me achieve this?

Related

Pyspark transform json into multiple dataframes

I have multiple json with this structure (association can have one or multiple objects & Charasteritics doesn't always has the same number of kv pairs:
{
"vl:VNETList": {
"Template": {
"ID": "SomeId",
"Object": [
{
"ID": "my_first_id",
"Context": {
"ID": "Avngate"
},
"Name": "Model Description",
"ClassID": "PID",
"Association": [
{
"Object": {
"ID": "test.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
},
{
"Object": {
"ID": "Project Description",
"Context": {
"ID": "Avngate"
}
},
"#type": "is an element of"
}
],
"Characteristic": [
{
"Name": "InfoType",
"Value": "image/svg+xml"
},
{
"Name": "LOCK",
"Value": false
},
{
"Name": "EXFI",
"Value": 10000
}
]
},
{
"ID": "my_second_id",
"Context": {
"ID": "Avngate2"
},
"Name": "Model Description2",
"ClassID": "PID2",
"Association": [
{
"Object": {
"ID": "test2.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
}
],
"Characteristic": [
{
"Name": "Dbtencoding",
"Value": "unicode"
}
]
}
]
}
}
I would like to build two dataframes like this:
and the second dataframe like this:
What's the best approach? If too complex, I would be able also to save the characteristics as a separate table referencing the objectId like with the association.
Read json and groupBy for the first one, just select for the second one with explode.
df1 = spark.read.json('test.json', multiLine=True)
df2 = df1.select(f.explode('vl:VNETList.Template.Object').alias('value')) \
.select('value.*')
df_f1 = df2.withColumn('Characteristic', f.explode('Characteristic')) \
.groupBy('ID', 'Name', 'ClassId') \
.pivot('Characteristic.Name') \
.agg(f.first('Characteristic.Value'))
df_f2 = df2.withColumn('Association', f.explode('Association')) \
.select('ID', 'Association.Object.ID', 'Association.#Type') \
.toDF('ID', 'AssociationId', 'AssociationType')
df_f1.show()
df_f2.show()
+------------+------------------+-------+-----------+-----+-------------+-----+
| ID| Name|ClassId|Dbtencoding| EXFI| InfoType| LOCK|
+------------+------------------+-------+-----------+-----+-------------+-----+
| my_first_id| Model Description| PID| null|10000|image/svg+xml|false|
|my_second_id|Model Description2| PID2| unicode| null| null| null|
+------------+------------------+-------+-----------+-----+-------------+-----+
+------------+-------------------+----------------+
| ID| AssociationId| AssociationType|
+------------+-------------------+----------------+
| my_first_id| test.svg| is fulfilled by|
| my_first_id|Project Description|is an element of|
|my_second_id| test2.svg| is fulfilled by|
+------------+-------------------+----------------+

mysql, Get all rows which satisfies the group by condition

My rows looks like -
[
{
"id": 1254,
"receive_date": "2022-11-03T18:30:00.000Z",
"receive_time": "19:29:00",
"machineId": "TEST",
"attributeId": "connection_status",
"hardwareId": "5"
},
{
"id": 1255,
"receive_date": "2022-11-03T18:30:00.000Z",
"receive_time": "19:29:00",
"machineId": "TEST",
"attributeId": "connection_status",
"hardwareId": "6"
},
{
"id": 1256,
"receive_date": "2022-11-03T18:30:00.000Z",
"receive_time": "19:30:00",
"machineId": "TEST",
"attributeId": "connection_status",
"hardwareId": "6"
}
]
First I want to group the data by following fields -
day of year - column receive_date - datatype is DATE
minutes - column receive_time - datatype is TIME
After grouping, I want data that satisfies the grouping to be under one attribute.
I want something similar to following as the result of query.
[
{
"receive_date": "2022-11-03T18:30:00.000Z",
"receive_time": "19:29:00",
"data": [
{
"attributeId": "connection_status",
"hardwareId": "5",
"value": 478
},
{
"attributeId": "connection_status",
"hardwareId": "6",
"value": 344
}
]
},
{
"receive_date": "2022-11-03T18:30:00.000Z",
"receive_time": "19:30:00",
"data": [
{
"attributeId": "connection_status",
"hardwareId": "6",
"value": 789
}
]
},
]
Any help/suggestions would be appreciated.
Thank you.

extract values from JSON using jq

I have a JSON object that looks like this:
{
"Accounts": [
{
"Id": "1",
"Name": "Joe",
"Zip": "11111"
},
{
"Id": "2",
"Name": "Jack",
"Zip": "22222"
}
]
}
I am trying to write a jq query that gives me this:
[
{
"Id": "1",
"Name": "Joe"
},
{
"Id": "2",
"Name": "Jack"
}
]
How can I do that? Thanks.
jq '.Accounts | map({ Id, Name })'
Will produce
[
{
"Id": "1",
"Name": "Joe"
},
{
"Id": "2",
"Name": "Jack"
}
]
as you can try online using this demo.
.Accounts selects the Accounts key
map() will apply the following for each object [docs]
Create object with Id and Name key [docs]
Demo https://jqplay.org/s/v01P2gDVc8
You can do
[.Accounts[] | {Id, Name}]

2 Nested Dictionaries Json format using Json_norminalize using python

Hi I am trying to retrieve this data, there are 2 nested dictionaries within it :
{
"metadata": {
"stations": [
{
"id": "S108",
"device_id": "S108",
"name": "Kuala Lumpur",
"location": {
"latitude": 3.1390,
"longitude": 101.6869
}
},
{
"id": "S118",
"device_id": "S118",
"name": "Bukit Bintang",
"location": {
"latitude":3.1468,
"longitude": 101.7113
}
}
],
"reading_type": "DBT 1M F",
"reading_unit": "deg C"
},
"items": [
{
"timestamp": "2021-06-20T15:05:00+08:00",
"readings": [
{
"station_id": "S108",
"value": 32.6
},
{
"station_id": "S118",
"value": 30.3
}
]
}
]
I wanted to get the result like this :
Result
I have tried a few ways :
data = airtemp.json()
df = pd.json_normalize(data,record_path=['metadata', 'stations'])
df
data = airtemp.json()
df1 = pd.json_normalize(data,record_path=['items','readings'])
df1
Is there a way that I can use json_norminalize to form one table with station_id, name, latitude, longtitude, timestamp and value without breaking into 2 tables ?
Many thanks!
You can merge the two dataframes you created:
import pandas as pd
data = {
"metadata": {
"stations": [
{
"id": "S108",
"device_id": "S108",
"name": "Kuala Lumpur",
"location": {
"latitude": 3.1390,
"longitude": 101.6869
}
},
{
"id": "S118",
"device_id": "S118",
"name": "Bukit Bintang",
"location": {
"latitude": 3.1468,
"longitude": 101.7113
}
}
],
"reading_type": "DBT 1M F",
"reading_unit": "deg C"
},
"items": [
{
"timestamp": "2021-06-20T15:05:00+08:00",
"readings": [
{
"station_id": "S108",
"value": 32.6
},
{
"station_id": "S118",
"value": 30.3
}
]
}
]
}
stations = pd.json_normalize(data, record_path=['metadata', 'stations'])
readings = pd.json_normalize(data, record_path=['items', 'readings'])
result = stations.merge(readings, left_on='id', right_on='station_id')
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(result)
Outputs:
id device_id name location.latitude location.longitude \
0 S108 S108 Kuala Lumpur 3.1390 101.6869
1 S118 S118 Bukit Bintang 3.1468 101.7113
station_id value
0 S108 32.6
1 S118 30.3
There is only one timestamp in the data you provided so you will have to fetch that separately.

How to Query Sublist in CosmosDB to desired output

{
"Name": "Sam",
"Car": [
{
"Brand": "BMW",
"Category": "HunchBack",
"Type": "Gas"
},
{
"Brand": "Tesla",
"Category": "Sedan",
"Type": "Electric"
}
]
}
I want to Cosmos Sqlquery to query the sublist CAR on BRAND and it will only return those document that matches the criteria.
Select * from c JOIN t IN c.Car
where t.BRAND = 'Tesla'
I tried this but it only works partially, as it also return the Sublist BMW
But expected output is
{
"Name": "Sam",
"Car": [
{
"Brand": "Tesla",
"Category": "Sedan",
"Type": "Electric"
}
]
}
Try it?
Select distinct c.Name,ARRAY(SELECT n.Brand,n.Category,n.Type FROM n IN c.Car where n.Brand = 'benci') as Car from c JOIN t in c.Car where t.Brand='benci'