Query All Elements in Nested JSON Array PostrgreSQL - json

I am trying to create a query in SQL to retrieve DNS answer information so that I can visualize it in Grafana with the add of TimescaleDB. Right now, I am struggling to get postgres to query more than one element at a time. The structure of my JSON that I am trying to query looks like this:
{
"Z": 0,
"AA": 0,
"ID": 56559,
"QR": 1,
"RA": 1,
"RD": 1,
"TC": 0,
"RCode": 0,
"OpCode": 0,
"answer": [
{
"ttl": 19046,
"name": "i.stack.imgur.com",
"type": 5,
"class": 1,
"rdata": "i.stack.imgur.com.cdn.cloudflare.net"
},
{
"ttl": 220,
"name": "i.stack.imgur.com.cdn.cloudflare.net",
"type": 1,
"class": 1,
"rdata": "104.16.30.34"
},
{
"ttl": 220,
"name": "i.stack.imgur.com.cdn.cloudflare.net",
"type": 1,
"class": 1,
"rdata": "104.16.31.34"
},
{
"ttl": 220,
"name": "i.stack.imgur.com.cdn.cloudflare.net",
"type": 1,
"class": 1,
"rdata": "104.16.0.35"
}
],
"ANCount": 13,
"ARCount": 0,
"QDCount": 1,
"question": [
{
"name": "i.stack.imgur.com",
"qtype": 1,
"qclass": 1
}
]
}
There can be any number of answers, including zero, so I would like to figure out a way to query all answers. For example, I am trying to retrieve the ttl field from every index answer, and I can query a specific index, but have trouble querying all occurrences.
This works for querying a single index:
SELECT (data->'answer'->>0)::json->'ttl'
FROM dns;
When I looked around, I found this as a potential solution for querying all indices within the array, but it did not seem to work and told me "cannot extract elements from a scalar":
SELECT answer->>'ttl' ttl
FROM dns, jsonb_array_elements(data->'answer') answer, jsonb_array_elements(answer->'ttl') ttl

Using jsonb_array_elements() will give you a row for every object in the answer array. You can then dereference that object:
select a.obj->>'ttl' as ttl, a.obj->>'name' as name, a.obj->>'rdata' as rdata
from dns d
cross join lateral jsonb_array_elements(data->'answer') as a(obj)

Related

pandas json_normalize columns created as dtype object

I have a json object served from an api as follows:
{
"workouts": [
{
"id": 92527291,
"starts": "2021-06-28T15:42:44.000Z",
"minutes": 30,
"name": "Indoor Cycling",
"created_at": "2021-06-28T16:12:57.000Z",
"updated_at": "2021-06-28T16:12:57.000Z",
"plan_id": null,
"workout_token": "ELEMNT BOLT A1B3:59",
"workout_type_id": 12,
"workout_summary": {
"id": 87540207,
"heart_rate_avg": "152.0",
"calories_accum": "332.0",
"created_at": "2021-06-28T16:12:58.000Z",
"updated_at": "2021-06-28T16:12:58.000Z",
"power_avg": "185.0",
"distance_accum": "17520.21",
"cadence_avg": "87.0",
"ascent_accum": "0.0",
"duration_active_accum": "1801.0",
"duration_paused_accum": "0.0",
"duration_total_accum": "1801.0",
"power_bike_np_last": "186.0",
"power_bike_tss_last": "27.6",
"speed_avg": "9.73",
"work_accum": "332109.0",
"file": {
"url": "https://cdn.wahooligan.com/wahoo-cloud/production/uploads/workout_file/file/FPoJBPZo17BvTmSomq5Y_Q/2021-06-28-154244-ELEMNT_BOLT_A1B3-59-0.fit"
}
}
}
],
"total": 55,
"page": 1,
"per_page": 1,
"order": "descending",
"sort": "starts"
}
I want to get the data into a dataframe. However, lots of the columns seem to have a dtype of object. I assume that this is because some of the numeric values in the json are double quoted. What is the best and most efficient way to avoid this (the json potentially has many workouts elements)?
Is it to fix the returned json? Or to iterate through the dataframe columns and convert the objects to floats?
Thank you
Martyn
IIUC, you can try:
df = pd.json_normalize(json_data, meta=[
'total', 'page', 'per_page', 'order', 'sort'], record_path='workouts').convert_dtypes()
Try using pandas.to_numeric. Here are the docs.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html

JSON parse in SQL with hash key

I have a table with a JSON column. in this column I have a blockchain hash data.
for example this JSON:
{ "017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894": {
"transaction": {
"block_id": 648895,
"id": 568135560,
"hash": "017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894",
"date": "2020-01-14",
"time": "2020-01-14 11:37:37",
"size": 198,
},
"inputs": [
{
"block_id": 648859,
"transaction_id": 567456558,
"index": 4,
"transaction_hash": "8aa2c6c9a804mate29790e03fac462782d99f16614732f82a5214786926e1397",
"date": "2020-01-13",
"time": "2020-01-13 23:15:37",
"value": 300830,
"value_usd": 33.2264,
"recipient": "1LcrmomE74BPzBTdduE8WHU2ox4QAFEpQi",
}
],
"outputs": [
{
"block_id": 648445,
"transaction_id": 568146680,
"index": 0,
"transaction_hash": "017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894",
"date": "2020-01-14",
"time": "2020-01-14 11:37:37",
"value": 300048,
"value_usd": 31.9397,
"recipient": "12UJZqf4sDGRNb9uYBABJkMyX91iLjDViT",
}
]}}
I used below query:
SELECT *, JSON_VALUE(d.json_data,'$.017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894.transaction.size') as jj
FROM BlockChain as d
but I have an error
Msg 13607, Level 16, State 4, Line 39
JSON path is not properly formatted. Unexpected character '0' is found at position 2.
Does anyone have any idea?
The path
'$.017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894.transaction.size'
cannot have a node starting with a 0. So enclose it in quotes:
'$."017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894".transaction.size'.
You also have a problem with your actual JSON, in that it has trailing commas, which is not supported in SQL Server, nor in the vast majority of parsers and browsers, as it is against the spec.
If you have different key names for each value, then you need to break out the JSON with OPENJSON:
SELECT b.*, j.[key] AS hash, JSON_VALUE(j.value,'$.transaction.size') as jj
FROM BlockChain as d
CROSS APPLY OPENJSON(d.json_data) AS j

Jq convert an object into an array

I have the following file "Pokemon.json", it's a stripped down list of Pokémon, listing their Pokédex ID, name and an array of Object Types.
[{
"name": "onix",
"id": 95,
"types": [{
"slot": 2,
"type": {
"name": "ground"
}
},
{
"slot": 1,
"type": {
"name": "rock"
}
}
]
}, {
"name": "drowzee",
"id": 96,
"types": [{
"slot": 1,
"type": {
"name": "psychic"
}
}]
}]
The output I'm trying to achieve is, extracting the name value of the type object and inserting it into an array.
I can easily get an array of all the types with
jq -r '.pokemon[].types[].type.name' pokemon.json
But I'm missing the key part to transform the name field into it's own array
[ {
"name": "onix",
"id": 95,
"types": [ "rock", "ground" ]
}, {
"name": "drowzee",
"id": 96,
"types": [ "psychic" ]
} ]
Any help appreciated, thank you!
In the man it states you have an option to use map - which essentially means walking over each result and returning something (in our case, same data, constructed differently.)
This means that for each row you are creating new object, and put some values inside
Pay attention, you do need another iterator within, since we want one object per row.
(we simply need to map the values in different way it is constructed right now.)
So the solution might look like so:
jq -r '.pokemon[]|{name:.name, id:.id, types:.types|map(.type.name)}' pokemon.json

How can I insert following JSON in MongoDB collection as different documents

I need to insert data in mongo but the JSON I am getting has multiple values in every field and I don't know how can I split them to insert in different documents.
I want to insert array data in different objects in MongoDB
{
"activity_template_id": [
1,
2,
3,
4,
5,
7
],
"done_date": [
"2019-08-10",
"2019-08-10",
"2019-08-10",
"0000-01-01",
"0000-01-01",
"0000-01-01"
],
"is_prescribed": [
"N",
"N",
"N",
"N",
"N",
"Y"
],
"material_id": [
1,
5,
21,
10,
14,
0
],
"qty": [
"1",
"1",
"1",
"0",
"0",
"0"
],
"unit_id": [
1,
1,
25,
0,
0,
0
],
}
(As far as I know) there is no feature in MongoDB itself that would process input data like that. You would do that in application code before calling MongoDB.
If there is no separate application, you can use standard JavaScript functions within the mongo shell to do that.

Ansible trouble parsing JSON to get correct UUIDs to poweron VMs

making an API GET cal I get the following JSON structure:
{
"metadata": {
"grand_total_entities": 231,
"total_entities": 0,
"count": 231
},
"entities": [
{
"allow_live_migrate": true,
"gpus_assigned": false,
"ha_priority": 0,
"memory_mb": 1024,
"name": "test-ansible2",
"num_cores_per_vcpu": 2,
"num_vcpus": 1,
"power_state": "off",
"timezone": "UTC",
"uuid": "e1aff9d4-c834-4515-8c08-235d1674a47b",
"vm_features": {
"AGENT_VM": false
},
"vm_logical_timestamp": 1
},
{
"allow_live_migrate": true,
"gpus_assigned": false,
"ha_priority": 0,
"memory_mb": 1024,
"name": "test-ansible1",
"num_cores_per_vcpu": 1,
"num_vcpus": 1,
"power_state": "off",
"timezone": "UTC",
"uuid": "4b3b315e-f313-43bb-941b-03c298937b4d",
"vm_features": {
"AGENT_VM": false
},
"vm_logical_timestamp": 1
},
{
"allow_live_migrate": true,
"gpus_assigned": false,
"ha_priority": 0,
"memory_mb": 4096,
"name": "test",
"num_cores_per_vcpu": 1,
"num_vcpus": 2,
"power_state": "off",
"timezone": "UTC",
"uuid": "fbe9a1ac-cf45-4efa-9d65-b3257548a9f4",
"vm_features": {
"AGENT_VM": false
},
"vm_logical_timestamp": 17
},
]
}
In my Ansible playbook I register a variable holding this content.
I need to get a list of UUID of "test-ansible1" and "test-ansible2" but I'm having a hard time finding the best way to to this.
Note that I have another variable holding the list of names for which I need to lookup the UUID.
The need is to use those UUIDs to fire a poweron command for all UUIDs corresponding to specific names.
How would you guys do that?
I've taken a number of approaches but I can't seem to get what I want so I prefer an uninfluenced opinion.
P.S.: This is what Nutanix AHV returns as a get of all vms thgough APIs. There seems to me no way to get only specific VMs JSON information but only all VMs.
Thanks.
Here is some Jinja2 magic for you:
- debug:
msg: "{{ mynames | map('extract', dict(test_json | json_query('entities[].[name,uuid]'))) | list }}"
vars:
mynames:
- test-ansible1
- test-ansible2
Explanation:
test_json | json_query('entities[].[name,uuid]') reduces your original json data to a list of elements which are lists of two items – name value and uuid value:
[
[
"test-ansible2",
"e1aff9d4-c834-4515-8c08-235d1674a47b"
],
[
"test-ansible1",
"4b3b315e-f313-43bb-941b-03c298937b4d"
],
[
"test",
"fbe9a1ac-cf45-4efa-9d65-b3257548a9f4"
]
]
BTW you can use http://jmespath.org/ to test query statements.
dict(...) when applied to such structure (list of "touples") generates a dictionary:
{
"test": "fbe9a1ac-cf45-4efa-9d65-b3257548a9f4",
"test-ansible1": "4b3b315e-f313-43bb-941b-03c298937b4d",
"test-ansible2": "e1aff9d4-c834-4515-8c08-235d1674a47b"
}
Then we apply extract filter as per documentation to fetch only required elements:
[
"4b3b315e-f313-43bb-941b-03c298937b4d",
"e1aff9d4-c834-4515-8c08-235d1674a47b"
]