JSON parse in SQL with hash key - json

I have a table with a JSON column. in this column I have a blockchain hash data.
for example this JSON:
{ "017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894": {
"transaction": {
"block_id": 648895,
"id": 568135560,
"hash": "017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894",
"date": "2020-01-14",
"time": "2020-01-14 11:37:37",
"size": 198,
},
"inputs": [
{
"block_id": 648859,
"transaction_id": 567456558,
"index": 4,
"transaction_hash": "8aa2c6c9a804mate29790e03fac462782d99f16614732f82a5214786926e1397",
"date": "2020-01-13",
"time": "2020-01-13 23:15:37",
"value": 300830,
"value_usd": 33.2264,
"recipient": "1LcrmomE74BPzBTdduE8WHU2ox4QAFEpQi",
}
],
"outputs": [
{
"block_id": 648445,
"transaction_id": 568146680,
"index": 0,
"transaction_hash": "017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894",
"date": "2020-01-14",
"time": "2020-01-14 11:37:37",
"value": 300048,
"value_usd": 31.9397,
"recipient": "12UJZqf4sDGRNb9uYBABJkMyX91iLjDViT",
}
]}}
I used below query:
SELECT *, JSON_VALUE(d.json_data,'$.017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894.transaction.size') as jj
FROM BlockChain as d
but I have an error
Msg 13607, Level 16, State 4, Line 39
JSON path is not properly formatted. Unexpected character '0' is found at position 2.
Does anyone have any idea?

The path
'$.017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894.transaction.size'
cannot have a node starting with a 0. So enclose it in quotes:
'$."017989w06d3f902f1f362dfg48f862dba6a605229e99859a91d854f93ac13894".transaction.size'.
You also have a problem with your actual JSON, in that it has trailing commas, which is not supported in SQL Server, nor in the vast majority of parsers and browsers, as it is against the spec.
If you have different key names for each value, then you need to break out the JSON with OPENJSON:
SELECT b.*, j.[key] AS hash, JSON_VALUE(j.value,'$.transaction.size') as jj
FROM BlockChain as d
CROSS APPLY OPENJSON(d.json_data) AS j

Related

pandas json_normalize columns created as dtype object

I have a json object served from an api as follows:
{
"workouts": [
{
"id": 92527291,
"starts": "2021-06-28T15:42:44.000Z",
"minutes": 30,
"name": "Indoor Cycling",
"created_at": "2021-06-28T16:12:57.000Z",
"updated_at": "2021-06-28T16:12:57.000Z",
"plan_id": null,
"workout_token": "ELEMNT BOLT A1B3:59",
"workout_type_id": 12,
"workout_summary": {
"id": 87540207,
"heart_rate_avg": "152.0",
"calories_accum": "332.0",
"created_at": "2021-06-28T16:12:58.000Z",
"updated_at": "2021-06-28T16:12:58.000Z",
"power_avg": "185.0",
"distance_accum": "17520.21",
"cadence_avg": "87.0",
"ascent_accum": "0.0",
"duration_active_accum": "1801.0",
"duration_paused_accum": "0.0",
"duration_total_accum": "1801.0",
"power_bike_np_last": "186.0",
"power_bike_tss_last": "27.6",
"speed_avg": "9.73",
"work_accum": "332109.0",
"file": {
"url": "https://cdn.wahooligan.com/wahoo-cloud/production/uploads/workout_file/file/FPoJBPZo17BvTmSomq5Y_Q/2021-06-28-154244-ELEMNT_BOLT_A1B3-59-0.fit"
}
}
}
],
"total": 55,
"page": 1,
"per_page": 1,
"order": "descending",
"sort": "starts"
}
I want to get the data into a dataframe. However, lots of the columns seem to have a dtype of object. I assume that this is because some of the numeric values in the json are double quoted. What is the best and most efficient way to avoid this (the json potentially has many workouts elements)?
Is it to fix the returned json? Or to iterate through the dataframe columns and convert the objects to floats?
Thank you
Martyn
IIUC, you can try:
df = pd.json_normalize(json_data, meta=[
'total', 'page', 'per_page', 'order', 'sort'], record_path='workouts').convert_dtypes()
Try using pandas.to_numeric. Here are the docs.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html

JMESPath how to write a query with multi-level filter?

I have been studying official documentation of JMESPath and a few other resources. However I was not successful with the following task:
my data structure is a json from vimeo api (video list):
data array contains lots of objects, each object is the uploaded file that has many attributes and various options.
"data": [
{
"uri": "/videos/00001",
"name": "Video will be added.mp4",
"description": null,
"type": "video",
"link": "https://vimeo.com/00001",
"duration": 9,
"files":[
{
"quality": "hd",
"type": "video/mp4",
"width": 1440,
"height": 1440,
"link": "https://player.vimeo.com/external/4443333.sd.mp4",
"created_time": "2020-09-01T19:10:01+00:00",
"fps": 30,
"size": 10807854,
"md5": "643d9f18e0a63e0630da4ad85eecc7cb",
"public_name": "UHD 1440p",
"size_short": "10.31MB"
},
{
"quality": "sd",
"type": "video/mp4",
"width": 540,
"height": 540,
"link": "https://player.vimeo.com/external/44444444.sd.mp4",
"created_time": "2020-09-01T19:10:01+00:00",
"fps": 30,
"size": 1345793,
"md5": "cb568939bb7b276eb468d9474c1f63f6",
"public_name": "SD 540p",
"size_short": "1.28MB"
},
... other data
]
},
... other uploaded files
]
Filter I need to apply is that duration needs to be less than 10 and width of file needs to be 540 and the result needs to contain a link (url) from files
I have managed to get only one of structure-levels working:
data[].files[?width == '540'].link
I need to extract this kind of list
[
{
"uri": "/videos/111111",
"link": "https://player.vimeo.com/external/4123112312.sd.mp4"
},
{
"uri": "/videos/22222",
"link": "https://player.vimeo.com/external/1231231231.sd.mp4"
},
...other data
]
Since the duration is in your data array, you will have to add this filter at that level.
You will also have to use what is described under the section filtering and selecting nested data because you only care of one specific type of file under the files array, so, you can use the same type of query structure | [0] in order to pull only the first element of the filtered files array.
So on your reduced exemple, the query:
data[?duration < `10`].{ uri: uri, link: files[?width == `540`].link | [0] }
Would yield the expected:
[
{
"uri": "/videos/00001",
"link": "https://player.vimeo.com/external/44444444.sd.mp4"
}
]

Query All Elements in Nested JSON Array PostrgreSQL

I am trying to create a query in SQL to retrieve DNS answer information so that I can visualize it in Grafana with the add of TimescaleDB. Right now, I am struggling to get postgres to query more than one element at a time. The structure of my JSON that I am trying to query looks like this:
{
"Z": 0,
"AA": 0,
"ID": 56559,
"QR": 1,
"RA": 1,
"RD": 1,
"TC": 0,
"RCode": 0,
"OpCode": 0,
"answer": [
{
"ttl": 19046,
"name": "i.stack.imgur.com",
"type": 5,
"class": 1,
"rdata": "i.stack.imgur.com.cdn.cloudflare.net"
},
{
"ttl": 220,
"name": "i.stack.imgur.com.cdn.cloudflare.net",
"type": 1,
"class": 1,
"rdata": "104.16.30.34"
},
{
"ttl": 220,
"name": "i.stack.imgur.com.cdn.cloudflare.net",
"type": 1,
"class": 1,
"rdata": "104.16.31.34"
},
{
"ttl": 220,
"name": "i.stack.imgur.com.cdn.cloudflare.net",
"type": 1,
"class": 1,
"rdata": "104.16.0.35"
}
],
"ANCount": 13,
"ARCount": 0,
"QDCount": 1,
"question": [
{
"name": "i.stack.imgur.com",
"qtype": 1,
"qclass": 1
}
]
}
There can be any number of answers, including zero, so I would like to figure out a way to query all answers. For example, I am trying to retrieve the ttl field from every index answer, and I can query a specific index, but have trouble querying all occurrences.
This works for querying a single index:
SELECT (data->'answer'->>0)::json->'ttl'
FROM dns;
When I looked around, I found this as a potential solution for querying all indices within the array, but it did not seem to work and told me "cannot extract elements from a scalar":
SELECT answer->>'ttl' ttl
FROM dns, jsonb_array_elements(data->'answer') answer, jsonb_array_elements(answer->'ttl') ttl
Using jsonb_array_elements() will give you a row for every object in the answer array. You can then dereference that object:
select a.obj->>'ttl' as ttl, a.obj->>'name' as name, a.obj->>'rdata' as rdata
from dns d
cross join lateral jsonb_array_elements(data->'answer') as a(obj)

How can I can improve raw JSON data in order to use it?

I'm trying to use some results exported in JSON of a script called "Mixed Content Scan" (it's a script in order to search on a website if there is some mixed HTTP/HTTPS content and if all your pages are ok in HTTPS).
I'm a beginner with JSON, I read and watched a lot of tutorials in order to understand how to structure JSON data but I'm stumbling on something.
Here is a sample of my data (first 3 lines) :
{"message":"Scanning https://mywebsite.com/","context":[],"level":250,"level_name":"NOTICE","channel":"MCS","datetime":{"date":"2018-10-05 23:48:50.268196","timezone_type":3,"timezone":"America/New_York"},"extra":[]}
{"message":"00000 - https://mywebsite.com/","context":[],"level":400,"level_name":"ERROR","channel":"MCS","datetime":{"date":"2018-10-05 23:48:50.760948","timezone_type":3,"timezone":"America/New_York"},"extra":[]}
{"message":"http://mywebsite.com/wp-content/uploads/2015/03/image.jpg","context":[],"level":300,"level_name":"WARNING","channel":"MCS","datetime":{"date":"2018-10-05 23:48:50.761082","timezone_type":3,"timezone":"America/New_York"},"extra":[]}
I know I need to wrap my data around some {} or [] (tried both), but I think I'm missing something, for example, every JSON data validator websites are telling me that I have an error between 2 lines when I add a "," when I try to have multiple results into it.
How can I upgrade this raw data in order for a JSON validator to validate it?
Thanks!
How's this
[{
"message": "Scanning https://mywebsite.com/",
"context": [],
"level": 250,
"level_name": "NOTICE",
"channel": "MCS",
"datetime": {
"date": "2018-10-05 23:48:50.268196",
"timezone_type": 3,
"timezone": "America/New_York"
},
"extra": []
}, {
"message": "00000 - https://mywebsite.com/",
"context": [],
"level": 400,
"level_name": "ERROR",
"channel": "MCS",
"datetime": {
"date": "2018-10-05 23:48:50.760948",
"timezone_type": 3,
"timezone": "America/New_York"
},
"extra": []
}, {
"message": "http://mywebsite.com/wp-content/uploads/2015/03/image.jpg",
"context": [],
"level": 300,
"level_name": "WARNING",
"channel": "MCS",
"datetime": {
"date": "2018-10-05 23:48:50.761082",
"timezone_type": 3,
"timezone": "America/New_York"
},
"extra": []
}]
Entries in an array need to be separated by commas.

Python json error list indices must be integers

I have json data:
[{
"dataType": "detox",
"hLogging": 0.5,
"reading": 63.9,
"minValue": 25,
"dataValue": [{
"time": 143221019,
"value": 44
}, {
"time": 1433521119,
"value": 66
}, {
"time": 1433521319,
"value": 22
}]
}, {
"dataType": "epox",
"hLogging": 3,
"reading": 61.0,
"min"Value: 0,
"dataValue": [{
"time": 1433521019,
"value": 55
}, {
"time": 1433521119,
"value": 66
}, {
"time": 1433521219,
"value": 77
}, {
"time": 1433521319,
"value": 88
}]
}]
There are two data types which have their own data value. This data value contains time which is in unix epoch time. I need to convert it into normal date time value. For this I started by parsing the data using for:
for item in range(len(json_data['dataValue'])):
print(json_data['dataValue'][item]['time'])
But this throws error:
string indices must be integers
Probably because json data has string values but then how can I approach to get the values of time and convert it into normal date time format. Also the time values are not fix in datavalue, it can be 3,4,5...nth items, so need to include range. Please suggest a good way.
The json data which you have is not actually json data but the list having element as json data. So you have to call the index of the list before parsing the json data.
for item in range(len(json_data[0]['dataValue'])):
print(json_data[0]['dataValue'][item]['time'])
Hope this helps! Cheers!