Using Power Query to extract data from nested arrays in JSON - json

I'm relatively new to Power Query, but I'm pulling in this basic structure of JSON from a web api
{
"report": "Cost History",
"dimensions": [
{
"time": [
{
"name": "2019-11",
"label": "2019-11",
…
},
{
"name": "2019-12",
"label": "2019-12",
…
},
{
"name": "2020-01",
"label": "2020-01",
…
},
…
]
},
{
"Category": [
{
"name": "category1",
"label": "Category 1",
…
},
{
"name": "category2",
"label": "Category 2",
…
},
…
]
}
],
"data": [
[
[
40419.6393798211
],
[
191.44
],
…
],
[
[
2299.652439184997
],
[
0.0
],
…
]
]
}
I actually have 112 categories and 13 "times". I figured out how to do multiple queries to turn the times into column headers and the categories into row labels (I think). But the data section is alluding me. Because each item is a list within a list I'm not sure how to expand it all out. Each object in the date array will have 112 numbers and there will be 13 objects. If that all makes sense.
So ultimately I want to make it look like
2019-11 2019-20 2020-01 ...
Category 1 40419 2299
Category 2 191 0
...
First time asking a question on here, so hopefully this all makes sense and is clear. Thanks in advance for any help!

i am also researching this exact thing and looking for a solution. In PQ, it displays nested arrays as a list and there is a function to extract values choosing a separating characterenter image description here
So this becomes, this
enter image description here
= Table.TransformColumns(#"Filtered Rows", {"aligned_to_ids", each Text.Combine(List.Transform(_, Text.From), ","), type text})
However the problem i'm trying to solve is when the nested json has multiple values like this: enter image description here
And when these LIST are extracted then an error message is caused, = Table.TransformColumns(#"Extracted Values1", {"collaborators", each Text.Combine(List.Transform(_, Text.From), ","), type text})
Expression.Error: We cannot convert a value of type Record to type Text.
Details:
Value=
id=15890
goal_id=323
role_id=15
Type=[Type]
It seems the multiple values are not handled and PQ does not recognise the underlying structure to enable the columns to be expanded.

Related

tExtractJsonFields getting data from 3 levels

I am trying to extract 3 levels of data from JSON with tExtractFields.
I know tHMap can do this but I am having trouble with that approach so I am pursuing a simpler approach for now.
I am working with a Smartsheet JSON response describing a sheet within Smartsheet.
There are 3 levels
Lvl 1 - Sheet info[]
Lvl 2 - Column Info[]
Lvl 2 - Row info[]
Lvl 3 - cell info[]
Using tExtractJsonFields, I am able to retrieve information from Level 1 and Level 3.
I do not know the correct JsonQuery to correctly retrieve level 2.
My problem I would like to extract information from Level 2 Row.Id, Row.Value in the same tExtractJsonFields component. Any help would be appreciated.
tExtractJsonFields configuration
tLogRow Output
Fields 2 and 3 are null.
Clearly, I am doing something wrong.
Sample JSON
{ "id": 8566480355780484,
"columns": [
{ "id": 7605383392978820,
"title": "Item #"
},
{ "id": 1975883858765700,
"title": "Indicator"
}
],
"rows": [
{ "id": 4808422210070404,
"rowNumber": 1,
"cells": [
{
"columnId": 7605383392978820,
"value": "0002",
"displayValue": "0002"
},
{
"columnId": 1975883858765700,
"value": "Draft",
"displayValue": "Draft"
}
]
},
{ "id": 2556622396385156,
"rowNumber": 2,
"cells": [
{ "columnId": 7605383392978820,
"value": "0003",
"displayValue": "0003"
}
]
}
]
}
Not sure if there is another way, but I did find a way using an approach Talend outlines in their documentation here.
The trick is to parse the higher levels in prior tExtractJsonFields components and then let that information flow through by simply leaving those JSON queries blank in the subsequent components.
The tFilterRow component is simply to exclude items that have only null values.

Retrieve specific value from a JSON blob in MS SQL Server, using a property value?

In my DB I have a column storing JSON. The JSON looks like this:
{
"views": [
{
"id": "1",
"sections": [
{
"id": "1",
"isToggleActive": false,
"components": [
{
"id": "1",
"values": [
"02/24/2021"
]
},
{
"id": "2",
"values": []
},
{
"id": "3",
"values": [
"5393",
"02/26/2021 - Weekly"
]
},
{
"id": "5",
"values": [
""
]
}
]
}
]
}
]
}
I want to create a migration script that will extract a value from this JSON and store them in its own column.
In the JSON above, in that components array, I want to extract the second value from the component with an ID of "3" (among other things, but this is a good example). So, I want to extract the value "02/26/2021 - Weekly" to store in its own column.
I was looking at the JSON_VALUE docs, but I only see examples for specifing indexes for the json properties. I can't figure out what kind of json path I'd need. Is this even possible to do with JSON_VALUE?
EDIT: To clarify, the views and sections components can have static array indexes, so I can use views[0].sections[0] for them. Currently, this is all I have with my SQL query:
SELECT
*
FROM OPENJSON(#jsonInfo, '$.views[0].sections[0]')
You need to use OPENJSON to break out the inner array, then filter it with a WHERE and finally select the correct value with JSON_VALUE
SELECT
JSON_VALUE(components.value, '$.values[1]')
FROM OPENJSON (#jsonInfo, '$.views[0].sections[0].components') components
WHERE JSON_VALUE(components.value, '$.id') = '3'

How to get the length of a JSON array using JSONPath?

If I have JSON document like this
[
{
"number" : "650-462-9154",
"type" : "main"
},
{
"number" : "650-462-1252",
"type" : "fax"
}
]
What JSONPath can I use to get the array length (which is 2), without hardcoding any property values?
Using the tool I have, here is some examples they gave, which doesn't help me figure out what value I need.
[
{
"type": "add",
"id": "tt0484562",
"version": 1,
"lang": "en",
"fields": {
"title": "The Seeker: The Dark Is Rising",
"director": "Cunningham, David L.",
"genre": ["Adventure","Drama","Fantasy","Thriller"],
"actor": ["McShane, Ian","Eccleston, Christopher","Conroy, Frances",
"Crewson, Wendy","Ludwig, Alexander","Cosmo, James",
"Warner, Amelia","Hickey, John Benjamin","Piddock, Jim",
"Lockhart, Emma"]
}
},
{
"type": "delete",
"id": "tt0484575",
"link_ref": null,
"version": 2
}
]
$.[0].genre ---> 0
$.[0].fields.genre ---> 1
$.[0].fields.genre[*] ---> 4
$.[*].type ---> 2
$.[1].link_ref ---> 1
You can get the length of the array from the first example by using the .length() function like this :
$.length
That should return something like this:
[
2
]
As for the second example, you can access any array keys using regular dot notation ($.node1.node2.node3) and call the length() function on the node key you wish to get the length of. For example, if you wanted to get the number of values in the actor array you could do something like:
$..actor.length
Which will return something like this:
[
10
]
Tested on https://codebeautify.org/jsonpath-tester and http://jsonpath.com/ .
You can find more functions in the jsonpath github repo.

Use jq to collect recursive-descent results into a single array

Is it possible to collect recursive-descent results into a single array with jq?
Would flatten help? Looks so to me, but I just cannot get it working. Take a look how far I am now at https://jqplay.org/s/6bxD-Wq0QE, anyone can make it working?
BTW,
.data.search.edges[].node | {name, topics: ..|.topics?} works, but I want all topics from the same node to be in one array, instead of having same name in all different returned results.
flatten alone will give me Cannot iterate over null, and
that's why I'm trying to use map(select(.? != null)) to filter the nulls out. However, I'd get Cannot iterate over null as well for my map-select.
So now it all comes down to how to filter out those nulls?
UPDATE:, by "collect into a single array" I meant to get something like this:
[
{
"name": "leumi-leumicard-bank-data-scraper",
"topics": ["banking", "leumi", "api", "puppeteer", "scraper", "open-api"]
}
]
instead of having same name duplicated in all different returned results. Thus recursively descends seems to me to be the option, but I'm open to any solution as long as I can get result like above. Is that possible? Thx.
One way to collect the non-falsey values:
.data.search.edges[].node
| {name, topics: [.. | .topics? | select(.)]}
The result would be:
{
"name": "leumi-leumicard-bank-data-scraper",
"topics": [
"banking",
"leumi",
"api",
"puppeteer",
"scraper",
"open-api"
]
}
{
"name": "echarts-scrappeteer",
"topics": []
}
Not sure what you're expecting to get in your results... but it seems like you're trying to get all the repositories and their topics in a flat array. I don't see any reason why you should use recurse here, you're only selecting from one class of objects. Just reference them directly.
[.data.search.edges[].node | {name,topic:(.repositoryTopics.nodes[].topic.topics)}]
For your particular input produces:
[
{
"name": "leumi-leumicard-bank-data-scraper",
"topic": "banking"
},
{
"name": "leumi-leumicard-bank-data-scraper",
"topic": "leumi"
},
{
"name": "leumi-leumicard-bank-data-scraper",
"topic": "api"
},
{
"name": "leumi-leumicard-bank-data-scraper",
"topic": "puppeteer"
},
{
"name": "leumi-leumicard-bank-data-scraper",
"topic": "scraper"
},
{
"name": "leumi-leumicard-bank-data-scraper",
"topic": "open-api"
}
]
https://jqplay.org/s/G2inYAJNLS
If you wanted to have an array of topics within the nodes instead, just collect them in an array by putting the filter that selects the topics within [].
[.data.search.edges[].node | {name,topic:[.repositoryTopics.nodes[].topic.topics]}]
[
{
"name": "leumi-leumicard-bank-data-scraper",
"topic": [
"banking",
"leumi",
"api",
"puppeteer",
"scraper",
"open-api"
]
},
{
"name": "echarts-scrappeteer",
"topic": []
}
]
https://jqplay.org/s/0AFneNK89i

Access deeper elements of a JSON using postgresql 9.4

I want to be able to access deeper elements stored in a json in the field json, stored in a postgresql database. For example, I would like to be able to access the elements that traverse the path states->events->time from the json provided below. Here is the postgreSQL query I'm using:
SELECT
data#>> '{userId}' as user,
data#>> '{region}' as region,
data#>>'{priorTimeSpentInApp}' as priotTimeSpentInApp,
data#>>'{userAttributes, "Total Friends"}' as totalFriends
from game_json
WHERE game_name LIKE 'myNewGame'
LIMIT 1000
and here is an example record from the json field
{
"region": "oh",
"deviceModel": "inHouseDevice",
"states": [
{
"events": [
{
"time": 1430247045.176,
"name": "Session Start",
"value": 0,
"parameters": {
"Balance": "40"
},
"info": ""
},
{
"time": 1430247293.501,
"name": "Mission1",
"value": 1,
"parameters": {
"Result": "Win ",
"Replay": "no",
"Attempt Number": "1"
},
"info": ""
}
]
}
],
"priorTimeSpentInApp": 28989.41467999999,
"country": "CA",
"city": "vancouver",
"isDeveloper": true,
"time": 1430247044.414,
"duration": 411.53,
"timezone": "America/Cleveland",
"priorSessions": 47,
"experiments": [],
"systemVersion": "3.8.1",
"appVersion": "14312",
"userId": "ef617d7ad4c6982e2cb7f6902801eb8a",
"isSession": true,
"firstRun": 1429572011.15,
"priorEvents": 69,
"userAttributes": {
"Total Friends": "0",
"Device Type": "Tablet",
"Social Connection": "None",
"Item Slots Owned": "12",
"Total Levels Played": "0",
"Retention Cohort": "Day 0",
"Player Progression": "0",
"Characters Owned": "1"
},
"deviceId": "ef617d7ad4c6982e2cb7f6902801eb8a"
}
That SQL query works, except that it doesn't give me any return values for totalFriends (e.g. data#>>'{userAttributes, "Total Friends"}' as totalFriends). I assume that part of the problem is that events falls within a square bracket (I don't know what that indicates in the json format) as opposed to a curly brace, but I'm also unable to extract values from the userAttributes key.
I would appreciate it if anyone could help me.
I'm sorry if this question has been asked elsewhere. I'm so new to postgresql and even json that I'm having trouble coming up with the proper terminology to find the answers to this (and related) questions.
You should definitely familiarize yourself with the basics of json
and json functions and operators in Postgres.
In the second source pay attention to the operators -> and ->>.
General rule: use -> to get a json object, ->> to get a json value as text.
Using these operators you can rewrite your query in the way which returns correct value of 'Total Friends':
select
data->>'userId' as user,
data->>'region' as region,
data->>'priorTimeSpentInApp' as priotTimeSpentInApp,
data->'userAttributes'->>'Total Friends' as totalFriends
from game_json
where game_name like 'myNewGame';
Json objects in square brackets are elements of a json array.
Json arrays may have many elements.
The elements are accessed by an index.
Json arrays are indexed from 0 (the first element of an array has an index 0).
Example:
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
-- returns "Mission1"
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
This did help me