Convert JSON categories tree to database table - json

Imagine I have a categories tree like this JSON file:
[
{
"id": "1",
"text": "engine",
"children": [
{
"id": "2",
"text": "exhaust",
"children": []
},
{
"id": "3",
"text": "cooling",
"children": [
{
"id": "4",
"text": "cooling fan",
"children": []
},
{
"id": "5",
"text": "water pump",
"children": []
}
]
}
]
},
{
"id": "6",
"text": "frame",
"children": [
{
"id": "7",
"text": "wheels",
"children": []
},
{
"id": "8",
"text": "brakes",
"children": [
{
"id": "9",
"text": "brake calipers",
"children": []
}
]
},
{
"id": "10",
"text": "cables",
"children": []
}
]
}
]
How can I convert it to this flat table?
id parent_id text
1 NULL engine
2 1 exhaust
3 1 cooling
4 3 cooling fan
5 3 water pump
6 NULL frame
7 6 wheels
8 6 brakes
9 8 brake calipers
10 6 cables
I found similar questions and inverted questions (from table to JSON) but I can't figure it out with jq and its #tsv filter. Also I noticed the "flatten" filter is not often referenced in the answers (while it looks to be the exact tool I need) but it might be because it was introduced recently in the latests versions of jq.

Here is another solution which uses jq's recurse builtin:
["id","parent_id","text"]
, (
.[]
| recurse(.id as $p| .children[] | .parent=$p )
| [.id, .parent, .text]
)
| #tsv
Sample Run (assumes filter in filter.jq and sample data in data.json)
$ jq -Mr -f filter.jq data.json
id parent_id text
1 engine
2 1 exhaust
3 1 cooling
4 3 cooling fan
5 3 water pump
6 frame
7 6 wheels
8 6 brakes
9 8 brake calipers
10 6 cables
Try it online!

The key here is to define a recursive function, like so:
def children($parent_id):
.id as $id
| [$id, $parent_id, .text],
(.children[] | children($id)) ;
With your data, the filter:
.[]
| children("NULL")
| #tsv
produces the tab-separated values shown below. It is now easy to add headers, convert to fixed-width format if desired, etc.
1 NULL engine
2 1 exhaust
3 1 cooling
4 3 cooling fan
5 3 water pump
6 NULL frame
7 6 wheels
8 6 brakes
9 8 brake calipers
10 6 cables

Here is a solution which uses a recursive function:
def details($parent):
[.id, $parent, .text], # details for current node
(.id as $p | .children[] | details($p)) # details for children
;
["id","parent_id","text"] # header
, (.[] | details(null)) # details
| #tsv # convert to tsv
Sample Run (assumes filter in filter.jq and sample data in data.json)
$ jq -Mr -f filter.jq data.json
id parent_id text
1 engine
2 1 exhaust
3 1 cooling
4 3 cooling fan
5 3 water pump
6 frame
7 6 wheels
8 6 brakes
9 8 brake calipers
10 6 cables
Try it online!

Related

how to format the result with jq

I have the following printout,
{
"metric": {
"container": "container1",
"namespace": "namespace1",
"pod": "pod1"
},
"values": [
[
1664418600,
"1"
],
[
1664418900,
"2"
],
[
1664419200,
"6"
],
[
1664419500,
"8"
],
[
1664419800,
"7"
],
[
1664420100,
"9"
]
]
}
{
"metric": {
"container": "container2",
"namespace": "namespace2",
"pod": "pod2"
},
"values": [
[
1664420100,
"1"
]
]
}
What I want:
container=container1,namespace=namespace1,pod=pod1
1 1664418600
2 1664418900
6 1664419200
8 1664419500
7 1664419800
9 1664420100
container=container2,namespace=namespace2,pod=pod2
1 1664420100
Build it from two JSON programs:
Header lines: .metric | to_entries | map(join("=")) | join(",")
Get metric object: .metric
Convert to an array of key-value pairs: to_entries
Map each key-value pair object to a string "key=value": map(join("="))
Join all pairs by comma: join(",")
Value lists: .values[] | [last,first] | join(" ")
Stream values: .values[]
Reverse each two-valued array: [last,first]
Join items by blank: join(" ")
An alternative for 2.2. and 2.3. could be "\(last) \(first)", i.e. values[] | "\(last) \(first)". Or [last,first] could be replaced with reverse: .values[] | reverse | join(" ").
Putting the two programs together:
(.metric | to_entries | map(join("=")) | join(",")),
(.values[] | [last,first] | join(" "))
And then execute with raw output enabled: jq -r (.metrics|to_entries…
Output:
container=container1,namespace=namespace1,pod=pod1
1 1664418600
2 1664418900
6 1664419200
8 1664419500
7 1664419800
9 1664420100
container=container2,namespace=namespace2,pod=pod2
1 1664420100

How to create a CSV File that will look Like This JSON File

I am basically wanting to update multiple scholars for an NFT game (axie infinity). It requires a JSON file that looks like this:
{
"name": "Scholar 1",
"ronin": "ronin:<account_s1_address>",
"splits": [
{
"persona": "Manager",
"percentage": 44,
"ronin": "ronin:<manager_address>"
},
{
"persona": "Scholar",
"percentage": 40,
"ronin": "ronin:<scholar_1_address>"
},
{
"persona": "Other Person",
"percentage": 6,
"ronin": "ronin:<other_person_address>"
},
{
"persona": "Trainer",
"percentage": 10,
"ronin": "ronin:<trainer_address>"
}
]
}
But since there are multiple scholars/players, I wanted to know if there was anyway to format something on a CSV file that if I convert or import it using a JSON tool it will look like like the JSON above?
Your help is much appreciated.. Thank you!
PS:
The first lines:
"name": "Scholar 1",
"ronin": "ronin:<account_s1_address>",
"splits":
Would need to be repeated since again there are multiple scholars, i.e. Scholar 1, Scholar 2, Scholar 3...
CSV file structure is column-base, if Axie infinity require JSON file, you can create a CSV file by Excel or Google sheet and convert to JSON.
there is a similar answer to convert CSV to JSON
starting from this CSV that has this structure
name
ronin
id_persona
persona
percentage
split_ronin
Scholar 1
ronin:<account_s1_address>
1
Manager
44
ronin:<manager_address>
Scholar 1
ronin:<account_s1_address>
2
Scholar
40
ronin:<scholar_1_address>
Scholar 1
ronin:<account_s1_address>
3
Other Person
6
ronin:<other_person_address>
Scholar 1
ronin:<account_s1_address>
4
Trainer
10
ronin:<trainer_address>
you can run this Miller command
mlr --c2j reshape -r "^(p|s)" -o k,v then \
put '$k="splits".".".${id_persona}.".".$k' then \
cut -x -f id_persona then \
reshape -s k,v out.csv
to have
[
{
"name": "Scholar 1",
"ronin": "ronin:<account_s1_address>",
"splits": [
{
"persona": "Manager",
"percentage": 44,
"split_ronin": "ronin:<manager_address>"
},
{
"persona": "Scholar",
"percentage": 40,
"split_ronin": "ronin:<scholar_1_address>"
},
{
"persona": "Other Person",
"percentage": 6,
"split_ronin": "ronin:<other_person_address>"
},
{
"persona": "Trainer",
"percentage": 10,
"split_ronin": "ronin:<trainer_address>"
}
]
}
]
Some notes:
reshape -r "^(p|s)" -o k,v, to transform the input from wide to long;
put '$k="splits".".".${id_persona}.".".$k', to create values that I will use as field names (splits.1.persona,splits.1.percentage,splits.1.split_ronin,splits.2.persona,splits.2.percentage, ....
cut -x -f id_persona, to remove the field id_persona;
reshape -s k,v, to transform all from long to wide.
The real goal is to build, starting from that input, this kind of CSV
+-----------+----------------------------+------------------+---------------------+-------------------------+------------------+---------------------+---------------------------+------------------+---------------------+------------------------------+------------------+---------------------+-------------------------+
| name | ronin | splits.1.persona | splits.1.percentage | splits.1.split_ronin | splits.2.persona | splits.2.percentage | splits.2.split_ronin | splits.3.persona | splits.3.percentage | splits.3.split_ronin | splits.4.persona | splits.4.percentage | splits.4.split_ronin |
+-----------+----------------------------+------------------+---------------------+-------------------------+------------------+---------------------+---------------------------+------------------+---------------------+------------------------------+------------------+---------------------+-------------------------+
| Scholar 1 | ronin:<account_s1_address> | Manager | 44 | ronin:<manager_address> | Scholar | 40 | ronin:<scholar_1_address> | Other Person | 6 | ronin:<other_person_address> | Trainer | 10 | ronin:<trainer_address> |
+-----------+----------------------------+------------------+---------------------+-------------------------+------------------+---------------------+---------------------------+------------------+---------------------+------------------------------+------------------+---------------------+-------------------------+
and than use it to create the final JSON output

jq cannot iterate over number with join

I would like to add some values from json file separated by pipe. It's working well so far until a value is a number and not a string.
Here what I've done so far: jq -r '.content[] | {seasonTitle, number, name} | join("|")' file.json
I've tried to convert number to string without any success jq -r '.content[] | {seasonTitle, "episodeNumber|tostring", name} | join("|")' file.json
Actual Result:
Top Master||Last Chance / Season 12
Top Master||Épisode 8 / Season 12
Top Master||Épisode 7 / Season 12
Expected Result:
Top Master|236|Last Chance / Season 12
Top Master|235|Épisode 8 / Season 12
Top Master|234|Épisode 7 / Season 12
Here the file.json
{
"page": 0,
"size": 3,
"count": 3,
"content": [
{
"name": "Last Chance / Season 12",
"releaseDate": "2008",
"duration": 2100,
"episodeNumber": 236,
"title": "Last Chance / Season 12",
"seasonTitle": "Top Master"
},
{
"name": "Épisode 8 / Season 12",
"releaseDate": "2008",
"duration": 7320,
"episodeNumber": 235,
"title": "Épisode 8 / Season 12",
"seasonTitle": "Top Master"
},
{
"name": "Épisode 7 / Season 12",
"releaseDate": "2008",
"duration": 7200,
"episodeNumber": 234,
"title": "Épisode 7 / Season 12",
"seasonTitle": "Top Master"
}
]
}
You are using join to concatenate values of different types, which works fine under jq v1.6:
.content[] | {seasonTitle, episodeNumber, name} | join("|")
Top Master|236|Last Chance / Season 12
Top Master|235|Épisode 8 / Season 12
Top Master|234|Épisode 7 / Season 12
Demo
However, with jq v1.5 it doesn't, and you need to convert non-strings to strings using tostring. As you are using a shortcut to create an object for join, introducing this conversion sacrifices the conciseness of your solution. So either stick with it:
.content[] | {seasonTitle, episodeNumber: (.episodeNumber | tostring), name} | join("|")
Or use an array instead, as you are going for the values only anyway:
.content[] | [.seasonTitle, (.episodeNumber | tostring), .name] | join("|")

How to convert object keys to arrays with jq

I am trying to convert a csv where the headers are keys and the values in the column are a list.
For example I have the following csv
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
I would like the following format.
{
"field1" : [Mazda RX4 ,Mazda RX4 Wag,Datsun 710],
"mpg" : [21,21,22.8],
"cyl" : [6,6,6],
"disp" : [160,160,108],
...
}
Note that the numerical values are not quoted. I am assuming that the columns all have the same type.
I am using the following jq command.
curl https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv cars.csv | head -n4 | csvtojson | jq '.'
[
{
"field1": "Mazda RX4",
"mpg": "21",
"cyl": "6",
"disp": "160",
"hp": "110",
"drat": "3.9",
"wt": "2.62",
"qsec": "16.46",
"vs": "0",
"am": "1",
"gear": "4",
"carb": "4"
},
{
"field1": "Mazda RX4 Wag",
"mpg": "21",
"cyl": "6",
"disp": "160",
"hp": "110",
"drat": "3.9",
"wt": "2.875",
"qsec": "17.02",
"vs": "0",
"am": "1",
"gear": "4",
"carb": "4"
},
{
"field1": "Datsun 710",
"mpg": "22.8",
"cyl": "4",
"disp": "108",
"hp": "93",
"drat": "3.85",
"wt": "2.32",
"qsec": "18.61",
"vs": "1",
"am": "1",
"gear": "4",
"carb": "1"
}
]
Complete working solution
cat <csv_data> | csvtojson | jq '. as $in | reduce (.[0] | keys_unsorted[]) as $k ( {}; .[$k] = ($in|map(.[$k])))'
jq play - Converting all numbers to strings
https://jqplay.org/s/HKjHLVp9KZ
Here's a concise, efficient, and conceptually simple solution based on just map and reduce:
. as $in
| reduce (.[0] | keys_unsorted[]) as $k ( {}; .[$k] = ($in|map(.[$k])))
Converting all number-valued strings to numbers
. as $in
| reduce (.[0] | keys_unsorted[]) as $k ( {};
.[$k] = ($in|map(.[$k] | (tonumber? // .))))

Unnesting nested JSON structures in Apache Drill

I have the following JSON (roughly) and I'd like to extract the information from the header and defects fields separately:
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890",
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
I have tried to access the individual elements with file.header.timeStamp etc but that returns null. I have tried using flatten(file) but that gives me
Cannot cast org.apache.drill.exec.vector.complex.MapVector to org.apache.drill.exec.vector.complex.RepeatedValueVector
I've looked into kvgen() but don't see how that fits in my case. I tried kvgen(file.header) but that gets me
kvgen function only supports Simple maps as input
which is what I had expected anyway.
Does anyone know how I can get header and defects, so I can process the information contained in them. Ideally, I'd just select the information from header because it contains no arrays or maps, so I can take individual records as they are. For defects I'd simply use FLATTEN(defectParts) to obtain a table of the defective parts.
Any help would be appreciated.
What version of Drill are you using ? I tried querying the following file on latest master (1.7.0-SNAPHOT):
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
And the following queries are working fine:
1.
select t.file.header.serialno as serialno from `parts.json` t;
+-----------+
| serialno |
+-----------+
| 3456 |
| 3456 |
+-----------+
2 rows selected (0.098 seconds)
2.
select flatten(t.file.defects) defects from `parts.json` t;
+---------------------------------------------------------------------------------------+
| defects |
+---------------------------------------------------------------------------------------+
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
+---------------------------------------------------------------------------------------+
3.
select q.h.serialno as serialno, q.d.info.defectParts as defectParts from (select t.file.header h, flatten(t.file.defects) d from `parts.json` t) q;
+-----------+----------------------+
| serialno | defectParts |
+-----------+----------------------+
| 3456 | ["003","006","008"] |
| 3456 | ["003","006","008"] |
+-----------+----------------------+
2 rows selected (0.126 seconds)
PS: This should've been a comment but I don't have enough rep yet!
I don't have experience with Apache Drill, but checked the manual. Isn't this what you're looking for?
https://drill.apache.org/docs/selecting-multiple-columns-within-nested-data/
https://drill.apache.org/docs/selecting-nested-data-for-a-column/