Generate csv files from a JSON - json

Unfortunately I have considerable difficulties to generate three csv files from one json format. Maybe someone has a good hint how I could do this. Thanks
Here is the output. Within dropped1 and dropped2 can be several different and multiple addresses.
{
"result": {
"found": 0,
"dropped1": {
"address10": 1140
},
"rates": {
"total": {
"1min": 3579,
"5min": 1593,
"15min": 5312,
"60min": 1328
},
"dropped2": {
"address20": {
"1min": 9139,
"5min": 8355,
"15min": 2785,
"60min": 8196
}
}
},
"connections": 1
},
"id": "whatever",
"jsonrpc": "2.0"
}
The 3 csv files should be displayed in this form.
address10,1140
total,3579,1593,5312,1328
address20,9139,8355,2785,8196

If you decide to use jq, then unless there is some specific reason not to, I'd suggest invoking jq once for each of the three output files. The three invocations would then look like these:
jq -r '.result.dropped1 | [to_entries[][]] | #csv' > 1.csv
jq -r '.result.rates.total | ["total", .["1min"], .["5min"], .["15min"], .["60min"]] | #csv' > 2.csv
jq -r '.result.rates.dropped2
| to_entries[]
| [.key] + ( .value | [ .["1min"], .["5min"], .["15min"], .["60min"]] )
| #csv
' > 3.csv
If you can be sure the ordering of keys within the total and address20 objects is fixed and in the correct order, then the last two invocations can be simplified.

Did you try using this library?
https://www.npmjs.com/package/json-to-csv-stream
npm i json-to-csv-stream

Related

Selecting 'name' with highest build number from json list

Here is my sample data, it's a list of objects in a storage bucket on Oracle cloud:
{
"objects": [
{
"name": "rhel/"
},
{
"name": "rhel/app-3.9.6.629089.txt"
},
{
"name": "rhel/app-3.11.4.629600.txt"
}
]
}
The part of the value before the '/' is a folder name, after is a filename. The last number in the filename is a build number. The desired output is the name of the object with the highest build number in the rhel folder:
$ jq -r 'some_program' file.json
rhel/app-3.11.4.629600.txt
I can somewhat process the data to exclude the bare "rhel/" folder as follows:
$ jq -r '.objects[] | select(.name|test("rhel/."))' file.json
{
"name": "rhel/app-3.9.6.629089.txt"
}
{
"name": "rhel/app-3.11.4.629600.txt"
}
When I try to split this on the period jq throws an error:
$ jq -r '.objects[] | select(.name|test("rhel/.")) | split(".")' file.json
jq: error (at file.json:1): split input and separator must be strings
I was expecting to use 'map(tonumber)[-2]' on the result of the split and wrap the entirety in 'max_by()'.
How can get closer to the desired output with jq?
[.objects[]
| select(.name|test("rhel/."))]
| max_by(.name|split(".")[-2]|tonumber)
produces:
{
"name": "rhel/app-3.11.4.629600.txt"
}
If you only want the names you could begin by extracting them:
[.objects[].name|select(test("rhel/."))]
| max_by(split(".")[-2]|tonumber)

jq: from one json input, construct multiple rows of tsv using an expression against the keys?

Using jq I can extract the data in this simple way as follows:
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_A_Foo.value, .data.Item_A_Bar.value] | #tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | #tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | #tsv' >> foobar.tsv
...
# and so on
But this seems pretty wasteful. Is there a more advanced way to use JQ, and perhaps:
Filter for .data.Item_*_Foo.value, .data.Item_*_Bar.value
OR chain these rows in a single jq expression (reasonably readable, compact)
# Here is a made up JSON file that can motivate this question.
# Imagine there are 100,000 of these and they are larger.
{
"data":
{
"Item_A_Foo": {
"adj": "wild",
"adv": "unruly",
"value": "unknown"
},
"Item_A_Bar": {
"adj": "rotund",
"quality": "mighty",
"value": "swing"
},
"Item_B_Foo": {
"adj": "nice",
"adv": "heroically",
"value": "medium"
},
... etc. for many Foo's and Bar's of A, B, C, ..., Z types
"Not_an_Item": {
"value": "doesn't matter"
}
}
And the goal is:
unknown, swing # data.Item_A_Foo.value, data.Item_A_Bar.value
medium, hit # data.Item_B_Foo.value, data.Item_B_Bar.value
whatever, etc. # data.Item_C_Foo.value, data.Item_C_Bar.value
The details of your requirements are unclear, but you could proceed along the lines suggested by this jq filter:
.data
| (keys_unsorted|map(select(test("^Item_[^_]*_Foo$")))) as $foos
| ($foos | map(sub("_Foo$"; "_Bar"))) as $bars
| [ .[$foos[]].value, .[$bars[]].value]
| #tsv
The idea is to determine dynamically which keys to select.

Output paths to all keys named "id" where the type of value is "string"

Given a huge (15GB) deeply nested (12+ object layers) JSON file how can I find the paths to all the keys named id whose values are type string?
A massively simplified example file:
{
"a": [
{
"id": 3,
"foo": "red"
}
],
"b": [
{
"id": "7",
"bar": "orange",
"baz": {
"id": 13
},
"bax": {
"id": "12"
}
}
]
}
Looking for a less ugly solution where I don't run out of RAM and have to punt to grep at the end (sigh). (I failed to figure out how to chain to_entries into this usefully. If that's even something I should be trying to do.)
Ugly solution 1:
$ cat huge.json | jq 'path(..|select(type=="string")) | join(".")' | grep -E '\.id"$'
"b.0.id"
"b.0.bax.id"
Ugly solution 2:
$ cat huge.json | jq --stream -c | grep -E '"id"],"'
[["b",0,"id"],"7"]
[["b",0,"bax","id"],"12"]
Something like this should do that.
jq --stream 'select(.[0][-1] == "id" and (.[1] | strings)) | .[0]' file
And by the way, your first ugly solution can be simplified to this:
jq 'path(.. .id? | strings)' file
Stream the input in as you started with your second solution, but add some filtering. You do not want want to read the entire contents into memory. And also... UUOC.
$ jq --stream '
select(.[0][-1] == "id" and (.[1]|type) == "string")[0]
| join(".")
' huge.json
Thank you both oguz and Jeff! Beautiful! This runs in 6.5 minutes (on my old laptop), never uses more than 21MB of RAM, and gives me exactly what I need. <3
$ jq --stream -c 'select(.[0][-1] == "id" and (.[1]|type) == "string")' huge.json

Fine tuning jq filters to reduce repetition in filter string

I have a complex JSON object produced from an API call (full JSON found in this gist). It's describing attributes of an entity (fields, parameters, child relationships, etc.). Using jq, I'm trying to extract just one child field array and convert it to CSV where field keys are a single header row and values of each array item form the subsequent rows. (NOTE: fields are uniform across all items in the array.)
So far I'm successful, but I feel as if my jq filter string could be better as there is a repetition of unpacking this array in two separate filters.
Here is a redacted version of the JSON for reference:
{
...
"result": {
...
"fields": [
{
"aggregatable": true,
"aiPredictionField": false,
"autoNumber": false,
"byteLength": 18,
"name": "Id",
...
},
{
"aggregatable": true,
"aiPredictionField": false,
"autoNumber": false,
"byteLength": 18,
"name": "OwnerId",
...
},
{
"aggregatable": false,
"aiPredictionField": false,
"autoNumber": false,
"byteLength": 0,
"name": "IsDeleted",
...
},
...
],
...
}
}
So far, here is the working command:
jq -r '.result.fields | (.[0] | keys) , .[] | [.[] | tostring] | #csv'
repeated array unpacking---^-------------^
I could be happy with this, but I would prefer to unpack the result.fields array in the first filter so that it starts out like this:
jq -r '.result.fields[] | ...
Only then there is no longer an array, just a set of objects. I tried several things but none of them gave me what I wanted. Here two things I tried before I realized that unpacking .result.fields[] destroyed anything array-like for me to work with (yep...slow learner here, and can be a bit thick):
jq -r '.result.fields[] | ( keys | .[0] ) , [.[] | tostring] | #csv'
jq -r '.result.fields[] | keys[0] , [.[] | tostring] | #csv'
So the real question is: can I unpack result.fields once and then work with what that gives me? And if not, is there a more efficient way to arrive at the CSV structure I'm looking for?
Your code is buggy, because keys sorts the keys. What's needed here is keys_unsorted.
If you want to accomplish everything in a single invocation of jq, you cannot start the pipeline with result.fields[].
The following does avoid one very small inefficiency of your approach:
.result.fields
| (.[0] | keys_unsorted),
(.[] | [.[] | tostring])
| #csv

Parsing nested json with jq

I am parsing a nested json to get specific values from the json response. The json response is as follows:
{
"custom_classes": 2,
"images":
[
{
"classifiers":
[
{
"classes":
[
{
"class": "football",
"score": 0.867376
}
],
"classifier_id": "players_367677167",
"name": "players"
}
],
"image": "1496A400EDC351FD.jpg"
}
],
"images_processed": 1
}
From the class images=>classifiers=>classes:"class" & "score" are the values that I want to save in a csv file. I have found how to save the result in a csv file. But I am unable to parse the images alone. I can get custom_classes and image_processed.
I am using jq-1.5.
The different commands I have tried :
curl "Some address"| jq '.["images"]'
curl "Some address"| jq '.[.images]'
curl "Some address"| jq '.[.images["image"]]'
Most of the times the error is about not being able to index the array images.
Any hints?
I must say, I'm not terribly good at jq, so probably all those array iterations could be shorthanded somehow, but this yields the values you mentioned:
cat foo.json | jq ".[] | .images | .[] | .classifiers | .[] | .classes | .[] | .[]"
If you want the keys, too, just omit that last .[].`
Edit
As #chepner pointed out in the comments, this can indeed be shortened to
cat foo.json | jq ".images[].classifiers[].classes[] | [.class, .score] | #csv "
Depending on the data this filter which uses Recursive Descent: .., objects and has may work:
.. | objects | select(has("class")) | [.class,.score] | #csv
Sample Run (assuming data in data.json)
$ jq -Mr '.. | objects | select(has("class")) | [.class,.score] | #csv' data.json
"football",0.867376
Try it online at jqplay.org
Here is another variation which uses paths and getpath
getpath( paths(has("class")?) ) | [.class,.score] | #csv
Try it online at jqplay.org
jq solution to obtain a prepared csv record:
jq -r '.images[0].classifiers[0].classes[0] | [.class, .score] | #csv' input.json
The output:
"football",0.867376