Export json via jq to csv - json

I have this output as test.json ( Its an AWS extract, but I have changed the names )
[
{
"InstanceId": "I-1234",
"Vol": "vol-5678",
"Delete": false,
"State": "in-use",
"Tags": [
{
"Key": "Size",
"Value": "large"
},
{
"Key": "Colour",
"Value": "red"
},
{
"Key": "Shape",
"Value": "square"
},
{
"Key": "Weight",
"Value": "light"
}
]
}
]
I want to export specific fields, including all tags to a csv, so it looks like this:
id,vol,state,size,colour,shape,weight
value,value,value,value,value,value,value
I have run this:
cat test.json | jq -c ' { id: .[].InstanceId, vol: .[].Vol, tags: .[].Tags | map ( [ .Key, .Value] | join (":")) | #csv } ' >> test.csv
And it looks like this:
cat test.csv
{"id":"I-1234","vol":"vol-5678","tags":"\"Size:large\",\"Colour:red\",\"Shape:square\",\"Weight:25kg\""}
if I open in Excel, looks like:
{"id":"I-1234" vol:"vol-5678" tags:"\"Size:large\" \"Colour:red\" \"Shape:square\" \"Weight:25kg\""}
I will be looping this over many aws resources, and would like to keep appending to csv.
I want to remove
{ } at beginning and end.
the key description I would like at top as a header, rather than to the left of the value..
so for: "id":"I-1234" vol:"vol-5678"
I would like
id, vol
I-1234, vol-5678
and the same with the Tags
remove the Array Name: "tags:" ( think its the array name, I'm not a developer, infrastructure dude! ) and just leave
Size,Colour,Shape,Weight, ...
large,red,square,25kg, ...
Can anyone help, point me in the right direction ..
thanks .. :)

jq -r '
["Size","Colour","Shape","Weight"] as $Keys
| (["id", "vol"] + ($Keys|map(ascii_downcase))),
( .[]
| (.Tags|from_entries) as $dict
| [.InstanceId, .Vol, $dict[$Keys[]]] )
| #csv
'
This will produce valid CSV, with the columns in the desired order, irrespective of the ordering of the items in the .Tags array.
If you don't want the strings in the rows to be quoted, then (at the risk of not having valid CSV) one option to consider would be replacing #csv above by join(","). Alternatively, you might wish to consider using #tsv and then replacing the tabs by commas (e.g. using sed or tror even jq :-).

Related

jq with multiple select statements and an array

I've got some JSON like the following (I've filtered the output here):
[
{
"Tags": [
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
}
],
"c7n:MatchedFilters": [
"tag: example_tag_rule"
],
"another_key": "another_value_I_dont_want"
},
{
"Tags": [
{
"Key": "Name",
"Value": "example2"
}
],
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I'd like to create a csv file with the value within the Name key and all of the "c7n:MatchedFilters" in the array. I've made a few attempts but still can't get quite the output I expect. There's some example code and the output below:
#Prints the key that I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(.Key=="Name")|.Value'
"example1"
"example2"
#Prints all the filters in an array I'm after.
cat new.jq | jq -r '.[] | [.Tags[], {"c7n:MatchedFilters"}] | .[] | select(."c7n:MatchedFilters") | .[]'
[
"tag: example_tag_rule"
]
[
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
#Prints *all* the tags (including ones I don't want) and all the filters in the array I'm after.
cat new.jq | jq '.[] | [.Tags[], {"c7n:MatchedFilters"}] | select((.[].Key=="Name") and (.[]."c7n:MatchedFilters"))'
[
{
"Key": "Name",
"Value": "example1"
},
{
"Key": "Irrelevant",
"Value": "irrelevant"
},
{
"c7n:MatchedFilters": [
"tag: example_tag_rule"
]
}
]
[
{
"Key": "Name",
"Value": "example2"
},
{
"c7n:MatchedFilters": [
"tag:example_tag_rule",
"tag: example_tag_rule2"
]
}
]
I hope this makes sense, let me know if I've missed anything.
Your attempts are not working because you start out with [.Tags[], {"c7n:MatchedFilters"}] to construct one array containing all the tags and an object containing the filters. You are then struggling to find a way to process this entire array at once because it jumbles together these unrelated things without any distinction. You will find it much easier if you don't combine them in the first place!
You want to find the single tag with a Key of "Name". Here's one way to find that:
first(
.Tags[]|
select(.Key=="Name")
).Value as $name
By using a variable binding we can save it for later and worry about constructing the array separately.
You say (in the comments) that you just want to concatenate the filters with spaces. You can do that easily enough:
(
."c7n:MatchedFilters"|
join(" ")
) as $filters
You can combine all this together like follows. Note that each variable binding leaves the input stream unchanged, so it's easy to compose everything.
jq --raw-output '
.[]|
first(
.Tags[]|
select(.Key=="Name")
).Value as $name|
(
."c7n:MatchedFilters"|
join(" ")
) as $filters|
[$name, $filters]|
#csv
Hopefully that's easy enough to read and separates out each concept. We break up the array into a stream of objects. For each object, we find the name and bind it to $name, we concatenate the filters and bind them to $filters, then we construct an array containing both, then we convert the array to a CSV string.
We don't need to use variables. We could just have a big array constructor wrapped around the expression to find the name and the expression to find the filters. But I hope you can see the variables make things a bit flatter and easier to understand.

JQ: key selection from numeric objects

I use jq 1.6 in a Windows 10 PowerShell enviroment and trying to select keys from coincidentally numeric json objects.
Json exampel:
{
"alliances_info":{
"744085325458334213":{
"emblem":3,
"name":"wellwell",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"MELL",
"slogan":"",
"id":744085325458334213
},
"744128593839677958":{
"emblem":0,
"name":"Brave",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"GABA",
"slogan":"",
"id":744128593839677958
},
"746034084459209223":{
"emblem":0,
"name":"Queen",
"member_count":1,
"level":1,
"military_might":1035,
"public":false,
"tag":"QUE",
"slogan":"",
"id":746034084459209223
},
"750446471312466445":{
"emblem":0,
"name":"Phoenix Inc",
"member_count":35,
"level":6,
"military_might":453369,
"public":true,
"tag":"PHOI",
"slogan":"",
"id":750446471312466445
},
"750446518934594062":{
"emblem":11,
"name":"Australia",
"member_count":44,
"level":8,
"military_might":957211,
"public":true,
"tag":"AUST",
"slogan":"Go Australia",
"id":750446518934594062
}
},
"server_version":"v7.190.4-master.000000006"
}
I tried several jq commands:
.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]
or
.alliances_info | .. | objects | [{alliance_name: .name, alliance_c
ount: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slog
an, alliance_id: .id}]
But Always get a jq error: parse error: Invalid numeric literal at line 1, column 3
I renounce on the object Building in the first command (and built only a Array) it works. But i need that objects. Any tips?
BR
Timo
Your first query works perfectly well with the given JSON sample. Perhaps you're invoking jq incorrectly. If you have the jq program in a file, say select.jq, you'd invoke jq like so:
jq -f select.jq sample.json
If that doesn't help, then try:
jq empty sample.json
If that fails, there might be something wrong with the encoding of the JSON.
I'm not sure I understand what you want.
Your first attempt works for me, but generates one output for JSON value in the input. That is, I created a file named so.json and put in it your JSON from above:
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
⋮
}
When I run your program , I get:
$ jq '.alliances_info | .[] | [{alliance_name: .name, alliance_count: .member_count, alliance_level: .level, alliance_power: .military_might, alliance_tag: .tag, alliance_slogan: .slogan, alliance_id: .id}]' so.json
[
{
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"alliance_tag": "MELL",
"alliance_slogan": "",
"alliance_id": 744085325458334200
}
]
[
{
"alliance_name": "Brave",
⋮
]
If you want an array at all, you probably want one array containing all the alliances like this:
$ jq '.alliances_info | [ .[] | { alliance_name: .name, alliance_id: .id } ]' so.json
[
{
"alliance_name": "wellwell",
"alliance_id": 744085325458334200
},
{
"alliance_name": "Brave",
"alliance_id": 744128593839678000
},
{
"alliance_name": "Queen",
"alliance_id": 746034084459209200
},
{
"alliance_name": "Phoenix Inc",
"alliance_id": 750446471312466400
},
{
"alliance_name": "Australia",
"alliance_id": 750446518934594000
}
]
Starting from the left,
- .alliances_info looks in its input object for the field named "alliances_info" and outputs its value
- the | next says take the output from the left-hand side and pass those as inputs to the right-hand side.
- right after that first |, I have a [ «jq expressions» ] which tells jq to create one JSON array output for each input; the elements of that array are the outputs of that inner «jq expressions»
- that inner expression starts with .[] which means to produce one output for each JSON value (ignoring the keys) in the input object. For us, that will be the objects named "744085325458334213", "744128593839677958", …
- The next | uses those objects as input and for each, generates a JSON object { alliance_name: .name, alliance_id: .id }
That's why I end up with one JSON array containing 5 JSON objects.
As far as I can tell, you are mostly just renaming a bunch of the fields. For that, you could just do something like this:
$ jq --argjson renameMap '{ "name": "alliance_name", "member_count": "alliance_count", "level": "alliance_level", "military_might": "alliance_power", "tag": "alliance_tag", "slog": "alliance_slogan"}' '.alliances_info |= ( . | [ to_entries[] | ( .value |= ( . | [ to_entries[] | ( .key |= ( if $renameMap[.] then $renameMap[.] else . end ) ) ] | from_entries ) ) ] | from_entries )' so.json
{
"alliances_info": {
"744085325458334213": {
"emblem": 3,
"alliance_name": "wellwell",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "MELL",
"slogan": "",
"id": 744085325458334200
},
"744128593839677958": {
"emblem": 0,
"alliance_name": "Brave",
"alliance_count": 1,
"alliance_level": 1,
"alliance_power": 1035,
"public": false,
"alliance_tag": "GABA",
"slogan": "",
"id": 744128593839678000
},
⋮
},
"server_version": "v7.190.4-master.000000006"
}
well i am a idiot (to be here totally clear). I found the reason (and this is normally a nobrainer...). I read the input from a file and the funny thing is that the file is Unicode but no UTF8. after recoding the command is working fine. Thanks for the help.
BR
Timo

How to use jq #CSV filter to generate a csv file with values from different json levels

I have a JSON file from the Spotify API that lists all the songs on a specific album. The file is organized as follows:
.
.name
.tracks.items
.tracks.items[]
.tracks.items[].artists
.tracks.items[].artists[].name
.tracks.items[].duration_ms
.tracks.items[].name
I'm using jq to create a csv with the following information: song's artist, song's title, and song's duration. I can do this using the following syntax:
jq -r '.tracks.items[] | [.artists[].name, .name, .duration_ms] | #csv' myfile.json
Output:
"Michael Jackson","Wanna Be Startin' Somethin'",363400
"Michael Jackson","Baby Be Mine",260666
...
However, I would like to also add the value under .name (which represents the name of the album the songs are from) to every row of my csv file. Something that would look like this:
"Thriller","Michael Jackson","Wanna Be Startin' Somethin'",363400
"Thriller","Michael Jackson","Baby Be Mine",260666
...
Is it possible to do this using the #csv filter? I can do it by hand by hardcoding the name of the album like this
jq -r '.tracks.items[] | ["Thriller", .artists[].name, .name, .duration_ms] | #csv' myfile.json
But I was hoping there might be a nicer way to do it.
EDIT:
Here's what the file looks like:
{
"name": "Thriller",
"tracks": {
"items": [
{
"artists": [
{
"name": "Michael Jackson"
}
],
"duration_ms": 363400,
"name": "Wanna Be Startin' Somethin'"
},
{
"artists": [
{
"name": "Michael Jackson"
}
],
"duration_ms": 260666,
"name": "Baby Be Mine"
}
]
}
}
See the "Variable / Symbolic Binding Operator" section in jq's documentation
jq -r '
.name as $album_name ### <- THIS RIGHT HERE
| .tracks.items[]
| [$album_name, .artists[].name, .name, .duration_ms]
| #csv
' myfile.json

Use JQ to select specific, arbitrarily nested objects from JSON

I'm looking for efficient means to search through an large JSON object for "sub-objects" that match a filter (via select(), I imagine). However, the top-level JSON is an object with arbitrary nesting contained within, including more simple values, objects and arrays of objects. For example:
{
"name": "foo",
"class": "system",
"description": "top-level-thing",
"configuration": {
"status": "normal",
"uuid": "id"
},
"children": [
{
"id": "c1",
"class": "c1",
"children": [
{
"id": "c1.1",
"class": "c1.1"
},
{
"id": "c1.1",
"class": "FINDME"
}
]
},
{
"id": "c2",
"class": "FINDME"
}
],
"thing": {
"id": "c3",
"class": "FINDME"
}
}
I have a solution which does part of what I want (and is understandable):
jq -r '.. | arrays | .[] | select(.class=="FINDME"?) | .id'
which returns:
c2
c1.1
... however, it misses c3, plus it changes the order of items output. Additionally I'm expecting this to operate on potentially very large JSON structures, I would like to make sure I find an efficient solution. Bonus points for something that remains readable by jq neophytes (myself included).
FWIW, references I was using to help me on the way, in case they help others:
Select objects based on value of variable in object using jq
How to use jq to find all paths to a certain key
Recursive search values by key
For small to modest-sized JSON input, you're on the right track with ..
but it seems you want to select objects, like so:
.. | objects | select(.class=="FINDME"?) | .id
For JSON documents that are very large, this might require too much memory, so it may be worth knowing about jq's streaming parser. Unfortunately it's much more difficult to use, so I'd suggest trying the above, and if you're interested, look in the usual places for documentation about the --stream option.
Here's a streaming-parser solution. To make sense of it, you'll need to read up on the --stream option, but the key is that the output includes lines of the form: [PATH, VALUE]
program.jq
foreach inputs as $in (null;
if has("id") and has("class") then null
else . as $x
| $in
| if length != 2 then null
elif .[0][-1] == "id" then ($x + {id: .[-1]})
elif .[0][-1] == "class"
and .[-1] == "FINDME" then ($x + {class: .[-1]})
else $x
end
end;
select(has("id") and has("class")) | .id )
Invocation
jq -n --stream -f program.jq input.json
Output with sample input
"c1.1"
"c2"
"c3"

Deep JSON merge

I have multiple JSON files that I'd like to merge into one.
Some have the same root element but different children. I don't want to overwrite the children but too extend them if they have the same parent element.
I've tried this answer, but it doesn't work:
jq: error (at file2.json:0): array ([{"title":"...) and array ([{"title":"...) cannot be multiplied
Sample files and wanted result (Gist)
Thank you in advance.
Here is a recursive solution which uses group_by(.key) to decide
which objects to combine. This could be a little simpler if .children
were more uniform. Sometimes it's absent in the sample data and sometimes it's the unusual value [{}].
def merge:
def kids:
map(
.children
| if length<1 then empty else .[] end
)
| if length<1 then {} else {children:merge} end
;
def mergegroup:
{
title: .[0].title
, key: .[0].key
} + kids
;
if .==[{}] then .
else group_by(.key) | map(mergegroup)
end
;
[ .[] | .[] ] | merge
When run with the -s option as follows
jq -M -s -f filter.jq file1.json file2.json
It produces the following output.
[
{
"title": "Title1",
"key": "12345678",
"children": [
{
"title": "SubTitle2",
"key": "123456713",
"children": [
{}
]
},
{
"title": "SubTitle1",
"key": "12345679",
"children": [
{
"title": "SubSubTitle1",
"key": "12345610"
},
{
"title": "SubSubTitle2",
"key": "12345611"
},
{
"title": "DifferentSubSubTitle1",
"key": "12345612"
}
]
}
]
}
]
If the ordering of the objects within the .children matters
then an a sort_by can be added to the {children:merge} expression,
e.g. {children:merge|sort_by(.key)}
Here is something that will reproduce your desired result. It's by no means automatic, It's really a proof of concept at this stage.
One liner:
jq -s '. as $in | ($in[0][].children[].children + $in[1][].children[0].children | unique) as $a1 | $in[1][].children[1] as $s1 | $in[0] | .[0].children[0].children = ($a1) | .[0].children += [$s1]' file1.json file2.json
Multi line breakdown (Copy/Paste):
jq -s '. as $in
| ($in[0][].children[].children + $in[1][].children[0].children
| unique) as $a1
| $in[1][].children[1] as $s1
| $in[0]
| .[0].children[0].children = ($a1)
| .[0].children += [$s1]' file1.json file2.json
Where:
$in : file1.json and file2.json combined input
$a1: merged "SubSubTitle" array
$s1: second subtitle object
I suspect the reason this didn't work was because your schema is different and has nested arrays.
I find it quite hypnotic looking at this, it would be good if you could elaborate a bit on how fixed the structure is and what the requirements are.