So I have three files:
cats.json
{
"cats": [
{
"name": "fluffles",
"age": 10,
"color": "white"
}
]
}
dogs.json
{
"dogs": [
{
"name": "sam",
"age": 5,
"color": "black and white"
},
{
"name": "rover",
"age": 2,
"color": "brown and white"
}
]
}
snakes.json
{
"snakes": [
{
"name": "noodles",
"age": 10,
"color": "green"
}
]
}
I wanted to merge these together, under an "animals" object. I've found that will merge the files:
jq -s '{"animals": .} ' cats.json dogs.json snakes.json > animals.json
{
"animals": [
{
"cats": [
{
"name": "fluffles",
"age": 10,
"color": "white"
}
]
},
{
"dogs": [
{
"name": "sam",
"age": 5,
"color": "black and white"
},
{
"name": "rover",
"age": 2,
"color": "brown and white"
}
]
},
{
"snakes": [
{
"name": "noodles",
"age": 10,
"color": "green"
}
]
}
]
}
Now I have an additional object:
owners.json
{
"owners": [
"peter",
"william",
"sally"
]
}
which I want to merge into the same file using
jq -s '.[0] + .[1]' animals.json owners.json
Can I do both of these operations with just one jq command?
jq -s '{"animals": .} ' cats.json dogs.json snakes.json > animals.json
jq -s '.[0] + .[1]' animals.json owners.json
The result would look like this:
{
"animals": [
{
"cats": [
{
"name": "fluffles",
"age": 10,
"color": "white"
}
]
},
{
"dogs": [
{
"name": "sam",
"age": 5,
"color": "black and white"
},
{
"name": "rover",
"age": 2,
"color": "brown and white"
}
]
},
{
"snakes": [
{
"name": "noodles",
"age": 10,
"color": "green"
}
]
}
],
"owners": [
"peter",
"william",
"sally"
]
}
Suppose you had an (a priori) indeterminate or large numbers of types of animals, and just one owners file. In such cases, it would be better (to save memory) not to use the -s option, and it would be easier to invoke jq with the owners file as the first data file, e.g. along the lines of:
jq -n -f program.jq owners.json $(ls *.json | grep -v owners.json)
where program.jq contains a program such as:
input as $owners | {$owners, animals: [inputs]}
(Notice how {"owners": $owners} can be abbreviated.)
Not sure if this is the way-to-go, but it gets the desired output by:
Using --slurp:
Catching the first 3 files as a single array variable
[ .[0] * .[1] * .[2] ] as $all
Catching owners object as a single variable
.[3].owners as $owners
Creating the object as desired
{ "animals": $all, "owners": $owners }
jq \
--slurp \
'[ .[0] * .[1] * .[2] ] as $all | .[3].owners as $owners | { "animals": $all, "owners": $owners }' cats.json dogs.json snakes.json owners.json
Will produce:
{
"animals": [
{
"cats": [
{
"name": "fluffles",
"age": 10,
"color": "white"
}
],
"dogs": [
{
"name": "sam",
"age": 5,
"color": "black and white"
},
{
"name": "rover",
"age": 2,
"color": "brown and white"
}
],
"snakes": [
{
"name": "noodles",
"age": 10,
"color": "green"
}
]
}
],
"owners": [
"peter",
"william",
"sally"
]
}
This question already has answers here:
How to convert arbitrary simple JSON to CSV using jq?
(8 answers)
Closed 2 years ago.
Trying to convert the json data below into csv, using jq and or awk or python or perl or anything from Linux shell.
Will appreciate your scripting help here.
{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}
The csv output should look like this:
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
etc
Please see the attached image
csv output
etc
Will appreciate any advice here.
I have tried using jq and awk but not getting the right result.
Here is one using GNU awk's JSON extension:
$ gawk '
#load "json"
BEGIN {
OFS=","
}
{
lines=lines $0 # keep appending lines
if(json_fromJSON(lines,data)!=0) { # until you have a valid object
for(inventory in data["inventory"]) # then we iterate the arrays and output
for(invState in data["inventory"][inventory]["invState"])
print data["inventory"][inventory]["profile"],
data["inventory"][inventory]["invState"][invState]["count"],
data["inventory"][inventory]["invState"][invState]["Status"]
lines="" # reset the object array for next round
}
}' file.json
Parts of the output:
Earth,6,ONLINE
Earth,8,EXIST
...
Mars,7,EXIST
Mars,20,INSTOCK
...
Mercury,4,EXIST
Mercury,1224,INSTOCK
...
The following will produce the output as shown if jq is invoked with the -r option:
.inventory[]
| .profile as $profile
| .invState[]
| [$profile] + [.count, .Status]
| join(",")
Note, however, that if CSV output is desired, then it might be better to replace the join in the last line by #csv.
Shorter versions
The following is equivalent to the above:
.inventory[]
| [.profile] + (.invState[] | [.count, .Status])
| join(",")
If the order of the "count" and "Status" keys is fixed, you could get away with:
.inventory[]
| [.profile] + (.invState[] | [.[]])
| join(",")
I produced quite a few CSV files out of JSON files with jq. I find jq quite suited for this.
Breaking out of an array
From:
["Earth","Mars","Mercury"]
To:
"Earth"
"Mars"
"Mercury"
Is achieved with this filter: .[] which iterates over the array. As the documentation puts it:
Running .[] with the input [1,2,3] will produce the numbers as three separate results, rather than as a single array.
From string to text
From:
["Earth","Mars","Mercury"]
To:
Earth
Mars
Mercury
Is achieved with the --raw-output parameter on the CLI combined with the .[] filter. e.g.,
jq --raw-output '.[]' input.json
Saving to variables
You'll need to hold a reference to .profile while you process the rest of the inventory. Here's a contrived example:
From:
[ {"x": "Earth", "y": ["1", "2", "3"]}
, {"x": "Mars", "y": ["1", "2", "3"]}
]
To:
"Earth1"
"Earth2"
"Earth3"
"Mars1"
"Mars2"
"Mars3"
Is achieved with this filter: .[] | .x as $x | .y[] | $x + .
(You save .x into var $x that you can refer to in your filter.)
To answer your question, given this input.json:
{
"inventory": [
{
"profile": "Earth",
"invState": [
{
"count": 6,
"Status": "ONLINE"
},
{
"count": 8,
"Status": "EXIST"
},
{
"count": 1,
"Status": "GIVEN"
},
{
"count": 4,
"Status": "ERROR"
},
{
"count": 49,
"Status": "INSTOCK"
},
{
"count": 389,
"Status": "RELEASED"
},
{
"count": 68,
"Status": "DELETED"
},
{
"count": 280,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mars",
"invState": [
{
"count": 7,
"Status": "EXIST"
},
{
"count": 20,
"Status": "INSTOCK"
},
{
"count": 110,
"Status": "RELEASED"
},
{
"count": 16,
"Status": "DELETED"
},
{
"count": 41,
"Status": "CONNECTED"
},
{
"count": 1,
"Status": "UNINSTOCK"
}
]
},
{
"profile": "Mercury",
"invState": [
{
"count": 4,
"Status": "EXIST"
},
{
"count": 1224,
"Status": "INSTOCK"
},
{
"count": 3,
"Status": "CONNECTED"
},
{
"count": 18,
"Status": "RELEASED"
},
{
"count": 5,
"Status": "DELETED"
}
]
}
]
}
The following invokation of jq should do the trick:
jq --raw-output '.inventory[] | .profile as $p | .invState[] | "\($p),\(.count),\(.Status)"' input.json
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED
If you don't have jq or gawks json extension (requires gawkextlib) and your input is always as simple and regular as in your example then this will do what you want using GNU awk for the 3rd arg to match() and gensub():
$ cat tst.awk
BEGIN { OFS="," }
match($0,/"([^"]+)": *("[^"]*"|[0-9]+)/,a) {
tag = a[1]
val = gensub(/^"|"$/,"","g",a[2])
f[tag] = val
if ( tag == "Status" ) {
print f["profile"], f["count"], f["Status"]
}
}
$ awk -f tst.awk file
Earth,6,ONLINE
Earth,8,EXIST
Earth,1,GIVEN
Earth,4,ERROR
Earth,49,INSTOCK
Earth,389,RELEASED
Earth,68,DELETED
Earth,280,CONNECTED
Earth,1,UNINSTOCK
Mars,7,EXIST
Mars,20,INSTOCK
Mars,110,RELEASED
Mars,16,DELETED
Mars,41,CONNECTED
Mars,1,UNINSTOCK
Mercury,4,EXIST
Mercury,1224,INSTOCK
Mercury,3,CONNECTED
Mercury,18,RELEASED
Mercury,5,DELETED
This might work for you (GNU sed):
sed -nE '/profile/{s/.*"(\S+)".*/\1/;h};
/count/{s/.* (\S+),.*/\1/;H};
/Status/{s/.*"(\S+)".*/\1/;H;g;s/\n/,/gp;g;s/\n.*\n.*//;h}' file
Stuff the profile, count and Status info in the hold space and after doing so for Status, retrieve the hold space, replace the newlines by commas, print and then remove count and Status details, ready for the next time.
N.B. As this is json, it is better to use jq as this will always be a more robust solution.
awk -F: 'BEGIN{ OFS=""; p=c=s=""; }
/"profile"/{ p=$2 }
/"count"/{ c=$2 }
/"Status"/{ s=$2 }
{ if(s!="") { print p,c,s; s="" }}' file.json
output:
"Earth", 6, "ONLINE"
"Earth", 8, "EXIST"
"Earth", 1, "GIVEN"
"Earth", 4, "ERROR"
"Earth", 49, "INSTOCK"
"Earth", 389, "RELEASED"
"Earth", 68, "DELETED"
"Earth", 280, "CONNECTED"
"Earth", 1, "UNINSTOCK"
"Mars", 7, "EXIST"
"Mars", 20, "INSTOCK"
"Mars", 110, "RELEASED"
"Mars", 16, "DELETED"
"Mars", 41, "CONNECTED"
"Mars", 1, "UNINSTOCK"
"Mercury", 4, "EXIST"
"Mercury", 1224, "INSTOCK"
"Mercury", 3, "CONNECTED"
"Mercury", 18, "RELEASED"
"Mercury", 5, "DELETED"
It is csv, that why text fields are surrounded by double quotes.... 😁😎
If you json is not in 'pretty_print' you might have to do something like:
cat file.json | json_pp | awk .....
I want to convert a complex JSON file into a simple JSON file using JQ. However, the query I'm using generates an incorrect output.
My (cut down) JSON file:
[
{
"id": 100,
"foo": [
{
"bar": [
{"type": "read"},
{"type": "write"}
],
"users": ["admin_1"],
"groups": []
},
{
"bar": [
{"type": "execute"},
{ "type": "read"}
],
"users": [],
"groups": ["admin_2"]
}
]
},
{
"id": 101,
"foo": [
{
"bar": [
{"type": "read"}
],
"users": [
"admin_3"
],
"groups": []
}
]
}
]
I need to generate a flatter JSON file and combine the users and groups into one field, similar to this:
[
{
"id": 100,
"users_groups": [
"admin_1",
"admin_2"
],
"bar": ["read"]
},
{
"id": 100,
"users_groups": ["admin_1"],
"bar": ["write"]
},
{
"id": 100,
"users_groups": ["admin_2"],
"bar": ["execute"]
},
{
"id": 101,
"users_groups": ["admin_3"],
"bar": ["read"]
}
]
Everything I try in JQ results in me getting an incorrect output (where admin_1 incorrectly has bar=execute and admin_2 incorrectly has bar=write), similar to the following:
[
{
"id": 100,
"users_groups": [
"admin_1",
"admin_2"
],
"bar": ["read", "write", "execute"]
},
{
"id": 101,
"users_groups": ["admin_3"],
"bar": ["read"]
}
]
I have tried many vairiats of this query - any idea what I should be doing instead?
cat file.json | jq -r '[.[] | select(has("foo")) |{"id", "users":(.foo[] | .users), "groups":(.foo[] | .groups), "bar":([.foo[].bar[] | .type])} ] '
The following filter groups by "type" as the question seems to require:
map(.id as $id
| [.foo[]
| {id: $id, bar: .bar[].type} +
{"users_groups": (.users + .groups)[]} ]
| group_by(.bar)
| map(.[0] + {"users_groups": [.[].users_groups]}) )
Output
[
[
{
"id": 100,
"bar": "execute",
"users_groups": [
"admin_2"
]
},
{
"id": 100,
"bar": "read",
"users_groups": [
"admin_1",
"admin_2"
]
},
{
"id": 100,
"bar": "write",
"users_groups": [
"admin_1"
]
}
],
[
{
"id": 101,
"bar": "read",
"users_groups": [
"admin_3"
]
}
]
]
Variations
To achieve the array-of-objects output format, simply tack on | [.[][]];
it would similarly be trivially easy to ensure that .bar is array-valued, though that might be pointless given that the grouping is by .type.