How can I map items together in a json structure with jq? - json

I have a json structure that looks like this:
{
"lorry1": {
"box1": [
{"item": "shoes", "state": "new"},
{"item": "snacks", "state": "new"}
],
"box2": [
{"item": "beer", "state": "cold"},
{"item": "potatoes"}
]
},
"lorry2": {
"box1": [
{"item": "shoes", "state": "new"},
{"item": "snacks", "state": "new"}
],
"box2": [
{"item": "beer", "state": "lukewarm"}
]
}
}
Now I want to know where I can find shoes:
I could come up with this jq query:
to_entries | select(.[].value | .[][].item=="shoes") | map({"lorry": "\(.key)" })
But that only gives me the lorries. Useful, but not quite there yet. I'd like to know the box they're in as well.
I came up with this, but it obviously is not correct:
to_entries | select(.[].value | .[][].item=="shoes") | keys as $box |map({"lorry": "\(.key)", "box": $box })
The answer I'd like to get is lorry1, box1 and lorry2, box1.
Even better yet: I'd like to find all items and provide the information, like this:
"shoes": [ {"lorry1", "box1"}, {"lorry2", "box1" } ],
"snacks": [ {"lorry1", "box1"}, {"lorry2", "box1"} ],
"beer": [ {"lorry1", "box2"}, {"lorry2", "box2"} ],
"potatoes": [ {"lorry1", "box2"} ]
but that may be asking a bit too much :)

This looks like a overkill to me but it does the job.
[path(.[][][].item) as $p | [$p, getpath($p)]] |
group_by( .[1] ) |
map({(.[0][1]): (. | map([.[0][0,1]]))})|
add
Save the above jq filter in file item_in.jq and run it as jq --from-file item_in.jq. Passing your input to this gives the following output:
{
"beer": [
[
"lorry1",
"box2"
],
[
"lorry2",
"box2"
]
],
"potatoes": [
[
"lorry1",
"box2"
]
],
"shoes": [
[
"lorry1",
"box1"
],
[
"lorry2",
"box1"
]
],
"snacks": [
[
"lorry1",
"box1"
],
[
"lorry2",
"box1"
]
]
}
The initial transformation was to dump leaves and their paths from the input JSON tree.
See
https://github.com/stedolan/jq/issues/78#issuecomment-17819519

The answer I'd like to get is lorry1, box1 and lorry2, box1
In this case, you can get it with:
path(.. | select(.item? == "shoes"))
Which returns:
["lorry1","box1",0]
["lorry2","box1",0]
These are the paths in your object that will lead to an object which .item property is set to "shoes"

Here's a generic solution that does not assume the "items" are in arrays, or even that the values associated with the "item" keys are always strings:
jq -c '. as $in
| [paths as $p | select($p[-1] == "item") | $p]
| group_by(. as $p | $in|getpath($p))
| .[]
| (.[0] as $p | $in | getpath($p)) as $v
| {($v|tostring): ( map(.[:-1] | if .[-1] | type == "number" then .[:-1] else . end)) }
'
Output
With your input:
{"beer":[["lorry1","box2"],["lorry2","box2"]]}
{"potatoes":[["lorry1","box2"]]}
{"shoes":[["lorry1","box1"],["lorry2","box1"]]}
{"snacks":[["lorry1","box1"],["lorry2","box1"]]}
If you want the output as a single JSON object, then collect the above into an array and use add.

Related

How to parse nested json to csv using command line

I want to parse a nested json to csv. The data looks similar to this.
{"tables":[{"name":"PrimaryResult","columns":[{"name":"name","type":"string"},{"name":"id","type":"string"},{"name":"custom","type":"dynamic"}]"rows":[["Alpha","1","{\"age\":\"23\",\"number\":\"xyz\"}]]]}
I want csv file as:
name id age number
alpha 1 23 xyz
I tried:
jq -r ".tables | .[] | .columns | map(.name)|#csv" demo.json > demo.csv
jq -r ".tables | .[] | .rows |.[]|#csv" demo.json >> demo.csv
But I am not getting expected result.
Output:
name id custom
alpha 1 {"age":"23","number":"xyz}
Expected:
name id age number
alpha 1 23 xyz
Assuming valid JSON input:
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{ "name": "name", "type": "string" },
{ "name": "id", "type": "string" },
{ "name": "custom", "type": "dynamic" }
],
"rows": [
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"xyz\"}"
]
}
]
}
And assuming fixed headers:
jq -r '["name", "id", "age", "number"],
(.tables[].rows | [.[0,1], (.[2] | fromjson | .age, .number)])
| #csv' input.json
Output:
"name","id","age","number"
"Alpha","1","23","xyz"
If any of the assumptions is wrong, you need to clarify your requirements, e.g.
How are column names determined?
What happens if the input contains multiple tables?
As the "dynamic" object always of the same shape? Or can it sometimes contain fewer, more, or different columns?
Assuming that the .rows array is a 2D array of rows and fields, and that a column of type "dynamic" always expects a JSON-encoded object whose fields represent further columns but may or may not always be present in every row.
Then you could go with transposing the headers array and the rows array in order to integratively process each column by their type, especially collecting all keys from the "dynamic" type on the fly, and then transpose it back to get the row-based CSV output.
Input (I have added another row for illustration):
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "name",
"type": "string"
},
{
"name": "id",
"type": "string"
},
{
"name": "custom",
"type": "dynamic"
}
],
"rows": [
[
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"123\"}"
],
[
"Beta",
"2",
"{\"age\":\"45\",\"word\":\"xyz\"}"
]
]
}
]
}
Filter:
jq -r '
.tables[] | [.columns, .rows[]] | transpose | map(
if first.type == "string" then first |= .name
elif first.type == "dynamic" then
.[1:] | map(fromjson)
| (map(keys[]) | unique) as $keys
| [$keys, (.[] | [.[$keys[]]])] | transpose[]
else empty end
)
| transpose[] | #csv
'
Output:
"name","id","age","number","word"
"Alpha","1","23","123",
"Beta","2","45",,"xyz"
Demo

Reuse the hash-key and generate csv ready format

I'm trying to create an csv ready output with jq, and want to reuse the nested hash key on the way.
In this example http, https should be reused to generate csv ready format ([][]...).
Original
{
"google.com":{
"https":{
"dest_url":"http://aaa.com"
}
},
"microsoft.com":{
"http":{
"dest_url":"http://bbb.com"
},
"https":{
"dest_url":"http://ccc.com"
}
}
}
Expected
[
"https://google.com",
"http://aaa.com"
]
[
"http://microsoft.com",
"http://bbb.com",
]
[
"https://microsoft.com",
"http://ccc.com"
]
What I tried
to_entries[] | [.key, .value[].dest_url]
[
"google.com",
"http://aaa.com"
]
[
"microsoft.com",
"http://bbb.com",
"http://ccc.com"
]
If you separate accessing .key and the iteration over .value[], you'll get the cartesian product:
jq 'to_entries[] | [.key] + (.value[] | [.dest_url])'
[
"google.com",
"http://aaa.com"
]
[
"microsoft.com",
"http://bbb.com"
]
[
"microsoft.com",
"http://ccc.com"
]
Demo
To include the nested keys, it's easier to save the outer .key before descending with the inner to_entries:
jq 'to_entries[] | .key as $key | .value | to_entries[] | [.key + "://" + $key, .value.dest_url]'
[
"https://google.com",
"http://aaa.com"
]
[
"http://microsoft.com",
"http://bbb.com"
]
[
"https://microsoft.com",
"http://ccc.com"
]
Demo

Json to CSV issues

I am using pandas to normalize some json data. I am getting stuck on this issue when more than 1 section is either an object or an array.
If i use the record_path on Car it breaks on the second.
Any pointers on how to get something like this to create a line in the csv per Car and per Location?
[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas"
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
Here is the output
Name,Car,Location
John Doe,"['Car1', 'Car2']",Texas
Jane Roe,Car1,"['Illinois', 'Kansas']"
Here is the code:
with open('file.json') as data_file:
data = json.load(data_file)
df = pd.io.json.json_normalize(data, errors='ignore')
Would like it to end up like this:
Name,Car,Location
John Doe,Car1,Texas
John Doe,Car2,Texas
Jane Roe,Car1,Illinois
Jane Roe,Car1,Kansas
The answers worked great until I ran into a json file with extra data. This what a file looks like with the extra values.
{
Customers:[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas",
"Repairs: {
"RepairLocations": {
"RepairsCompleted":[
"Fix1",
"Fix2"
]
}
}
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
}
Here is what I am going for. I think its the most readable in this format but anything would at least should all the keys
Name,Car,Location,Repairs:RepairLocation
John Doe,Car1,Texas,RepairsCompleted:Fix1
John Doe,Car1,Texas,RepairsCompleted:Fix2
John Doe,Car2,Texas,RepairsCompleted:Fix1
John Doe,Car2,Texas,RepairsCompleted:Fix2
Jane Roe,Car1,Illinois,
Jane Roe,Car1,Kansas,
Any suggestions on getting this second part?
A simple jq solution which is also a bit more generic than needed here:
["Name", "Car", "Location"],
(.[]
| [.Name] + (.Car|..|scalars|[.]) + (.Location|..|scalars|[.]))
| #csv
You're looking for something like this:
def expand($keys):
. as $in
| reduce $keys[] as $k ( [{}];
map(. + {
($k): ($in[$k] | if type == "array" then .[] else . end)
})
) | .[];
(.[0] | keys_unsorted) as $h
| $h, (.[] | expand($h) | [.[$h[]]]) | #csv
REPL demo

Extract inner array matching values using jq

My json is as shown below:
[
[
{
"id": "abcd"
},
{
"address": [
"140 Deco st"
]
}
],
[
{
"id": "xyz"
},
{
"dummy": "This is dummy"
}
],
[
{
"id": "12356"
},
{
"address": [
"140 Deco st"
]
}
]]
Now, I want to capture only those ids who have dummy value of "This is dummy". Some of the data may or may not have dummy and address fields.
I tried below but it gave me error "... cannot have their containment checked"
jq -c '.[] | .[] | select(.dummy | contains("This is dummy")) | .[] | .id'
Any help is much appreciated!
contains is quite tricky to use correctly. Since the requirement is:
to capture only those ids who have dummy value of "This is dummy"
I would suggest:
.[]
| select( any(.[]; .dummy == "This is dummy") )
| add
| .id
or perhaps (depending on your detailed requirements):
.[]
| select( any(.[]; .dummy == "This is dummy") )
| .[]
| .id? // empty

Parsing multiple key/values in json tree with jq

Using jq, I'd like to cherry-pick key/value pairs from the following json:
{
"project": "Project X",
"description": "This is a description of Project X",
"nodes": [
{
"name": "server001",
"detail001": "foo",
"detail002": "bar",
"networks": [
{
"net_tier": "network_tier_001",
"ip_address": "10.1.1.10",
"gateway": "10.1.1.1",
"subnet_mask": "255.255.255.0",
"mac_address": "00:11:22:aa:bb:cc"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk001": 40,
"detail001": "foo"
},
{
"disk002": 20,
"detail001": "bar"
}
]
},
"os": "debian8",
"geo": {
"region": "001",
"country": "Sweden",
"datacentre": "Malmo"
},
"detail003": "baz"
}
],
"detail001": "foo"
}
For the sake of an example, I'd like to parse the following keys and their values: "Project", "name", "net_tier", "vcpu", "mem", "disk001", "disk002".
I'm able to parse individual elements without much issue, but due to the hierarchical nature of the full parse, I've not had much luck parsing down different branches (i.e. both networks and hardware > disks).
Any help appreciated.
Edit:
For clarity, the output I'm going for is a comma-separated CSV. In terms of parsing all combinations, covering the sample data in the example will do for now. I will hopefully be able to expand on any suggestions.
Here is a different filter which computes the unique set of network tier and disk names and then generates a result with columns appropriate to the data.
{
tiers: [ .nodes[].networks[].net_tier ] | unique
, disks: [ .nodes[].hardware.disks[] | keys[] | select(startswith("disk")) ] | unique
} as $n
| def column_names($n): [ "project", "name" ] + $n.tiers + ["vcpu", "mem"] + $n.disks ;
def tiers($n): [ $n.tiers[] as $t | .networks[] | if .net_tier==$t then $t else null end ] ;
def disks($n): [ $n.disks[] as $d | map(select(.[$d]!=null)|.[$d])[0] ] ;
def rows($n):
.project as $project
| .nodes[]
| .name as $name
| tiers($n) as $tier_values
| .hardware
| .vcpu as $vcpu
| .mem as $mem
| .disks
| disks($n) as $disk_values
| [$project, $name] + $tier_values + [$vcpu, $mem] + $disk_values
;
column_names($n), rows($n)
| #csv
The benfit of this approach becomes apparent if we add another node to the sample data:
{
"name": "server002",
"networks": [
{
"net_tier": "network_tier_002"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk002": 40,
"detail001": "foo"
}
]
}
}
Sample Run (assuming filter in filter.jq and amended data in data.json)
$ jq -Mr -f filter.jq data.json
"project","name","network_tier_001","network_tier_002","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001","",1,1024,40,20
"Project X","server002",,"network_tier_002",1,1024,,40
Try it online!
Here's one way you could achieve the desired output.
program.jq:
["project","name","net_tier","vcpu","mem","disk001","disk002"],
[.project]
+ (.nodes[] | .networks[] as $n |
[
.name,
$n.net_tier,
(.hardware |
.vcpu,
.mem,
(.disks | add["disk001","disk002"])
)
]
)
| #csv
$ jq -r -f program.jq input.json
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
Basically, you'll want to project the fields that you want into arrays so you may convert those arrays to csv rows. Your input makes it seem like there could potentially be multiple networks for a given node. So if you wanted to output all combinations, that would have to be flattened out.
Here's another approach, that is short enough to speak for itself:
def s(f): first(.. | f? // empty) // null;
[s(.project), s(.name), s(.net_tier), s(.vcpu), s(.mem), s(.disk001), s(.disk002)]
| #csv
Invocation:
$ jq -r -f value-pairs.jq input.json
Result:
"Project X","server001","network_tier_001",1,1024,40,20
With headers
Using the same s/1 as above:
. as $d
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"]
| (., map( . as $v | $d | s(.[$v])))
| #csv
With multiple nodes
Again with s/1 as above:
.project as $p
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"] as $h
| ($h,
(.nodes[] as $d
| $h
| map( . as $v | $d | s(.[$v]) )
| .[0] = $p)
) | #csv
Output with the illustrative multi-node data:
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
"Project X","server002","network_tier_002",1,1024,,40