Extract certain array elements with its parent keys using jq - json

I would like to know how to use jq to extract patterns from a .json file
echo '{"parts": [{"name":"core","items":"garbage with ITEM1 ITEM2 and more"},{"name":"misc","items":"ITEM3 ITEM4 ITEM5 bla bla"} ]}' | jq '.parts | .[] | .items |=split(" ")'
{
"name": "core",
"items": [
"garbage",
"with",
"ITEM1",
"ITEM2",
"and",
"more"
]
}
{
"name": "misc",
"items": [
"ITEM3",
"ITEM4",
"ITEM5",
"bla",
"bla"
]
}
I think in splitting the items, but I don't know how to extract each ITEMx.
I want to obtain this output:
{ "core","ITEM1" }
{ "core","ITEM2" }
{ "misc","ITEM3" }
{ "misc","ITEM4" }
{ "misc","ITEM5" }

Your desired output is not valid JSON.
Do you want the words form an array under the value of the .name field?
jq '.parts[] | {(.name): (.items | split(" "))}'
{
"core": [
"garbage",
"with",
"ITEM1",
"ITEM2",
"and",
"more"
]
}
{
"misc": [
"ITEM3",
"ITEM4",
"ITEM5",
"bla",
"bla"
]
}
Demo
Or do you want each word to form a separate object?
jq '.parts[] | (.items | split(" "))[] as $word | {(.name): $word}'
{"core":"garbage"}
{"core":"with"}
{"core":"ITEM1"}
{"core":"ITEM2"}
{"core":"and"}
{"core":"more"}
{"misc":"ITEM3"}
{"misc":"ITEM4"}
{"misc":"ITEM5"}
{"misc":"bla"}
{"misc":"bla"}
Demo
To only capture words that match the regex ITEM\d+, you could employ the scan function instead of splitting:
jq '.parts[] | {(.name): .items | scan("ITEM\\d+")}'
{"core":"ITEM1"}
{"core":"ITEM2"}
{"misc":"ITEM3"}
{"misc":"ITEM4"}
{"misc":"ITEM5"}
Demo

Building on your attempt, we could try:
.parts
| .[]
| .items |= (split(" ") | map(select(test("ITEM"))))
| {(.name): .items[]}
This produces a stream of objects such as {"core":"ITEM1"}. If you really want the non-JSON output shown in the Q, it's easy enough to add the additional step.

Related

use jq to format json data into csv data

{
"Users": [
{
"Attributes": [
{
"Name": "sub",
"Value": "1"
},
{
"Name": "phone_number",
"Value": "1234"
},
{
"Name": "referral_code",
"Value": "abc"
}
]
},
{
"Attributes": [
{
"Name": "sub",
"Value": "2"
},
{
"Name": "phone_number",
"Value": "5678"
},
{
"Name": "referral_code",
"Value": "def"
}
]
}
]
}
How can I produce output like below ?
1,1234,abc
2,5678,def
jq '.Users[] .Attributes[] .Value' test.json
produces
1
1234
abc
2
5678
def
Not sure this is the cleanest way to handle this, but the following will get the desired output:
.Users[].Attributes | map(.Value) | #csv
Loop through all the deep Attributes .Users[].Attributes
map() to get all the Value's
Convert to #csv
jqPlay demo
If you don't need the output to be guaranteed to be CSV, and if you're sure the "Name" values are presented in the same order, you could go with:
.Users[].Attributes
| from_entries
| [.[]]
| join(",")
To be safe though it would be better to ensure consistency of ordering:
(.Users[0] | [.Attributes[] | .Name]) as $keys
| .Users[]
| .Attributes
| from_entries
| [.[ $keys[] ]]
| join(",")
Using join(",") will produce the comma-separated values as shown in the Q (without the quotation marks), but is not guaranteed to produce the expected CSV for all valid values of the input. If you don't mind the pesky quotation marks, you could use #csv, or if you want to skip the quotation marks around all numeric values:
map(tonumber? // .) | #csv

Extract inner array matching values using jq

My json is as shown below:
[
[
{
"id": "abcd"
},
{
"address": [
"140 Deco st"
]
}
],
[
{
"id": "xyz"
},
{
"dummy": "This is dummy"
}
],
[
{
"id": "12356"
},
{
"address": [
"140 Deco st"
]
}
]]
Now, I want to capture only those ids who have dummy value of "This is dummy". Some of the data may or may not have dummy and address fields.
I tried below but it gave me error "... cannot have their containment checked"
jq -c '.[] | .[] | select(.dummy | contains("This is dummy")) | .[] | .id'
Any help is much appreciated!
contains is quite tricky to use correctly. Since the requirement is:
to capture only those ids who have dummy value of "This is dummy"
I would suggest:
.[]
| select( any(.[]; .dummy == "This is dummy") )
| add
| .id
or perhaps (depending on your detailed requirements):
.[]
| select( any(.[]; .dummy == "This is dummy") )
| .[]
| .id? // empty

jq: Insert values according to mappings from external file

I was wondering how I can complete this task by command line jq. I make up a file with similar nested structure as follows:
{
"item": "item1",
"features": [
{
"feature": "feature_a",
"value": ""
},
{
"feature": "feature_b",
"value": ""
}
]
}
Now I have another file that maps the feature to value:
feature_a value_1
feature_b value_2
So I would like to insert the value into the first json file, according to the maps, resulting the following output:
{
"item": "item1";
"features": [
{
"feature": "feature_a",
"value": "value_1"
},
{
"feature": "feature_b",
"value": "value_2"
}
]
}
How I can achieve above operation by jq?
Thanks in advance!
Assuming the text file is in dict.txt and the JSON file is in source.json, the invocation
jq -Rs --argfile target source.json dict.txt '
([ split("\n")[]
| select(length>0)
| split(" ")
| { (.[0]): .[1]} ]
| add) as $dict
| $target
| .features |= map(.value = $dict[.feature])'
would yield the desired output.
The main reason for including select(length>0) is to skip any empty strings that might result from using split("\n") to split an entire file.

jq: translate array of objects to object

I have a response from curl in a format like this:
[
{
"list": [
{
"value": 1,
"id": 12
},
{
"value": 15,
"id": 13
},
{
"value": -4,
"id": 14
}
]
},
...
]
Given a mapping between ids like this:
{
"12": "newId1",
"13": "newId2",
"14": "newId3"
}
I want to make this:
[
{
"list": {
"newId1": 1,
"newId2": 15,
"newId3": -4,
}
},
...
]
Such that I get a mapping from ids to values (and along the way I'd like to remap the ids).
I've been working at this for a while and every time I get a deadend.
Note: I can use Shell or the like to preform loops if necessary.
edit: Here's one version what I've developed so far:
jq '[].list.id = ($mapping.[] | select(.id == key)) | del(.id)' -M --argjson "mapping" "$mapping"
I don't think it's the best one, but I'm looking to see if I can find an old version that was closer to what I need.
[EDIT: The following response was in answer to the question when it described (a) the mapping as shown below, and (b) the input data as having the form:
[
{
"list": [
{
"value": 1,
"id1": 12
},
{
"value": 15,
"id2": 13
},
{
"value": -4,
"id3": 14
}
]
}
]
END OF EDIT]
In the following I'll assume that the mapping is available via the following function, but that is an inessential assumption:
def mapping: {
"id1": "newId1",
"id2": "newId2",
"id3": "newId3"
} ;
The following jq filter will then produce the desired output:
map( .list
|= (map( to_entries[]
| (mapping[.key]) as $mapped
| select($mapped)
| {($mapped|tostring): .value} )
| add) )
There's plenty of ways to skin a cat. I'd do it like this:
.[].list |= reduce .[] as $i ({};
($i.id|tostring) as $k
| (select($mapping | has($k))[$mapping[$k]] = $i.value) // .
)
You would just provide the mapping through a separate file or argument.
$ cat program.jq
.[].list |= reduce .[] as $i ({};
($i.id|tostring) as $k
| (select($mapping | has($k))[$mapping[$k]] = $i.value) // .
)
$ cat mapping.json
{
"12": "newId1",
"13": "newId2",
"14": "newId3"
}
$ jq --argfile mapping mapping.json -f program.jq input.json
[
{
"list": {
"newId1": 1,
"newId2": 15,
"newId3": -4
}
}
]
Here is a reduce-free solution to the revised problem.
In the following I'll assume that the mapping is available via the following function, but that is an inessential assumption:
def mapping:
{
"12": "newId1",
"13": "newId2",
"14": "newId3"
} ;
map( .list
|= (map( mapping[.id|tostring] as $mapped
| select($mapped)
| {($mapped): .value} )
| add) )
The "select" is for safety (i.e., it checks that the .id under consideration is indeed mapped). It might also be appropriate to ensure that $mapped is a string by writing {($mapped|tostring): .value}.

Parsing multiple key/values in json tree with jq

Using jq, I'd like to cherry-pick key/value pairs from the following json:
{
"project": "Project X",
"description": "This is a description of Project X",
"nodes": [
{
"name": "server001",
"detail001": "foo",
"detail002": "bar",
"networks": [
{
"net_tier": "network_tier_001",
"ip_address": "10.1.1.10",
"gateway": "10.1.1.1",
"subnet_mask": "255.255.255.0",
"mac_address": "00:11:22:aa:bb:cc"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk001": 40,
"detail001": "foo"
},
{
"disk002": 20,
"detail001": "bar"
}
]
},
"os": "debian8",
"geo": {
"region": "001",
"country": "Sweden",
"datacentre": "Malmo"
},
"detail003": "baz"
}
],
"detail001": "foo"
}
For the sake of an example, I'd like to parse the following keys and their values: "Project", "name", "net_tier", "vcpu", "mem", "disk001", "disk002".
I'm able to parse individual elements without much issue, but due to the hierarchical nature of the full parse, I've not had much luck parsing down different branches (i.e. both networks and hardware > disks).
Any help appreciated.
Edit:
For clarity, the output I'm going for is a comma-separated CSV. In terms of parsing all combinations, covering the sample data in the example will do for now. I will hopefully be able to expand on any suggestions.
Here is a different filter which computes the unique set of network tier and disk names and then generates a result with columns appropriate to the data.
{
tiers: [ .nodes[].networks[].net_tier ] | unique
, disks: [ .nodes[].hardware.disks[] | keys[] | select(startswith("disk")) ] | unique
} as $n
| def column_names($n): [ "project", "name" ] + $n.tiers + ["vcpu", "mem"] + $n.disks ;
def tiers($n): [ $n.tiers[] as $t | .networks[] | if .net_tier==$t then $t else null end ] ;
def disks($n): [ $n.disks[] as $d | map(select(.[$d]!=null)|.[$d])[0] ] ;
def rows($n):
.project as $project
| .nodes[]
| .name as $name
| tiers($n) as $tier_values
| .hardware
| .vcpu as $vcpu
| .mem as $mem
| .disks
| disks($n) as $disk_values
| [$project, $name] + $tier_values + [$vcpu, $mem] + $disk_values
;
column_names($n), rows($n)
| #csv
The benfit of this approach becomes apparent if we add another node to the sample data:
{
"name": "server002",
"networks": [
{
"net_tier": "network_tier_002"
}
],
"hardware": {
"vcpu": 1,
"mem": 1024,
"disks": [
{
"disk002": 40,
"detail001": "foo"
}
]
}
}
Sample Run (assuming filter in filter.jq and amended data in data.json)
$ jq -Mr -f filter.jq data.json
"project","name","network_tier_001","network_tier_002","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001","",1,1024,40,20
"Project X","server002",,"network_tier_002",1,1024,,40
Try it online!
Here's one way you could achieve the desired output.
program.jq:
["project","name","net_tier","vcpu","mem","disk001","disk002"],
[.project]
+ (.nodes[] | .networks[] as $n |
[
.name,
$n.net_tier,
(.hardware |
.vcpu,
.mem,
(.disks | add["disk001","disk002"])
)
]
)
| #csv
$ jq -r -f program.jq input.json
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
Basically, you'll want to project the fields that you want into arrays so you may convert those arrays to csv rows. Your input makes it seem like there could potentially be multiple networks for a given node. So if you wanted to output all combinations, that would have to be flattened out.
Here's another approach, that is short enough to speak for itself:
def s(f): first(.. | f? // empty) // null;
[s(.project), s(.name), s(.net_tier), s(.vcpu), s(.mem), s(.disk001), s(.disk002)]
| #csv
Invocation:
$ jq -r -f value-pairs.jq input.json
Result:
"Project X","server001","network_tier_001",1,1024,40,20
With headers
Using the same s/1 as above:
. as $d
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"]
| (., map( . as $v | $d | s(.[$v])))
| #csv
With multiple nodes
Again with s/1 as above:
.project as $p
| ["project", "name", "net_tier", "vcpu", "mem", "disk001","disk002"] as $h
| ($h,
(.nodes[] as $d
| $h
| map( . as $v | $d | s(.[$v]) )
| .[0] = $p)
) | #csv
Output with the illustrative multi-node data:
"project","name","net_tier","vcpu","mem","disk001","disk002"
"Project X","server001","network_tier_001",1,1024,40,20
"Project X","server002","network_tier_002",1,1024,,40