jq- merge two json files on a value - json

i have two json files structured like that:
file 1
[
{
"id": 25422,
"location": "Hotel X",
"suppliers": [
12
]
},
{
"id": 25423,
"location": "Hotel Y",
"suppliers": [
13
]
}]
file 2
[
{
"id": 12,
"vatNumber": "0000000000"
},
{
"id": 14,
"vatNumber": "0000000001"
}]
and i'd like a result like this
[
{
"id": 25422,
"location": "Hotel X",
"suppliers": [
12
],
"vatNumber": "0000000000"
},
{
"id": 25423,
"location": "Hotel Y",
"suppliers": [
13
],
}]
The important thing to me is that the matching vatNumbers, are set in the first file. Supplier arrays are not required anymore after the melding, if it simplifies the job.
Also jq is not essential, but i need something i can use via terminal to set up a script.
Thank you in advance.

Here's one of many possible solutions. If your jq does not have INDEX/2, then either upgrade your jq or include its def (available e.g. from https://github.com/stedolan/jq/blob/master/src/builtin.jq):
Invocation:
jq -n --argfile f1 file1.json --argfile f2 file2.json -f merge.jq
merge.jq:
INDEX($f2[] ; .id) as $dict
| $f1
| map( ($dict[.suppliers[0]|tostring]|.vatNumber) as $vn
| if $vn then .vatNumber = $vn else . end)

Related

iterating through JSON files adding properties to each with jq

I am attempting to iterate through all my JSON files and add properties but I am relatively new jq.
here is what I am attempting:
find hashlips_art_engine/build -type f -name '*.json' | jq '. + {
"creators": [
{
"address": "4iUFmB3H3RZGRrtuWhCMtkXBT51iCUnX8UV7R8rChJsU",
"share": 10
},
{
"address": "2JApg1AXvo1Xvrk3vs4vp3AwamxQ1DHmqwKwWZTikS9w",
"share": 45
},
{
"address": "Zdda4JtApaPs47Lxs1TBKTjh1ZH2cptjxXMwrbx1CWW",
"share": 45
}
]
}'
However this is returning an error:
parse error: Invalid numeric literal at line 2, column 0
I have around 10,000 JSON files that I need to iterate over and add
{
"creators": [
{
"address": "4iUFmB3H3RZGRrtuWhCMtkXBT51iCUnX8UV7R8rChJsU",
"share": 10
},
{
"address": "2JApg1AXvo1Xvrk3vs4vp3AwamxQ1DHmqwKwWZTikS9w",
"share": 45
},
{
"address": "Zdda4JtApaPs47Lxs1TBKTjh1ZH2cptjxXMwrbx1CWW",
"share": 45
}
]
}
to, is this possible or am I barking up the wrong tree on this?
thanks for your assistance with this, I have been searching the web for several hours now but either my terminology is incorrect or there isn't much out there regarding this issue.
The problem is that you are piping the filenames to jq rather than making the contents available to jq.
Most likely you could use the following approach, e.g. if you want the augmented contents of each file to be handled separately:
find ... | while read f ; do jq ... "$f" ; done
An alternative that might be relevant would be:
jq ... $(find ...)
If you have 2 files:
file01.json :
{"a":"1","b":"2"}
file02.json :
{"x":"10","y":"12","z":"15"}
you can:
for f in file*.json ;do cat $f | jq '. + { creators:[{address: "xxx",share:1}] } ' ; done
result:
{
"a": "1",
"b": "2",
"creators": [
{
"address": "xxx",
"share": 1
}
]
}
{
"x": "10",
"y": "12",
"z": "15",
"creators": [
{
"address": "xxx",
"share": 1
}
]
}

Print only one property of an object that is within an an array attribute as well as a property that is a sibling to the array property in jq

I have a json file that looks like so:
[
{
"code": "1234",
"files": [
{
"fileType": "pdf",
"url": "http://.../a.pdf"
},
{
"fileType": "video",
"url": "http://.../b.mp4"
}
]
},
{
"code": "4321",
"files": [
{
"fileType": "pdf",
"url": "http://.../c.pdf"
},
{
"fileType": "video",
"url": "http://.../d.mp4"
}
]
},
{
"code": "9999",
"files": [
{
"fileType": "pdf",
"url": "http://.../e.pdf"
}
]
}
]
I would like to print out only the files that are of fileType == video in the files array such that I end up with output that looks like so:
1234, "http://.../b.mp4"
4321, "http://.../d.mp4"
So far I am only able to output something that looks like this:
1234, "http://.../a.pdf", "http://.../b.mp4",
4321, "http://.../c.pdf", "http://.../d.mp4"
Using the following:
jq -r '.[] | select(.files[]?.fileType == "video") | [.code, .files[].url] | #csv'
I was wondering how I can filter the .files[] based on the fileType as I am outputting them?
The following pipeline makes the solution fairly self-explanatory, assuming one understands the basic syntax and the -r command-line option:
< input.json jq -r '
.[]
| .code as $code
| .files[]
| select(.fileType == "video")
| "\($code), \"\(.url)\""
'

Columnar CSV Output from nested SON

[
{
"name": "Metadata:MER-2.0-ver AGYW_PREV-Results (Semi Annual)",
"id": "XOPEXepA7zg",
"categoryOptions.name": [
"0 -2 month",
">2months-<1 year",
"< 1 year",
"(1 - 4) Years",
"(1-9) Years"
],
"categoryOptions.id": [
"wfvXckoyaE9",
"Yi2K2FUDa3B",
"kKt6hryCX75",
"A0B8w6HoZvV",
"upbvx1IvICR"
]
},
{
"name": "Metadata:MER-2.0-ver KP-Results (Semi Annual)",
"id": "k9p3Ghbi6eW",
"categoryOptions.name": [
"Sex Workers",
"People in prisons and other enclosed settings (Incarcerated Population) ",
"PWID..",
"MSM",
"Transgender"
],
"categoryOptions.id": [
"mwTwhESK21T",
"eQjIwsDqbPy",
"zYaPQA3uTiH",
"vu0dG7psM5W",
"Jyo9XWumVtZ"
]
},
{
"name": "Metadata:MER-2.0-ver PP-Results (Semi Annual)",
"id": "rkExsSSc3yI",
"categoryOptions.name": [
"Adolescents (10-24)",
"Clients of Sex Workers",
"Displaced Persons",
"Fishing communities",
"Military and other Uniform Services"
],
"categoryOptions.id": [
"yWwp6xnt0pw",
"jlKwW6DC023",
"wF42hb47Z7J",
"qkIUghy30Vl",
"Vcuw6LkdAkk"
]
},
{
"name": "Metadata:MER-2.0-ver PREP_CURR-and-TX_ML (Semi Annual)",
"id": "ZYdO3FqQgo1",
"categoryOptions.name": [
"Adolescents (10-24)",
"Clients of Sex Workers",
"Displaced Persons",
"Fishing communities",
"Military and other Uniform Services"
],
"categoryOptions.id": [
"yWwp6xnt0pw",
"jlKwW6DC023",
"wF42hb47Z7J",
"qkIUghy30Vl",
"Vcuw6LkdAkk"
]
},
{
"name": "Metadata:MER-2.0-ver SupplyChain-Results (Semi Annual)",
"id": "Cub0DEVWs3P",
"categoryOptions.name": [
"TLD 30-count bottles",
"TLD 90-count bottles",
"TLD 180-count bottles",
"TLE/400 30-count bottles",
"TLE/400 90-count bottles"
],
"categoryOptions.id": [
"dtmTsLvH2dk",
"sOLj1z1XRxh",
"SnkZTF4kThV",
"sNnXSKiPvb5",
"t3iPChPFIcd"
]
}
]
Expected Output should be in csv format as below:
key,name,id,"categoryOptions.name","categoryOptions.id"
0,Metadata:MER-2.0-ver AGYW_PREV-Results (Semi Annual),XOPEXepA7zg,0 -2 month,wfvXckoyaE9
0,Metadata:MER-2.0-ver AGYW_PREV-Results (Semi Annual),XOPEXepA7zg,>2months-<1 year,Yi2K2FUDa3B
1,Metadata:MER-2.0-ver KP-Results (Semi Annual),k9p3Ghbi6eW,Sex Workers,mwTwhESK21T
1,Metadata:MER-2.0-ver KP-Results (Semi Annual),k9p3Ghbi6eWPeople in prisons and other enclosed settings (Incarcerated Population),eQjIwsDqbPy
2,Metadata:MER-2.0-ver PP-Results (Semi Annual),rkExsSSc3yI,Adolescents (10-24),yWwp6xnt0pw
2,Metadata:MER-2.0-ver PP-Results (Semi Annual),rkExsSSc3yI,Clients of Sex Workers,jlKwW6DC023
upto key4
The above input json came from here below:
cat /home/fred/Downloads/metadata/multiple-dataset-metadata.json
| jq '[.dataSets[]
| {name: .name,id: .id,"categoryOptions.name": [.dataSetElements[].dataElement.categoryCombo.categories[].categoryOptions
[].name],"categoryOptions.id": [.dataSetElements[].dataElement.categoryCombo.categories[].categoryOptions[].id]}]'
Here is one solution to the problem as I understand it:
range(0;length) as $i
| .[$i]
| [$i, .name, .id] +
( range(0, .["categoryOptions.name"]|length) as $j
| [ .["categoryOptions.name"][$j], .["categoryOptions.id"][$j] ] )
| #csv
This produces everything except the header row, the production of which is left as an exercise.
Invocation
... would be along the lines of:
jq -r -f program.jq input.json
To add onto #peak's solution
The final invocation ( with CSV header) may look like this:
jq -r -f program.jq input.json > output.csv && sed -i '1i "key","name","id","categoryOptions.name","categoryOptions.id"' output.csv
The sed solution is picked from here

jq - Find a JSON object based on one of its values and get another value from it

I've started using jq just very recently and I would like to know if something like this is even possible.
Example:
{
"name": "device",
"version": "1.0.0",
"address": [
{
"address": "10.1.2.3",
"interface": "wlan1_wifi"
},
{
"address": "10.1.2.5",
"interface": "wlan2_link"
},
{
"address": "10.1.2.4",
"interface": "ether1"
}
],
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link"
}
]
}
Firstly let's transform the example to this json object:
cat json | jq '. | {"name": ."name", "version": ."version", "wireless": [."wireless"[] | {"name": ."name", "type": ."type", "ssid": ."ssid"}]}'
{
"name": "device",
"version": "1.0.0",
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link"
}
]
}
Now there's a problem. I need to assign an address to the "wireless" array. The address is stored in "address" array.
So the question: is there a way of finding the right json object in "address" based on "name" (in wireless array) and "interface" (in address array) for every json object in "wireless" array and then assigning "address" to it?
The final result should look like this:
{
"name": "device",
"version": "1.0.0",
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi",
"address": "10.1.2.3"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link",
"address": "10.1.2.5"
}
]
}
Answer:
Here's my answer based on the answer from #peak. Instead of copying the content of .wireless and then using map, I'm cherry picking the keys that I want to include only. This also allows me to position "address" how ever I want.
(INDEX(.address[]; .interface)) as $dict
| {name: .name, version: .version,
wireless: [.wireless[] | {name, address: ($dict[.name]|.address), type, ssid}]}
The following produces the output as originally requested:
(.wireless[].name) as $name
| .address[]
| select(.interface == $name)
| { wireless: {name: $name, address}}
However the above filter could potentially produce more than one result, so you might want to make modifications accordingly.
Revised revised requirements
If your jq has INDEX/2 (which was only made available AFTER jq 1.5 was released), you can simply use it to create a lookup table:
(INDEX(.address[]; .interface)) as $dict
| {name,
version,
wireless: (.wireless
| map(. + {address: ($dict[.name]|.address) }) ) }
Or (depending perhaps on the exact requirements):
(INDEX(.address[]; .interface)) as $dict
| del(.address)
| .wireless |= map(. + {address: ($dict[.name]|.address) })
If your jq does not have INDEX/2, then you could easily adapt the above (using reduce), or even more easily snarf the def of INDEX/2 from https://github.com/stedolan/jq/blob/master/src/builtin.jq

Select or exclude multiples object with an array of IDs

I have the following JSON :
[
{
"id": "1",
"foo": "bar-a",
"hello": "world-a"
},
{
"id": "2",
"foo": "bar-b",
"hello": "world-b"
},
{
"id": "10",
"foo": "bar-c",
"hello": "world-c"
},
{
"id": "42",
"foo": "bar-d",
"hello": "world-d"
}
]
And I have the following array store in a variable: ["1", "2", "56", "1337"] (note the IDs are string, and may contain any regular character).
So, thanks to this SO, I found a way to filter my original data. jq 'jq '[.[] | select(.id == ("1", "2", "56", "1337"))]' ./data.json (note the array is surrounded by parentheses and not brackets) produces :
[
{
"id": "1",
"foo": "bar-a",
"hello": "world-a"
},
{
"id": "2",
"foo": "bar-b",
"hello": "world-b"
}
]
But I would also liked to do the opposite (basically excluding IDs instead of selecting them). Using select(.id != ("1", "2", "56", "1337")) doesn't work and using jq '[. - [.[] | select(.id == ("1", "2", "56", "1337"))]]' ./data.json seems very ugly and it doesn't work with my actual data (an output of aws ec2 describe-instances).
So have you any idea to do that? Thank you!
To include them, you need to verify that the id is any of the values in the keep set.
$ jq --argjson include '["1", "2", "56", "1337"]' 'map(select(.id == $include[]))' ...
To exclude them, you need to verify that all values are not in your excluded set. But it might just be easier to take the original set and remove the items that are in the excluded set.
$ jq --argjson exclude '["1", "2", "56", "1337"]' '. - map(select(.id == $exclude[]))' ...
Here is a solution that uses inside. Assuming you run jq as
jq -M --argjson IDS '["1","2","56","1337"]' -f filter.jq data.json
This filter.jq
map( select([.id] | inside($IDS)) )
produces the ids from data.json that are in the $IDS array:
[
{
"id": "1",
"foo": "bar-a",
"hello": "world-a"
},
{
"id": "2",
"foo": "bar-b",
"hello": "world-b"
}
]
and this filter.jq
map( select([.id] | inside($IDS) | not) )
produces the ids from data.json that are not in the $IDS array:
[
{
"id": "10",
"foo": "bar-c",
"hello": "world-c"
},
{
"id": "42",
"foo": "bar-d",
"hello": "world-d"
}
]