Extract data from JSON and insert it as new by using jq - json

I have some database in JSON file, I had already sort and remove some data from object by using ./jq
But I'm stuck at adding new variables in object.
Here is a part of my JSON file:
{
"Name": "Forrest.Gump.1994.MULTi.1080p.AMZN.WEB-DL.DDP5.1.H264-Ao",
"ID": "SMwIkBoC2blXeWnBa9Hjge9YPs90"
},
{
"Name": "Point.Blank.2019.MULTi.1080p.NF.WEB-DL.DDP5.1.x264-Ao",
"ID": "OZI4mOuBXuJ7b89FLgXJoozyhHe9"
},
{
"Name": "The.Incredible.Hulk.2008.MULTi.2160p.UHD.BluRay.REMUX.HDR.HEVC.DTS-HD.MA.7.1",
"ID": "jZzR4_B_vjm593cYKR7j97XAMv6d"
},
Is it possible by using jq and for example RegExp to extract some data and insert it as new variable in object, I wish to achive something like this:
{
"Name": "Forrest.Gump.1994.MULTi.1080p.AMZN.WEB-DL.DDP5.1.H264-Ao",
"ID": "SMwIkBoC2blXeWnBa9Hjge9YPs90",
"Year": "1994",
"Res": "1080p"
},
{
"Name": "Point.Blank.2019.MULTi.1080p.NF.WEB-DL.DDP5.1.x264-Ao",
"ID": "OZI4mOuBXuJ7b89FLgXJoozyhHe9",
"Year": "2019",
"Res": "1080p"
},
{
"Name": "The.Incredible.Hulk.2008.MULTi.2160p.UHD.BluRay.REMUX.HDR.HEVC.DTS-HD.MA.7.1",
"ID": "jZzR4_B_vjm593cYKR7j97XAMv6d",
"Year": "2008",
"Res": "2160p"
},
Thanks in advance

Here's one solution that assumes for simplicity that the fragment you've shown comes from an array:
map( . as $in
| .Name | capture(".*[.](?<year>[12][0-9]{3})[.](?<rest>.*)")
| .year as $year
| (.rest | split(".") | .[1]) as $res
| $in + {Year: $year, Res: $res} )
Hopefully, once you're familiar with some jq basics, such as map, capture, and the EXP as $var syntax, the above will be more-or-less self-explanatory.
As a one-liner
Here's the same thing but as a one-liner:
map(. + (.Name | capture(".*[.](?<Year>[12][0-9]{3})[.](?<Res>.*)") | {Year, Res: (.Res | split(".")[1])}))

Related

using jq : how can i use the same search in other field without duplicate code?

I have the following json file for exemple:
{
"FOO": {
"name": "Donald",
"location": "Stockholm"
},
"BAR": {
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"name": "Jack",
"location": "Whereever"
}
}
and i have this jq command :
cat json | jq .[] | {newname : select(.location=="Stockholm") | .name , contains_w : select(.location=="Stockholm") | .name | startswith("W")}
so i get the result :
{
"newname": "Donald",
"contains_w": false
}
{
"newname": "Walt",
"contains_w": true
}
my question is : is there any way to DRY my command ?
i mean how can i get the same result without duplicate the part :
select(.location=="Stockholm") | .name
how can i reuse the result of newname feild ?
i have a really big file to work with so i don't want to waste time and resources.
You are filtering multiple times during object construction. You could filter first and then do the construction on the filtered list eg.
map(select(.location=="Stockholm"))
| map({newname: .name, contains_w: (.name | startswith("W"))})
https://jqplay.org/s/aXjlgOEDnb

how to output all the keys and values from json using jq?

I am trying to out all the data from my json file that matches the value "data10=true" it does that but only grabs the names, how can i make it so it will output everything in my json file with anything that matches the "data10=true"?
this is what ive got data=$(jq -c 'to_entries[] | select (.value.data10 == "true")| [.key, .value.name]' data.json )
This is in my YAML template btw, running it as a pipeline in devops.
The detailed requirements are unclear, but hopefully you'll be able to use the following jq program as a guide:
..
| objects
| select( .data10 == "true" )
| to_entries[]
| select(.key != "data10")
| [.key, .value]
This will recursively (thanks to the initial ..) examine all the JSON objects in the input.
p.s.
If you want to make the selection based on whether .data10 is "true" or true, you could change the criterion to .data10 | . == true or . == "true".
jq 'to_entries | map(select(.value.data10=="true")) | from_entries' data.json
input data.json,
with false value:
{
"FOO": {
"data10": "false",
"name": "Donald",
"location": "Stockholm"
},
"BAR": {
"data10": "true",
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"data10": "true",
"name": "Jack",
"location": "Whereever"
}
}
output:
{
"BAR": {
"data10": "true",
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"data10": "true",
"name": "Jack",
"location": "Whereever"
}
}
based on: https://stackoverflow.com/a/37843822/983325

jq - Find a JSON object based on one of its values and get another value from it

I've started using jq just very recently and I would like to know if something like this is even possible.
Example:
{
"name": "device",
"version": "1.0.0",
"address": [
{
"address": "10.1.2.3",
"interface": "wlan1_wifi"
},
{
"address": "10.1.2.5",
"interface": "wlan2_link"
},
{
"address": "10.1.2.4",
"interface": "ether1"
}
],
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link"
}
]
}
Firstly let's transform the example to this json object:
cat json | jq '. | {"name": ."name", "version": ."version", "wireless": [."wireless"[] | {"name": ."name", "type": ."type", "ssid": ."ssid"}]}'
{
"name": "device",
"version": "1.0.0",
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link"
}
]
}
Now there's a problem. I need to assign an address to the "wireless" array. The address is stored in "address" array.
So the question: is there a way of finding the right json object in "address" based on "name" (in wireless array) and "interface" (in address array) for every json object in "wireless" array and then assigning "address" to it?
The final result should look like this:
{
"name": "device",
"version": "1.0.0",
"wireless": [
{
"name": "wlan1_wifi",
"type": "5Ghz",
"ssid": "wifi",
"address": "10.1.2.3"
},
{
"name": "wlan2_link",
"type": "2Ghz",
"ssid": "link",
"address": "10.1.2.5"
}
]
}
Answer:
Here's my answer based on the answer from #peak. Instead of copying the content of .wireless and then using map, I'm cherry picking the keys that I want to include only. This also allows me to position "address" how ever I want.
(INDEX(.address[]; .interface)) as $dict
| {name: .name, version: .version,
wireless: [.wireless[] | {name, address: ($dict[.name]|.address), type, ssid}]}
The following produces the output as originally requested:
(.wireless[].name) as $name
| .address[]
| select(.interface == $name)
| { wireless: {name: $name, address}}
However the above filter could potentially produce more than one result, so you might want to make modifications accordingly.
Revised revised requirements
If your jq has INDEX/2 (which was only made available AFTER jq 1.5 was released), you can simply use it to create a lookup table:
(INDEX(.address[]; .interface)) as $dict
| {name,
version,
wireless: (.wireless
| map(. + {address: ($dict[.name]|.address) }) ) }
Or (depending perhaps on the exact requirements):
(INDEX(.address[]; .interface)) as $dict
| del(.address)
| .wireless |= map(. + {address: ($dict[.name]|.address) })
If your jq does not have INDEX/2, then you could easily adapt the above (using reduce), or even more easily snarf the def of INDEX/2 from https://github.com/stedolan/jq/blob/master/src/builtin.jq

How can I filter by a numeric field using jq?

I am writing a script to query the Bitbucket API and delete SNAPSHOT artifacts that have never been downloaded. This script is failing because it gets ALL snapshot artifacts, the select for the number of downloads does not appear to be working.
What is wrong with my select statement to filter objects by the number of downloads?
Of course the more direct solution here would be if I could just query the Bitbucket API with a filter. To the best of my knowledge the API does not support filtering by downloads.
My script is:
#!/usr/bin/env bash
curl -X GET --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=100" > downloads.json
# get all values | reduce the set to just be name and downloads | select entries where downloads is zero | select entries where name contains SNAPSHOT | just get the name
#TODO i screwed up the selection somewhere its returning files that contain SNAPSHOT regardless of number of downloads
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#unique sort, not sure why jq gives me multiple values
sort -u snapshots_without_any_downloads.js | tr -d '"' > unique_snapshots_without_downloads.js
cat unique_snapshots_without_downloads.js | xargs -t -I % curl -Ss -X DELETE --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/%" > deleted_files.txt
A deidentified sample of the raw input from the API is:
{
"pagelen": 10,
"size": 40,
"values": [
{
"name": "myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 2,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 0,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.0_mc_3.5.1.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.1.zip"
}
},
"downloads": 5,
"created_on": "2018-03-15T17:49:14.885544+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430934
}
],
"page": 1,
"next": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=10&page=2"
}
The output I want from this snippet is myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip - that artifact is a SNAPSHOT and has zero downloads.
I have used this intermediate step to do some debugging:
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads>0) | select(.name | contains("SNAPSHOT")) | unique' downloads.json > snapshots_with_downloads.js
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#this returns the same values for each list!
diff unique_snapshots_with_downloads.js unique_snapshots_without_downloads.js
This adjustment gives a cleaner and unique structure, it suggests that theres some sort of splitting or streaming aspect of jq that I do not fully understand:
#this returns a "unique" array like I expect, adding select to this still does not produce the desired outcome
jq '.values | [{name: .[].name, downloads: .[].downloads}] | unique' downloads.json
The data after this step looks like this. It just removed the cruft I didn't need from the raw API response:
[
{
"name": "myproject_1.0_2400a51_mc_3.4.0.zip",
"downloads": 0
},
{
"name": "myproject_1.0_2400a51_mc_3.4.1.zip",
"downloads": 2
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.0.zip",
"downloads": 0
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.1.zip",
"downloads": 2
}
]
As I understand it:
You want globally unique outputs
You want only items with downloads==0
You want only items whose name contains "SNAPSHOT"
The following will accomplish that:
jq -r '
[.values[] | {(.name): .downloads}]
| add
| to_entries[]
| select(.value == 0)
| .key | select(contains("SNAPSHOT"))'
Rather than making unique an explicit step, this version generates a map from names to download counters (adding the values together -- which means that in case of conflicts, the last one wins), and thereby both ensures that the outputs are unique.
Given your test JSON, output is:
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
Applied to the overall problem context, this strategy can be used to simplify the overall process:
jq -r '[.values[] | {(.links.self.href): .downloads}] | add | to_entries[] | select(.value == 0) | .key | select(contains("SNAPSHOT"))'
It simplifies the overall process by acting on the URL to the file rather than the name only. This simplifies the subsequent DELETE call. The sort and tr calls can also be removed.
Here's a solution which sums up the .download values per .name before making the selection based on the total number of downloads:
reduce (.values[] | select(.name | contains("SNAPSHOT"))) as $v
({}; .[$v.name] += $v.downloads)
| with_entries(select(.value == 0))
| keys_unsorted[]
Example:
$ jq -r -f program.jq input.json
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
p.s.
What is wrong with my select statement ...?
The problem that jumps out is the bit of the pipeline just before the "select" filter:
.values | {name: .[].name, downloads: .[].downloads}
The use of .[] in this manner results in the Cartesian product being formed -- that is, the above expression will emit n*n JSON sets, where n is the length of .values. You evidently intended to write:
.values[] | {name: .name, downloads: .downloads}
which can be abbreviated to:
.values[] | {name, downloads}

create an object from an existing json file using 'jq'

I have a messages.json file
[
{
"id": "title",
"description": "This is the Title",
"defaultMessage": "title",
"filepath": "src/title.js"
},
{
"id": "title1",
"description": "This is the Title1",
"defaultMessage": "title1",
"filepath": "src/title1.js"
},
{
"id": "title2",
"description": "This is the Title2",
"defaultMessage": "title2",
"filepath": "src/title2.js"
},
{
"id": "title2",
"description": "This is the Title2",
"defaultMessage": "title2",
"filepath": "src/title2.js"
},
]
I want to create an object
{
"title": "Dummy1",
"title1": "Dummy2",
"title2": "Dummy3",
"title3": "Dummy4"
}
from the top one.
So far I have
jq '.[] | .id' src/messages.json;
And it does give me the IDs
How do I add some random text and make the new object as above?
Can we also create a new JSON file and write the newly created object onto it using jq?
Your output included "title3" so I'll assume that you intended that the second occurrence of "title2" in the input was supposed to refer to "title3".
With this assumption, the following jq program seems to do what you want:
map( .id )
| . as $in
| reduce range(0;length) as $i ({};
. + {($in[$i]): "dummy\(1+$i)"})
In words, extract the values of .id, and then turn each into an object of the form: {(.id) : "dummy\(1+$i)"}
This uses string interpolation, and produces:
{
"title": "dummy1",
"title1": "dummy2",
"title2": "dummy3",
"title3": "dummy4"
}
reduce-free solution
map(.id )
| [., [range(0;length)]]
| transpose
| map( {(.[0]): "dummy\(.[1]+1)"})
| add
Output
Can we also create a new json file and write the newly created object onto it using jq?
Yes, just use output redirection:
jq -f program.jq messages.json > output.json
Addendum
I want a parent object "de" to the already created json file objects
You could just pipe either of the above solutions to: {de: .}