Compare 2 JSON-files and create a new key if values match - json

I have 2 sets of JSON-files looking like below, data-A.json and data-B.json.
I need to somehow compare the key URL in data-A.json with the same key in data-B.json. Where there is a match take data from the key Position in data-A.json and write to new key PreviousPosition in data-B.json. If there is no matching URL, write a null value for this new key in data-B.json
Please see examples:
data-A.json
[
{
"Position": "1",
"TrackName": "One hit wonder",
"URL": "https://domain.local/xyz123"
},
{
"Position": "2",
"TrackName": "Random song",
"URL": "https://domain.local/123qwe"
},
{
"Position": "3",
"TrackName": "Dueling banjos",
"URL": "https://domain.local/asd456"
}
]
data-B.json
[
{
"Position": "1",
"TrackName": "Rocket",
"URL": "https://domain.local/nbs678"
},
{
"Position": "2",
"TrackName": "Dueling banjos",
"URL": "https://domain.local/asd456"
},
{
"Position": "3",
"TrackName": "One hit wonder",
"URL": "https://domain.local/xyz123"
}
]
(desired) data-B.json
[
{
"Position": "1",
"TrackName": "Rocket",
"URL": "https://domain.local/nbs678",
"PreviousPosition": null
},
{
"Position": "2",
"TrackName": "Dueling banjos",
"URL": "https://domain.local/asd456",
"PreviousPosition": "3"
},
{
"Position": "3",
"TrackName": "One hit wonder",
"URL": "https://domain.local/xyz123",
"PreviousPosition": "1"
}
]
I have done some mediocre attemps to solve this using jq with no luck. Also tried some PowerShell and Python but I just can't figure it out.
Any suggestions?

If a straightforward, two-line solution is what you're looking for, then jq is a good choice:
(INDEX($A[]; .URL) | map_values(.Position)) as $dict
| map( .PreviousPosition = $dict[ .URL ] )
This is perhaps more straightforward than it looks, as the expression in the first line is a commonly found idiom (namely INDEX(...) | map_values(...)) for creating a dictionary. In the first line, it is assumed that $A holds the JSON in data-A.json.
The second line just applies the lookup rule specified in the question.
The only tricky bit here is getting the command-line invocation right. The following will suffice:
jq --argfile A data-A.json -f program.jq data-B.json
where program.jq contains the above two-line program.

Related

How to get output with unique values but only with last instance of a duplicate key - unique_by

I am currently working with jq to parse through some json. I would like to retrieve unique values based on a certain key. I came across unique_by. It does just that of getting unique values for key name but I am still not getting my desired output. From my understanding, unique_by looks at key name value an uses the first instance and then removes the duplicates that follow in the final output. However, I would like to grab the last duplicate key name value and display that in the final output.
Below is an example of my desired output. Is it possible to do this with unique_by or what would be the best approach?
cat file.json
Original json:
[
{
"name": "app-fastly",
"tag": "20210825-95-448f024",
"image": "docker.io/repoxy/app-fastly:20210825-95-448f024"
},
{
"name": "app-lovely",
"tag": "20211004-2101-b6a256c",
"image": "ghcr.io/repox/app-lovely:20211004-2101-b6a256c"
},
{
"name": "app-lovely",
"tag": "20211007-6622-b3fooba",
"image": "ghcr.io/repoxy/app-lovely:20211007-6622-b3fooba"
},
{
"name": "app-dogwood",
"tag": "20210325-36-2a349e9",
"image": "docker.io/repoxy/app-dogwood:20210325-36-2a349e9"
}
]
Jq Command:
cat file.json | jq 'unique_by( {name} )'
Current Output:
[
{
"name": "app-dogwood",
"tag": "20210325-36-2a349e9",
"image": "docker.io/repoxy/app-dogwood:20210325-36-2a349e9"
},
{
"name": "app-fastly",
"tag": "20210825-95-448f024",
"image": "docker.io/repoxy/app-fastly:20210825-95-448f024"
},
{
"name": "app-lovely",
"tag": "20211004-2101-b6a256c",
"image": "ghcr.io/repox/app-lovely:20211004-2101-b6a256c"
}
]
Desired Output:
[
{
"name": "app-dogwood",
"tag": "20210325-36-2a349e9",
"image": "docker.io/repoxy/app-dogwood:20210325-36-2a349e9"
},
{
"name": "app-fastly",
"tag": "20210825-95-448f024",
"image": "docker.io/repoxy/app-fastly:20210825-95-448f024"
},
{
"name": "app-lovely",
"tag": "20211007-6622-b3fooba",
"image": "ghcr.io/repoxy/app-lovely:20211007-6622-b3fooba"
}
]
If you want the last unique item, simply reverse the array first
jq 'reverse | unique_by( {name} )'
And if you want to retain the original order, reverse back again afterwards
jq 'reverse | unique_by( {name} ) | reverse'

Using `jq` to add key/value to a json file using another json file as a source

Been struggling with this for a while and I'm no closer to a solution. I'm not very experienced using jq.
I'd like to take the values from one json file and add them to another file when other values in the dict match. The example files below demonstrate what I'd like more clearly than an explanation.
hosts.json:
{
"hosts": [
{
"host": "hosta.example.com",
"hostid": "101",
"proxy_hostid": "1"
},
{
"host": "hostb.example.com",
"hostid": "102",
"proxy_hostid": "1"
},
{
"host": "hostc.example.com",
"hostid": "103",
"proxy_hostid": "2"
}
]
}
proxies.json:
{
"proxies": [
{
"host": "proxy1.example.com",
"proxyid": "1"
},
{
"host": "proxy2.example.com",
"proxyid": "2"
}
]
}
I also have the above file available with proxyid as the key, if this makes it easier:
{
"proxies": {
"1": {
"host": "proxy1.example.com",
"proxyid": "1"
},
"2": {
"host": "proxy2.example.com",
"proxyid": "2"
}
}
}
Using these json files above (from the Zabbix API), I'd like to add the value of .proxies[].host (from proxies.json) as .hosts[].proxy_host (to hosts.json).
This would only be when .hosts[].proxy_hostid equals .proxies[].proxyid
Desired output:
{
"hosts": [
{
"host": "hosta.example.com",
"hostid": "101",
"proxy_hostid": "1",
"proxy_host": "proxy1.example.com"
},
{
"host": "hostb.example.com",
"hostid": "102",
"proxy_hostid": "1",
"proxy_host": "proxy1.example.com"
},
{
"host": "hostc.example.com",
"hostid": "103",
"proxy_hostid": "2",
"proxy_host": "proxy2.example.com"
}
]
}
I've tried many different ways of doing this, and think I need to use jq -s or jq --slurpfile, but I've reached a lot of dead-ends and can't find a solution.
jq 'input as $p | map(.[].proxy_host = $p.proxies[].proxyid)' hosts.json proxies.json
I think I would need something like this as well, but not sure how to use it.
if .hosts[].proxy_hostid == .proxies[].proxyid then .hosts[].proxy_host = .proxies[].host else empty end'
I've found these questions but they haven't helped :(
How do I use a value as a key reference in jq? <- I think this one is the closest
Lookup values from one JSON file and replace in another
Using jq find key/value pair based on another key/value pair
This indeed is easier with the alternative version of your proxies.json. All you need is to store proxies in a variable as reference, and retrieve proxy hosts from it while updating hosts.
jq 'input as { $proxies } | .hosts[] |= . + { proxy_host: $proxies[.proxy_hostid].host }' hosts.json proxies.json
Online demo

Elegant way to select nested objects with the associated key based on a specific criteria

Given an example document in JSON similar to this:
{
"id": "post-1",
"type": "blog-post",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-2",
"name": "Tag 2"
}
],
"heading": "Post 1",
"body": "this is my first blog post",
"links": [
{
"id": "post-2",
"heading": "Post 2",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-3",
"name": "Tag 3"
}
]
}
],
"metadata": {
"user": {
"social": [
{
"id": "twitter",
"handle": "#user"
},
{
"id": "facebook",
"handle": "123456"
},
{
"id": "youtube",
"handle": "ABC123xyz"
}
]
},
"categories": [
{
"name": "Category 1"
},
{
"name": "Category 2"
}
]
}
}
I would like to select any object (regardless of depth) that has an attribute "id", as well as the attribute name of the parent object. The above example should be taken as just that, an example. The actual data, that I'm not at liberty to share, can have any depth and just about any structure. Attributes can be introduced and removed at any time. Using the Blog Post style is just because it is quite popular for examples and I have very limited imagination.
The attribute signifies a particular type within the domain, that might also be (but is not necessarily) coded into the value of the attribute.
If an object does not have the "id" attribute it is not interesting and should not be selected.
A very important special case is when the value of an attribute is an array of objects, in that case I need to keep the attribute name and associate it with each element in the array.
An example of the desired output would be:
[
{
"type": "tags",
"node": {
"id": "tag-1",
"name": "Tag 1"
}
},
{
"type": "tags",
"node": {
"id": "tag-2",
"name": "Tag 2"
}
},
{
"type": "links",
"node": {
"id": "post-2",
"heading": "Post 2",
"tags": [
{
"id": "tag-1",
"name": "Tag 1"
},
{
"id": "tag-3",
"name": "Tag 3"
}
]
}
},
{
"type": "tags",
"node": {
"id": "tag-1",
"name": "Tag 1"
}
},
{
"type": "tags",
"node": {
"id": "tag-3",
"name": "Tag 3"
}
},
{
"type": "social",
"node": {
"id": "twitter",
"handle": "#user"
}
},
{
"type": "social",
"node": {
"id": "facebook",
"handle": "123456"
}
},
{
"type": "social",
"node": {
"id": "youtube",
"handle": "ABC123xyz"
}
}
]
It isn't strictly necessary that the output is identical, order for instance is irrelevant for my use-case it could be grouped as well. Since the top level object has an attribute "id" it could be included with a special name, but I'd prefer if it was not included at all.
I've tried to use walk, reduce and recurse to no avail, I'm afraid my jq skills are too limited. But I imagine that a good solution would make use of at least one of them.
I would like an expression something like
to_entries[] | .value | .. | select(has("id")?)
which would select the correct objects, but with .. I'm no longer able to keep the associated attribute name.
The best I've come up with is
. as $document
| [paths | if length > 1 and .[-1] == "id" then .[0:-1] else empty end]
| map(. as $path
| $document
| { "type": [$path[] | if type == "string" then . else empty end][-1],
"node": getpath($path) })
Which works, but feels quite complicated and involves first extracting all paths, ignoring any path that does not have "id" as the last element, then remove the "id" segment to get the path to the actual object and storing the (by now last) segment that is a string, which corresponds to the parent objects attribute containing the interesting object. Finally the actual object is selected through getpath.
Is there a more elegant, or at the least shorter way to express this?
I should note that I'd like to use jq for the convenience of having bindings to other languages as well as being able to run the program on the command line.
For the scope of this question, I'm not really interested in alternatives to jq as I can imagine how to solve this differently using other tooling, but I would really like to "just" use jq.
Since the actual requirements aren’t clear to me, I’ll assume that the given implementation defines the functional requirements, and propose a shorter and hopefully sleeker version:
. as $document
| paths
| select(length > 1 and .[-1] == "id")
| .[0:-1] as $path
| { "type": last($path[] | strings),
"node": $document | getpath($path) }
This produces a stream, so if you want an array, you could simply enclose the above in square brackets.
last(stream) emits null if the stream is empty, which accords with the behavior of .[-1].
This works:
[
foreach (paths | select(.[-1] == "id" and length > 1)[:-1]) as $path ({i:.};
.o = {
type: last($path[] | strings),
node: (.i | getpath($path))
};
.o
)
]
The trick is to know that any numbers in the path indicates the value is part of an array. You'll have to adjust the path to get the parent name. But using last/1 with a string filter makes it simpler.

jq only show when object doesnt match

I'm trying to set up an alert for when the following JSON object state says anything but started. I'm beginning to play around with conditional jq but I'm unsure how to implement regex into this.
{
"page": 0,
"page_size": 100,
"total_pages": 10,
"total_rows": 929,
"headers": [
"*"
],
"rows": [
{
"id": "168",
"state": "STARTED"
},
{
"id": "169",
"state": "FAILED"
},
{
"id": "170",
"state": "STARTED"
}
]
}
I only want to display the id and state of the failed object, this is what I tried
jq '.rows[] | .id, select(.state | contains("!STARTED"))' test.json
I'd like my output to be something like
{
"id": "169",
"state": "FAILED"
}
If you simply want to print out the objects for which .state is NOT "STARTED", just use negation:
.rows[] | select(.state != "STARTED")
If the "started" state is associated with multiple values, please give further details. There might not be any need to use regular expressions. If you really do need to use regular expressions, then you will probably want to use test.

How to lift the value of a JSON object that is nested two levels deep?

Given the following test.json that I received as a response from the Pocket API,
{
"complete": 1,
"error": null,
"list": {
"1000055792": {
"excerpt": "Some Text",
"favorite": "0",
"given_title": "Some Title",
"given_url": "Some URL",
"has_image": "0",
"has_video": "0",
"is_article": "1",
"is_index": "0",
"item_id": "1000055792",
"resolved_id": "1000055792",
"resolved_title": "Title",
"resolved_url": "Some URL",
"sort_id": 700,
"status": "1",
"time_added": "1438646514",
"time_favorited": "0",
"time_read": "1439025088",
"time_updated": "1439025090",
"word_count": "10549"
},
"1000102810": {
"excerpt": "Some Text",
"favorite": "0",
"given_title": "Title",
"given_url": "Some URL",
"has_image": "1",
"has_video": "0",
"is_article": "1",
"is_index": "0",
"item_id": "1000102810",
"resolved_id": "1000102810",
"resolved_title": "Title",
"resolved_url": "Resolved URL",
"sort_id": 650,
"status": "1",
"time_added": "1440303789",
"time_favorited": "0",
"time_read": "1440320729",
"time_updated": "1440320731",
"word_count": "3219"
}
How can I access the values of keys like resolved_title and word_count. They are nested inside an object which is a number, the same as the id, which in itself is nested inside list. I've searched and found a way to access nested objects using jq. But how can I access the values that are nested inside another object within the main list object?
Also, the IDs are different and not sequential, so I don't think recursion is possible, but I could be wrong. What I'm intending to do with this data is to only extract the resolved_title and word_count values for each item and save them to a two-column spreadsheet.
Thanks in advance!
The following can easily be extended and/or adapted:
> jq ".list[] | {resolved_title, word_count}" input.json
Output:
{
"resolved_title": "Title",
"word_count": "10549"
}
{
"resolved_title": "Title",
"word_count": "3219"
}
You can use the .[] operator to iterate over all elements in an array (or in this case all the keys). The following will give you output with each field on a separate line:
cat <file_with_json> | jq '.list | .[] | .resolved_title, .word_count'
The first filter operates on only the list element. The second filter says for every element and finally the output is just the resolved_title and .word_count fields. This produces the following:
"Title"
"3219"
"Title"
"10549"
Try map():
$ cat myfile.json | jq '.list | map({resolved_title: .resolved_title, word_count: .word_count})'
[
{
"resolved_title": "Title",
"word_count": "10549"
},
{
"resolved_title": "Title",
"word_count": "3219"
}
]