JQ, how to count depending on conditions? - json

Using jq, I need to get the count within an array depending on two criterias: it MUST have status === 'skipped' && ref.includes(version)
[
{
"id": 15484,
"sha": "52606c8da57984d1243f436e5d12e275db29a6e0",
"ref": "v1.4.15",
"status": "canceled"
},
{
"id": 15483,
"sha": "52606c8da57984d1243f436e5d12e275db29a6e0",
"ref": "v1.4.15",
"status": "canceled"
},
{
"id": 15482,
"sha": "1b4ccc1dc17e9b8ddb24550c5566d2be6b03465e",
"ref": "dev",
"status": "success"
},
{
"id": 15481,
"sha": "5b6ec939739c5a1513634f3b58bf96522917571d",
"ref": "dev",
"status": "failed"
},
{
"id": 15480,
"sha": "ec18d46f491a4645c68388df91fc41455b421e71",
"ref": "dev",
"status": "failed"
},
{
"id": 15479,
"sha": "dd83a6d6e58cc5114aed8016341ab3c5b3ebb702",
"ref": "dev",
"status": "failed"
},
{
"id": 15478,
"sha": "18ccaf4bc37bf65470b2c6ddaa69e5b4018354a7",
"ref": "dev",
"status": "success"
},
{
"id": 15477,
"sha": "f90900d733bce2be3d9ba9db25f8b51296bc6f3f",
"ref": "dev",
"status": "failed"
},
{
"id": 15476,
"sha": "3cf0431a161e6c9ca90e8248af7b4ec39c54bfb1",
"ref": "dev",
"status": "failed"
},
{
"id": 15285,
"sha": "d24b46edc75d8f7308dbef37d7b27625ef70c845",
"ref": "dev",
"status": "success"
},
{
"id": 15265,
"sha": "52606c8da57984d1243f436e5d12e275db29a6e0",
"ref": "v1.4.15",
"status": "success"
},
{
"id": 15264,
"sha": "9a15f8d4c950047f88c642abda506110b9b0bbd7",
"ref": "v1.4.15-static",
"status": "skipped"
},
{
"id": 15263,
"sha": "9a15f8d4c950047f88c642abda506110b9b0bbd7",
"ref": "v1.4.15-static",
"status": "skipped"
},
{
"id": 15262,
"sha": "76451d2401001c4c51b9800d3cdf62e4cdcc86ba",
"ref": "v1.4.15-no-js",
"status": "skipped"
},
{
"id": 15261,
"sha": "76451d2401001c4c51b9800d3cdf62e4cdcc86ba",
"ref": "v1.4.15-no-js",
"status": "skipped"
},
{
"id": 15260,
"sha": "515cd1b00062e9cbce05420036f5ecc7a898a4bd",
"ref": "v1.4.15-cli",
"status": "skipped"
},
{
"id": 15259,
"sha": "515cd1b00062e9cbce05420036f5ecc7a898a4bd",
"ref": "v1.4.15-cli",
"status": "skipped"
},
{
"id": 15258,
"sha": "b67acd3082da795f022fafc304d267d3afd6b736",
"ref": "v1.4.15-node",
"status": "skipped"
},
{
"id": 15257,
"sha": "b67acd3082da795f022fafc304d267d3afd6b736",
"ref": "v1.4.15-node",
"status": "skipped"
},
{
"id": 15256,
"sha": "4da4a788a85d82527ea568fed4f03da193842a80",
"ref": "v1.4.15-bs-redux-saga-router-dom-intl",
"status": "skipped"
}
]
We also like to use environment variable for the query :
status=skipped
ref=v1.4.15
This work but without environment variable options:
cat test.json | jq '[.[] | select(.status=="skipped") | select(.ref | startswith("v1.4.15"))] | length'
How is this possible?
Answer:
status=skipped; ref=v1.4.15; cat test.json | jq --arg REF "$ref" --arg STATUS "$status" -r '[.[] | select(.status==$STATUS) | select(.ref | startswith($REF))] | length'

Use the length() function at the end of the filter, after putting the objects list into an array
jq '[.[] | select(.status == "skipped") | select(.ref | test("1\\.4\\.15"))] | length'
but for just returning the objects leave out the logic to get the length
jq '[.[] | select(.status == "skipped") | select(.ref | test("1\\.4\\.15"))]'
The test() is a more powerful way to match your regex with JSON strings. The startswith() or endswith() can't match strings if they are in the middle.
Using variables,
ref="1\.4\.15"
jq --arg status "$status" --arg ref "$ref" \
'[.[] | select(.status == $status) | select(.ref | test($ref))]|length' json

By using map(select(...) or equivalent, you could use length, but it is generally more efficient to use a generic counting function, such as:
def sigma(s): reduce s as $s (null; .+$s);
sigma(.[] | select(.status=="skipped" and (.ref | startswith("v1.4.15") )) | 1)
Using shell and environment variables
Using shell and environment variables is covered in the jq manual, but in brief, one way to pass in string values is using the command-line option --arg, e.g. along the lines of:
jq --arg status "$status" --arg ref "$ref" -f program.jq test.json

I know jq is popular around here, but may I suggest xidel? See http://videlibri.sourceforge.net/xidel.html.
Just like jq it's a JSON interpreter, but besides JSONiq you can also use XPath/Xquery functions to do all sorts of cool stuff.
This would list all objects with the 2 criteria:
xidel -s test.json -e '$json()[status="skipped" and starts-with(ref,"v1.4.15")]'
To count them, simply enclose the query with the count() function:
xidel -s test.json -e 'count($json()[status="skipped" and starts-with(ref,"v1.4.15")])'
This returns 9.
With variables:
status=skipped
ref=v1.4.15
xidel -s test.json -e 'count($json()[status="'$status'" and starts-with(ref,"'$ref'")])'

For the sake of completeness, this would be an equivalent JSONiq query:
let $a := [
(: copy-paste the entire array here in plain JSON syntax --
omitted for the sake of brevity :)
]
return count(
for $obj in $a[]
where $obj.status eq "skipped"
and
matches($obj.ref, "ˆv")
return $obj
)

Related

Can we use dynamic values in static json file

Am having a json file for application configuration like below.
[
{
"name": "environment",
"value": "prod"
},
{
"name": "deployment_date",
"value": "2022-12-21"
}
]
The variable deployment_date, I want it as dynamic to current UTC date. Can we use any programing language to achieve this? something like getUTCDate().toString() instead "2022-12-21"?
Using jq:
jq '(.[] | select(.name == "deployment_date")).value |= (now | todate)' file.json
Output
[
{
"name": "environment",
"value": "prod"
},
{
"name": "deployment_date",
"value": "2022-12-21T12:46:11Z"
}
]
jq '(.[] | select(.name == "deployment_date")).value |= (now | strflocaltime("%Y-%m-%d"))' file.json
Output
[
{
"name": "environment",
"value": "prod"
},
{
"name": "deployment_date",
"value": "2022-12-21"
}
]

How to use JQ to merge corresponding elements of two arrays

I have two arrays with the same amount of elements but with different keys/values. I want to integrate the key/value of the second array into the first for each index/position.
1.json
[
{
"name": "xxx",
"url": "yyy",
"thumbnail": "nnn"
},
{
"name": "bla bla",
"url": "some-url",
"thumbnail": "another-pic"
}
]
2.json
[
{
"spotifyUrl": "first-spotify-url"
},
{
"spotifyUrl": "second-spotify-url"
}
]
The result I would like to achieve:
[
{
"name": "xxx",
"url": "yyy",
"thumbnail": "nnn",
"spotifyUrl": "first-spotify-url"
},
{
"name": "bla bla",
"url": "some-url",
"thumbnail": "another-pic",
"spotifyUrl": "second-spotify-url"
}
]
I already tried different things but couldn't find the result I wanted. For example this one here:
jq -n '
(input | map_values([.])) as $one
| input as $two
| reduce ($two|keys_unsorted[]) as $k2 ( $one;
.[$k2] += [$two[$k2]] )
' 1.json 2.json
is almost what I want, except that the spotify-url is nested into its own object and looks like this:
[
{
"name": "xxx",
"url": "yyy",
"thumbnail": "nnn"
},
{
"spotifyUrl": "first-spotify-url"
}
]
I appreciate any help and bet it's a lot simpler than I can think of. Thanks in advance.
It's easier than that.
jq -s 'transpose | map(add)' 1.json 2.json
Online demo

How to select max value by condition and then compare it with others?

I have a backup set that is described by json. Sample is below.
I want to count how much increment backups were added since the last full backup.
I try to select max timestamp of the record with type "full" so after that i will count how much records with type "incr" has the bigger timestamp.
{
"archive": [
{
"database": {
"id": 1
},
"id": "11-1",
"max": "0000000A000018B90000006A",
"min": "0000000A0000167D000000C7"
}
],
"backup": [
{
"archive": {
"start": "0000000A0000181600000030",
"stop": "0000000A0000181C00000083"
},
"backrest": {
"format": 5,
"version": "2.28"
},
"database": {
"id": 1
},
"info": {
"delta": 417875448942,
"repository": {
"delta": 67466720725,
"size": 67466720725
},
"size": 417875448942
},
"label": "20201213-200009F",
"prior": null,
"reference": null,
"timestamp": {
"start": 1607878809,
"stop": 1607896232
},
"type": "full"
},
{
"archive": {
"start": "0000000A0000182900000065",
"stop": "0000000A0000182F00000069"
},
"backrest": {
"format": 5,
"version": "2.28"
},
"database": {
"id": 1
},
"info": {
"delta": 122520170241,
"repository": {
"delta": 19316550760,
"size": 67786280115
},
"size": 416998156028
},
"label": "20201213-200009F_20201214-200009I",
"prior": "20201213-200009F",
"reference": [
"20201213-200009F"
],
"timestamp": {
"start": 1607965209,
"stop": 1607974161
},
"type": "incr"
},
{
"archive": {
"start": "0000000A0000185B000000DD",
"stop": "0000000A0000185B000000F4"
},
"backrest": {
"format": 5,
"version": "2.28"
},
"database": {
"id": 1
},
"info": {
"delta": 126982395984,
"repository": {
"delta": 19541379733,
"size": 67993072945
},
"size": 421395153101
},
"label": "20201213-200009F_20201217-200105I",
"prior": "20201213-200009F_20201214-200009I",
"reference": [
"20201213-200009F",
"20201213-200009F_20201214-200009I"
],
"timestamp": {
"start": 1608224465,
"stop": 1608233408
},
"type": "incr"
}
]
}
I tried to complete first part by this command but it says that "number (1607896232) and number (1607896232) cannot be iterated over"
.[0] |.backup[] | select(.type=="full").timestamp.stop|max
I tried sort_by but has no luck. So what am I doing wrong here?
With the aid of a generic helper-function for counting, here's a complete solution, assuming you want to count based on .timestamp.start:
def count(s): reduce s as $x (0; .+1);
.backup
| (map( select( .type == "full" ).timestamp.stop) | max) as $max
| count(.[] | select( .type == "incr" and .timestamp.start > $max))
Using max/1
For large arrays, it would probably be more efficient to use a streaming version of max:
def count(s): reduce s as $x (0; .+1);
# Note: max(empty) #=> null
def max(s):
reduce s as $s (null; if $s > .m then $s else . end);
.backup
| max(.[] | select( .type == "full" ).timestamp.stop) as $max
| count(.[] | select( .type == "incr" and .timestamp.start > $max))
max expects an array.
[ .backup[] | select( .type == "full" ).timestamp.stop ] | max
Test
or
.backup | map( select( .type == "full" ).timestamp.stop ) | max
Test
So, after I solved problem with getting concrete value (thanks #ikegami), I solved my entire question by this way
jq '(.[0] |[.backup[] | select(.type=="full").timestamp.stop]|max) as $i| [.[0] |.backup[] | select(.type=="incr" and .timestamp.stop>$i)]|length
Not sure if it made optimal, but it works anyway.
Here's also an alternative (non-jq) solution how to achieve the same JSON query with jtc tool:
bash $ <input.json jtc -jw'[timestamp]:<>G:[-1][type]' / -w'<full><>k'
2
PS. I'm a developer of jtc unix JSON processor
PPS. the above disclaimer is required by SO.

Building json path from JQ using some keyword

I have a deep json. Sometimes, I need to look for the json path for a key containing certain word.
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"creationTimestamp": "2019-03-28T21:09:42Z",
"labels": {
"bu": "finance",
"env": "prod"
},
"name": "auth",
"namespace": "default",
"resourceVersion": "2786",
"selfLink": "/api/v1/namespaces/default/pods/auth",
"uid": "ce73565a-519d-11e9-bcb7-0242ac110009"
},
"spec": {
"containers": [
{
"command": [
"sleep",
"4800"
],
"image": "busybox",
"imagePullPolicy": "Always",
"name": "busybox",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"name": "default-token-dbpcm",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"nodeName": "node01",
"priority": 0,
"restartPolicy": "Always",
"schedulerName": "default-scheduler",
"securityContext": {},
"serviceAccount": "default",
"serviceAccountName": "default",
"terminationGracePeriodSeconds": 30,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 300
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 300
}
],
"volumes": [
{
"name": "default-token-dbpcm",
"secret": {
"defaultMode": 420,
"secretName": "default-token-dbpcm"
}
}
]
},
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2019-03-28T21:09:42Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2019-03-28T21:09:50Z",
"status": "True",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": null,
"status": "True",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2019-03-28T21:09:42Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"containerID": "docker://b5be8275555ad70939401d658bb4e504b52215b70618ad43c2d0d02c35e1ae27",
"image": "busybox:latest",
"imageID": "docker-pullable://busybox#sha256:061ca9704a714ee3e8b80523ec720c64f6209ad3f97c0ff7cb9ec7d19f15149f",
"lastState": {},
"name": "busybox",
"ready": true,
"restartCount": 0,
"state": {
"running": {
"startedAt": "2019-03-28T21:09:49Z"
}
}
}
],
"hostIP": "172.17.0.37",
"phase": "Running",
"podIP": "10.32.0.4",
"qosClass": "BestEffort",
"startTime": "2019-03-28T21:09:42Z"
}
}
Currently If i need the podIP, then I do that this way to find the object which has the search keyword and then I build the path
curl myson | jq "[paths]" | grep "IP" --context=10
Is there any nice shortcut to simplify this? What I really need is - all the paths which could have the matching key.
spec.podIP
spec.hostIP
select paths containing keyword in their last element, and use join(".") to generate your desired output.
paths
| select(.[-1] | type == "string" and contains("keyword"))
| join(".")
.[-1] returns the last element of an array,
type == "string" is required because an array index is a number and numbers and strings can't be checked for their containment.
You may want to specify -r option.
As #JeffMercado implicitly suggested you can set the query from command line without touching the script:
jq -r 'paths
| select(.[-1] | type == "string" and contains($q))
| join(".")' file.json --arg q 'keyword'
You can stream the input in, which provides paths and values. You could then inspect the paths and optionally output the values.
$ jq --stream --arg pattern 'IP' '
select(length == 2 and any(.[0][] | strings; test($pattern)))
| "\(.[0] | join(".")): \(.[1])"
' input.json
"status.hostIP: 172.17.0.37"
"status.podIP: 10.32.0.4"
shameless plug
https://github.com/TomConlin/json_to_paths
because sometime you do not even know the component you want to filter for before you see what is there.
json2jqpath.jq file.json
.
.apiVersion
.kind
.metadata
.metadata|.creationTimestamp
.metadata|.labels
.metadata|.labels|.bu
.metadata|.labels|.env
.metadata|.name
.metadata|.namespace
.metadata|.resourceVersion
.metadata|.selfLink
.metadata|.uid
.spec
.spec|.containers
.spec|.containers|.[]
.spec|.containers|.[]|.command
.spec|.containers|.[]|.command|.[]
.spec|.containers|.[]|.image
.spec|.containers|.[]|.imagePullPolicy
.spec|.containers|.[]|.name
.spec|.containers|.[]|.resources
.spec|.containers|.[]|.terminationMessagePath
.spec|.containers|.[]|.terminationMessagePolicy
.spec|.containers|.[]|.volumeMounts
.spec|.containers|.[]|.volumeMounts|.[]
.spec|.containers|.[]|.volumeMounts|.[]|.mountPath
.spec|.containers|.[]|.volumeMounts|.[]|.name
.spec|.containers|.[]|.volumeMounts|.[]|.readOnly
.spec|.dnsPolicy
.spec|.nodeName
.spec|.priority
.spec|.restartPolicy
.spec|.schedulerName
.spec|.securityContext
.spec|.serviceAccount
.spec|.serviceAccountName
.spec|.terminationGracePeriodSeconds
.spec|.tolerations
.spec|.tolerations|.[]
.spec|.tolerations|.[]|.effect
.spec|.tolerations|.[]|.key
.spec|.tolerations|.[]|.operator
.spec|.tolerations|.[]|.tolerationSeconds
.spec|.volumes
.spec|.volumes|.[]
.spec|.volumes|.[]|.name
.spec|.volumes|.[]|.secret
.spec|.volumes|.[]|.secret|.defaultMode
.spec|.volumes|.[]|.secret|.secretName
.status
.status|.conditions
.status|.conditions|.[]
.status|.conditions|.[]|.lastProbeTime
.status|.conditions|.[]|.lastTransitionTime
.status|.conditions|.[]|.status
.status|.conditions|.[]|.type
.status|.containerStatuses
.status|.containerStatuses|.[]
.status|.containerStatuses|.[]|.containerID
.status|.containerStatuses|.[]|.image
.status|.containerStatuses|.[]|.imageID
.status|.containerStatuses|.[]|.lastState
.status|.containerStatuses|.[]|.name
.status|.containerStatuses|.[]|.ready
.status|.containerStatuses|.[]|.restartCount
.status|.containerStatuses|.[]|.state
.status|.containerStatuses|.[]|.state|.running
.status|.containerStatuses|.[]|.state|.running|.startedAt
.status|.hostIP
.status|.phase
.status|.podIP
.status|.qosClass
.status|.startTime

jq streaming of large json files to get only objects whose properties have a specific value

I have some rather large json files (~500mb - 4gb compressed) for which I cannot load into memory for manipulation. So I am using the --stream option with jq.
For example my json might look like this - only bigger:
[{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters": {
"batter": [{
"id": "1001",
"type": "Regular"
}, {
"id": "1002",
"type": "Chocolate"
}, {
"id": "1003",
"type": "Blueberry"
}, {
"id": "1004",
"type": "Devil's Food"
}]
},
"topping": [{
"id": "5001",
"type": "None"
}, {
"id": "5002",
"type": "Glazed"
}, {
"id": "5005",
"type": "Sugar"
}, {
"id": "5007",
"type": "Powdered Sugar"
}, {
"id": "5006",
"type": "Chocolate with Sprinkles"
}, {
"id": "5003",
"type": "Chocolate"
}, {
"id": "5004",
"type": "Maple"
}]
}, {
"id": "0002",
"type": "donut",
"name": "Raised",
"ppu": 0.55,
"batters": {
"batter": [{
"id": "1001",
"type": "Regular"
}]
},
"topping": [{
"id": "5001",
"type": "None"
}, {
"id": "5002",
"type": "Glazed"
}, {
"id": "5005",
"type": "Sugar"
}, {
"id": "5003",
"type": "Chocolate"
}, {
"id": "5004",
"type": "Maple"
}]
}, {
"id": "0003",
"type": "donut",
"name": "Old Fashioned",
"ppu": 0.55,
"batters": {
"batter": [{
"id": "1001",
"type": "Regular"
}, {
"id": "1002",
"type": "Chocolate"
}]
},
"topping": [{
"id": "5001",
"type": "None"
}, {
"id": "5002",
"type": "Glazed"
}, {
"id": "5003",
"type": "Chocolate"
}, {
"id": "5004",
"type": "Maple"
}]
}]
If this were the type of file I could hold in memory, and I wanted to select objects that only have batter type "Chocolate", I could use:
cat sample.json | jq '.[] | select(.batters.batter[].type == "Chocolate")'
And I would only get back the full objects with ids "0001" and "0003"
But with streaming I know it's different.
I am reading through the jq documentation on streaming here and here, but I am still quite confused as the examples don't really demonstrate real world problems with json.
Namely, Is it even possible to select whole objects after streaming through their paths and identifying a notable event, or in this case a property value that matches a certain string?
I know that I can use:
cat sample.json | jq --stream 'select(.[0][1] == "batters" and .[0][2] == "batter" and .[0][4] == "type") | .[1]'
to give me all of the batter types. But is there a way to say: "If it's Chocolate, grab the object this leaf is a part of"?
Command:
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs))' array_of_objects.json |
jq 'select(.batters.batter[].type == "Chocolate") | .id'
Output:
"0001"
"0003"
The first invocation of jq converts the array of objects into a stream of objects. The second is based on your invocation and can be tailored further to your needs.
Of course the two invocations can (and probably should) be combined into one, but you might want to use the first invocation to save the big file as a file containing the stream of objects.
By the way, it would probably be better to use the following select:
select( any(.batters.batter[]; .type == "Chocolate") )
Here is another approach. Start with a streaming filter filter1.jq that extracts the record number and the minimum set of attributes you need to process. E.g.
select(length==2)
| . as [$p, $v]
| {r:$p[0]}
| if $p[1] == "id" then .id = $v
elif $p[1] == "batters" and $p[-1] == "type" then .type = $v
else empty
end
Running this with
jq -M -c --stream -f filter1.jq bigdata.json
produces values like
{"r":0,"id":"0001"}
{"r":0,"type":"Regular"}
{"r":0,"type":"Chocolate"}
{"r":0,"type":"Blueberry"}
{"r":0,"type":"Devil's Food"}
{"r":1,"id":"0002"}
{"r":1,"type":"Regular"}
{"r":2,"id":"0003"}
{"r":2,"type":"Regular"}
{"r":2,"type":"Chocolate"}
now pipe this into a second filter filter2.jq which does the processing you want on those attributes for each record
foreach .[] as $i (
{c: null, r:null, id:null, type:null}
; .c = $i
| if .r != .c.r then .id=null | .type=null | .r=.c.r else . end # control break
| .id = if .c.id == null then .id else .c.id end
| .type = if .c.type == null then .type else .c.type end
; if ([.id, .type] | contains([null])) then empty else . end
)
| select(.type == "Chocolate").id
with a command like
jq -M -c --stream -f filter1.jq bigdata.json | jq -M -s -r -f filter2.jq
to produce
0001
0003
filter1.jq and filter2.jq do a little more than what you need for this specific problem but they can be generalized easily.