jq select error: "Cannot index string with string <object>" - json

command:
cat test.json | jq -r '.[] | select(.input[] | .["$link"] | contains("randomtext1")) | .id'
I was expecting to have both entries (a and b) to show up since they both contains randomtext1
Instead, I got the following output message:
a
jq: error (at <stdin>:22): Cannot index string with string "$link"
From some digging I understand that the issue is likely caused by the following object/value pair in the a entry:
"someotherobj": "123"
because it does not contain the object $link and the filter in the command expects to see $link in all objects under the input so it errors out before the command has a chance to search in the b entry.
What I really want is to be able to search for any entries that have at least one "$link": "randomtext1" pair under input. Is there a fuzzier search feature allowing me to achieve this?
I tried to use two contains hoping it will just pipe things through:
jq -r '.[] | select(.input[] | contains(["$link"]) | contains("randomtext1")) | .id'
but it did not like that at all..
the test.json file:
[
{
"input": {
"obj1": {
"$link": "randomtext1"
},
"obj2": {
"$link": "randomtext2"
},
"someotherobj": "123"
},
"id": "a"
},
{
"input": {
"obj3": {
"$link": "randomtext1"
},
"obj4": {
"$link": "randomtext2"
}
},
"id": "b"
}
]

What I really want is to be able to search for any entries that have at least one "$link": "randomtext1" pair under input.
The key word here, both in the question and the following answer, is any:
.[]
| select( any(.input[];
type=="object" and has("$link") and (.["$link"] | index("randomtext1"))))
| .id
Of course if you require the key's value to be "randomtext1", you'd write .["$link"] == "randomtext1".

Related

How do I print a specific value of an array given a condition in jq if there is no key specified

I am trying to output the value for .metadata.name followed by the student's name in .spec.template.spec.containers[].students[] array using the regex test() function in jq.
I am having trouble to retrieve the individual array value since there is no key specified for the students[] array.
For example, if I check the students[] array if it contains the word "Jeff", I would like the output to display as below:
student-deployment: Jefferson
What i have tried:
I've tried the command below which somewhat works but I am not sure how to get only the "Jefferson" value. The command below would print out all of the students[] array values which is not what I want. I am using Powershell to run the command below.
kubectl get deployments -o json | jq -r '.items[] | select(.spec.template.spec.containers[].students[]?|test("\"^Jeff.\"")) | .metadata.name, "\":\t\"", .spec.template.spec.containers[].students'
Is there a way to print a specific value of an array given a condition in jq if there is no key specified? Also, would the solution work if there are multiple deployments?
The deployment template below is in json and I shortened it to only the relevant parts.
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {
"name": "student-deployment",
"namespace": "default"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"students": [
"Alice",
"Bob",
"Peter",
"Sally",
"Jefferson"
]
}
]
}
}
}
}
]
}
For this approch, we introduce a variable $pattern. You may set it with --arg pattern to your regex, e.g. "Jeff" or "^Al" or "e$" to have the student list filtered by test, or leave it empty to see all students.
Now, we iterate over all .item[] elements (i.e. over "all deployments"). For each found, we output the content of .metadata.name followed by a literal colon and a space. Then we iterate again over all .spec.template.spec.containers[].students[], perform the pattern test and concatenate the outcome.
To print out raw strings instead of JSON, we use the -r option when calling jq.
kubectl get deployments -o json \
| jq --arg pattern "Jeff" -r '
.items[]
| .metadata.name + ": " + (
.spec.template.spec.containers[].students[]
| select(test($pattern))
)
'
To retrieve the "students" array(s) in the input, you could use this filter:
.items[]
| paths(objects) as $p
| getpath($p)
| select( objects | has("students") )
| .students
You can then add additional filters to select the particular student(s) of interest, e.g.
| .[]
| select(test("Jeff"))
And then add any postprocessing filters, e.g.
| "student-deployment: \(.)"
Of course you can obtain the students array in numerous other ways.

jq: How can I get array values based on superordinate key name

I'm trying to use jq to parse the output of https://ssl-config.mozilla.org/guidelines/5.6.json, a pretty simple JSON structure.
How can I get the "openssl" values if "configurations" is "modern" or "intermediate"?
The basic JSON structure would be:
{
"configurations": {
"intermediate": {
"ciphers": {
"openssl": [
"ECDHE-ECDSA-AES128-GCM-SHA256",
"ECDHE-RSA-AES128-GCM-SHA256",
"ECDHE-ECDSA-AES256-GCM-SHA384",
"ECDHE-RSA-AES256-GCM-SHA384",
"ECDHE-ECDSA-CHACHA20-POLY1305",
"ECDHE-RSA-CHACHA20-POLY1305",
"DHE-RSA-AES128-GCM-SHA256",
"DHE-RSA-AES256-GCM-SHA384"
]
}
}
}
}
I had to shorten it in order to avoid the "It looks like your post is mostly code; please add some more detail" error message.
To get all both the modern and intermediate openssl arrays, we can use:
jq '.configurations | with_entries(select([.key] | inside([ "modern", "intermediate" ])))[] | .ciphers.openssl' input
This will show:
[]
[
"ECDHE-ECDSA-AES128-GCM-SHA256",
"ECDHE-RSA-AES128-GCM-SHA256",
"ECDHE-ECDSA-AES256-GCM-SHA384",
"ECDHE-RSA-AES256-GCM-SHA384",
"ECDHE-ECDSA-CHACHA20-POLY1305",
"ECDHE-RSA-CHACHA20-POLY1305",
"DHE-RSA-AES128-GCM-SHA256",
"DHE-RSA-AES256-GCM-SHA384"
]
To get a result with an object so we can see on what key those openssl certs are found, use something like:
jq '.configurations | to_entries | map(select([.key] | inside([ "modern", "intermediate" ])) | { "\(.key)": .value.ciphers.openssl }) | add' input
This will produce:
{
"modern": [],
"intermediate": [
"ECDHE-ECDSA-AES128-GCM-SHA256",
"ECDHE-RSA-AES128-GCM-SHA256",
"ECDHE-ECDSA-AES256-GCM-SHA384",
"ECDHE-RSA-AES256-GCM-SHA384",
"ECDHE-ECDSA-CHACHA20-POLY1305",
"ECDHE-RSA-CHACHA20-POLY1305",
"DHE-RSA-AES128-GCM-SHA256",
"DHE-RSA-AES256-GCM-SHA384"
]
}

Output paths to all keys named "id" where the type of value is "string"

Given a huge (15GB) deeply nested (12+ object layers) JSON file how can I find the paths to all the keys named id whose values are type string?
A massively simplified example file:
{
"a": [
{
"id": 3,
"foo": "red"
}
],
"b": [
{
"id": "7",
"bar": "orange",
"baz": {
"id": 13
},
"bax": {
"id": "12"
}
}
]
}
Looking for a less ugly solution where I don't run out of RAM and have to punt to grep at the end (sigh). (I failed to figure out how to chain to_entries into this usefully. If that's even something I should be trying to do.)
Ugly solution 1:
$ cat huge.json | jq 'path(..|select(type=="string")) | join(".")' | grep -E '\.id"$'
"b.0.id"
"b.0.bax.id"
Ugly solution 2:
$ cat huge.json | jq --stream -c | grep -E '"id"],"'
[["b",0,"id"],"7"]
[["b",0,"bax","id"],"12"]
Something like this should do that.
jq --stream 'select(.[0][-1] == "id" and (.[1] | strings)) | .[0]' file
And by the way, your first ugly solution can be simplified to this:
jq 'path(.. .id? | strings)' file
Stream the input in as you started with your second solution, but add some filtering. You do not want want to read the entire contents into memory. And also... UUOC.
$ jq --stream '
select(.[0][-1] == "id" and (.[1]|type) == "string")[0]
| join(".")
' huge.json
Thank you both oguz and Jeff! Beautiful! This runs in 6.5 minutes (on my old laptop), never uses more than 21MB of RAM, and gives me exactly what I need. <3
$ jq --stream -c 'select(.[0][-1] == "id" and (.[1]|type) == "string")' huge.json

Filtering one key with the value from another key in jq

I've got a list of package data in JSON format that looks like:
[
{
"Package": "pyasn1",
"Version": "0.4.6",
"DownloadURL": "https://files.pythonhosted.org/packages/3d/50/5ce5dbe42eaf016cb9b062caf6d0f38018454756d4feb467de3e29431dae/pyasn1-0.4.6-py2.4.egg"
},
{
"Package": "cachetools",
"Version": "3.1.1",
"DownloadURL": "https://files.pythonhosted.org/packages/08/6a/abf83cb951617793fd49c98cb9456860f5df66ff89883c8660aa0672d425/cachetools-4.0.0-py3-none-any.whl"
}
]
And I'd like to generate a list of items where .DownloadURL doesn't contain the string in .Version.
I've tried this but it doesn't seem to work:
jq '.[]|select(.DownloadURL | contains(.Version)| not)'
I get the following error:
jq: error (at <stdin>:11): Cannot index string with string "Version"
exit status 5
I was able to do the following...
jq '.[]|select(.DownloadURL | contains("0.4.6")| not)'
...and it gave the results I would expect:
{
"Package": "cachetools",
"Version": "3.1.1",
"DownloadURL": "https://files.pythonhosted.org/packages/08/6a/abf83cb951617793fd49c98cb9456860f5df66ff89883c8660aa0672d425/cachetools-4.0.0-py3-none-any.whl"
}
Is there a way to use the contents of .Version with the contains() function or is there a better way to do this? I've set up a playground with the data here.
Thanks!
The following has worked for me:
jq '.[]|.Version as $v|select(.DownloadURL | contains($v)| not)'
{
"Package": "cachetools",
"Version": "3.1.1",
"DownloadURL": "https://files.pythonhosted.org/packages/08/6a/abf83cb951617793fd49c98cb9456860f5df66ff89883c8660aa0672d425/cachetools-4.0.0-py3-none-any.whl"
}
Is the result
I'd like to generate a list of items where .DownloadURL doesn't contain the string in .Version.
If as you say you want the result as a list, you could use map:
map(select( .Version as $v | .DownloadURL | contains($v) | not))
However, the semantics of contains is very complex so you might wish to consider using index instead:
map(select( .Version as $v | .DownloadURL | index($v) | not))

Using jq to extract common prefixes in a JSON data structure

I have a JSON data set with around 8.7 million key value pairs extracted from a Redis store, where each key is guaranteed to be an 8 digit number, and the key is an 8 alphanumeric character value i.e.
[{
"91201544":"INXX0019",
"90429396":"THXX0020",
"20140367":"ITXX0043",
...
}]
To reduce Redis memory usage, I want to transform this into a hash of hashes, where the hash prefix key is the first 6 characters of the key (see this link) and then store this back into Redis.
Specifically, I want my resulting JSON data structure (that I'll then write some code to parse this JSON structure and create a Redis command file consisting of HSET, etc) to look more like
[{
"000000": { "00000023": "INCD1234",
"00000027": "INCF1423",
....
},
....
"904293": { "90429300": "THXX0020",
"90429302": "THXX0024",
"90429305": "THXY0013"}
}]
Since I've been impressed by jq and I'm trying to be more proficient at functional style programming, I wanted to use jq for this task. So far I've come up with the following:
% jq '.[0] | to_entries | map({key: .key, pfx: .key[0:6], value: .value}) | group_by(.pfx)'
This gives me something like
[
[
{
"key": "00000130",
"pfx": "000001",
"value": "CAXX3231"
},
{
"key": "00000162",
"pfx": "000001",
"value": "CAXX4606"
}
],
[
{
"key": "00000238",
"pfx": "000002",
"value": "CAXX1967"
},
{
"key": "00000256",
"pfx": "000002",
"value": "CAXX0727"
}
],
....
]
I've tried the following:
% jq 'map(map({key: .pfx, value: {key, value}}))
| map(reduce .[] as $item ({}; {key: $item.key, value: [.value[], $item.value]} ))
| map( {key, value: .value | from_entries} )
| from_entries'
which does give me the correct result, but also prints out an error for every reduce (I believe) of
jq: error: Cannot iterate over null
The end result is
{
"000001": {
"00000130": "CAXX3231",
"00000162": "CAXX4606"
},
"000002": {
"00000238": "CAXX1967",
"00000256": "CAXX0727"
},
...
}
which is correct, but how can I avoid getting this stderr warning thrown as well?
I'm not sure there's enough data here to assess what the source of the problem is. I find it hard to believe that what you tried results in that. I'm getting errors with that all the way.
Try this filter instead:
.[0]
| to_entries
| group_by(.key[0:6])
| map({
key: .[0].key[0:6],
value: map(.key=.key[6:8]) | from_entries
})
| from_entries
Given data that looks like this:
[{
"91201544":"INXX0019",
"90429396":"THXX0020",
"20140367":"ITXX0043",
"00000023":"INCD1234",
"00000027":"INCF1423",
"90429300":"THXX0020",
"90429302":"THXX0024",
"90429305":"THXY0013"
}]
Results in this:
{
"000000": {
"23": "INCD1234",
"27": "INCF1423"
},
"201403": {
"67": "ITXX0043"
},
"904293": {
"00": "THXX0020",
"02": "THXX0024",
"05": "THXY0013",
"96": "THXX0020"
},
"912015": {
"44": "INXX0019"
}
}
I understand that this is not what you are asking for but, just for the reference, I think it will be MUCH more faster to do this with Redis's built-in Lua scripting.
And it turns out that it is a bit more straightforward:
for _,key in pairs(redis.call('keys', '*')) do
local val = redis.call('get', key)
local short_key = string.sub(key, 0, -2)
redis.call('hset', short_key, key, val)
redis.call('del', key)
end
This will be done in place without transferring from/to Redis and converting to/from JSON.
Run it from console as:
$ redis-cli eval "$(cat script.lua)" 0
For the record, jq's group_by relies on sorting, which of course will slow things down noticeably when the input is sufficiently large. The following is about 40% faster even when the input array has just 100,000 items:
def compress:
. as $in
| reduce keys[] as $key ({};
$key[0:6] as $k6
| $key[6:] as $k2
| .[$k6] += {($k2): $in[$key]} );
.[0] | compress
Given Jeff's input, the output is identical.