ID lookup from an external file in JQ - json

I have a lookup file that maps IDs from one system onto another:
[
{
"idA": 2547,
"idB": "5d0bf91d191c6554d14572a6"
},
{
"idA": 2549,
"idB": "5b0473f93d4e53db19f8c249"
},
{
"idA": 2550,
"idB": "5d0bfabc8f20917b92ff07dc"
},
...
And I have a data file with values and an ID from one of these systems:
[
{
"idB": "5d0bf91d191c6554d14572a6",
"description": "Description for 5d0bf91d191c6554d14572a6"
},
{
"idB": "5d0bf49e9236c57281811cfc",
"description": "Description for 5d0bf49e9236c57281811cfc"
},
{
"idB": "5d0bfabc8f20917b92ff07dc",
"description": "Description for 5d0bfabc8f20917b92ff07dc"
},
...
I want to produce a new file of the descriptions with their IDs converted to the idA values in the lookup file. I tried this:
jq --slurpfile idmap ids.json 'map( {"description":.description, "id": (.idB as $b|$idmap[][]|select(.idB==$b)|.idA) } )' descriptions.json
But it produces only an empty array.
I have to double-dereference $idmap because slurping a file "binds an array of the parsed JSON values to the given global variable" -- so just doing $idmap[] throws an error, jq: error (at descriptions.json:70): Cannot index array with string "idB".
Can anyone explain what I'm doing wrong here?

Here's a concise and straightforward solution to the stated problem.
For simplicity, we'll begin by constructing a dictionary containing the relevant mapping using INDEX/2:
INDEX($idmap[]; .idB) | map_values(.idA)
Now the task is easy:
(INDEX($idmap[]; .idB) | map_values(.idA)) as $dict
| map( {description, "idA": $dict[.idB] } )
This assumes an invocation that uses --argfile idmap ids.json to avoid
the unwanted "slurping" caused by --slurpfile, but if the latter is used, then you would use $idmap[][] instead as noted in the original question.
Since the sample snippets do not include any matching "idB" values, there is little point in showing the output that would be obtained using these snippets.
Variation
If the objects in descriptions.js had other keys that should be retained, then the following variant would probably be a more useful guide:
(INDEX($idmap[]; .idB) | map_values(.idA)) as $dict # or $idmap[][] as above
| map( .idA = $dict[.idB] | del(.idB) )

Related

How do I print a specific value of an array given a condition in jq if there is no key specified

I am trying to output the value for .metadata.name followed by the student's name in .spec.template.spec.containers[].students[] array using the regex test() function in jq.
I am having trouble to retrieve the individual array value since there is no key specified for the students[] array.
For example, if I check the students[] array if it contains the word "Jeff", I would like the output to display as below:
student-deployment: Jefferson
What i have tried:
I've tried the command below which somewhat works but I am not sure how to get only the "Jefferson" value. The command below would print out all of the students[] array values which is not what I want. I am using Powershell to run the command below.
kubectl get deployments -o json | jq -r '.items[] | select(.spec.template.spec.containers[].students[]?|test("\"^Jeff.\"")) | .metadata.name, "\":\t\"", .spec.template.spec.containers[].students'
Is there a way to print a specific value of an array given a condition in jq if there is no key specified? Also, would the solution work if there are multiple deployments?
The deployment template below is in json and I shortened it to only the relevant parts.
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {
"name": "student-deployment",
"namespace": "default"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"students": [
"Alice",
"Bob",
"Peter",
"Sally",
"Jefferson"
]
}
]
}
}
}
}
]
}
For this approch, we introduce a variable $pattern. You may set it with --arg pattern to your regex, e.g. "Jeff" or "^Al" or "e$" to have the student list filtered by test, or leave it empty to see all students.
Now, we iterate over all .item[] elements (i.e. over "all deployments"). For each found, we output the content of .metadata.name followed by a literal colon and a space. Then we iterate again over all .spec.template.spec.containers[].students[], perform the pattern test and concatenate the outcome.
To print out raw strings instead of JSON, we use the -r option when calling jq.
kubectl get deployments -o json \
| jq --arg pattern "Jeff" -r '
.items[]
| .metadata.name + ": " + (
.spec.template.spec.containers[].students[]
| select(test($pattern))
)
'
To retrieve the "students" array(s) in the input, you could use this filter:
.items[]
| paths(objects) as $p
| getpath($p)
| select( objects | has("students") )
| .students
You can then add additional filters to select the particular student(s) of interest, e.g.
| .[]
| select(test("Jeff"))
And then add any postprocessing filters, e.g.
| "student-deployment: \(.)"
Of course you can obtain the students array in numerous other ways.

jq: Conditionally update/replace/add json elements using an input file

I receive the following input file:
input.json:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},
{"ID":"bbb_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"bbb_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"bbb_0021122","time_CET":"00:30:00","VALUE":22,"FLAG":"0"},
{"ID":"ccc_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"ccc_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"ccc_0021122","time_CET":"00:30:00","VALUE":20,"FLAG":"0"},
{"ID":"ddd_122455","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
{"ID":"ddd_122455","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"ddd_122455","time_CET":"00:30:00","VALUE":null,"FLAG":"?"},
]
As you can see there are some valid values (FLAG: 0) and some invalid values (FLAG: "?").
Now I got a file looking like this (one for each ID):
aaa.json:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
{"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]
As you can see, object one is the same as in input.json but object two is invalid (FLAG: "?"). That's why object two has to be replaced by the correct object from input.json (with VALUE:18).
Objects can be identified by "time_CET" and "ID" element.
Additionally, there will be new objects in input.json, that have not been part of aaa.json etc. These objects should be added to the array, and valid objects from aaa.json should be kept.
In the end, aaa.json should look like this:
[
{"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},
{"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]
So, to summarize:
look for FLAG: "?" in aaa.json
replace this object with matching object from input.json using "ID"
and "time_CET" for mapping.
Keep exisiting valid objects and add objects from input.json that
did not exist in aaa.json before (this means only objects starting
with "aaa" in "ID" field)
repeat this for bbb.json, ccc.json and ddd.json
I am not sure if it's possible to get this done all at once with a command like this, because the output has to go to back to the correct id files (aaa, bbb ccc.json):
jq --argfile aaa aaa.json --argfile bbb bbb.json .... -f prog.jq input.json
The problem is, that the number after the identifier (aaa, bbb, ccc etc.) may change. So to make sure objects are added to the correct file/array, a statement like this would be required:
if (."ID"|contains("aaa")) then ....
Or is it better to run the program several times with different input parameters? I am not sure..
Thank you in advance!!
Here is one approach
#!/bin/bash
# usage: update.sh input.json aaa.json bbb.json....
# updates each of aaa.json bbb.json....
input_json="$1"
shift
for i in "$#"; do
jq -M --argfile input_json "$input_json" '
# functions to restrict input.json to keys of current xxx.json file
def prefix: input_filename | split(".")[0];
def selectprefix: select(.ID | startswith(prefix));
# functions to build and probe a lookup table
def pk: [.ID, .time_CET];
def lookup($t;$k): $t | getpath($k);
def lookup($t): lookup($t;pk);
def organize(s): reduce s as $r ({}; setpath($r|pk; $r));
# functions to identify objects in input.json missing from xxx.json
def pks: paths | select(length==2);
def missing($t1;$t2): [$t1|pks] - [$t2|pks] | .[];
def getmissing($t1;$t2): [ missing($t1;$t2) as $p | lookup($t1;$p)];
# main routine
organize(.[]) as $xxx
| organize($input_json[] | selectprefix) as $inp
| map(if .FLAG != "?" then . else . += lookup($inp) end)
| . + getmissing($inp;$xxx)
' "$i" | sponge "$i"
done
The script uses jq in a loop to read and update each aaa.json... file.
The filter creates temporary objects to facilitate looking up values by [ID,time_CET], updates any values in the aaa.json with a FLAG=="?" and finally adds any values from input.json that are missing in aaa.json.
The temporary lookup table for input.json uses input_filename so that only keys starting with a prefix matching the name of the currently processed file will be included.
Sample Run:
$ ./update.sh input.json aaa.json
aaa.json after run:
[
{
"ID": "aaa_12301248",
"time_CET": "00:00:00",
"VALUE": 10,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:15:00",
"VALUE": 18,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:55:00",
"VALUE": 45,
"FLAG": "0"
},
{
"ID": "aaa_12301248",
"time_CET": "00:30:00",
"VALUE": 160,
"FLAG": "0"
}
]

Extract schema of nested JSON object

Let's assume this is the source json file:
{
"name": "tom",
"age": 12,
"visits": {
"2017-01-25": 3,
"2016-07-26": 4,
"2016-01-24": 1
}
}
I want to get:
[
"age",
"name",
"visits.2017-01-25",
"visits.2016-07-26",
"visits.2016-01-24"
]
I am able to extract the keys using: jq '. | keys' file.json, but this skips nested fields. How to include those?
With your input, the invocation:
jq 'leaf_paths | join(".")'
produces:
"name"
"age"
"visits.2017-01-25"
"visits.2016-07-26"
"visits.2016-01-24"
If you want to include "visits", use paths. If you want the result as a JSON array, enclose the filter with square brackets: [ ... ]
If your input might include arrays, then unless you are using jq 1.6 or later, you will need to convert the integer indices to strings explicitly; also, since leaf_paths is now deprecated, you might want to use its def. The result:
jq 'paths(scalars) | map(tostring) | join(".")'
allpaths
To include paths to null, you could use allpaths defined as follows:
def allpaths:
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
Example:
{"a": null, "b": false} | allpaths | join(".")
produces:
"a"
"b"
all_leaf_paths
Assuming jq version 1.5 or higher, we can get to all_leaf_paths by following the strategy used in builtins.jq, that is, by adding these definitions:
def allpaths(f):
. as $in | allpaths | select(. as $p|$in|getpath($p)|f);
def isscalar:
. == null or . == true or . == false or type == "number" or type == "string";
def all_leaf_paths: allpaths(isscalar);
Example:
{"a": null, "b": false, "object":{"x":0} } | all_leaf_paths | join(".")
produces:
"a"
"b"
"object.x"
Some time ago, I wrote a structural-schema inference engine that
produces simple structural schemas that mirror the JSON documents under consideration,
e.g. for the sample JSON given here, the inferred schema is:
{
"name": "string",
"age": "number",
"visits": {
"2017-01-25": "number",
"2016-07-26": "number",
"2016-01-24": "number"
}
}
This is not exactly the format requested in the original posting, but
for large collections of objects, it does provide a useful overview.
More importantly, there is now a complementary validator for
checking whether a collection of JSON documents matches a structural
schema. The validator checks against schemas written in
JESS (JSON Extended Structural Schemas), a superset of the simple
structural schemas (SSS) produced by the schema inference engine .
(The idea is that one can use the SSS as a starting point to add
more elaborate constraints, including recursive constraints,
within-document referential integrity constraints, etc.)
For reference, here is how one the SSS for your sample.json
would be produced using the "schema" module:
jq 'include "schema"; schema' source.json > source.schema.json
And to validate source.json against a SSS or ESS:
JESS --schema source.schema.json source.json
This does what you want but it doesn't return the data in an array, but it should be an easy modification:
https://github.com/ilyash/show-struct
you can also check out this page:
https://ilya-sher.org/2016/05/11/most-jq-you-will-ever-need/

jq trying to iterate over two sets of values

Update: I have experimented with extracting the paths I desire to update and using those paths relative to the local and returned objects (read below) in a setpath(paths;getpath(paths)) construction. I can now iterate over the $paths array and make the desired updates to the local json object.
Using the thermostats2.json file below and a ret.json that differs from thermostats2.json only in:
{"location":{"livingroom":{"setpoints":{"day":"25000"}}}} #vs{"day":"23000"}
my script now looks like:
. as $obj |
# obtain location keys from $obj as they may have changed locally prior to $retobj being processed
($obj.location | keys? ) as $locs |
# setpoints are fixed in this code as ["day","night","away"]
["day","night","away"] as $setpoints |
[path($obj.location[($locs[])].setpoints[($setpoints[])])] as $paths |
reduce
range(0; $paths|length) as $i
(.; . | setpath($paths[$i];( $retobj[0] | getpath($paths[$i]))) ) | .
I don't need the $obj variable at this time but I have not cleaned that up yet. Please comment if you see problems with this approach or if this looks like a good solution. I will answer this question if the comments indicate it should be.
I have a json object that contains several location objects each, in turn, containing several setpoint objects, among other data. A remote application is provided this json object and returns updates to the values of the setpoint objects, if required. I would like to update the local json object rather than replace it with the returned object.
I do not want to assume the returned object's location keys are identical to those of the local object as the local object may have been maintained while the remote object was being modified.
I have figured out how to extract the location keys from the local file and create an array containing the setpoint keys whose values I am interested in updating. I have also been able to figure out how to reduce the updated values from the returned object into an array.
What I have not figured out is how to iterate over the locations and the setpoints together in order to update the values in the local json object.
I invoke jq with:
# usage : jq --slurpfile retobj ret.json --from-file query.jq thermostats2.json
query.jq contains:
# use $obj as the local object to be updated with values returned in $retobj
# $retobj is not permitted to modify the structure of $obj
. as $obj |
# obtain location keys from $obj as they may have changed locally prior to $retobj being processed
($obj.location | keys? ) as $locs |
# setpoints are fixed in this code as ["day","night","away"]
["day","night","away"] as $setpoints |
reduce $retobj[0].location[($locs[])].setpoints[($setpoints[])] as $item
( []; . + [$item] )
| . as $vals |
$vals
thermostats2.json:
{ "mode":"Home",
"location": {
"livingroom": {
"scale":"Celcius",
"current": {
"valid":"YES",
"reading":"23000",
"time":"000000"
},
"previous": {
"reading":"23000",
"time":"000000"
},
"setpoints": {
"schedule": {
"weekday": {"day":"0600",
"night":"2100"
},
"weekend": {"day":"0630",
"night":"2200"
}
},
"active":"day",
"day":"23000",
"night":"15556",
"away":"12778"
}
},
"familyroom": {
"scale":"Celcius",
"current": {
"valid":"YES",
"reading":"23000",
"time":"000000"
},
"previous": {
"reading":"23000",
"time":"000000"
},
"setpoints": {
"schedule": {
"weekday": {"day":"0600",
"night":"2100"
},
"weekend": {"day":"0630",
"night":"2200"
}
},
"active":"day",
"day":"23000",
"night":"15556",
"away":"12778"
}
},
"28-000005e2fdef": {
"scale":"Celcius",
"current": {
"valid":"YES",
"reading":"23000",
"time":"000000"
},
"previous": {
"reading":"23000",
"time":"000000"
},
"setpoints": {
"schedule": {
"weekday": {"day":"0600",
"night":"2100"
},
"weekend": {"day":"0630",
"night":"2200"
}
},
"active":"day",
"day":"23000",
"night":"15556",
"away":"12778"
}
}
}
}
What I cannot find is any means to set the values for the same objects in $obj, i.e. effectively:
$obj[0].location[($locs[])].setpoints[($setpoints[])] = $vals
I understand that, as a novice, I am not likely choosing the preferred approach for solving this type of problem. I am also struggling with embracing the filter paradigm in some of the built-in functions, particularly foreach.
to recap my goal, I wish to:
get the proper object values in $retobj via location keys derived from the local obj and the setpoint keys defined in the filter, and set the same paths in local object to those values.
Based on my understanding of the question, I believe that using --argfile rather than --slurpfile would simplify things.
The following filter will adjust $retobj based on the contents of thermostats2.json:
. as $in
| reduce ["location",
(.location | keys[]),
"setpoints",
["day","night","away"][]] as $path
( $retobj;
setpath( $path; ($in | getpath($path)) ) )
Invocation:
jq --argfile retobj ret.json -f query.jq thermostats2.json

Using jq to extract common prefixes in a JSON data structure

I have a JSON data set with around 8.7 million key value pairs extracted from a Redis store, where each key is guaranteed to be an 8 digit number, and the key is an 8 alphanumeric character value i.e.
[{
"91201544":"INXX0019",
"90429396":"THXX0020",
"20140367":"ITXX0043",
...
}]
To reduce Redis memory usage, I want to transform this into a hash of hashes, where the hash prefix key is the first 6 characters of the key (see this link) and then store this back into Redis.
Specifically, I want my resulting JSON data structure (that I'll then write some code to parse this JSON structure and create a Redis command file consisting of HSET, etc) to look more like
[{
"000000": { "00000023": "INCD1234",
"00000027": "INCF1423",
....
},
....
"904293": { "90429300": "THXX0020",
"90429302": "THXX0024",
"90429305": "THXY0013"}
}]
Since I've been impressed by jq and I'm trying to be more proficient at functional style programming, I wanted to use jq for this task. So far I've come up with the following:
% jq '.[0] | to_entries | map({key: .key, pfx: .key[0:6], value: .value}) | group_by(.pfx)'
This gives me something like
[
[
{
"key": "00000130",
"pfx": "000001",
"value": "CAXX3231"
},
{
"key": "00000162",
"pfx": "000001",
"value": "CAXX4606"
}
],
[
{
"key": "00000238",
"pfx": "000002",
"value": "CAXX1967"
},
{
"key": "00000256",
"pfx": "000002",
"value": "CAXX0727"
}
],
....
]
I've tried the following:
% jq 'map(map({key: .pfx, value: {key, value}}))
| map(reduce .[] as $item ({}; {key: $item.key, value: [.value[], $item.value]} ))
| map( {key, value: .value | from_entries} )
| from_entries'
which does give me the correct result, but also prints out an error for every reduce (I believe) of
jq: error: Cannot iterate over null
The end result is
{
"000001": {
"00000130": "CAXX3231",
"00000162": "CAXX4606"
},
"000002": {
"00000238": "CAXX1967",
"00000256": "CAXX0727"
},
...
}
which is correct, but how can I avoid getting this stderr warning thrown as well?
I'm not sure there's enough data here to assess what the source of the problem is. I find it hard to believe that what you tried results in that. I'm getting errors with that all the way.
Try this filter instead:
.[0]
| to_entries
| group_by(.key[0:6])
| map({
key: .[0].key[0:6],
value: map(.key=.key[6:8]) | from_entries
})
| from_entries
Given data that looks like this:
[{
"91201544":"INXX0019",
"90429396":"THXX0020",
"20140367":"ITXX0043",
"00000023":"INCD1234",
"00000027":"INCF1423",
"90429300":"THXX0020",
"90429302":"THXX0024",
"90429305":"THXY0013"
}]
Results in this:
{
"000000": {
"23": "INCD1234",
"27": "INCF1423"
},
"201403": {
"67": "ITXX0043"
},
"904293": {
"00": "THXX0020",
"02": "THXX0024",
"05": "THXY0013",
"96": "THXX0020"
},
"912015": {
"44": "INXX0019"
}
}
I understand that this is not what you are asking for but, just for the reference, I think it will be MUCH more faster to do this with Redis's built-in Lua scripting.
And it turns out that it is a bit more straightforward:
for _,key in pairs(redis.call('keys', '*')) do
local val = redis.call('get', key)
local short_key = string.sub(key, 0, -2)
redis.call('hset', short_key, key, val)
redis.call('del', key)
end
This will be done in place without transferring from/to Redis and converting to/from JSON.
Run it from console as:
$ redis-cli eval "$(cat script.lua)" 0
For the record, jq's group_by relies on sorting, which of course will slow things down noticeably when the input is sufficiently large. The following is about 40% faster even when the input array has just 100,000 items:
def compress:
. as $in
| reduce keys[] as $key ({};
$key[0:6] as $k6
| $key[6:] as $k2
| .[$k6] += {($k2): $in[$key]} );
.[0] | compress
Given Jeff's input, the output is identical.