I frequently need to create reusable function that performs transformations for a given field of input, for example:
def keep_field_only(field):
{field}
;
or
def count_by(field):
group_by(field) |
map(
{
field: .[0].field,
count: length
}
)
;
While group_by works fine with key passed as an argument, using it to construct object (eg. to keep only key in the object) doesn't work.
I believe it can be always worked around using path/1, but in my experience it significantly complicates code.
Other workaround I used is copying field +{new_field: field} at beginning of function, deleting it in the end, but it doesn't look very efficient or readable either.
Is there a shorter and more readable way?
Update:
Sample input:
[
{"type":1, "name": "foo"},
{"type":1, "name": "bar"},
{"type":2, "name": "joe"}
]
Preferred function invocation and expected results:
.[] | keep_field_only(.type):
{"type": 1}
{"type": 1}
{"type": 2}
count_by(.type):
[
{"type":1, "count": 2},
{"type":2, "count": 1}
]
You can define pick/1 as below,
def pick(paths):
. as $in
| reduce path(paths) as $path (null;
setpath($path; $in | getpath($path))
);
and use it like so:
.[] | pick(.type)
Online demo
def count_by(paths; filter):
group_by(paths | filter) | map(
(.[0] | pick(paths)) + {count: length}
);
def count_by(paths):
count_by(paths; .);
count_by(.type)
Online demo
I don't think there's a shorter and more readable way.
As you say, you can use path/1 to define your keep_field_only and count_by, but it can be done in a very simple way:
def keep_field_only(field):
(null | path(field)[0]) as $field
| {($field): field} ;
def count_by(field):
(null | path(field)[0]) as $field
| group_by(field)
| map(
{
($field): .[0][$field],
count: length
}
);
Of course this is only intended to work in examples like yours, e.g. with invocations like keep_field_only(.type) or count_by(.type).
However, thanks to setpath, the same technique can be used in more complex cases.
Related
I have the following JSON. From there I'd like to count how many objects I have which type attribute is either "null" or has an array that contains the value "null". In the following example, the answer would be two. Note that the JSON could also be deeply nested.
{
"A": {
"type": "string"
},
"B": {
"type": "null"
},
"C": {
"type": [
"null",
"string"
]
}
}
I came up with the following, but obviously this doesn't work since it misses the arrays. Any hints how to solve this?
jq '[..|select(.type?=="null")] | length'
This answer focuses on efficiency, straightforwardness, and generality.
In brief, the following jq program produces 2 for the given example.
def count(s): reduce s as $x (0; .+1);
def hasValue($value):
has("type") and
(.type | . == $value or (type == "array" and any(. == $value)));
count(.. | objects | select(hasValue("null")))
Notice that using this approach, it would be easy to count the number of objects having null or "null":
count(.. | objects | select(hasValue("null") or hasValue(null)))
You were almost there. For arrays you could use IN. I also used objects, strings and arrays which are shortcuts to a select of the according types.
jq '[.. | objects.type | select(strings == "null", IN(arrays[]; "null"))] | length'
2
Demo
On larger structures you could also improve performance by not creating that array of which you would only calculate the length, but by instead just iterating over the matching items (e.g. using reduce) and counting on the go.
jq 'reduce (.. | objects.type | select(strings == "null", IN(arrays[]; "null"))) as $_ (0; .+1)'
2
Demo
I would like to "transpose" (not sure that's the right word) JSON elements.
For example, I have a JSON file like this:
{
"name": {
"0": "fred",
"1": "barney"
},
"loudness": {
"0": "extreme",
"1": "not so loud"
}
}
... and I would like to generate a JSON array like this:
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
My original JSON has many more first level elements than just "name" and "loudness", and many more names, features, etc.
For this simple example I could fully specify the transformation like this:
$ echo '{"name":{"0":"fred","1":"barney"},"loudness":{"0":"extreme","1":"not so loud"}}'| \
> jq '[{"name":.name."0", "loudness":.loudness."0"},{"name":.name."1", "loudness":.loudness."1"}]'
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
... but this isn't feasible for the original JSON.
How can jq create the desired output while being key-agnostic for my much larger JSON file?
Yes, transpose is an appropriate word, as the following makes explicit.
The following generic helper function makes for a simple solution that is completely agnostic about the key names, both of the enclosing object and the inner objects:
# Input: an array of values
def objectify($keys):
. as $in | reduce range(0;length) as $i ({}; .[$keys[$i]] = $in[$i]);
Assuming consistency of the ordering of the inner keys
Assuming the key names in the inner objects are given in a consistent order, a solution can now obtained as follows:
keys_unsorted as $keys
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Without assuming consistency of the ordering of the inner keys
If the ordering of the inner keys cannot be assumed to be consistent, then one approach would be to order them, e.g. using this generic helper function:
def reorder($keys):
. as $in | reduce $keys[] as $k ({}; .[$k] = $in[$k]);
or if you prefer a reduce-free def:
def reorder($keys): [$keys[] as $k | {($k): .[$k]}] | add;
The "main" program above can then be modified as follows:
keys_unsorted as $keys
| (.[$keys[0]]|keys_unsorted) as $inner
| map_values(reorder($inner))
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Caveat
The preceding solution only considers the key names in the first inner object.
Building upon Peak's solution, here is an alternative based on group_by to deal with arbitrary orders of inner keys.
keys_unsorted as $keys
| map(to_entries[])
| group_by(.key)
| map(with_entries(.key = $keys[.key] | .value |= .value))
Using paths is a good idea as pointed out by Hobbs. You could also do something like this :
[ path(.[][]) as $p | { key: $p[0], value: getpath($p), id: $p[1] } ]
| group_by(.id)
| map(from_entries)
This is a bit hairy, but it works:
. as $data |
reduce paths(scalars) as $p (
[];
setpath(
[ $p[1] | tonumber, $p[0] ];
( $data | getpath($p) )
)
)
First, capture the top level as $data because . is about to get a new value in the reduce block.
Then, call paths(scalars) which gives a key path to all of the leaf nodes in the input. e.g. for your sample it would give ["name", "0"] then ["name", "1"], then ["loudness", "0"], then ["loudness", "1"].
Run a reduce on each of those paths, starting the reduction with an empty array.
For each path, construct a new path, in the opposite order, with numbers-in-strings turned into real numbers that can be used as array indices, e.g. ["name", "0"] becomes [0, "name"].
Then use getpath to get the value at the old path in $data and setpath to set a value at the new path in . and return it as the next . for the reduce.
At the end, the result will be
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
If your real data structure might be two levels deep then you would need to replace [ $p[1] | tonumber, $p[0] ] with a more appropriate expression to transform the path. Or maybe some of your "values" are objects/arrays that you want to leave alone, in which case you probably need to replace paths(scalars) with something like paths | select(length == 2).
Below is a sample output that is returned when calling an API:
curl "https://mywebsite.com/api/cars.json&page=1" | jq '.'
Using jq, how would one count the number or records where the charge key is missing? I understand that the first bit of code would include jq '. | length' but how would one filter out objects that contain or don't contain a certain key value ?
If applied to the sample below, the output would be 1
{
"current_page": 1,
"items": [
{
"id": 1,
"name": "vehicleA",
"state": "available",
"charge": 100
},
{
"id": 2,
"name": "vehicleB",
"state": "available",
},
{
"id": 3,
"name": "vehicleB",
"state": "available",
"charge": 50
}
]
}
Here is a solution using map and length:
.items | map(select(.charge == null)) | length
Try it online at jqplay.org
Here is a more efficient solution using reduce:
reduce (.items[] | select(.charge == null)) as $i (0;.+=1)
Try it online at jqplay.org
Sample Run (assuming corrected JSON data in data.json)
$ jq -M 'reduce (.items[] | select(.charge == null)) as $i (0;.+=1)' data.json
1
Note that each of the above takes a minor shortcut assuming that the items won't have a "charge":null member. If some items could have a null charge then the test for == null won't distinguish between those items and items without the charge key. If this is a concern the following forms of the above filters which use has are better:
.items | map(select(has("charge")|not)) | length
reduce (.items[] | select(has("charge")|not)) as $i (0;.+=1)
Here is a solution that uses a simple but powerful utility function worthy perhaps of your standard library:
def sigma(stream): reduce stream as $s (null; . + $s);
The filter you'd use with this would be:
sigma(.items[] | select(has("charge") == false) | 1)
This is very efficient as no intermediate array is required, and no useless additions of 0 are involved. Also, as mentioned elsewhere, using has is more robust than making assumptions about the value of .charge.
Startup file
If you have no plans to use jq's module system, you can simply add the above definition of sigma to the file ~/.jq and invoke jq like so:
jq 'sigma(.items[] | select(has("charge") == false) | 1)'
Better yet, if you also add def count(s): sigma(s|1); to the file, the invocation would simply be:
jq 'count(.items[] | select(has("charge") | not))'
Standard Library
If for example ~/.jq/jq/jq.jq is your standard library, then assuming count/1 is included in this file, you could invoke jq like so:
jq 'include "jq"; count(.items[] | select(has("charge") == false))'
I have following json:
{
"vertices": [
{
"__cp": "foo",
"__type": "metric",
"__eid": "foobar",
"name": "Undertow Metrics~Sessions Created",
"_id": 45056,
"_type": "vertex"
},
...
]
"edges": [
...
and I would like to achieve this format:
{
"nodes": [
{
"cp": "foo",
"type": "metric",
"label": "metric: Undertow Metrics~Sessions Created",
"name": "Undertow Metrics~Sessions Created",
"id": 45056
},
...
]
"edges": [
...
So far I was able to create this expression:
jq '{nodes: .vertices} | del(.nodes[]."_type", .nodes[]."__eid")'
I.e. rename 'vertices' to 'nodes' and remove '_type' and '__eid', how can I rename a key nested deeper in the JSON?
You can change the names of properties of objects if you use with_entries(filter). This converts an object to an array of key/value pairs and applies a filter to the pairs and converts back to an object. So you would just want to update the key of those objects to your new names.
Depending on which version of jq you're using, the next part can be tricky. String replacement doesn't get introduced until jq 1.5. If that was available, you could then do this:
{
nodes: .vertices | map(with_entries(
.key |= sub("^_+"; "")
)),
edges
}
Otherwise if you're using jq 1.4, then you'll have to remove them manually. A recursive function can help with that since the number of underscores varies.
def ltrimall(str): str as $str |
if startswith($str)
then ltrimstr($str) | ltrimall(str)
else .
end;
{
nodes: .vertices | map(with_entries(
.key |= ltrimall("_")
)),
edges
}
The following program works with jq 1.4 or jq 1.5.
It uses walk/1 to remove leading underscores from any key, no matter where it occurs in the input JSON.
The version of ltrim provided here uses recurse/1 for efficiency and portability, but any suitable substitute may be used.
def ltrim(c):
reduce recurse( if .[0:1] == c then .[1:] else null end) as $x
(null; $x);
# Apply f to composite entities recursively, and to atoms
def walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
.vertices = .nodes
| del(.nodes)
| (.vertices |= walk(
if type == "object"
then with_entries( .key |= ltrim("_") )
else .
end ))
From your example data it looks like you intend lots of little manipulations so I'd break things out into stages like this:
.nodes = .vertices # \ first take care of renaming
| del(.vertices) # / .vertices to .nodes
| .nodes = [
.nodes[] # \ then scan each node
| . as $n # /
| del(._type, .__eid) # \ whatever key-specific tweaks like
| .label = "metric: \(.name)" # / calculating .label you want can go here
| reduce keys[] as $k ( # \
{}; # | final reduce to handle renaming
.[$k | sub("^_+";"")] = $n[$k] # | any keys that start with _
) # /
]
I have a JSON data set with around 8.7 million key value pairs extracted from a Redis store, where each key is guaranteed to be an 8 digit number, and the key is an 8 alphanumeric character value i.e.
[{
"91201544":"INXX0019",
"90429396":"THXX0020",
"20140367":"ITXX0043",
...
}]
To reduce Redis memory usage, I want to transform this into a hash of hashes, where the hash prefix key is the first 6 characters of the key (see this link) and then store this back into Redis.
Specifically, I want my resulting JSON data structure (that I'll then write some code to parse this JSON structure and create a Redis command file consisting of HSET, etc) to look more like
[{
"000000": { "00000023": "INCD1234",
"00000027": "INCF1423",
....
},
....
"904293": { "90429300": "THXX0020",
"90429302": "THXX0024",
"90429305": "THXY0013"}
}]
Since I've been impressed by jq and I'm trying to be more proficient at functional style programming, I wanted to use jq for this task. So far I've come up with the following:
% jq '.[0] | to_entries | map({key: .key, pfx: .key[0:6], value: .value}) | group_by(.pfx)'
This gives me something like
[
[
{
"key": "00000130",
"pfx": "000001",
"value": "CAXX3231"
},
{
"key": "00000162",
"pfx": "000001",
"value": "CAXX4606"
}
],
[
{
"key": "00000238",
"pfx": "000002",
"value": "CAXX1967"
},
{
"key": "00000256",
"pfx": "000002",
"value": "CAXX0727"
}
],
....
]
I've tried the following:
% jq 'map(map({key: .pfx, value: {key, value}}))
| map(reduce .[] as $item ({}; {key: $item.key, value: [.value[], $item.value]} ))
| map( {key, value: .value | from_entries} )
| from_entries'
which does give me the correct result, but also prints out an error for every reduce (I believe) of
jq: error: Cannot iterate over null
The end result is
{
"000001": {
"00000130": "CAXX3231",
"00000162": "CAXX4606"
},
"000002": {
"00000238": "CAXX1967",
"00000256": "CAXX0727"
},
...
}
which is correct, but how can I avoid getting this stderr warning thrown as well?
I'm not sure there's enough data here to assess what the source of the problem is. I find it hard to believe that what you tried results in that. I'm getting errors with that all the way.
Try this filter instead:
.[0]
| to_entries
| group_by(.key[0:6])
| map({
key: .[0].key[0:6],
value: map(.key=.key[6:8]) | from_entries
})
| from_entries
Given data that looks like this:
[{
"91201544":"INXX0019",
"90429396":"THXX0020",
"20140367":"ITXX0043",
"00000023":"INCD1234",
"00000027":"INCF1423",
"90429300":"THXX0020",
"90429302":"THXX0024",
"90429305":"THXY0013"
}]
Results in this:
{
"000000": {
"23": "INCD1234",
"27": "INCF1423"
},
"201403": {
"67": "ITXX0043"
},
"904293": {
"00": "THXX0020",
"02": "THXX0024",
"05": "THXY0013",
"96": "THXX0020"
},
"912015": {
"44": "INXX0019"
}
}
I understand that this is not what you are asking for but, just for the reference, I think it will be MUCH more faster to do this with Redis's built-in Lua scripting.
And it turns out that it is a bit more straightforward:
for _,key in pairs(redis.call('keys', '*')) do
local val = redis.call('get', key)
local short_key = string.sub(key, 0, -2)
redis.call('hset', short_key, key, val)
redis.call('del', key)
end
This will be done in place without transferring from/to Redis and converting to/from JSON.
Run it from console as:
$ redis-cli eval "$(cat script.lua)" 0
For the record, jq's group_by relies on sorting, which of course will slow things down noticeably when the input is sufficiently large. The following is about 40% faster even when the input array has just 100,000 items:
def compress:
. as $in
| reduce keys[] as $key ({};
$key[0:6] as $k6
| $key[6:] as $k2
| .[$k6] += {($k2): $in[$key]} );
.[0] | compress
Given Jeff's input, the output is identical.