Count number of objects whose attribute are "null" or contain "null" - json

I have the following JSON. From there I'd like to count how many objects I have which type attribute is either "null" or has an array that contains the value "null". In the following example, the answer would be two. Note that the JSON could also be deeply nested.
{
"A": {
"type": "string"
},
"B": {
"type": "null"
},
"C": {
"type": [
"null",
"string"
]
}
}
I came up with the following, but obviously this doesn't work since it misses the arrays. Any hints how to solve this?
jq '[..|select(.type?=="null")] | length'

This answer focuses on efficiency, straightforwardness, and generality.
In brief, the following jq program produces 2 for the given example.
def count(s): reduce s as $x (0; .+1);
def hasValue($value):
has("type") and
(.type | . == $value or (type == "array" and any(. == $value)));
count(.. | objects | select(hasValue("null")))
Notice that using this approach, it would be easy to count the number of objects having null or "null":
count(.. | objects | select(hasValue("null") or hasValue(null)))

You were almost there. For arrays you could use IN. I also used objects, strings and arrays which are shortcuts to a select of the according types.
jq '[.. | objects.type | select(strings == "null", IN(arrays[]; "null"))] | length'
2
Demo
On larger structures you could also improve performance by not creating that array of which you would only calculate the length, but by instead just iterating over the matching items (e.g. using reduce) and counting on the go.
jq 'reduce (.. | objects.type | select(strings == "null", IN(arrays[]; "null"))) as $_ (0; .+1)'
2
Demo

Related

How to construct object in function using key passed as argument

I frequently need to create reusable function that performs transformations for a given field of input, for example:
def keep_field_only(field):
{field}
;
or
def count_by(field):
group_by(field) |
map(
{
field: .[0].field,
count: length
}
)
;
While group_by works fine with key passed as an argument, using it to construct object (eg. to keep only key in the object) doesn't work.
I believe it can be always worked around using path/1, but in my experience it significantly complicates code.
Other workaround I used is copying field +{new_field: field} at beginning of function, deleting it in the end, but it doesn't look very efficient or readable either.
Is there a shorter and more readable way?
Update:
Sample input:
[
{"type":1, "name": "foo"},
{"type":1, "name": "bar"},
{"type":2, "name": "joe"}
]
Preferred function invocation and expected results:
.[] | keep_field_only(.type):
{"type": 1}
{"type": 1}
{"type": 2}
count_by(.type):
[
{"type":1, "count": 2},
{"type":2, "count": 1}
]
You can define pick/1 as below,
def pick(paths):
. as $in
| reduce path(paths) as $path (null;
setpath($path; $in | getpath($path))
);
and use it like so:
.[] | pick(.type)
Online demo
def count_by(paths; filter):
group_by(paths | filter) | map(
(.[0] | pick(paths)) + {count: length}
);
def count_by(paths):
count_by(paths; .);
count_by(.type)
Online demo
I don't think there's a shorter and more readable way.
As you say, you can use path/1 to define your keep_field_only and count_by, but it can be done in a very simple way:
def keep_field_only(field):
(null | path(field)[0]) as $field
| {($field): field} ;
def count_by(field):
(null | path(field)[0]) as $field
| group_by(field)
| map(
{
($field): .[0][$field],
count: length
}
);
Of course this is only intended to work in examples like yours, e.g. with invocations like keep_field_only(.type) or count_by(.type).
However, thanks to setpath, the same technique can be used in more complex cases.

how to alter values in a field of a json array with jq?

I would like to assign different values to a same field in a json array using jq command line utility,
for example let's take this:
myjson='[{"a":1,"b":"john"}, {"a":2, "b":"mark"}]'
with the following I can assign all fields a same value
echo $myjson | jq '[.[] | .a=5 ]'
I can also increment it:
echo $myjson | jq '[.[] | .a+=2]'
but how can I assign two new different values like 7 and 3 to the two a fields in the objects in the array?
using the following, or similar assigns a whole array to each variable
echo $myjson | jq '[.[] | .a=[7,3]]'
Following your scheme, there is no such solution because all of your filters address all items "at once", meaning independently from their position in the array. That's why you can only set them to the same value or otherwise treat them in the same way, without using their index, which would be needed to address the specific new value to be used.
To achieve your goal, you will have to iterate over either array indices or the aligned elements.
The former could be implemented by using reduce to iterate over the new values in combination with to_entries to access their position (key), and successively update the input array's items using key and value.
reduce ([3,7] | to_entries[]) as $i (.; .[$i.key].a = $i.value)
[
{
"a": 3,
"b": "john"
},
{
"a": 7,
"b": "mark"
}
]
Demo
The latter could be implemented using the transpose builtin, which aligns elements of two (or any number of) arrays by matching their indices, and outputs them as a combined array.
[[3,7], .] | transpose | map(.[0] as $a | .[1] | .a = $a)
[
{
"a": 3,
"b": "john"
},
{
"a": 7,
"b": "mark"
}
]
Demo
Note: Your sample filters could be condensed to .[].a |= 5 (Demo) and .[].a += 2 (Demo), respectively.

"Transpose"/"Rotate"/"Flip" JSON elements

I would like to "transpose" (not sure that's the right word) JSON elements.
For example, I have a JSON file like this:
{
"name": {
"0": "fred",
"1": "barney"
},
"loudness": {
"0": "extreme",
"1": "not so loud"
}
}
... and I would like to generate a JSON array like this:
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
My original JSON has many more first level elements than just "name" and "loudness", and many more names, features, etc.
For this simple example I could fully specify the transformation like this:
$ echo '{"name":{"0":"fred","1":"barney"},"loudness":{"0":"extreme","1":"not so loud"}}'| \
> jq '[{"name":.name."0", "loudness":.loudness."0"},{"name":.name."1", "loudness":.loudness."1"}]'
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
... but this isn't feasible for the original JSON.
How can jq create the desired output while being key-agnostic for my much larger JSON file?
Yes, transpose is an appropriate word, as the following makes explicit.
The following generic helper function makes for a simple solution that is completely agnostic about the key names, both of the enclosing object and the inner objects:
# Input: an array of values
def objectify($keys):
. as $in | reduce range(0;length) as $i ({}; .[$keys[$i]] = $in[$i]);
Assuming consistency of the ordering of the inner keys
Assuming the key names in the inner objects are given in a consistent order, a solution can now obtained as follows:
keys_unsorted as $keys
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Without assuming consistency of the ordering of the inner keys
If the ordering of the inner keys cannot be assumed to be consistent, then one approach would be to order them, e.g. using this generic helper function:
def reorder($keys):
. as $in | reduce $keys[] as $k ({}; .[$k] = $in[$k]);
or if you prefer a reduce-free def:
def reorder($keys): [$keys[] as $k | {($k): .[$k]}] | add;
The "main" program above can then be modified as follows:
keys_unsorted as $keys
| (.[$keys[0]]|keys_unsorted) as $inner
| map_values(reorder($inner))
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Caveat
The preceding solution only considers the key names in the first inner object.
Building upon Peak's solution, here is an alternative based on group_by to deal with arbitrary orders of inner keys.
keys_unsorted as $keys
| map(to_entries[])
| group_by(.key)
| map(with_entries(.key = $keys[.key] | .value |= .value))
Using paths is a good idea as pointed out by Hobbs. You could also do something like this :
[ path(.[][]) as $p | { key: $p[0], value: getpath($p), id: $p[1] } ]
| group_by(.id)
| map(from_entries)
This is a bit hairy, but it works:
. as $data |
reduce paths(scalars) as $p (
[];
setpath(
[ $p[1] | tonumber, $p[0] ];
( $data | getpath($p) )
)
)
First, capture the top level as $data because . is about to get a new value in the reduce block.
Then, call paths(scalars) which gives a key path to all of the leaf nodes in the input. e.g. for your sample it would give ["name", "0"] then ["name", "1"], then ["loudness", "0"], then ["loudness", "1"].
Run a reduce on each of those paths, starting the reduction with an empty array.
For each path, construct a new path, in the opposite order, with numbers-in-strings turned into real numbers that can be used as array indices, e.g. ["name", "0"] becomes [0, "name"].
Then use getpath to get the value at the old path in $data and setpath to set a value at the new path in . and return it as the next . for the reduce.
At the end, the result will be
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
If your real data structure might be two levels deep then you would need to replace [ $p[1] | tonumber, $p[0] ] with a more appropriate expression to transform the path. Or maybe some of your "values" are objects/arrays that you want to leave alone, in which case you probably need to replace paths(scalars) with something like paths | select(length == 2).

jq get number of jsons in an array containing a specific value

I've got an array of multiple JSON. I would like to get the number of of JSON which contain a specific value.
Example:
[
{
"key": "value1",
"2ndKey":"2ndValue1"
},
{
"key": "value2",
"2ndKey":"2ndValue2"
},
{
"key": "value1",
"2ndKey":"2ndValue3"
}
]
So in case I'm looking for value1 in key, the result should be 2.
I would like to get an solution using jq. I had already some tries, however they did not fully work. The best one yet was the following:
cat /tmp/tmp.txt | jq ' select(.[].key == "value1" ) | length '
I get the correct results but it is shown multiple times.
Can anybody help me to further improve my code. Thanks in advance!
You are pretty close. Try this
map(select(.key == "value1")) | length
or the equivalent
[ .[] | select(.key == "value1") ] | length
An efficient and convenient way to count is to use 'count' as defined below:
def count(s; cond): reduce s as $x (0; if ($x|cond) then .+1 else . end);
count(.[]; .key == "value1")

Extract schema of nested JSON object

Let's assume this is the source json file:
{
"name": "tom",
"age": 12,
"visits": {
"2017-01-25": 3,
"2016-07-26": 4,
"2016-01-24": 1
}
}
I want to get:
[
"age",
"name",
"visits.2017-01-25",
"visits.2016-07-26",
"visits.2016-01-24"
]
I am able to extract the keys using: jq '. | keys' file.json, but this skips nested fields. How to include those?
With your input, the invocation:
jq 'leaf_paths | join(".")'
produces:
"name"
"age"
"visits.2017-01-25"
"visits.2016-07-26"
"visits.2016-01-24"
If you want to include "visits", use paths. If you want the result as a JSON array, enclose the filter with square brackets: [ ... ]
If your input might include arrays, then unless you are using jq 1.6 or later, you will need to convert the integer indices to strings explicitly; also, since leaf_paths is now deprecated, you might want to use its def. The result:
jq 'paths(scalars) | map(tostring) | join(".")'
allpaths
To include paths to null, you could use allpaths defined as follows:
def allpaths:
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
Example:
{"a": null, "b": false} | allpaths | join(".")
produces:
"a"
"b"
all_leaf_paths
Assuming jq version 1.5 or higher, we can get to all_leaf_paths by following the strategy used in builtins.jq, that is, by adding these definitions:
def allpaths(f):
. as $in | allpaths | select(. as $p|$in|getpath($p)|f);
def isscalar:
. == null or . == true or . == false or type == "number" or type == "string";
def all_leaf_paths: allpaths(isscalar);
Example:
{"a": null, "b": false, "object":{"x":0} } | all_leaf_paths | join(".")
produces:
"a"
"b"
"object.x"
Some time ago, I wrote a structural-schema inference engine that
produces simple structural schemas that mirror the JSON documents under consideration,
e.g. for the sample JSON given here, the inferred schema is:
{
"name": "string",
"age": "number",
"visits": {
"2017-01-25": "number",
"2016-07-26": "number",
"2016-01-24": "number"
}
}
This is not exactly the format requested in the original posting, but
for large collections of objects, it does provide a useful overview.
More importantly, there is now a complementary validator for
checking whether a collection of JSON documents matches a structural
schema. The validator checks against schemas written in
JESS (JSON Extended Structural Schemas), a superset of the simple
structural schemas (SSS) produced by the schema inference engine .
(The idea is that one can use the SSS as a starting point to add
more elaborate constraints, including recursive constraints,
within-document referential integrity constraints, etc.)
For reference, here is how one the SSS for your sample.json
would be produced using the "schema" module:
jq 'include "schema"; schema' source.json > source.schema.json
And to validate source.json against a SSS or ESS:
JESS --schema source.schema.json source.json
This does what you want but it doesn't return the data in an array, but it should be an easy modification:
https://github.com/ilyash/show-struct
you can also check out this page:
https://ilya-sher.org/2016/05/11/most-jq-you-will-ever-need/