Count records with missing keys using jq - json

Below is a sample output that is returned when calling an API:
curl "https://mywebsite.com/api/cars.json&page=1" | jq '.'
Using jq, how would one count the number or records where the charge key is missing? I understand that the first bit of code would include jq '. | length' but how would one filter out objects that contain or don't contain a certain key value ?
If applied to the sample below, the output would be 1
{
"current_page": 1,
"items": [
{
"id": 1,
"name": "vehicleA",
"state": "available",
"charge": 100
},
{
"id": 2,
"name": "vehicleB",
"state": "available",
},
{
"id": 3,
"name": "vehicleB",
"state": "available",
"charge": 50
}
]
}

Here is a solution using map and length:
.items | map(select(.charge == null)) | length
Try it online at jqplay.org
Here is a more efficient solution using reduce:
reduce (.items[] | select(.charge == null)) as $i (0;.+=1)
Try it online at jqplay.org
Sample Run (assuming corrected JSON data in data.json)
$ jq -M 'reduce (.items[] | select(.charge == null)) as $i (0;.+=1)' data.json
1
Note that each of the above takes a minor shortcut assuming that the items won't have a "charge":null member. If some items could have a null charge then the test for == null won't distinguish between those items and items without the charge key. If this is a concern the following forms of the above filters which use has are better:
.items | map(select(has("charge")|not)) | length
reduce (.items[] | select(has("charge")|not)) as $i (0;.+=1)

Here is a solution that uses a simple but powerful utility function worthy perhaps of your standard library:
def sigma(stream): reduce stream as $s (null; . + $s);
The filter you'd use with this would be:
sigma(.items[] | select(has("charge") == false) | 1)
This is very efficient as no intermediate array is required, and no useless additions of 0 are involved. Also, as mentioned elsewhere, using has is more robust than making assumptions about the value of .charge.
Startup file
If you have no plans to use jq's module system, you can simply add the above definition of sigma to the file ~/.jq and invoke jq like so:
jq 'sigma(.items[] | select(has("charge") == false) | 1)'
Better yet, if you also add def count(s): sigma(s|1); to the file, the invocation would simply be:
jq 'count(.items[] | select(has("charge") | not))'
Standard Library
If for example ~/.jq/jq/jq.jq is your standard library, then assuming count/1 is included in this file, you could invoke jq like so:
jq 'include "jq"; count(.items[] | select(has("charge") == false))'

Related

How to construct object in function using key passed as argument

I frequently need to create reusable function that performs transformations for a given field of input, for example:
def keep_field_only(field):
{field}
;
or
def count_by(field):
group_by(field) |
map(
{
field: .[0].field,
count: length
}
)
;
While group_by works fine with key passed as an argument, using it to construct object (eg. to keep only key in the object) doesn't work.
I believe it can be always worked around using path/1, but in my experience it significantly complicates code.
Other workaround I used is copying field +{new_field: field} at beginning of function, deleting it in the end, but it doesn't look very efficient or readable either.
Is there a shorter and more readable way?
Update:
Sample input:
[
{"type":1, "name": "foo"},
{"type":1, "name": "bar"},
{"type":2, "name": "joe"}
]
Preferred function invocation and expected results:
.[] | keep_field_only(.type):
{"type": 1}
{"type": 1}
{"type": 2}
count_by(.type):
[
{"type":1, "count": 2},
{"type":2, "count": 1}
]
You can define pick/1 as below,
def pick(paths):
. as $in
| reduce path(paths) as $path (null;
setpath($path; $in | getpath($path))
);
and use it like so:
.[] | pick(.type)
Online demo
def count_by(paths; filter):
group_by(paths | filter) | map(
(.[0] | pick(paths)) + {count: length}
);
def count_by(paths):
count_by(paths; .);
count_by(.type)
Online demo
I don't think there's a shorter and more readable way.
As you say, you can use path/1 to define your keep_field_only and count_by, but it can be done in a very simple way:
def keep_field_only(field):
(null | path(field)[0]) as $field
| {($field): field} ;
def count_by(field):
(null | path(field)[0]) as $field
| group_by(field)
| map(
{
($field): .[0][$field],
count: length
}
);
Of course this is only intended to work in examples like yours, e.g. with invocations like keep_field_only(.type) or count_by(.type).
However, thanks to setpath, the same technique can be used in more complex cases.

"Transpose"/"Rotate"/"Flip" JSON elements

I would like to "transpose" (not sure that's the right word) JSON elements.
For example, I have a JSON file like this:
{
"name": {
"0": "fred",
"1": "barney"
},
"loudness": {
"0": "extreme",
"1": "not so loud"
}
}
... and I would like to generate a JSON array like this:
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
My original JSON has many more first level elements than just "name" and "loudness", and many more names, features, etc.
For this simple example I could fully specify the transformation like this:
$ echo '{"name":{"0":"fred","1":"barney"},"loudness":{"0":"extreme","1":"not so loud"}}'| \
> jq '[{"name":.name."0", "loudness":.loudness."0"},{"name":.name."1", "loudness":.loudness."1"}]'
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
... but this isn't feasible for the original JSON.
How can jq create the desired output while being key-agnostic for my much larger JSON file?
Yes, transpose is an appropriate word, as the following makes explicit.
The following generic helper function makes for a simple solution that is completely agnostic about the key names, both of the enclosing object and the inner objects:
# Input: an array of values
def objectify($keys):
. as $in | reduce range(0;length) as $i ({}; .[$keys[$i]] = $in[$i]);
Assuming consistency of the ordering of the inner keys
Assuming the key names in the inner objects are given in a consistent order, a solution can now obtained as follows:
keys_unsorted as $keys
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Without assuming consistency of the ordering of the inner keys
If the ordering of the inner keys cannot be assumed to be consistent, then one approach would be to order them, e.g. using this generic helper function:
def reorder($keys):
. as $in | reduce $keys[] as $k ({}; .[$k] = $in[$k]);
or if you prefer a reduce-free def:
def reorder($keys): [$keys[] as $k | {($k): .[$k]}] | add;
The "main" program above can then be modified as follows:
keys_unsorted as $keys
| (.[$keys[0]]|keys_unsorted) as $inner
| map_values(reorder($inner))
| [.[] | [.[]]] | transpose
| map(objectify($keys))
Caveat
The preceding solution only considers the key names in the first inner object.
Building upon Peak's solution, here is an alternative based on group_by to deal with arbitrary orders of inner keys.
keys_unsorted as $keys
| map(to_entries[])
| group_by(.key)
| map(with_entries(.key = $keys[.key] | .value |= .value))
Using paths is a good idea as pointed out by Hobbs. You could also do something like this :
[ path(.[][]) as $p | { key: $p[0], value: getpath($p), id: $p[1] } ]
| group_by(.id)
| map(from_entries)
This is a bit hairy, but it works:
. as $data |
reduce paths(scalars) as $p (
[];
setpath(
[ $p[1] | tonumber, $p[0] ];
( $data | getpath($p) )
)
)
First, capture the top level as $data because . is about to get a new value in the reduce block.
Then, call paths(scalars) which gives a key path to all of the leaf nodes in the input. e.g. for your sample it would give ["name", "0"] then ["name", "1"], then ["loudness", "0"], then ["loudness", "1"].
Run a reduce on each of those paths, starting the reduction with an empty array.
For each path, construct a new path, in the opposite order, with numbers-in-strings turned into real numbers that can be used as array indices, e.g. ["name", "0"] becomes [0, "name"].
Then use getpath to get the value at the old path in $data and setpath to set a value at the new path in . and return it as the next . for the reduce.
At the end, the result will be
[
{
"name": "fred",
"loudness": "extreme"
},
{
"name": "barney",
"loudness": "not so loud"
}
]
If your real data structure might be two levels deep then you would need to replace [ $p[1] | tonumber, $p[0] ] with a more appropriate expression to transform the path. Or maybe some of your "values" are objects/arrays that you want to leave alone, in which case you probably need to replace paths(scalars) with something like paths | select(length == 2).

Fine tuning jq filters to reduce repetition in filter string

I have a complex JSON object produced from an API call (full JSON found in this gist). It's describing attributes of an entity (fields, parameters, child relationships, etc.). Using jq, I'm trying to extract just one child field array and convert it to CSV where field keys are a single header row and values of each array item form the subsequent rows. (NOTE: fields are uniform across all items in the array.)
So far I'm successful, but I feel as if my jq filter string could be better as there is a repetition of unpacking this array in two separate filters.
Here is a redacted version of the JSON for reference:
{
...
"result": {
...
"fields": [
{
"aggregatable": true,
"aiPredictionField": false,
"autoNumber": false,
"byteLength": 18,
"name": "Id",
...
},
{
"aggregatable": true,
"aiPredictionField": false,
"autoNumber": false,
"byteLength": 18,
"name": "OwnerId",
...
},
{
"aggregatable": false,
"aiPredictionField": false,
"autoNumber": false,
"byteLength": 0,
"name": "IsDeleted",
...
},
...
],
...
}
}
So far, here is the working command:
jq -r '.result.fields | (.[0] | keys) , .[] | [.[] | tostring] | #csv'
repeated array unpacking---^-------------^
I could be happy with this, but I would prefer to unpack the result.fields array in the first filter so that it starts out like this:
jq -r '.result.fields[] | ...
Only then there is no longer an array, just a set of objects. I tried several things but none of them gave me what I wanted. Here two things I tried before I realized that unpacking .result.fields[] destroyed anything array-like for me to work with (yep...slow learner here, and can be a bit thick):
jq -r '.result.fields[] | ( keys | .[0] ) , [.[] | tostring] | #csv'
jq -r '.result.fields[] | keys[0] , [.[] | tostring] | #csv'
So the real question is: can I unpack result.fields once and then work with what that gives me? And if not, is there a more efficient way to arrive at the CSV structure I'm looking for?
Your code is buggy, because keys sorts the keys. What's needed here is keys_unsorted.
If you want to accomplish everything in a single invocation of jq, you cannot start the pipeline with result.fields[].
The following does avoid one very small inefficiency of your approach:
.result.fields
| (.[0] | keys_unsorted),
(.[] | [.[] | tostring])
| #csv

Select entries based on multiple values in jq

I'm working with JQ and I absolutely love it so far. I'm running into an issue I've yet to find a solution to anywhere else, though, and wanted to see if the community had a way to do this.
Let's presume we have a JSON file that looks like so:
{"author": "Gary", "text": "Blah"}
{"author": "Larry", "text": "More Blah"}
{"author": "Jerry", "text": "Yet more Blah"}
{"author": "Barry", "text": "Even more Blah"}
{"author": "Teri", "text": "Text on text on text"}
{"author": "Bob", "text": "Another thing to say"}
Now, we want to select rows where the value of author is equal to either "Gary" OR "Larry", but no other case. In reality, I have several thousand names I'm checking against, so simply stating the direct or conditional (e.g. cat blah.json | jq -r 'select(.author == "Gary" or .author == "Larry")') isn't sufficient. I'm trying to do this via the inside function like so but get an error dialog:
cat blah.json | jq -r 'select(.author | inside(["Gary", "Larry"]))'
jq: error (at <stdin>:1): array (["Gary","La...) and string ("Gary") cannot have their containment checked
What would be the best method for doing something like this?
inside and contains are a bit weird. Here are some more straightforward solutions:
index/1
select( .author as $a | ["Gary", "Larry"] | index($a) )
any/2
["Gary", "Larry"] as $whitelist
| select( .author as $a | any( $whitelist[]; . == $a) )
Using a dictionary
If performance is an issue and if "author" is always a string, then a solution along the lines suggested by #JeffMercado should be considered. Here is a variant (to be used with the -n command-line option):
["Gary", "Larry"] as $whitelist
| ($whitelist | map( {(.): true} ) | add) as $dictionary
| inputs
| select($dictionary[.author])
IRC user gnomon answered this on the jq channel as follows:
jq 'select([.author] | inside(["Larry", "Garry", "Jerry"]))'
The intuition behind this approach, as stated by the user was: "Literally your idea, only wrapping .author as [.author] to coerce it into being a single-item array so inside() will work on it." This answer produces the desired result of filtering for a series of names provided in a list as the original question desired.
You can use objects as if they're sets to test for membership. Methods operating on arrays will be inefficient, especially if the array may be huge.
You can build up a set of values prior to reading your input, then use the set to filter your inputs.
$ jq -n --argjson names '["Larry","Garry","Jerry"]' '
(reduce $names[] as $name ({}; .[$name] = true)) as $set
| inputs | select($set[.author])
' blah.json

Extract schema of nested JSON object

Let's assume this is the source json file:
{
"name": "tom",
"age": 12,
"visits": {
"2017-01-25": 3,
"2016-07-26": 4,
"2016-01-24": 1
}
}
I want to get:
[
"age",
"name",
"visits.2017-01-25",
"visits.2016-07-26",
"visits.2016-01-24"
]
I am able to extract the keys using: jq '. | keys' file.json, but this skips nested fields. How to include those?
With your input, the invocation:
jq 'leaf_paths | join(".")'
produces:
"name"
"age"
"visits.2017-01-25"
"visits.2016-07-26"
"visits.2016-01-24"
If you want to include "visits", use paths. If you want the result as a JSON array, enclose the filter with square brackets: [ ... ]
If your input might include arrays, then unless you are using jq 1.6 or later, you will need to convert the integer indices to strings explicitly; also, since leaf_paths is now deprecated, you might want to use its def. The result:
jq 'paths(scalars) | map(tostring) | join(".")'
allpaths
To include paths to null, you could use allpaths defined as follows:
def allpaths:
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
Example:
{"a": null, "b": false} | allpaths | join(".")
produces:
"a"
"b"
all_leaf_paths
Assuming jq version 1.5 or higher, we can get to all_leaf_paths by following the strategy used in builtins.jq, that is, by adding these definitions:
def allpaths(f):
. as $in | allpaths | select(. as $p|$in|getpath($p)|f);
def isscalar:
. == null or . == true or . == false or type == "number" or type == "string";
def all_leaf_paths: allpaths(isscalar);
Example:
{"a": null, "b": false, "object":{"x":0} } | all_leaf_paths | join(".")
produces:
"a"
"b"
"object.x"
Some time ago, I wrote a structural-schema inference engine that
produces simple structural schemas that mirror the JSON documents under consideration,
e.g. for the sample JSON given here, the inferred schema is:
{
"name": "string",
"age": "number",
"visits": {
"2017-01-25": "number",
"2016-07-26": "number",
"2016-01-24": "number"
}
}
This is not exactly the format requested in the original posting, but
for large collections of objects, it does provide a useful overview.
More importantly, there is now a complementary validator for
checking whether a collection of JSON documents matches a structural
schema. The validator checks against schemas written in
JESS (JSON Extended Structural Schemas), a superset of the simple
structural schemas (SSS) produced by the schema inference engine .
(The idea is that one can use the SSS as a starting point to add
more elaborate constraints, including recursive constraints,
within-document referential integrity constraints, etc.)
For reference, here is how one the SSS for your sample.json
would be produced using the "schema" module:
jq 'include "schema"; schema' source.json > source.schema.json
And to validate source.json against a SSS or ESS:
JESS --schema source.schema.json source.json
This does what you want but it doesn't return the data in an array, but it should be an easy modification:
https://github.com/ilyash/show-struct
you can also check out this page:
https://ilya-sher.org/2016/05/11/most-jq-you-will-ever-need/