jq - count number of items matching select - json

I am trying to count the number of records that match some property.
If I have json like:
[
{
"id": 0,
"count": 1
},
{
"id": 1,
"count": 1
},
{
"id": 2,
"count": 0
}
]
I am trying to get the number of records with a count of 1.
I can get the matching records with:
$ jq '.[] | select(.count == 1)' in.json
{
"id": 0,
"count": 1
}
{
"id": 1,
"count": 1
}
But the output lists two items, so I cannot directly use length to count them. Instead, using length gives the length of each item.
$ jq '.[] | select(.count == 1) | length' in.json
2
2
How can I count how many records were matched by select?

For efficiency, one should avoid using length on a constructed array. Instead, it's preferable to use a stream-oriented approach.
Here's one efficient solution, which, for convenience, uses the generic count function, defined as:
def count(stream): reduce stream as $i (0; .+1);
With this def, the solution is simply:
count(.[] | select(.count==1))

Use jq 'map(select(.count==1))|length' in.json.
See select example in jq manual: https://stedolan.github.io/jq/manual/#select(boolean_expression)

Related

Is there a way to filter a JSON object using jq to only include those with a key matching a value from a known list?

I have a JSON array, and another text file that contains a list of values.
[
{
"key": "foo",
"detail": "bar"
},
...
]
I need to filter the array elements to only those that have a "key" value that is found in the list of values.
The list of values is a text file containing a single item per-line.
foo
baz
Is this possible to do using jq?
You can use the following:
jq --rawfile to_keep_file to_keep.txt '
( [ $to_keep_file | match(".+"; "g").string | { (.): true } ] | add ) as $to_keep_lkup |
map(select($to_keep_lkup[.key]))
' to_filter.json
or
(
jq -sR . to_keep.txt
cat to_filter.json
) | jq -n '
( [ input | match(".+"; "g").string | { (.): true } ] | add ) as $to_keep_lkup |
inputs | map(select($to_keep_lkup[.key]))
'
The former requires jq v1.6, the first version to provide --rawfile.
jqplay

jq - Get a higher level key after a selection

Given a JSON like the following:
{
"data": [{
"id": "1a2b3c",
"info": {
"a": {
"number": 0
},
"b": {
"number": 1
},
"c": {
"number": 2
}
}
}]
}
I want to select on a number that is greater than or equal to 2 and for that selection I want to return the values of id and number. I did this like so:
$ jq -r '.data[] | .id as $ID | .info[] | select(.number >= 2) | [$ID, .number]' in.json
[
"1a2b3c",
2
]
Now I would also like to return a higher level key for my selection, in my case I need to return c. How can I accomplish this?
Assuming you want the string "c" instead of 2 in the output, this will work:
$ jq '.data[] | .id as $ID | .info | to_entries[] | select(.value.number >= 2) | [$ID, .key]' input.json
[
"1a2b3c",
"c"
]

JQ - Denormalize nested object

I've been trying to convert some JSON to csv and I have the following problem:
I have the following input json:
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
And I expect the following CSV output:
id,t1,t2
100,2,3
200,,3
300,3,
Unfortunately JQ doesn't output if one of select has no match.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
output:
{ "t1": 2, "t2": 3 }
but if one of the objects select returns no match it doesn't return at all.
Example:
echo '{ "id": 100, "a": [{"t" : 1,"c" : 2 }] }' | jq '{t1: (.a[] | select(.t==1)).c , t2: (.a[] | select(.t==2)).c }'
Expected output:
{ "t1": 2, "t2": null }
Does anyone know how to achieve this with JQ?
EDIT:
Based on a comment made by #peak I found the solution that I was looking for.
jq -r '["id","t1","t2"],[.id, (.a[] | select(.t==1)).c//null, (.a[] | select(.t==2)).c//null ]|#csv'
The alternative operator does exactly what I was looking for.
Alternative Operator
Here's a simple solution that does not assume anything about the ordering of the items in the .a array, and easily generalizes to arbitrarily many .t values:
# Convert an array of {t, c} to a dictionary:
def tod: map({(.t|tostring): .c}) | add;
["id", "t1", "t2"], # header
(inputs
| (.a | tod) as $dict
| [.id, (range(1;3) as $i | $dict[$i|tostring]) ])
| #csv
Command-line options
Use the -n option (because inputs is being used), and the -r option (to produce CSV).
This is an absolute mess, but it works:
$ cat tmp.json
{"id": 100, "a": [{"t" : 1,"c" : 2 }, {"t": 2, "c" : 3 }] }
{"id": 200, "a": [{"t": 2, "c" : 3 }] }
{"id": 300, "a": [{"t": 1, "c" : 3 }] }
$ cat filter.jq
def t(id):
.a |
map({key: "t\(.t)", value: .c}) |
({t1:null, t2:null, id:id} | to_entries) + . | from_entries
;
inputs |
map(.id as $id | t($id)) |
(.[0] | keys) as $hdr |
([$hdr] + map(to_entries |map(.value)))[]|
#csv
$ jq -rn --slurp -f filter.jq tmp.json
"id","t1","t2"
2,3,100
,3,200
3,,300
In short, you produce a direct object containing the values from your input, then add it to a "default" object to fill in the missing keys.

How to get latest date key-value pair from json array including parent keys

1) I am trying to generate a CSV file using jq from a json.
2) I need parent keys along with one key-value pair from the child array
3) Which ever value has latest date in it , will be the resulting key-value pair
4) Need to generate a csv out of that result
This is my json
{
"students": [
{
"name": "Name1",
"class": "parentClass1",
"teacher": "teacher1",
"attendance": [
{
"key": "class1",
"value": "01-DEC-2018"
},
{
"key": "class1",
"value": "28-Nov-2018"
},
{
"key": "class1",
"value": "26-Oct-2018"
}
]
},
{
"name": "Name2",
"class": "parentClass2",
"teacher": "teacher2",
"attendance": [
{
"key": "class2",
"value": "05-DEC-2018"
},
{
"key": "class2",
"value": "25-Nov-2018"
},
{
"key": "class2",
"value": "20-Oct-2018"
}
]
}
]
}
I did not made much progress I am trying to create csv like this
jq '.students[] | [.name, .class, attendance[].key,.properties[].value] | #csv ' main.json
Below is expected CSV from that json
Name ParentClass key dateValue Summary
Name1 parentClass1 class1 150 days ago(difference with today date with latest date i.e 01-DEC-2018 ) Teacher1.parentClass1
Name2 parentClass2 class2 150 days ago(difference with today date with latest date i.e 05-DEC-2018 ) Teacher2.parentClass2
Parse dates using strptime and assign the result to values, thus you can get the latest attendance using max_by. Convert the value to seconds since Epoch using mktime, substract it from now, divide by 24 * 60 * 60 to get number of days since.
$ jq -r '
def days_since:
(now - .) / 86400 | floor;
.students[]
| [ .name, .class ] +
( .attendance
| map(.value |= strptime("%d-%b-%Y"))
| max_by(.value)
| [ .key, "\(.value | mktime | days_since) days ago" ]
) +
[ .teacher + "." + .class ]
| #tsv' file
Name1 parentClass1 class1 148 days ago teacher1.parentClass1
Name2 parentClass2 class2 144 days ago teacher2.parentClass2
Note that this solution doesn't deal with daylight saving time changes.
For production purposes jq can't be used here because it doesn't allow to perform daylight saving time safe date calculations.
I would use Python because it allows to perform daylight saving time safe date calculations, comes with json support by default and is installed on most to all UNIX derivates.
#!/usr/bin/env python
import argparse
from datetime import datetime
import json
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('filename')
return parser.parse_args()
def main():
args = parse_args()
with open(args.filename) as file_desc:
data = json.load(file_desc)
print('Name\tParentClass\tkey\tdateValue')
today = datetime.today()
for record in data['students']:
for a in record['attendance']:
date = datetime.strptime(a['value'], '%d-%b-%Y')
a['since'] = (today - date).days
last = sorted(record['attendance'], key=lambda x: x['since'])[0]
print('\t'.join([
record['name'],
record['class'],
last['key'],
'{} days ago'.format(last['since']),
'{}.{}'.format(record['teacher'], record['class']),
]))
if __name__ == '__main__':
main()
Output (on the day when this answer was written):
Name ParentClass Key DateValue Summary
Name1 parentClass1 class1 148 days ago teacher1.parentClass1
Name2 parentClass2 class2 144 days ago teacher2.parentClass2

Sort descending by multiple keys in jq

I have the following array:
[{
"name": "Object 1",
"prop1": 5,
"prop2": 2
}, {
"name": "Object 2",
"prop1": 6,
"prop2": 4
}, {
"name": "Object 3",
"prop1": 5,
"prop2": 3
}]
I want to sort this array analogous to this SQL ORDER BY prop1 DESC, prop2 ASC so I have this result:
[{
"name": "Object 2",
"prop1": 6,
"prop2": 4
}, {
"name": "Object 1",
"prop1": 5,
"prop2": 2
}, {
"name": "Object 3",
"prop1": 5,
"prop2": 3
}]
How can I sort an array a) descending by a key and b) by multiple keys?
Version: jq 1.5
In jq, arrays sort by the sorting of the elements they contain, in order. That is:
$ jq -n '[1, 2] < [1, 3], [1, 2] < [2, 1]'
true
true
The sort_by filter sorts an array, taking an expression as argument that will be evaluated for each member of the array. For example, if you wanted to sort a list of words by length:
$ jq -n '["prop", "leo", "column", "blast"] | sort_by(length)'
[
"leo",
"prop",
"blast",
"column"
]
If the expression given to sort_by as argument returns more than one value, the return values will be implicitly wrapped in an array, which will be subject to the array sorting rules referred to above. For example, if you wanted to sort a list of words by length, and then alphabetically:
$ jq -n '["pro", "leo", "column", "ablast"] | sort_by(length, .)'
[
"leo",
"pro",
"ablast",
"column"
]
Knowing this, and taking into account that the values in your example are numeric, you can just do the following:
$ jq 'sort_by(-.prop1, .prop2)'
One may use the reverse (jq 1.5 Manual) function like:
$ jq -n '["pro", "leo", "column", "ablast"] | sort | reverse'
[
"pro",
"leo",
"column",
"ablast"
]
So your specific example may become:
$ jq -n '[{...}, {...}, {...}, {...}] | sort_by(.prop2) | reverse | sort_by(.prop1) | reverse'
[
{
"name": "Object 2",
"prop1": 6,
"prop2": 4
},
{
"name": "Object 1",
"prop1": 5,
"prop2": 2
},
{
"name": "Object 3",
"prop1": 5,
"prop2": 3
}
]
SQL ORDER BY prop1 DESC, prop2 ASC → jq | sort_by(.prop2) | reverse | sort_by(.prop1) | reverse – Note, the sorting is specified with properties in the reverse order, and reverse is used twice.
Given prop1 and prop2 are numeric, the accepted answer (sort_by(-.prop1, .prop2)) is much simpler/better.