date/numeric filter with jq - json

I have a json that looks like this (it's the result of simple jq filter):
[{
"name": "Corrine",
"firstname": "Odile",
"uid": "PA2685",
"roles": [{
"role_name": "GQ",
"start": "2012-06-20"
},
{
"role_name": "HOUSE",
"start": "2012-06-26"
},
{
"role_name": "HOUSE",
"start": "2017-06-28"
}
]
},
{
"name": "Blanche",
"firstname": "Matthieu",
"uid": "PA2685",
"roles": [{
"role_name": "SENATE",
"start": "2014-06-20"
},
{
"role_name": "SENATE",
"start": "2012-06-26"
},
{
"role_name": "SENATE",
"start": "2012-06-28"
}
]
}
]
I would like to filter in two ways:
select only the first-level objects that have at least one role_name (inside the roles array) whose value is HOUSE;
and from this group select only the ones that have at least one start whose date is in 2017 or after.
In the json above "Corrine Odile" would be the only one selected.
I tried some with_entries(select(.value expressions, but my I'm confused about how to deal with the dates as well as the "at least one" requirement.

Requirements of the form "at least one" can usually be satisfied efficiently using any/2, e.g.
any(.roles[]; .role_name == "HOUSE")
The check for the year can (apparently) be accomplished by:
.start | .[:4] | tonumber >= 2017
Solution
To produce an array of objects satisfying the two conditions:
map(select(
any(.roles[]; .role_name == "HOUSE") and
any(.roles[]; .start[:4] | tonumber >= 2017) ))
.start[:4] is short for .start|.[0:4]

Related

JQ Sorting Issue with Decimal Values?

I have following test data and am trying to jq sort in descending order by the highest weight, but having trouble with these decimal points, which the sort seems to ignore?
[
{
"name": "train",
"amount": "4",
"weight": "89129.70000000"
},
{
"name": "plane",
"amount": "200",
"weight": "819002.68900000"
},
{
"name": "car",
"amount": "27",
"weight": "527561695272.42000000"
},
{
"name": "bike",
"amount": "14",
"weight": "9914795.00000000"
},
{
"name": "truck",
"amount": "92",
"weight": "999147.00000000"
}
]
jq -r 'map(select(.weight)) | sort_by(.weight)[] | [.name,.weight]'
will output it incorrectly, it seems to ignore the decimal point eg.
[
"car",
"527561695272.42000000"
]
[
"plane",
"819002.68900000"
]
[
"train",
"89129.70000000"
]
[
"bike",
"9914795.00000000"
]
[
"truck",
"999147.00000000"
]
i've tried a few things and have managed to sort it via:
jq '[.[].weight] | sort_by( split(".") | map(tonumber) ) | reverse'
which will sort output correctly:
[
"527561695272.42000000",
"9914795.00000000",
"999147.00000000",
"819002.68900000",
"89129.70000000"
]
Think its because all outputs come in strings, and so need to use the tonumber function. However, from what I'm doing there, I'm not sure how to get jq to output the full objects, or other things like .name as in the first example. keep getting errors, seem to have put myself into a corner.
Ideally I just want the full output the objects but correctly sorted descending by weight, i.e. the full data, just sorted.
Introduction
Let's consider the following versions as the current versions:
jq: 1.6.
Number representation: Recommendation
but having trouble with these decimal points, which the sort seems to ignore?
Please, note that for some reason to represent numbers you are using string values instead of numeric values.
For example:
"weight": "89129.70000000"
Please, note that the number itself is in double quotes: this makes it a string value.
I would like to recommend you to correct the input data, so that numbers are represented by numeric values.
For example:
"weight": 89129.7
Solution
Please, consider using the sort_by() function with the appropriate path expression.
jq 1.6 Manual:
sort_by(foo) compares two elements by comparing the result of foo on each element.
When numbers are represented by string values
Since weight values are represented by string values (i.e. without taking into account the recommendation), it is necessary to perform the tonumber conversion to sort them as numeric values.
jq -r 'sort_by(.weight | tonumber) | reverse' input.json
Output
[
{
"name": "car",
"amount": "27",
"weight": "527561695272.42000000"
},
{
"name": "bike",
"amount": "14",
"weight": "9914795.00000000"
},
{
"name": "truck",
"amount": "92",
"weight": "999147.00000000"
},
{
"name": "plane",
"amount": "200",
"weight": "819002.68900000"
},
{
"name": "train",
"amount": "4",
"weight": "89129.70000000"
}
]
When numbers are represented by numeric values
If weight values were represented by numeric values (i.e. taking into account the recommendation), it would not be necessary to perform the tonumber conversion:
jq -r 'sort_by(.weight) | reverse' input.json

Remove matching/non-matching elements of a nested array using jq

I need to split the results of a sonarqube analysis history into individual files. Assuming a starting input below,
{
"paging": {
"pageIndex": 1,
"pageSize": 100,
"total": 3
},
"measures": [
{
"metric": "coverage",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "100.0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "100.0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "100.0"
}
]
},
{
"metric": "bugs",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "0"
}
]
},
{
"metric": "vulnerabilities",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "0"
}
]
}
]
}
How do I use jq to clean the results so it only retains the history array entries for each element? The desired output is something like this (output-20181118123808.json for analysis done on "2018-11-18T12:37:08+0000"):
{
"paging": {
"pageIndex": 1,
"pageSize": 100,
"total": 3
},
"measures": [
{
"metric": "coverage",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "100.0"
}
]
},
{
"metric": "bugs",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
}
]
},
{
"metric": "vulnerabilities",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
}
]
}
]
}
I am lost on how to operate only on the sub-elements while leaving the parent structure intact. The naming of the JSON file is going to be handled externally from the jq utility. The sample data provided will be split into 3 files. Some other input can have a variable number of entries, some may be up to 10000. Thanks.
Here is a solution which uses awk to write the distinct files. The solution assumes that the dates for each measure are the same and in the same order, but imposes no limit on the number of distinct dates, or the number of distinct measures.
jq -c 'range(0; .measures[0].history|length) as $i
| (.measures[0].history[$i].date|gsub("[^0-9]";"")), # basis of filename
reduce range(0; .measures|length) as $j (.;
.measures[$j].history |= [.[$i]])' input.json |
awk -F\\t 'fn {print >> fn; fn="";next}{fn="output-" $1 ".json"}'
Comments
The choice of awk here is just for convenience.
The disadvantage of this approach is that if each file is to be neatly formatted, an additional run of a pretty-printer (such as jq) would be required for each file. Thus, if the output in each file is required to be neat, a case could be made for running jq once for each date, thus obviating the need for the post-processing (awk) step.
If the dates of the measures are not in lock-step, then the same approach as above could still be used, but of course the gathering of the dates and the corresponding measures would have to be done differently.
Output
The first two lines produced by the invocation of jq above are as follows:
"201811181237080000"
{"paging":{"pageIndex":1,"pageSize":100,"total":3},"measures":[{"metric":"coverage","history":[{"date":"2018-11-18T12:37:08+0000","value":"100.0"}]},{"metric":"bugs","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]},{"metric":"vulnerabilities","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]}]}
In the comments, the following addendum to the original question appeared:
is there a variation wherein the filtering is based on the date value and not the position? It is not guaranteed that the order will be the same or the number of elements in each metric is going to be the same (i.e. some dates may be missing "bugs", some might have additional metric such as "complexity").
The following will produce a stream of JSON objects, one per date. This stream can be annotated with the date as per my previous answer, which shows how to use these annotations to create the various files. For ease of understanding, we use two helper functions:
def dates:
INDEX(.measures[].history[].date; .)
| keys;
def gather($date): map(select(.date==$date));
dates[] as $date
| .measures |= map( .history |= gather($date) )
INDEX/2
If your jq does not have INDEX/2, now would be an excellent time to upgrade, but in case that's not feasible, here is its def:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);

Parsing JIRA Insights API JSON using jq

So I basically have JSON output from the JIRA Insights API, been digging around and found jq for parsing the JSON. Struggling to wrap my head around on how parse the following to only return values for the objectTypeAttributeId's that I am interested in.
For Example I'm only interested in the value of objectTypeAttributeId 887 provided that objectTypeAttributeId 911's name states as active, but then would like to return the name value of another objectTypeAttributeId
Can this be achieved using jq only? Or shoudl I be using something else?
I can filter down to this level which is the 'attributes' section of the JSON output and print each value, but struggling to find an example catering for my situation.
{
"id": 137127,
"objectTypeAttributeId": 887,
"objectAttributeValues": [
{
"value": "false"
}
],
"objectId": 9036,
"position": 16
},
{
"id": 137128,
"objectTypeAttributeId": 888,
"objectAttributeValues": [
{
"value": "false"
}
],
"objectId": 9036,
"position": 17
},
{
"id": 137296,
"objectTypeAttributeId": 911,
"objectAttributeValues": [
{
"status": {
"id": 1,
"name": "Active",
"category": 1
}
}
],
"objectId": 9036,
"position": 18
},
Can this be achieved using jq only?
Yes, jq was designed precisely for this kind of query. In your case, you could use any, select and if ... then ... else ... end, along the lines of:
if any(.[]; .objectTypeAttributeId == 911 and
any(.objectAttributeValues[]; .status.name == "Active"))
then map(select(.objectTypeAttributeId == 887))
else "whatever"
end

jq only show when object doesnt match

I'm trying to set up an alert for when the following JSON object state says anything but started. I'm beginning to play around with conditional jq but I'm unsure how to implement regex into this.
{
"page": 0,
"page_size": 100,
"total_pages": 10,
"total_rows": 929,
"headers": [
"*"
],
"rows": [
{
"id": "168",
"state": "STARTED"
},
{
"id": "169",
"state": "FAILED"
},
{
"id": "170",
"state": "STARTED"
}
]
}
I only want to display the id and state of the failed object, this is what I tried
jq '.rows[] | .id, select(.state | contains("!STARTED"))' test.json
I'd like my output to be something like
{
"id": "169",
"state": "FAILED"
}
If you simply want to print out the objects for which .state is NOT "STARTED", just use negation:
.rows[] | select(.state != "STARTED")
If the "started" state is associated with multiple values, please give further details. There might not be any need to use regular expressions. If you really do need to use regular expressions, then you will probably want to use test.

parsing JSON with jq to return value of element where another element has a certain value

I have some JSON output I am trying to parse with jq. I read some examples on filtering but I don't really understand it and my output it more complicated than the examples. I have no idea where to even begin beyond jq '.[]' as I don't understand the syntax of jq beyond that and the hierarchy and terminology are challenging as well. My JSON output is below. I want to return the value for Valid where the ItemName equals Item_2. How can I do this?
"1"
[
{
"GroupId": "1569",
"Title": "My_title",
"Logo": "logo.jpg",
"Tags": [
"tag1",
"tag2",
"tag3"
],
"Owner": [
{
"Name": "John Doe",
"Id": "53335"
}
],
"ItemId": "209766",
"Item": [
{
"Id": 47744,
"ItemName": "Item_1",
"Valid": false
},
{
"Id": 47872,
"ItemName": "Item_2",
"Valid": true
},
{
"Id": 47872,
"ItemName": "Item_3",
"Valid": false
}
]
}
]
"Browse"
"8fj9438jgge9hdfv0jj0en34ijnd9nnf"
"v9er84n9ogjuwheofn9gerinneorheoj"
Except for the initial and trailing JSON scalars, you'd simply write:
.[] | .Item[] | select( .ItemName == "Item_2" ) | .Valid
In your particular case, to ensure the top-level JSON scalars are ignored, you could prefix the above with:
arrays |