jq: reduce json arrays with a Math function - json

I wrote this JSONProcessor bash function to sum, average, get min and get max values from arrays based on the name values. The average and min results are not correct and I can't for the life of me figure out what I am doing wrong.
function JSONProccessor {
jq '
def myMathFunc:
if (.name | test("^sum", "")) then
{"\(.name)": (.values | add)}
elif (.name | test("^avg|^global-avg", "")) then
{"\(.name)": ((.values | add) / (.values | length)) }
elif (.name | test("^max", "")) then
{"\(.name)": (.values | max) }
elif (.name | test("^min", "")) then
{"\(.name)": (.values | min) }
else
{"\(.name)": .values[]}
end;
[
.Response.stats.data[] |
.identifier.names[] as $name |
.identifier.values[] as $val |
{"\($name)": "\($val)"} + ([
.metric[] | myMathFunc
] | add)
]
' < ${1} > ${2}
}
Input JSON
{
"Response" : {
"TimeUnit" : [ 1588153140000, 1588153200000 ],
"metaData" : {
"errors" : [ ]
},
"resultTruncated" : false,
"stats" : {
"data" : [ {
"identifier" : {
"names" : [ "apiproxy" ],
"values" : [ "authn" ]
},
"metric" : [ {
"env" : "prod",
"name" : "min(request_processing_latency)",
"values" : [ 917.0, 6.0 ]
}, {
"env" : "prod",
"name" : "avg(total_response_time)",
"values" : [ 2203.5, 13.0 ]
}, {
"env" : "prod",
"name" : "max(request_processing_latency)",
"values" : [ 1286.0, 6.0 ]
}, {
"env" : "prod",
"name" : "global-avg-total_response_time",
"values" : [ 1473.3333333333333 ]
}, {
"env" : "prod",
"name" : "sum(message_count)",
"values" : [ 2.0, 1.0 ]
}
]
}
]
}
}
}
Output
[
{
"apiproxy": "authn",
"min(request_processing_latency)": 923,
"avg(total_response_time)": 2216.5,
"max(request_processing_latency)": 1292,
"global-avg-total_response_time": 1473.3333333333333,
"sum(message_count)": 3
}
]
Can you please review and let me know if I am missing anything obvious?

Your idea is right, but the test() function does not "need" the 2nd argument "" as you have defined. You just need the test() function to return a bool to assert your match. Removing the 2nd argument should make your function work as expected.
The test() function supports prototypes test(REGEX; FLAGS) where FLAGS defined in don't matter your logic at all. The test() function does not even take arguments separated by , but only by ; de-limiter.
jq-play snippet

Related

Import JSON from CSV, grouping by multiple fields

I would like to create a JSON with array of nested objects with a grouping for different fields.
This is the CSV and Iwould like to group it by sid, year and quarter (first three fields):
S4446B3,2020,202001,2,345.45
S4446B3,2020,202001,4,24.44
S4446B3,2021,202102,5,314.55
S6506LK,2020,202002,3,376.55
S6506LK,2020,202003,3,76.23
After splitting the CSV with the following I get an object for each record.
split("\n")
| map(split(","))
| .[0:]
| map({"sid" : .[0], "year" : .[1], "quarter" : .[2], "customer_type" : .[3], "obj" : .[4]})
But for each sid I would like to get an array of objects nested like this :
[
{
"sid" : "S4446B3",
"years" : [
{
"year" : 2020,
"quarters" : [
{
"quarter" : 202001,
"customer_type" : [
{
"type" : 2,
"obj" : "345.45"
},
{
"type" : 4,
"obj" : "24.44"
}
]
}
]
},
{
"year" : 2021,
"quarters" : [
{
"quarter" : 202102,
"customer_type" : [
{
"type" : 5,
"obj" : "314.55"
}
]
}
]
}
]
},
{
"sid" : "S6506LK",
"years" : [
{
"year" : 2020,
"quarters" : [
{
"quarter" : 202002,
"customer_type" : [
{
"type" : 3,
"obj" : "376.55"
}
]
},
{
"quarter" : 202003,
"customer_type" : [
{
"type" : 3,
"obj" : "76.23"
}
]
}
]
}
]
}
]
It'd be more intuitive if sid, year, quarter, etc. were to be key names. With -R/--raw-input and -n/--null-input options on the command line, this will do that:
reduce (inputs / ",")
as [$sid, $year, $quarter, $type, $obj]
(.; .[$sid][$year][$quarter] += [{$type, $obj}])
And, to get your expected output you can append these lines to the above program.
| .[][] |= (to_entries | map({quarter: .key, customer_type: .value}))
| .[] |= (to_entries | map({year: .key, quarters: .value}))
| . |= (to_entries | map({sid: .key, years: .value}))

Transform JSON to a compact format

I'm trying to Transform the following json
{ "application" : [
{ "name" : "app1",
"policies" : [
{ "name" : "pol_1",
"orderNumber" : "10"
},
{ "name" : "pol_2",
"orderNumber" : "20"
}
]
},
{ "name" : "app2",
"policies" : [
{ "name" : "pol_A",
"orderNumber" : "10"
},
{ "name" : "pol_B",
"orderNumber" : "20"
}
]
}
]
}
To the following
{ "pol_1":"10", "pol_2":"20" }
Using
jq -r ".application[] | select(.name==\"app1\") | .policies[] | {\".name\" : .orderNumber}"
I was able to get
{
"pol_1":"10"
}
{
"pol_2":"20"
}
Any idea how I can merge them. Am I missing something Or am I doing it the wrong way?
You were almost there. Use map to create a single array instead of two independent objects, then use add to merge its contents.
jq '.application[]
| select(.name == "app1")
| .policies
| map({ (.name) : .orderNumber } )
| add' file.json

Parse and Map 2 Arrays with jq

I am working with a JSON file similar to the one below:
{ "Response" : {
"TimeUnit" : [ 1576126800000 ],
"metaData" : {
"errors" : [ ],
"notices" : [ "query served by:1"]
},
"stats" : {
"data" : [ {
"identifier" : {
"names" : [ "apiproxy", "response_status_code", "target_response_code", "target_ip" ],
"values" : [ "IO", "502", "502", "7.1.143.6" ]
},
"metric" : [ {
"env" : "dev",
"name" : "sum(message_count)",
"values" : [ 0.0]
} ]
} ]
} } }
My object is to display a mapping of the identifier and values like :
apiproxy=IO
response_status_code=502
target_response_code=502
target_ip=7.1.143.6
I have been able to parse both names and values with
.[].stats.data[] | (.identifier.names[]) and .[].stats.data[] | (.identifier.values[])
but I need help with the jq way to map the values.
The whole thing can be done in jq using the -r command-line option:
.[].stats.data[]
| [.identifier.names, .identifier.values]
| transpose[]
| "\(.[0])=\(.[1])"

Moving a json nested key-value pair up one level with jq

I want to use jq to move a nested key:value pair up one level. So given a geojson array of objects like this:
{
"type" : "FeatureCollection",
"features" : [ {
"type" : "Feature",
"geometry" : {
"type" : "MultiLineString",
"coordinates" : [ [ [ -74, 40 ], [ -73, 40 ] ] ]
},
"properties" : {
"startTime" : "20160123T162547-0500",
"endTime" : "20160123T164227-0500",
"activities" : [ {
"activity" : "car",
"group" : "car"
} ]
}
} ]
}
I want to return the exact same object, but with "group": "car" in the features object. So the result would look something like this:
{
"type" : "FeatureCollection",
"features" : [ {
"type" : "Feature",
"geometry" : {
"type" : "MultiLineString",
"coordinates" : [ [ [ -74, 40 ], [ -73, 40 ] ] ]
},
"properties" : {
"type" : "move",
"startTime" : "20160123T162547-0500",
"endTime" : "20160123T164227-0500",
"group" : "car",
"activities" : [ {
"activity" : "car"
} ]
}
} ]
}
This seems simple, but somehow I'm struggling to figure out how to do it with jq. Help appreciated!
jq solution:
jq '(.features[0].properties.group = .features[0].properties.activities[0].group)
| del(.features[0].properties.activities[0].group)' input.json
The output:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "MultiLineString",
"coordinates": [
[
[
-74,
40
],
[
-73,
40
]
]
]
},
"properties": {
"startTime": "20160123T162547-0500",
"endTime": "20160123T164227-0500",
"activities": [
{
"activity": "car"
}
],
"group": "car"
}
}
]
}
In two steps (first add, then delete):
.features[0].properties |= (.group = .activities[0].group)
| del(.features[0].properties.activities[0].group)
Or still more succinctly:
.features[0].properties |=
((.group = .activities[0].group) | del(.activities[0].group))
The problem doesn't discuss what should be done if there are no activities or
if there is more than one activity so the following filter encapsulates the
property change to a function:
def update_activity:
if .activities|length<1 then .
else
.group = .activities[0].group
| del(.activities[0].group)
end
;
.features[].properties |= update_activity
.properties is left unmodified when there are no activities otherwise the group
of the first activity is moved to the property, leaving other activities unmodified.
So if the sample input (slightly abbreviated) were instead
{
"type" : "FeatureCollection",
"features" : [ {
"properties" : {
"activities" : [ {
"activity" : "car",
"group" : "car"
}, {
"activity" : "bike",
"group" : "bike"
} ]
}
} ]
}
the result would be
{
"type": "FeatureCollection",
"features" : [ {
"properties" : {
"group": "car",
"activities": [ {
"activity": "car"
}, {
"activity": "bike",
"group": "bike"
} ]
}
} ]
}
This approach offers a specific place to put the logic dealing with other
variations. E.g. this version of update_activity removes the .group of
all activities:
def update_activity:
if .activities|length<1 then .
else
.group = .activities[0].group
| del(.activities[].group)
end
;
and this version also assigns .group to null in the event there are no activites:
def update_activity:
if .activities|length<1 then
.group = null
else
.group = .activities[0].group
| del(.activities[].group)
end
;
Here is a generalized solution:
# move the key/value specified by apath up to the containing JSON object:
def up(apath):
def trim:
if .[-1] | type == "number" then .[0:-2] | trim
else .
end;
. as $in
| (null | path(apath)) as $p
| ($p|last) as $last
| $in
| getpath($p) as $v
| setpath(($p[0:-1]|trim) + [$last]; $v)
| del(apath)
;
With this definition, the solution is simply:
up( .features[0].properties.activities[0].group )

jq: group and key by property

I have a list of objects that look like this:
[
{
"ip": "1.1.1.1",
"component": "name1"
},
{
"ip": "1.1.1.2",
"component": "name1"
},
{
"ip": "1.1.1.3",
"component": "name2"
},
{
"ip": "1.1.1.4",
"component": "name2"
}
]
Now I'd like to group and key that by the component and assign a list of ips to each of the components:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
]
},{
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
I figured it out myself. I first group by .component and then just create new lists of ips that are indexed by the component of the first object of each group:
jq ' group_by(.component)[] | {(.[0].component): [.[] | .ip]}'
The accepted answer doesn't produce valid json, but:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
]
}
{
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
name1 as well as name2 are valid json objects, but the output as a whole isn't.
The following jq statement results in the desired output as specified in the question:
group_by(.component) | map({ key: (.[0].component), value: [.[] | .ip] }) | from_entries
Output:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
],
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
Suggestions for simpler approaches are welcome.
If human readability is preferred over valid json, I'd suggest something like ...
jq -r 'group_by(.component)[] | "IPs for " + .[0].component + ": " + (map(.ip) | tostring)'
... which results in ...
IPs for name1: ["1.1.1.1","1.1.1.2"]
IPs for name2: ["1.1.1.3","1.1.1.4"]
As a further example of #replay's technique, after many failures using other methods, I finally built a filter that condenses this Wazuh report (excerpted for brevity):
{
"took" : 228,
"timed_out" : false,
"hits" : {
"total" : {
"value" : 2806,
"relation" : "eq"
},
"hits" : [
{
"_source" : {
"agent" : {
"name" : "100360xx"
},
"data" : {
"vulnerability" : {
"severity" : "High",
"package" : {
"condition" : "less than 78.0",
"name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
}
}
}
}
},
{
"_source" : {
"agent" : {
"name" : "100360xx"
},
"data" : {
"vulnerability" : {
"severity" : "High",
"package" : {
"condition" : "less than 78.0",
"name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
}
}
}
}
},
...
Here is the jq filter I use to provide an array of objects, each consisting of an agent name followed by an array of names of the agent's vulnerable packages:
jq ' .hits.hits |= unique_by(._source.agent.name, ._source.data.vulnerability.package.name) | .hits.hits | group_by(._source.agent.name)[] | { (.[0]._source.agent.name): [.[]._source.data.vulnerability.package | .name ]}'
Here is an excerpt of the output produced by the filter:
{
"100360xx": [
"Mozilla Firefox 68.11.0 ESR (x64 en-US)",
"VLC media player",
"Windows 10"
]
}
{
"WIN-KD5C4xxx": [
"Windows Server 2019"
]
}
{
"fridxxx": [
"java-1.8.0-openjdk",
"kernel",
"kernel-headers",
"kernel-tools",
"kernel-tools-libs",
"python-perf"
]
}
{
"mcd-xxx-xxx": [
"dbus",
"fribidi",
"gnupg2",
"graphite2",
...