Import JSON from CSV, grouping by multiple fields - json

I would like to create a JSON with array of nested objects with a grouping for different fields.
This is the CSV and Iwould like to group it by sid, year and quarter (first three fields):
S4446B3,2020,202001,2,345.45
S4446B3,2020,202001,4,24.44
S4446B3,2021,202102,5,314.55
S6506LK,2020,202002,3,376.55
S6506LK,2020,202003,3,76.23
After splitting the CSV with the following I get an object for each record.
split("\n")
| map(split(","))
| .[0:]
| map({"sid" : .[0], "year" : .[1], "quarter" : .[2], "customer_type" : .[3], "obj" : .[4]})
But for each sid I would like to get an array of objects nested like this :
[
{
"sid" : "S4446B3",
"years" : [
{
"year" : 2020,
"quarters" : [
{
"quarter" : 202001,
"customer_type" : [
{
"type" : 2,
"obj" : "345.45"
},
{
"type" : 4,
"obj" : "24.44"
}
]
}
]
},
{
"year" : 2021,
"quarters" : [
{
"quarter" : 202102,
"customer_type" : [
{
"type" : 5,
"obj" : "314.55"
}
]
}
]
}
]
},
{
"sid" : "S6506LK",
"years" : [
{
"year" : 2020,
"quarters" : [
{
"quarter" : 202002,
"customer_type" : [
{
"type" : 3,
"obj" : "376.55"
}
]
},
{
"quarter" : 202003,
"customer_type" : [
{
"type" : 3,
"obj" : "76.23"
}
]
}
]
}
]
}
]

It'd be more intuitive if sid, year, quarter, etc. were to be key names. With -R/--raw-input and -n/--null-input options on the command line, this will do that:
reduce (inputs / ",")
as [$sid, $year, $quarter, $type, $obj]
(.; .[$sid][$year][$quarter] += [{$type, $obj}])
And, to get your expected output you can append these lines to the above program.
| .[][] |= (to_entries | map({quarter: .key, customer_type: .value}))
| .[] |= (to_entries | map({year: .key, quarters: .value}))
| . |= (to_entries | map({sid: .key, years: .value}))

Related

Transform JSON to a compact format

I'm trying to Transform the following json
{ "application" : [
{ "name" : "app1",
"policies" : [
{ "name" : "pol_1",
"orderNumber" : "10"
},
{ "name" : "pol_2",
"orderNumber" : "20"
}
]
},
{ "name" : "app2",
"policies" : [
{ "name" : "pol_A",
"orderNumber" : "10"
},
{ "name" : "pol_B",
"orderNumber" : "20"
}
]
}
]
}
To the following
{ "pol_1":"10", "pol_2":"20" }
Using
jq -r ".application[] | select(.name==\"app1\") | .policies[] | {\".name\" : .orderNumber}"
I was able to get
{
"pol_1":"10"
}
{
"pol_2":"20"
}
Any idea how I can merge them. Am I missing something Or am I doing it the wrong way?
You were almost there. Use map to create a single array instead of two independent objects, then use add to merge its contents.
jq '.application[]
| select(.name == "app1")
| .policies
| map({ (.name) : .orderNumber } )
| add' file.json

jq: reduce json arrays with a Math function

I wrote this JSONProcessor bash function to sum, average, get min and get max values from arrays based on the name values. The average and min results are not correct and I can't for the life of me figure out what I am doing wrong.
function JSONProccessor {
jq '
def myMathFunc:
if (.name | test("^sum", "")) then
{"\(.name)": (.values | add)}
elif (.name | test("^avg|^global-avg", "")) then
{"\(.name)": ((.values | add) / (.values | length)) }
elif (.name | test("^max", "")) then
{"\(.name)": (.values | max) }
elif (.name | test("^min", "")) then
{"\(.name)": (.values | min) }
else
{"\(.name)": .values[]}
end;
[
.Response.stats.data[] |
.identifier.names[] as $name |
.identifier.values[] as $val |
{"\($name)": "\($val)"} + ([
.metric[] | myMathFunc
] | add)
]
' < ${1} > ${2}
}
Input JSON
{
"Response" : {
"TimeUnit" : [ 1588153140000, 1588153200000 ],
"metaData" : {
"errors" : [ ]
},
"resultTruncated" : false,
"stats" : {
"data" : [ {
"identifier" : {
"names" : [ "apiproxy" ],
"values" : [ "authn" ]
},
"metric" : [ {
"env" : "prod",
"name" : "min(request_processing_latency)",
"values" : [ 917.0, 6.0 ]
}, {
"env" : "prod",
"name" : "avg(total_response_time)",
"values" : [ 2203.5, 13.0 ]
}, {
"env" : "prod",
"name" : "max(request_processing_latency)",
"values" : [ 1286.0, 6.0 ]
}, {
"env" : "prod",
"name" : "global-avg-total_response_time",
"values" : [ 1473.3333333333333 ]
}, {
"env" : "prod",
"name" : "sum(message_count)",
"values" : [ 2.0, 1.0 ]
}
]
}
]
}
}
}
Output
[
{
"apiproxy": "authn",
"min(request_processing_latency)": 923,
"avg(total_response_time)": 2216.5,
"max(request_processing_latency)": 1292,
"global-avg-total_response_time": 1473.3333333333333,
"sum(message_count)": 3
}
]
Can you please review and let me know if I am missing anything obvious?
Your idea is right, but the test() function does not "need" the 2nd argument "" as you have defined. You just need the test() function to return a bool to assert your match. Removing the 2nd argument should make your function work as expected.
The test() function supports prototypes test(REGEX; FLAGS) where FLAGS defined in don't matter your logic at all. The test() function does not even take arguments separated by , but only by ; de-limiter.
jq-play snippet

Parse and Map 2 Arrays with jq

I am working with a JSON file similar to the one below:
{ "Response" : {
"TimeUnit" : [ 1576126800000 ],
"metaData" : {
"errors" : [ ],
"notices" : [ "query served by:1"]
},
"stats" : {
"data" : [ {
"identifier" : {
"names" : [ "apiproxy", "response_status_code", "target_response_code", "target_ip" ],
"values" : [ "IO", "502", "502", "7.1.143.6" ]
},
"metric" : [ {
"env" : "dev",
"name" : "sum(message_count)",
"values" : [ 0.0]
} ]
} ]
} } }
My object is to display a mapping of the identifier and values like :
apiproxy=IO
response_status_code=502
target_response_code=502
target_ip=7.1.143.6
I have been able to parse both names and values with
.[].stats.data[] | (.identifier.names[]) and .[].stats.data[] | (.identifier.values[])
but I need help with the jq way to map the values.
The whole thing can be done in jq using the -r command-line option:
.[].stats.data[]
| [.identifier.names, .identifier.values]
| transpose[]
| "\(.[0])=\(.[1])"

Moving a json nested key-value pair up one level with jq

I want to use jq to move a nested key:value pair up one level. So given a geojson array of objects like this:
{
"type" : "FeatureCollection",
"features" : [ {
"type" : "Feature",
"geometry" : {
"type" : "MultiLineString",
"coordinates" : [ [ [ -74, 40 ], [ -73, 40 ] ] ]
},
"properties" : {
"startTime" : "20160123T162547-0500",
"endTime" : "20160123T164227-0500",
"activities" : [ {
"activity" : "car",
"group" : "car"
} ]
}
} ]
}
I want to return the exact same object, but with "group": "car" in the features object. So the result would look something like this:
{
"type" : "FeatureCollection",
"features" : [ {
"type" : "Feature",
"geometry" : {
"type" : "MultiLineString",
"coordinates" : [ [ [ -74, 40 ], [ -73, 40 ] ] ]
},
"properties" : {
"type" : "move",
"startTime" : "20160123T162547-0500",
"endTime" : "20160123T164227-0500",
"group" : "car",
"activities" : [ {
"activity" : "car"
} ]
}
} ]
}
This seems simple, but somehow I'm struggling to figure out how to do it with jq. Help appreciated!
jq solution:
jq '(.features[0].properties.group = .features[0].properties.activities[0].group)
| del(.features[0].properties.activities[0].group)' input.json
The output:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "MultiLineString",
"coordinates": [
[
[
-74,
40
],
[
-73,
40
]
]
]
},
"properties": {
"startTime": "20160123T162547-0500",
"endTime": "20160123T164227-0500",
"activities": [
{
"activity": "car"
}
],
"group": "car"
}
}
]
}
In two steps (first add, then delete):
.features[0].properties |= (.group = .activities[0].group)
| del(.features[0].properties.activities[0].group)
Or still more succinctly:
.features[0].properties |=
((.group = .activities[0].group) | del(.activities[0].group))
The problem doesn't discuss what should be done if there are no activities or
if there is more than one activity so the following filter encapsulates the
property change to a function:
def update_activity:
if .activities|length<1 then .
else
.group = .activities[0].group
| del(.activities[0].group)
end
;
.features[].properties |= update_activity
.properties is left unmodified when there are no activities otherwise the group
of the first activity is moved to the property, leaving other activities unmodified.
So if the sample input (slightly abbreviated) were instead
{
"type" : "FeatureCollection",
"features" : [ {
"properties" : {
"activities" : [ {
"activity" : "car",
"group" : "car"
}, {
"activity" : "bike",
"group" : "bike"
} ]
}
} ]
}
the result would be
{
"type": "FeatureCollection",
"features" : [ {
"properties" : {
"group": "car",
"activities": [ {
"activity": "car"
}, {
"activity": "bike",
"group": "bike"
} ]
}
} ]
}
This approach offers a specific place to put the logic dealing with other
variations. E.g. this version of update_activity removes the .group of
all activities:
def update_activity:
if .activities|length<1 then .
else
.group = .activities[0].group
| del(.activities[].group)
end
;
and this version also assigns .group to null in the event there are no activites:
def update_activity:
if .activities|length<1 then
.group = null
else
.group = .activities[0].group
| del(.activities[].group)
end
;
Here is a generalized solution:
# move the key/value specified by apath up to the containing JSON object:
def up(apath):
def trim:
if .[-1] | type == "number" then .[0:-2] | trim
else .
end;
. as $in
| (null | path(apath)) as $p
| ($p|last) as $last
| $in
| getpath($p) as $v
| setpath(($p[0:-1]|trim) + [$last]; $v)
| del(apath)
;
With this definition, the solution is simply:
up( .features[0].properties.activities[0].group )

MongoDB aggregate and count json paths

I have a MongoDB Collection which contains data elements like this:
{
"_id" : "9878jr23geg",
"element" : {
"name" : "element7",
"Set" : [
{
"SubListA" : [
{
"name" : "AlbertEinstein",
"value" : "45"
},
{
"name" : "JohnDoe",
"value" : "34"
},
]
},
{
"MoreNames" : [
{
"name" : "TimMcGraw",
"value" : "39"
}
]
}
]
}
{
"_id" : "275678hfvd",
"element" : {
"name" : "element8",
"Set" : [
{
"SubListA" : [
{
"name" : "AlbertEinstein",
"value" : "45"
},
{
"name" : "JimmyKimmel",
"value" : "41"
}
]
}
]
}
I'm trying to count the occurrences of each unique name, grouped by the element of Set to which they belong. For example, both objects in my example above have an object with name: "AlbertEinstein" inside element.Set.SublistA; therefore I'd expect a return value something along the lines of:
element.Set.SublistA.AlbertEinstein | 2
Essentially, I'd like a count for each of the distinct names when the data is grouped by objects within element.Set.
Ideally, for the example given, I'd like all of:
element.Set.SubListA.AlbertEinstein | 2
element.Set.SubListA.JohnDoe | 1
element.Set.MoreNames.TimMcGraw | 1
element.Set.SublistA.JimmyKimmel | 1
I've tried several aggregate queries but none seems to achieve what I'm trying to do.