JQ to CSV issues - json

I previously got some help on here for some jq to csv issues. I ran into an issue where a few json files had some extra values that breaks the jq command
Here is the json data. The repairs section is what breaks the jq command
[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas",
"Repairs: {
"RepairLocations": {
"RepairsCompleted":[
"Fix1",
"Fix2"
]
}
}
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
Here is the command
def expand($keys):
. as $in
| reduce $keys[] as $k ( [{}];
map(. + {
($k): ($in[$k] | if type == "array" then .[] else . end)
})
) | .[];
(.[0] | keys_unsorted) as $h
| $h, (.[] | expand($h) | [.[$h[]]]) | #csv
This is the end result i am trying to get. This data isnt actual data.
Name,Car,Location,Repairs:RepairLocation
John Doe,Car1,Texas,RepairsCompleted:Fix1
John Doe,Car1,Texas,RepairsCompleted:Fix2
John Doe,Car2,Texas,RepairsCompleted:Fix1
John Doe,Car2,Texas,RepairsCompleted:Fix2
Jane Roe,Car1,Illinois,
Jane Roe,Car1,Kansas,
Any advice on this would be great. I am struggling to figure jq out

A simple solution can be obtained using the same technique shown in one of the answers to the similar question you already asked. The only difference is fulfilling your requirements in the case where the "Repairs" key does not exist:
["Name", "Car", "Location", "Repairs:RepairLocation"],
(.[]
| [.Name]
+ (.Car|..|scalars|[.])
+ (.Location|..|scalars|[.])
+ (.Repairs|..|scalars
| [if . == null then . else "RepairsCompleted:\(.)" end]) )
| #csv
Avoiding the repetition with a helper function
def s: .. | scalars | [.];
["Name", "Car", "Location", "Repairs:RepairLocation"],
(.[]
| [.Name]
+ (.Car|s)
+ (.Location|s)
+ (.Repairs|s|map(if . == null then . else "RepairsCompleted:\(.)" end)))
| #csv

Related

How to spare & group objects according key-value

I'm new on this, those are my first steps. I guess I've started with a not simple case.
Let's see:
I have objects, with an ID (name) and a resource group (rgs). Each object may be part of several groups. And what a do need is to get the intersections of the groups.
It is important to say that the object may part of several groups, which are parent-child groups, and I just need to get the parent group. It is easy to identify the parenthoods as they share prefixes.
e.g. Group PROM_FD_ARCNA contains the child groups PROM_FD_ARCNA_TGM and PROM_FD_ARCNA_TGM_TGA.
And the child groups contains the objects itself. But, as long as I can get the information from object, it is over.
The parent groups are PROM_FD_ARCNA, PROM_JOB_ICMP and PROM_JOB_WIN. That is to say, I need to get those objects which belong to the intersections of those groups.
The JSON file which looks like:
[
{
"id_ci": "487006",
"name": "LABTNSARWID625",
"id_ci_class": "host",
"rgs": "PROM_FD_ARCNA, PROM_FD_ARCNA_TGM, PROM_FD_ARCNA_TGM_TGA"
},
{
"id_ci": "5706",
"name": "HCCQ2001",
"id_ci_class": "host",
"rgs": "PROM_JOB_ICMP"
},
{
"id_ci": "9106",
"name": "HCC02155",
"id_ci_class": "host",
"rgs": "PROM_FD_ARCNA, PROM_FD_ARCNA_TGA, PROM_JOB_ICMP"
},
{
"id_ci": "2306",
"name": "VM00006",
"id_ci_class": "host",
"rgs": "PROM_FD_ARCNA, PROM_FD_ARCNA_TGA, PROM_JOB_WIN, PROM_JOB_WIN_TGA"
}
]
If my explanation was not good, I need to get a JSON like this:
PROM_FD_ARCNA, PROM_JOB_ICMP
{
"HCC02155"
}
PROM_FD_ARCNA, PROM_JOB_WIN
{
"VM00006"
}
As those are the intersections.
So far, I tried this:
jq '[.[] | select(.id_ci_class == "host") | select (.rgs | startswith("PROM_FD_ARCNA")) | .rgs = "PROM_FD_ARCNA"]
| group_by(.rgs) | map({"rgs": .[0].rgs, "Hosts": map(.name)}) ' ./prom_jobs.json >> Step0A.json
jq '[.[] | select(.id_ci_class == "host") | select (.rgs | startswith("PROM_JOB_WIN")) | .rgs = "PROM_JOB_WIN"]
| group_by(.rgs) | map({"rgs": .[0].rgs, "Hosts": map(.name)}) ' ./prom_jobs.json >> Step0A.json
jq '[.[] | select(.id_ci_class == "host") | select (.rgs | startswith("PROM_JOB_ICMP")) | .rgs = "PROM_JOB_ICMP"]
| group_by(.rgs) | map({"rgs": .[0].rgs, "Hosts": map(.name)}) ' ./prom_jobs.json >> Step0A.json
And the result is:
[
{
"rgs": "PROM_FD_ARCNA",
"Hosts": [
"LABTNSARWID625",
"HCC02155",
"VM00006"
]
}
]
[
{
"rgs": "PROM_JOB_WIN",
"Hosts": [
"VM00006"
]
}
]
[
{
"rgs": "PROM_JOB_ICMP",
"Hosts": [
"HCCQ2001",
"HCC02155"
]
}
]
Of course, the full JSON is quite long and I need to process this as lightweight as possible. Don't know if I've started well or bad.
def to_set(s): reduce s as $_ ( {}; .[ $_ ] = true );
[ "PROM_FD_ARCNA", "PROM_JOB_ICMP", "PROM_JOB_WIN" ] as $roots |
map(
{
name,
has_rg: to_set( .rgs | split( ", " )[] )
}
) as $hosts |
[
range( 0; $roots | length ) as $i | $roots[ $i ] as $g1 |
range( $i+1; $roots | length ) as $j | $roots[ $j ] as $g2 |
{
root_rgs: [ $g1, $g2 ],
names: [
$hosts[] |
select( .has_rg[ $g1 ] and .has_rg[ $g2 ] ) |
.name
]
} |
select( .names | length > 0 )
]
produces
[
{
"root_rgs": [
"PROM_FD_ARCNA",
"PROM_JOB_ICMP"
],
"names": [
"HCC02155"
]
},
{
"root_rgs": [
"PROM_FD_ARCNA",
"PROM_JOB_WIN"
],
"names": [
"VM00006"
]
}
]
Demo on jqplay

Using jq to convert json to csv

I'm trying to come up with the correct jq syntax to convert json to csv.
Desired results:
<email>,<id>,<name>
e.g.
user1#whatever.nevermind.no,0,general
user2#whatever.nevermind.no,0,general
user1#whatever.nevermind.no,1,local
...
note that also need to ignore objects with empty "agent_priorities"
Input
[
{
"id": 0,
"name": "General",
"agent_priorities": {
"user1#whatever.nevermind.no": "normal",
"user2#whatever.nevermind.no": "normal"
}
},
{
"id": 1,
"name": "local",
"agent_priorities": {
"user1#whatever.nevermind.no": "normal"
}
},
{
"id": 2,
"name": "Engineering",
}
]
The following variant of the accepted answer checks for the existence of the "agent_priorities" key as per the requirements, and uses keys_unsorted to preserve the order of the keys:
jq -r '
.[]
| select(has("agent_priorities"))
| .id as $id
| .name as $name
| .agent_priorities
| keys_unsorted[]
| [., $id, $name ]
| #csv
' file.json
Store the id and name in variables, then iterate over the keys of agent_priorities:
jq -r '.[]
| .id as $id
| .name as $name
| .agent_priorities
| keys
| .[]
| [., $id, $name ]
| #csv
' file.json

Convert JSON to vertical table

I have the below payload, and what I am trying to produce is a horizontal column output like such, with a newline between entries. Does someone know how this can be achieved? Either in jq directly or with some interesting bash. :
StackId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
EventId : 97b2fbf0-75a3-11ea-bb77-0e8a861a6983
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : cbongiorno-30800-bb-lambda
PhysicalResourceId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
ResourceType : AWS::CloudFormation::Stack
Timestamp : 2020-04-03T12:06:47.501Z
ResourceStatus : CREATE_IN_PROGRESS
ResourceStatusReason : User Initiated
EventId : BBPassword-CREATE_IN_PROGRESS-2020-04-03T12:06:51.336Z
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : BBPassword
PhysicalResourceId :
ResourceType : AWS::SSM::Parameter
Timestamp : 2020-04-03T12:06:51.336Z
ResourceStatus : CREATE_IN_PROGRESS
Here is the 2 commands I am using to produce output but neither is ideal.
I have deleted a key that's usually filled with JSON and it just messes everything up.
The first example I insert a delimiter that I am hoping I can use later to strip out
The second example gives me an error xargs: unterminated quote:
In both cases I hardcoded the format length. But for the curious, it can be done as such: jq -re '.StackEvents | map(to_entries | map(.key | length) | max) | max'
jq -re '.StackEvents | .[] | del(.ResourceProperties) | . * {"entry":"---"} | to_entries | .[] | "\(.key) \"\(.value?)\""' bin/logs/3.json | xargs -n 2 printf "%-21s: %s\n"
jq -re '.StackEvents | .[] | del(.ResourceProperties) | . * {"":"\n"} | to_entries | .[] | "\(.key) \"\(.value?)\""' bin/logs/3.json | xargs -n 2 printf "%-21s: %s\n"
Here is the payload:
{
"StackEvents": [
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBWebhookLogGroup-CREATE_IN_PROGRESS-2020-04-03T12:06:51.884Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBWebhookLogGroup",
"PhysicalResourceId": "cbongiorno-30800-bb-lambda",
"ResourceType": "AWS::Logs::LogGroup",
"Timestamp": "2020-04-03T12:06:51.884Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceStatusReason": "Resource creation Initiated",
"ResourceProperties": "{\"RetentionInDays\":\"7\",\"LogGroupName\":\"cbongiorno-30800-bb-lambda\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBUserName-CREATE_IN_PROGRESS-2020-04-03T12:06:51.509Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBUserName",
"PhysicalResourceId": "",
"ResourceType": "AWS::SSM::Parameter",
"Timestamp": "2020-04-03T12:06:51.509Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceProperties": "{\"Type\":\"String\",\"Description\":\"The username for this lambda to operate under\",\"Value\":\"chb0bitbucket\",\"Name\":\"/bb-webhooks/authorization/username\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBWebhookLogGroup-CREATE_IN_PROGRESS-2020-04-03T12:06:51.409Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBWebhookLogGroup",
"PhysicalResourceId": "",
"ResourceType": "AWS::Logs::LogGroup",
"Timestamp": "2020-04-03T12:06:51.409Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceProperties": "{\"RetentionInDays\":\"7\",\"LogGroupName\":\"cbongiorno-30800-bb-lambda\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBPassword-CREATE_IN_PROGRESS-2020-04-03T12:06:51.336Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBPassword",
"PhysicalResourceId": "",
"ResourceType": "AWS::SSM::Parameter",
"Timestamp": "2020-04-03T12:06:51.336Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceProperties": "{\"Type\":\"String\",\"Description\":\"The password for this lambda to operate under with BB. Unfortunately, using an encrypted password is currently not possible\",\"Value\":\"****\",\"Name\":\"/bb-webhooks/authorization/password\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "97b2fbf0-75a3-11ea-bb77-0e8a861a6983",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "cbongiorno-30800-bb-lambda",
"PhysicalResourceId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"ResourceType": "AWS::CloudFormation::Stack",
"Timestamp": "2020-04-03T12:06:47.501Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceStatusReason": "User Initiated"
}
]
}
Based on input from others, I have put together a simple bash script illustrating a tiny anomaly (the column width isn't uniform):
#!/usr/bin/env bash
set -e
set -o pipefail
fileCount=$(( $( ls -1 logs/*.json | wc -l) - 1))
for i in $(seq 1 $fileCount); do
jq -rs '
def width: map(keys_unsorted | map(length) | max) | max ;
def pad($w): . + (($w-length)*" ") ;
.[1].StackEvents - .[0].StackEvents | sort_by (.Timestamp)
| width as $w | map(to_entries | map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]
' "logs/$((i - 1)).json" "logs/$i.json"
done
Yields:
StackId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
EventId : ApiKey-CREATE_COMPLETE-2020-04-03T12:07:47.382Z
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : ApiKey
PhysicalResourceId : KYgzCNAzPw5Tsy3dKBdoTaHlxywijTSrb1d2UIQ2
ResourceType : AWS::ApiGateway::ApiKey
Timestamp : 2020-04-03T12:07:47.382Z
ResourceStatus : CREATE_COMPLETE
ResourceProperties : {"StageKeys":[{"StageName":"beta","RestApiId":"8n6tijwaib"}]}
StackId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
EventId : bc9371c0-75a3-11ea-b442-1217092af407
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : cbongiorno-30800-bb-lambda
PhysicalResourceId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
ResourceType : AWS::CloudFormation::Stack
Timestamp : 2020-04-03T12:07:49.203Z
ResourceStatus : CREATE_COMPLETE
Here is a solution with some helper functions that can be generalized for other uses.
def width: map(keys | map(length) | max) | max ;
def pad($w): . + (($w-length)*" ") ;
.StackEvents
| width as $w
| map(del(.ResourceProperties) | to_entries | map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]
It should produce the desired output if jq is passed -r
Try it online!
EDIT: as peak and oguz ismail point out in the comments this solution could be improved using keys_unsorted and should exclude .ResourceProperties from the width calculation.
Here is version with those improvements:
def width: map(keys_unsorted | map(length) | max) | max ;
def pad($w): . + (($w-length)*" ") ;
.StackEvents
| map(del(.ResourceProperties))
| width as $w
| map(to_entries | map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]
Try it online!
JQ doesn't have a builtin for padding strings but it's not that hard to implement that functionality. Given -r/--raw-output option on the command line, below script will produce your desired output.
.StackEvents
| map(del(.ResourceProperties))
| ( [ .[] | keys_unsorted[] ]
| map(length)
| max + 1
) as $max
| .[]
| ( keys_unsorted as $keys
| [ $keys,
( $keys
| map(length)
| map($max - .)
| map(. * " " + ": ")
),
map(.)
]
| transpose[]
| add
),
""
Online demo
Here is a solution that:
uses max/1 for efficiency
addresses some of the issues with
calculating the "width" of Unicode strings, e.g. if we want the
"width" of:
"J̲o̲s̲é̲" to be calculated as 4
Note that the jq filter grapheme_length as defined here ignores
issues with control characters and zero-width spaces.
Generic functions
def max(stream):
reduce stream as $x (null; if . == null then $x elif $x > . then $x else . end);
# Grapheme Length ignoring issues with control characters
# Mn = non-spacing mark
# Mc = combining
# Cf = soft-hyphen, bidi control characters, and language tag characters
def grapheme_length:
gsub("\\p{Mn}";"") | gsub("\\p{Mc}";"") | gsub("\\p{Cf}";"")
| length;
def pad($w): tostring + (($w - grapheme_length)*" ") ;
Main program
.StackEvents
| max(.[]
| keys_unsorted[]
| select(. != "ResourceProperties")
| grapheme_length) as $w
| map(del(.ResourceProperties)
| to_entries
| map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]

Json to CSV conversion with value as headers

I have a below JSON file and need to convert to CSV file with some values as headers and below that values should get populated. Below is the sample json
{
"environments" : [ {
"dimensions" : [ {
"metrics" : [ {
"name" : "count",
"values" : [ "123" ]
}, {
"name" : "response_time",
"values" : [ "15.7" ]
}],
"name" : "abcd"
}, {
"metrics" : [ {
"name" : "count",
"values" : [ "456" ]
}, {
"name" : "response_time",
"values" : [ "18.7" ]
}],
"name" : "xyzz"
}
This is what I have tried already
jq -r '.environments[].dimensions[] | .name as $p_name | .metrics[] | .name as $val_name | if $val_name == "response_time" then ($p_name,$val_name, .values[])' input.json
Expected out as
name,count,response_time
abcd, 123, 15.7
xyzz, 456, 18.7
If the goal is to rely on the JSON itself to supply the header names in whatever order the "metrics" arrays present them,
then consider:
.environments[].dimensions
| ["name", (.[0] | .metrics[] | .name)], # first emit the headers
( .[] | [.name, (.metrics[].values[0])] ) # ... and then the data rows
| #csv
Generating the headers is easy, so I'll focus on generating the rest of the CSV.
The following has the advantage of being straightforward and will hopefully be more-or-less self-explanatory, at least with the jq manual at the ready. A tweak with an eye to efficiency follows.
jq -r '
# name,count,response_time
.environments[].dimensions[]
| .name as $p_name
| .metrics
| [$p_name]
+ map(select(.name == "count") | .values[0] )
+ map(select(.name == "response_time") | .values[0] )
| #csv
'
Efficiency
Here's a variant of the above which would be appropriate if the .metrics array had a large number of items:
jq -r '
# name,count,response_time
.environments[].dimensions[]
| .name as $p_name
| INDEX(.metrics[]; .name) as $dict
| [$p_name, $dict["count"].values[0], $dict["response_time"].values[0]]
| #csv
'

How to convert nested JSON to CSV using only jq

I've following json,
{
"A": {
"C": {
"D": "T1",
"E": 1
},
"F": {
"D": "T2",
"E": 2
}
},
"B": {
"C": {
"D": "T3",
"E": 3
}
}
}
I want to convert it into csv as follows,
A,C,T1,1
A,F,T2,2
B,C,T3,3
Description of output: The parents keys will be printed until, I've reached the leaf child. Once I reached leaf child, print its value.
I've tried following and couldn't succeed,
cat my.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $rows[] | #csv'
and it throwing me an error.
I can't hardcode the parent keys, as the actual json has too many records. But the structure of the json is similar. What am I missing?
Some of the requirements are unclear, but the following solves one interpretation of the problem:
paths as $path
| {path: $path, value: getpath($path)}
| select(.value|type == "object" )
| select( [.value[]][0] | type != "object")
| .path + ([.value[]])
| #csv
(This program could be optimized but the presentation here is intended to make the separate steps clear.)
Invocation:
jq -r -f leaves-to-csv.jq input.json
Output:
"A","C","T1",1
"A","F","T2",2
"B","C","T3",3
Unquoted strings
To avoid the quotation marks around strings, you could replace the last component of the pipeline above with:
join(",")
Here is a solution using tostream and group_by
[
tostream
| select(length == 2) # e.g. [["A","C","D"],"T1"]
| .[0][:-1] + [.[1]] # ["A","C","T1"]
]
| group_by(.[:-1]) # [[["A","C","T1"],["A","C",1]],...
| .[] # [["A","C","T1"],["A","C",1]]
| .[0][0:2] + map(.[-1]|tostring) # ["A","C","T1","1"]
| join(",") # "A,C,T1,1"