Convert JSON to vertical table - json

I have the below payload, and what I am trying to produce is a horizontal column output like such, with a newline between entries. Does someone know how this can be achieved? Either in jq directly or with some interesting bash. :
StackId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
EventId : 97b2fbf0-75a3-11ea-bb77-0e8a861a6983
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : cbongiorno-30800-bb-lambda
PhysicalResourceId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
ResourceType : AWS::CloudFormation::Stack
Timestamp : 2020-04-03T12:06:47.501Z
ResourceStatus : CREATE_IN_PROGRESS
ResourceStatusReason : User Initiated
EventId : BBPassword-CREATE_IN_PROGRESS-2020-04-03T12:06:51.336Z
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : BBPassword
PhysicalResourceId :
ResourceType : AWS::SSM::Parameter
Timestamp : 2020-04-03T12:06:51.336Z
ResourceStatus : CREATE_IN_PROGRESS
Here is the 2 commands I am using to produce output but neither is ideal.
I have deleted a key that's usually filled with JSON and it just messes everything up.
The first example I insert a delimiter that I am hoping I can use later to strip out
The second example gives me an error xargs: unterminated quote:
In both cases I hardcoded the format length. But for the curious, it can be done as such: jq -re '.StackEvents | map(to_entries | map(.key | length) | max) | max'
jq -re '.StackEvents | .[] | del(.ResourceProperties) | . * {"entry":"---"} | to_entries | .[] | "\(.key) \"\(.value?)\""' bin/logs/3.json | xargs -n 2 printf "%-21s: %s\n"
jq -re '.StackEvents | .[] | del(.ResourceProperties) | . * {"":"\n"} | to_entries | .[] | "\(.key) \"\(.value?)\""' bin/logs/3.json | xargs -n 2 printf "%-21s: %s\n"
Here is the payload:
{
"StackEvents": [
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBWebhookLogGroup-CREATE_IN_PROGRESS-2020-04-03T12:06:51.884Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBWebhookLogGroup",
"PhysicalResourceId": "cbongiorno-30800-bb-lambda",
"ResourceType": "AWS::Logs::LogGroup",
"Timestamp": "2020-04-03T12:06:51.884Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceStatusReason": "Resource creation Initiated",
"ResourceProperties": "{\"RetentionInDays\":\"7\",\"LogGroupName\":\"cbongiorno-30800-bb-lambda\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBUserName-CREATE_IN_PROGRESS-2020-04-03T12:06:51.509Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBUserName",
"PhysicalResourceId": "",
"ResourceType": "AWS::SSM::Parameter",
"Timestamp": "2020-04-03T12:06:51.509Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceProperties": "{\"Type\":\"String\",\"Description\":\"The username for this lambda to operate under\",\"Value\":\"chb0bitbucket\",\"Name\":\"/bb-webhooks/authorization/username\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBWebhookLogGroup-CREATE_IN_PROGRESS-2020-04-03T12:06:51.409Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBWebhookLogGroup",
"PhysicalResourceId": "",
"ResourceType": "AWS::Logs::LogGroup",
"Timestamp": "2020-04-03T12:06:51.409Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceProperties": "{\"RetentionInDays\":\"7\",\"LogGroupName\":\"cbongiorno-30800-bb-lambda\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "BBPassword-CREATE_IN_PROGRESS-2020-04-03T12:06:51.336Z",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "BBPassword",
"PhysicalResourceId": "",
"ResourceType": "AWS::SSM::Parameter",
"Timestamp": "2020-04-03T12:06:51.336Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceProperties": "{\"Type\":\"String\",\"Description\":\"The password for this lambda to operate under with BB. Unfortunately, using an encrypted password is currently not possible\",\"Value\":\"****\",\"Name\":\"/bb-webhooks/authorization/password\"}"
},
{
"StackId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"EventId": "97b2fbf0-75a3-11ea-bb77-0e8a861a6983",
"StackName": "cbongiorno-30800-bb-lambda",
"LogicalResourceId": "cbongiorno-30800-bb-lambda",
"PhysicalResourceId": "arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983",
"ResourceType": "AWS::CloudFormation::Stack",
"Timestamp": "2020-04-03T12:06:47.501Z",
"ResourceStatus": "CREATE_IN_PROGRESS",
"ResourceStatusReason": "User Initiated"
}
]
}
Based on input from others, I have put together a simple bash script illustrating a tiny anomaly (the column width isn't uniform):
#!/usr/bin/env bash
set -e
set -o pipefail
fileCount=$(( $( ls -1 logs/*.json | wc -l) - 1))
for i in $(seq 1 $fileCount); do
jq -rs '
def width: map(keys_unsorted | map(length) | max) | max ;
def pad($w): . + (($w-length)*" ") ;
.[1].StackEvents - .[0].StackEvents | sort_by (.Timestamp)
| width as $w | map(to_entries | map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]
' "logs/$((i - 1)).json" "logs/$i.json"
done
Yields:
StackId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
EventId : ApiKey-CREATE_COMPLETE-2020-04-03T12:07:47.382Z
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : ApiKey
PhysicalResourceId : KYgzCNAzPw5Tsy3dKBdoTaHlxywijTSrb1d2UIQ2
ResourceType : AWS::ApiGateway::ApiKey
Timestamp : 2020-04-03T12:07:47.382Z
ResourceStatus : CREATE_COMPLETE
ResourceProperties : {"StageKeys":[{"StageName":"beta","RestApiId":"8n6tijwaib"}]}
StackId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
EventId : bc9371c0-75a3-11ea-b442-1217092af407
StackName : cbongiorno-30800-bb-lambda
LogicalResourceId : cbongiorno-30800-bb-lambda
PhysicalResourceId : arn:aws:cloudformation:us-east-1:882038671278:stack/cbongiorno-30800-bb-lambda/97b14e40-75a3-11ea-bb77-0e8a861a6983
ResourceType : AWS::CloudFormation::Stack
Timestamp : 2020-04-03T12:07:49.203Z
ResourceStatus : CREATE_COMPLETE

Here is a solution with some helper functions that can be generalized for other uses.
def width: map(keys | map(length) | max) | max ;
def pad($w): . + (($w-length)*" ") ;
.StackEvents
| width as $w
| map(del(.ResourceProperties) | to_entries | map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]
It should produce the desired output if jq is passed -r
Try it online!
EDIT: as peak and oguz ismail point out in the comments this solution could be improved using keys_unsorted and should exclude .ResourceProperties from the width calculation.
Here is version with those improvements:
def width: map(keys_unsorted | map(length) | max) | max ;
def pad($w): . + (($w-length)*" ") ;
.StackEvents
| map(del(.ResourceProperties))
| width as $w
| map(to_entries | map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]
Try it online!

JQ doesn't have a builtin for padding strings but it's not that hard to implement that functionality. Given -r/--raw-output option on the command line, below script will produce your desired output.
.StackEvents
| map(del(.ResourceProperties))
| ( [ .[] | keys_unsorted[] ]
| map(length)
| max + 1
) as $max
| .[]
| ( keys_unsorted as $keys
| [ $keys,
( $keys
| map(length)
| map($max - .)
| map(. * " " + ": ")
),
map(.)
]
| transpose[]
| add
),
""
Online demo

Here is a solution that:
uses max/1 for efficiency
addresses some of the issues with
calculating the "width" of Unicode strings, e.g. if we want the
"width" of:
"J̲o̲s̲é̲" to be calculated as 4
Note that the jq filter grapheme_length as defined here ignores
issues with control characters and zero-width spaces.
Generic functions
def max(stream):
reduce stream as $x (null; if . == null then $x elif $x > . then $x else . end);
# Grapheme Length ignoring issues with control characters
# Mn = non-spacing mark
# Mc = combining
# Cf = soft-hyphen, bidi control characters, and language tag characters
def grapheme_length:
gsub("\\p{Mn}";"") | gsub("\\p{Mc}";"") | gsub("\\p{Cf}";"")
| length;
def pad($w): tostring + (($w - grapheme_length)*" ") ;
Main program
.StackEvents
| max(.[]
| keys_unsorted[]
| select(. != "ResourceProperties")
| grapheme_length) as $w
| map(del(.ResourceProperties)
| to_entries
| map("\(.key|pad($w)) : \(.value)"), [""])
| .[][]

Related

Windows version fails where jqplay.org works

I've been using jq to parse the output from AWS cli.
The output looks something like this..
{
"Vpcs": [
{
"CidrBlock": "10.29.19.64/26",
"State": "available",
"VpcId": "vpc-0ba51bd29c41d41",
"IsDefault": false,
"Tags": [
{
"Key": "Name",
"Value": "CloudEndure-Europe-Development"
}
]
}
]}
and the script I am using looks like this..
.Vpcs[] | [.VpcId, .CidrBlock, (.Tags[]|select(.Key=="Name")|.Value)]
If I run it under Windows it fails like this.
jq: error: Name/0 is not defined at , line 1:
.Vpcs[] | [.VpcId, .CidrBlock, (.Tags[]|select(.Key==Name)|.Value)]
jq: 1 compile error
But it works fine in jqplay.org.
Any ideas, on Windows Im using jq-1.6.
Thanks
Bruce.
The correct jq program is
.Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == "Name" ) | .Value ) ]
You didn't show the command you used, but you provided the following to jq:
.Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == Name ) | .Value ) ]
That's incorrect. (Notice the missing quotes.)
Not only did you not provide what command you used, you didn't specify whether it was being provided to the Windows API (CreateProcess), Windows Shell (cmd) or Power Shell.
I'm guessing cmd. In order to provide the above program to jq, you can use the following cmd command:
jq ".Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == \"Name\" ) | .Value ) ]" file.json
I'm not agreeing to ikegami about the CMD command that [he/she?] provided because the character used for CMD escaping is ^, not \ like Assembly/C/C++. I hope this will work (I don't want to test this on my potato thing):
jq .Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == "Name" ) | .Value ) ] file.json
or this:
jq .Vpcs[] | [.VpcId, .CidrBlock, ( .Tags[] | select( .Key == ^"Name^" ) | .Value ) ] file.json

JQ to CSV issues

I previously got some help on here for some jq to csv issues. I ran into an issue where a few json files had some extra values that breaks the jq command
Here is the json data. The repairs section is what breaks the jq command
[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas",
"Repairs: {
"RepairLocations": {
"RepairsCompleted":[
"Fix1",
"Fix2"
]
}
}
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
Here is the command
def expand($keys):
. as $in
| reduce $keys[] as $k ( [{}];
map(. + {
($k): ($in[$k] | if type == "array" then .[] else . end)
})
) | .[];
(.[0] | keys_unsorted) as $h
| $h, (.[] | expand($h) | [.[$h[]]]) | #csv
This is the end result i am trying to get. This data isnt actual data.
Name,Car,Location,Repairs:RepairLocation
John Doe,Car1,Texas,RepairsCompleted:Fix1
John Doe,Car1,Texas,RepairsCompleted:Fix2
John Doe,Car2,Texas,RepairsCompleted:Fix1
John Doe,Car2,Texas,RepairsCompleted:Fix2
Jane Roe,Car1,Illinois,
Jane Roe,Car1,Kansas,
Any advice on this would be great. I am struggling to figure jq out
A simple solution can be obtained using the same technique shown in one of the answers to the similar question you already asked. The only difference is fulfilling your requirements in the case where the "Repairs" key does not exist:
["Name", "Car", "Location", "Repairs:RepairLocation"],
(.[]
| [.Name]
+ (.Car|..|scalars|[.])
+ (.Location|..|scalars|[.])
+ (.Repairs|..|scalars
| [if . == null then . else "RepairsCompleted:\(.)" end]) )
| #csv
Avoiding the repetition with a helper function
def s: .. | scalars | [.];
["Name", "Car", "Location", "Repairs:RepairLocation"],
(.[]
| [.Name]
+ (.Car|s)
+ (.Location|s)
+ (.Repairs|s|map(if . == null then . else "RepairsCompleted:\(.)" end)))
| #csv

Json to CSV conversion with value as headers

I have a below JSON file and need to convert to CSV file with some values as headers and below that values should get populated. Below is the sample json
{
"environments" : [ {
"dimensions" : [ {
"metrics" : [ {
"name" : "count",
"values" : [ "123" ]
}, {
"name" : "response_time",
"values" : [ "15.7" ]
}],
"name" : "abcd"
}, {
"metrics" : [ {
"name" : "count",
"values" : [ "456" ]
}, {
"name" : "response_time",
"values" : [ "18.7" ]
}],
"name" : "xyzz"
}
This is what I have tried already
jq -r '.environments[].dimensions[] | .name as $p_name | .metrics[] | .name as $val_name | if $val_name == "response_time" then ($p_name,$val_name, .values[])' input.json
Expected out as
name,count,response_time
abcd, 123, 15.7
xyzz, 456, 18.7
If the goal is to rely on the JSON itself to supply the header names in whatever order the "metrics" arrays present them,
then consider:
.environments[].dimensions
| ["name", (.[0] | .metrics[] | .name)], # first emit the headers
( .[] | [.name, (.metrics[].values[0])] ) # ... and then the data rows
| #csv
Generating the headers is easy, so I'll focus on generating the rest of the CSV.
The following has the advantage of being straightforward and will hopefully be more-or-less self-explanatory, at least with the jq manual at the ready. A tweak with an eye to efficiency follows.
jq -r '
# name,count,response_time
.environments[].dimensions[]
| .name as $p_name
| .metrics
| [$p_name]
+ map(select(.name == "count") | .values[0] )
+ map(select(.name == "response_time") | .values[0] )
| #csv
'
Efficiency
Here's a variant of the above which would be appropriate if the .metrics array had a large number of items:
jq -r '
# name,count,response_time
.environments[].dimensions[]
| .name as $p_name
| INDEX(.metrics[]; .name) as $dict
| [$p_name, $dict["count"].values[0], $dict["response_time"].values[0]]
| #csv
'

Pad JSON array with JQ to obtain rectangular result

I have json that looks like this (jq play in the link), and I want to build csv in the end looking like this (reproducible sample at the bottom).
"SO302993",items1,item2,item3.1,item3.2,item3.3, item3.4,...
"SO302994",items1,item2,item3.1,item3.2, , ,...
"SO302995",items1,item2,item3.1,item3.2,item3.3, ,...
item3 elements are in an array and my current solution:
.[] | [.number, .item1, item2, item3[]?]
gives me this:
"SO302993",items1,item2,item3.1,item3.2,item3.3, item3.4,...
"SO302994",items1,item2,item3.1,item3.2,...
"SO302995",items1,item2,item3.1,item3.2,item3.3,...
which will create an uneven number of columns in the csv.
I tried adding .item3[:]? in a Python flavor-style, but it didn't work.
Any help would be much appreciated! And if I wasn't clear do ask to clarify! My snippet and toy data are in the link above.
{
"items": [
{
"name": "Mr Simon Mackin",
"country_of_residence": "Scotland",
"natures_of_control": [
"voting-rights-25-to-50-percent-limited-liability-partnership",
"significant-influence-or-control-limited-liability-partnership"
],
"premises": "4"
}
]
}
{
"items": [
{
"name": "Mrs Simonne Mackinni",
"country_of_residence": "France",
"natures_of_control": [
"significant-influence-or-control-limited-liability-partnership"
],
"premises": "4"
}
]
}
with this query:
.items[] | [.name, .country_of_residence, .natures_of_control[]?, .premises] | #csv
I get this results
"Mr Simon Mackin","Scotland","voting-rights","significant-influence","4"
"Mrs Simonne Mackinni","France","significant-influence","4"
But I'd like to get this (second line has extra comma after "significant-influence).
"Mr Simon Mackin","Scotland","voting-rights","significant-influence","4"
"Mrs Simonne Mackinni","France","significant-influence",,"4"
Since you want a rectangular result, you will have to "pad" the "natures_of_control" array. Based on the sample input, you will need to "slurp" the input in order to obtain a global maximum.
To pad the array, you could use the helper function:
# emit a stream of exactly $n items
def pad($n): range(0;$n) as $i | .[$i];
The solution to the problem as posted on jqplay then becomes:
([.[] | .items[] | .natures_of_control | length] | max) as $mx
| .[]
| (.active_count) as $active_count
| (.ceased_count) as $ceased_count
| (.links.self | split("/")[2]) as $companyCode
| .items[]
| [$companyCode, $active_count, $ceased_count, .name, .country_of_residence, .nationality, .notified_on, (.natures_of_control | pad($mx))]
| #csv
Invocation
The appropriate invocation would look like this:
jq -sr -f program.jq input.json
Handling missing data
To ignore objects that have no "items" you could tweak the above, e.g. as follows:
([.[] | .items[]? | .natures_of_control | length] | max) as $mx
| .[]
| select(.items)
| (.active_count) as $active_count
| (.ceased_count) as $ceased_count
| (.links.self | split("/")[2]) as $companyCode
| .items[]
| [$companyCode, $active_count, $ceased_count, .name, .country_of_residence, .nationality, .notified_on, (.natures_of_control | pad($mx))]
| #csv

How to convert nested JSON to CSV using only jq

I've following json,
{
"A": {
"C": {
"D": "T1",
"E": 1
},
"F": {
"D": "T2",
"E": 2
}
},
"B": {
"C": {
"D": "T3",
"E": 3
}
}
}
I want to convert it into csv as follows,
A,C,T1,1
A,F,T2,2
B,C,T3,3
Description of output: The parents keys will be printed until, I've reached the leaf child. Once I reached leaf child, print its value.
I've tried following and couldn't succeed,
cat my.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $rows[] | #csv'
and it throwing me an error.
I can't hardcode the parent keys, as the actual json has too many records. But the structure of the json is similar. What am I missing?
Some of the requirements are unclear, but the following solves one interpretation of the problem:
paths as $path
| {path: $path, value: getpath($path)}
| select(.value|type == "object" )
| select( [.value[]][0] | type != "object")
| .path + ([.value[]])
| #csv
(This program could be optimized but the presentation here is intended to make the separate steps clear.)
Invocation:
jq -r -f leaves-to-csv.jq input.json
Output:
"A","C","T1",1
"A","F","T2",2
"B","C","T3",3
Unquoted strings
To avoid the quotation marks around strings, you could replace the last component of the pipeline above with:
join(",")
Here is a solution using tostream and group_by
[
tostream
| select(length == 2) # e.g. [["A","C","D"],"T1"]
| .[0][:-1] + [.[1]] # ["A","C","T1"]
]
| group_by(.[:-1]) # [[["A","C","T1"],["A","C",1]],...
| .[] # [["A","C","T1"],["A","C",1]]
| .[0][0:2] + map(.[-1]|tostring) # ["A","C","T1","1"]
| join(",") # "A,C,T1,1"