Convert file paths into JSON structure using bash / jq - json

Can we convert the below example using jq for bash (https://stedolan.github.io/jq/)?
The requirement is to convert the file paths into json as given in the below example
const data = [
"/parent/child1/grandchild1"
"/parent/child1/grandchild2"
"/parent/child2/grandchild1"
];
const output = {};
let current;
for (const path of data) {
current = output;
for (const segment of path.split('/')) {
if (segment !== '') {
if (!(segment in current)) {
current[segment] = {};
}
current = current[segment];
}
}
}
console.log(output);

The following assumes:
the input is a valid JSON array of "/"-style pathnames of files;
pathnames are all absolute (i.e., begin with "/").
reduce .[] as $entry ({};
($entry | split("/") ) as $names
| $names[1:-1] as $p
| setpath($p; getpath($p) + [$names[-1]]) )
Example
Input
[
"/parent/child1/grandchild1",
"/parent/child1/grandchild2",
"/parent/child2/grandchild3",
"/parent/child2/grandchild4",
"/parent2/child2/grandchild5"
]
Output
{
"parent": {
"child1": [
"grandchild1",
"grandchild2"
],
"child2": [
"grandchild3",
"grandchild4"
]
},
"parent2": {
"child2": [
"grandchild5"
]
}
}

Related

Terraform: Iterate over keys in JSON file

Using Terraform I need to loop over some JSON and create some files.
This is the file I'm reading in:
{
"files": {
"file1": {
"a": {
"unusedValue": "val"
}
},
"file2": {
"a": {
"unusedValue": "val"
},
"b": {
"unusedValue": "val"
}
}
}
}
I can't change the format of this file, I need to use to create 3 files:
file1a
file2a
file2b
At the minute, I've got this:
locals {
json = jsondecode(file("files.json"))
files = flatten([ for v in local.json.files:
[ for file, fileLetter in v:
{ "file" = file,
"fileLetter" = fileLetter}
]
])
}
# resource local_file file {
# for_each = { for idx, v in local.files: idx => v }
# content = "Temp content"
# filename = "${path.module}/${each.value.file}-${each.value.fileLetter}"
#}
output myout {
value = local.files
}
But it's giving the wrong output - it's taking the contents of the second object rather than it's name and the first bit isn't using the fileA, fileB, etc.
[
+ {
+ file = "a"
+ fileLetter = {
+ unusedValue = "val"
}
},
+ {
+ file = "a"
+ fileLetter = {
+ unusedValue = "val"
}
},
+ {
+ file = "b"
+ fileLetter = {
+ unusedValue = "val"
}
},
]
It should be:
[
+ {
+ file = "file1"
+ fileLetter = "a"
},
+ {
+ file = "file2",
+ fileLetter = "a"
},
+ {
+ file = "file2",
+ fileLetter = "b"
},
]
I think this should work:
files = flatten([ for file, fileObject in local.json.files:
[ for fileLetter, _ in fileObject:
{ "file" = file,
"fileLetter" = fileLetter}
]
])
With the original for v in local.json.files, you're iterating over only the inner objects, e.g.
{
"a": {
"unusedValue": "val"
}
}
Instead, you want to read both the key and the value from the JSON object in files. This can be done by using two temporary variables, file, fileObject. This is the same syntax you originally used in your file, fileLetter.

How to iterate through a unknown JSON data/object?

Given the two JSON examples
{
"A": {
"name": "noname",
"key": "nokey"
}
and then
{
"B": {
"property1": "value3",
"country": "australia"
}
}
Is it possible to create a Powershell script that could take either one of the JSON examples and loop through them without knowing any names/keys upfront? (Using Windows 2016)
Something similar to what is posted as an answer here How do I loop through or enumerate a JavaScript object?
var p = {
"p1": "value1",
"p2": "value2",
"p3": "value3"
};
for (var key in p) {
if (p.hasOwnProperty(key)) {
console.log(key + " -> " + p[key]);
}
}
Not working
Did try this, but it works only for the first level
$json = #"
{"A": {"property1": "value1", "property2": "value2"}, "B": {"property1":
"value3", "property2": "value4"}}
"#
$parsed = $json | ConvertFrom-Json
$parsed.PSObject.Properties | ForEach-Object {
$next = $_
$name = $_.Name
$value = $_.value
echo "$name = $value"
$next.PSObject.Properties | ForEach-Object {
$name = $_.Name
$value = $_.value
echo "Second level: $name = $value"
}
}
ConvertFrom-Json uses a nested structure of PSCustomObjects to represent the data - you can use $x.psobject.properties to get a collection of json property names and values for each item in the structure, and then you can loop over them however you want.
For example:
$x = ConvertFrom-Json "{ 'A': {'name': 'noname', 'key': 'nokey'} }"
foreach( $rootProperty in #($x.psobject.properties | where-object {$_.MemberType -eq "NoteProperty"}) )
{
write-host "'$($rootProperty.Name)' = '$($rootProperty.Value)'"
foreach( $childProperty in #($rootProperty.Value.psobject.properties | where-object {$_.MemberType -eq "NoteProperty"}) )
{
write-host "'$($childProperty.Name)' = '$($childProperty.Value)'"
}
}
outputs the following:
'A' = '#{name=noname; key=nokey}'
'name' = 'noname'
'key' = 'nokey'
You could also walk the structure recursively to an arbitrary depth if you convert the above into a function that calls itself on nested PSCustomObjects, but I'm not sure if you need to be able to do that or just go 2 levels like in your example json documents.

How do I add dynamically generated arrays with jq?

I'm using jq 1.6 on Windows 7 and want to add a dynamically generated array to a json file.
That array doesn't yet exist in this file.
I've got the following JSON structure (reduced for reasons of clarity):
{
"policies": {
"SearchBar": "separate",
"SearchEngines": {
"PreventInstalls": false
}
}
}
I'd like to generate an array based on dynamic values and finally create the following output:
{
"policies": {
"SearchBar": "separate",
"SearchEngines": {
"PreventInstalls": false,
"Remove": [
"Twitter",
"Wikipedia (en)"
]
}
}
}
The Remove array's content is stored in a (cmd) %variable%.
I found that the line
jq -n --arg items "%variable%" "{ Remove: $items | split(\",\") }"
produces the array I want:
{
"Remove": [
"Twitter",
"Wikipedia (en)"
]
}
What is the best way to insert this array into the original file?
Given the string input string Twitter,Wikipedia (en), you can use jq to update the JSON data:
<file jq --arg i 'Twitter,Wikipedia (en)' '.policies.SearchEngines += ({ Remove: $i | split(",") })'
{
"policies": {
"SearchBar": "separate",
"SearchEngines": {
"PreventInstalls": false,
"Remove": [
"Twitter",
"Wikipedia (en)"
]
}
}
}

I have a messy JSON that I am trying to clean up using jq

I have some messy JSON.
Some nodes are not consistent across rows. In some rows these nodes are arrays and in some these are objects or strings.
The example here is only two levels, but the actual data is nested many more levels.
Example:
[
{
"id": 1,
"person": {
"addresses": {
"address": {
"city": "FL"
}
},
"phones": [
{
"type": "mobile",
"number": "555-555-5555"
}
],
"email": [
{
"type": "work",
"email": "john.doe#gmail.com"
},
{
"type": "work",
"email": "john.doe#work.com"
}
]
}
},
{
"id": 2,
"person": {
"addresses": [
{
"type": "home",
"address": {
"city": "FL"
}
}
],
"phones": {
"type": "mobile",
"number": "555-555-5555"
},
"email": {
"type": "work",
"email": "jane.doe#gmail.com"
}
}
}
]
I would like to make the nodes consistent so that if any the node is an array in any of the nodes, then the remaining nodes should be converted into arrays.
Once the data is consistent, it would be easier to analyze and restructure the data.
Expected result:
[
{
"id": 1,
"person": {
"addresses": [
{
"address": {
"city": "FL"
}
}
],
"phones": [
{
"type": "mobile",
"number": "555-555-5555"
}
],
"email": [
{
"type": "work",
"email": "john.doe#gmail.com"
},
{
"type": "work",
"email": "john.doe#work.com"
}
]
}
},
{
"id": 2,
"person": {
"addresses": [
{
"type": "home",
"address": {
"city": "FL"
}
}
],
"phones": [
{
"type": "mobile",
"number": "555-555-5555"
}
],
"email": [
{
"type": "work",
"email": "jane.doe#gmail.com"
}
]
}
}
]
After making the arrays consistent I would like to flatten the data so that objects are flattened out but the arrays remain arrays. This
Expected result
[
{
"id": 1,
"person.addresses": [
{
"address": {
"city": "FL"
}
}
],
"person.phones": [
{
"type": "mobile",
"number": "555-555-5555"
}
],
"person.email": [
{
"type": "work",
"email": "john.doe#gmail.com"
},
{
"type": "work",
"email": "john.doe#work.com"
}
]
},
{
"id": 2,
"person.addresses": [
{
"type": "home",
"address": {
"city": "FL"
}
}
],
"person.phones": [
{
"type": "mobile",
"number": "555-555-5555"
}
],
"person.email": [
{
"type": "work",
"email": "jane.doe#gmail.com"
}
]
}
]
I was able to do this partially using jq. It works when there are one or two paths to be fixed, but when there are more than two it seems to break.
The approach I took
Identify all possible paths
Group and count the datatypes for each path
Identify cases where there are mixed datatypes
Sort the paths by decreasing depth
Exclude paths that do not have mixed types
Exclude paths where one of the mixed types is not an array
For each path apply the fix on the original data
This generates a stream containing N copies one for each N transformation
Extract the last copy which should contain the cleaned result
My Experiment so far
def fix(data; path):
data |= map(. | getpath(path)?=([getpath(path)?]|flatten));
def hist:
length as $l
| group_by (.)
| map( .
| (.|length) as $c
| {(.[0]):{
"count": $c,
"diff": ($l - $c)
}} )
| (length>1) as $mixed
| {
"types": .[],
"count": $l,
"mixed":$mixed
};
def summary:
map( .
| path(..) as $p
| {
path:$p,
type: getpath($p)|type,
key:$p|join(".")
}
)
| flatten
| group_by(.key)
| map( .
| {
key: .[0].key,
path: .[0].path,
depth: (.[0].path|length),
type:([(.[] | .type)]|hist)
}
)
| sort_by(.depth)
| reverse;
. as $data
| .
| summary
| map( .
| select(.type.mixed)
| select(.type.types| keys| contains(["array"]))
| .path)
| map(. as $path | $data | fix($data;$path))
| length as $l
| .[$l-1]
Only the last conversion is present. I think the $data is not getting updated by my fix and this is probably the root cause, or maybe I am just doing this wrong.
Here is e where this doesn't work
The following response first solves the first task, namely:
make the nodes consistent so that if any ... node is an array in any of the nodes, then the remaining nodes should be converted into arrays.
in a generic way:
def paths_to_array:
[paths as $path
| select( any(.[]; (getpath($path[1:] )? | type) == "array"))
| $path] ;
# If a path to a value in .[] is an array,
# then ensure all corresponding values are also arrays
def make_uniform:
reduce (paths_to_array[][1:]) as $path (.;
map( (getpath($path)? // null) as $value
| if $value and ($value|type != "array")
then setpath($path; [$value])
else . end ) ) ;
make_uniform
For the second task, let's define a utility function:
# Input is assumed to be an object:
def flatten_top_level_keys:
[ to_entries[]
| if (.value|type) == "object"
then .key as $k
| (.value|to_entries)[] as $kv
| {key: ($k + "." + $kv.key), value: $kv.value}
else .
end ]
| from_entries;
This can be used in conjunction with walk/1 to achieve recursive
flattening.
In other words, the solution to the combined problem can be obtained
by:
make_uniform
| walk( if type == "object" then flatten_top_level_keys else . end )
Efficiency
The above def of make_uniform suffers from an obvious efficiency issue in the line:
reduce (paths_to_array[][1:]) as $path (.;
Using jq's unique would be one way to resolve it, but unique is implemented using a sort, which in this case introduces another inefficiency. So let's use this old chestnut:
# bag of words
def bow(stream):
reduce stream as $word ({}; .[$word|tostring] += 1);
Now we can define make_uniform more efficiently:
def make_uniform:
def uniques(s): bow(s) | keys_unsorted[] | fromjson;
reduce uniques(paths_to_array[][1:]) as $path (.;
map( (getpath($path)? // null) as $value
| if $value and ($value|type != "array")
then setpath($path; [$value])
else . end ) ) ;
Using a bit of python along with the JQ scripts that peak had given in the solution above, I was able to clean up my messy data.
I still think that the answer given by peak is the right answer given the question I had asked. Although the solution is very good and works well, it took a lot of time to complete. The time taken depended on the number of nodes, depth of the nodes and the number or arrays it found.
I had two different files that I needed to fix and both had around 5000 rows of data. On one of them, the jq script took about 6 hours to complete and I had to terminate the other one after 16 hours.
The solution below builds on the original solution by using python and jq together to process some of the steps in parallel. Finding paths to arrays is still the most time-consuming part.
setup
I split the scripts into the following
# paths_to_array.jq
def paths_to_array:
[paths as $path
| select( any(.[]; (getpath($path[1:] )? | type) == "array"))
| $path[1:]]
| unique
| map(. | select([.[]|type]|contains(["number"])|not));
paths_to_array
Minor adjustment to exclude any paths that had arrays in between. I just wanted all paths that end with arrays.
I also excluded the topmost array indices from the path to reduce the number of paths
# flatten.jq
def update_array($path):
(getpath($path)? // null) as $value
| (if $value and ($value|type != "array")
then . as $data | (try (setpath($path; [$value]))
catch $data)
else . end);
def make_uniform($paths):
map( .
| reduce($paths[]) as $path (
. ; update_array($path)
)
);
# Input is assumed to be an object:
def flatten_top_level_keys:
[ to_entries[]
| if (.value|type) == "object"
then .key as $k
| (.value|to_entries)[] as $kv
| {key: ($k + "." + $kv.key), value: $kv.value}
else .
end ]
| from_entries;
I had to add the walk function from jq builtins because the jq library for pythonn didn't include it.
I split the make_uniform function so that I could understand the script better and I added a try catch because of an issue I encountered when the path included array indices in between. Otherwise this is pretty much same as the code from the original solution
# apply.jq
make_uniform({path})
| map( .
| walk( if type == "object" then
flatten_top_level_keys
else . end ))
I had to split this because I was injecting the data for the path using the {path} and when this was in the full script I got an error when using .format() within python.
import math
import os
import JSON
from jq import jq
import multiprocessing as mp
def get_script(filename):
"""Utility function to read the jq script"""
with open(filename, "r") as f:
script = f.read()
return script
def get_data(filename):
"""Utility function to read json data from file"""
with open(filename, 'r') as f:
data = json.load(f)
return data
def transform(script, data):
"""Wrapper to be used by the parallel processor"""
return jq(script).transform(data)
def parallel_jq(script, data, rows=100, processes=8):
"""Executes the JQ script on data in parallel chuncks specified by rows"""
pool = mp.Pool(processes=processes)
size = math.ceil(len(data) / rows)
segments = [pool.apply_async(transform,
args=(script,
data[index*rows:(index+1)*rows]))
for index in range(size) ]
result = []
for seg in segments:
result.extend(seg.get())
return result
def get_paths_to_arrays(data, dest="data"):
"""Obtain the paths to arrays"""
filename = os.path.join(dest, "paths_to_arrays.json")
if os.path.isfile(filename):
paths = get_data(filename)
else:
script = get_script('jq/paths_to_array.jq')
paths = parallel_jq(script, data)
paths = jq("unique|sort_by(length)|reverse").transform(paths)
with open(filename, 'w') as f:
json.dump(paths, f, indent=2)
return paths
def flatten(data, paths, dest="data"):
"""Make the arrays uniform and flatten the result"""
filename = os.path.join(dest, "uniform_flat.json")
script = get_script('jq/flatten.jq')
script += "\n" + get_script('jq/apply.jq').format(path=json.dumps(paths))
data = parallel_jq(script, data)
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
if __name__ == '__main__':
entity = 'messy_data'
sourcefile = os.path.join('data', entity+'.json')
dest = os.path.join('data', entity)
data = get_data(sourcefile)
# Finding paths with arrays
paths = get_paths_to_arrays(data, dest)
# Fixing array paths and flattening
flatten(data, paths, dest)
As I mentioned before the get_paths_to_arrays does take quite long even with parallel processing.
get_paths_to_arrays took 3811.834 seconds => Just over an hour.
flatten took 38 seconds

How to simplify aws DynamoDB query JSON output from the command line?

I'm working with The AWS Command Line Interface for DynamoDB.
When we query an item, we get a very detailed JSON output. You get something like this (it has been built from the get-item in order to be almost exhaustive (the NULL type has been omitted) aws command line help:
{
"Count": 1,
"Items": [
{
"Id": {
"S": "app1"
},
"Parameters": {
"M": {
"nfs": {
"M": {
"IP" : {
"S" : "172.16.0.178"
},
"defaultPath": {
"S": "/mnt/ebs/"
},
"key": {
"B": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
},
"activated": {
"BOOL": true
}
}
},
"ws" : {
"M" : {
"number" : {
"N" : "5"
},
"values" : {
"L" : [
{ "S" : "12253456346346"},
{ "S" : "23452353463464"},
{ "S" : "23523453461232"},
{ "S" : "34645745675675"},
{ "S" : "46456745757575"}
]
}
}
}
}
},
"Oldtypes": {
"typeSS" : {"SS" : ["foo", "bar", "baz"]},
"typeNS" : {"NS" : ["0", "1", "2", "3", "4", "5"]},
"typeBS" : {"BS" : ["VGVybWluYXRvcgo=", "VGVybWluYXRvciAyOiBKdWRnbWVudCBEYXkK", "VGVybWluYXRvciAzOiBSaXNlIG9mIHRoZSBNYWNoaW5lcwo=", "VGVybWluYXRvciA0OiBTYWx2YXRpb24K","VGVybWluYXRvciA1OiBHZW5lc2lzCg=="]}
}
}
],
"ScannedCount": 1,
"ConsumedCapacity": null
}
Is there any way to get a simpler output for the Items part? Like this:
{
"ConsumedCapacity": null,
"Count": 1,
"Items": [
{
"Id": "app1",
"Parameters": {
"nfs": {
"IP": "172.16.0.178",
"activated": true,
"defaultPath": "/mnt/ebs/",
"key": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
},
"ws": {
"number": 5,
"values": ["12253456346346","23452353463464","23523453461232","34645745675675","46456745757575"]
}
},
"Oldtypes": {
"typeBS": ["VGVybWluYXRvcgo=", "VGVybWluYXRvciAyOiBKdWRnbWVudCBEYXkK", "VGVybWluYXRvciAzOiBSaXNlIG9mIHRoZSBNYWNoaW5lcwo=", "VGVybWluYXRvciA0OiBTYWx2YXRpb24K", "VGVybWluYXRvciA1OiBHZW5lc2lzCg=="],
"typeNS": [0, 1, 2, 3, 4, 5],
"typeSS": ["foo","bar","baz"]
}
}
],
"ScannedCount": 1
}
There is nothing helpful in the dynamodb - AWS CLI 1.7.10 documentation.
We must get the result from the command line. I'm willing to use other command line tools like jq if necessary, but such a jq mapping appears to complicated to me.
Update 1: jq based solution (with help from DanielH's answer)
With jq it is easy, but not quite pretty, you can do something like:
$> aws dynamodb query --table-name ConfigCatalog --key-conditions '{ "Id" : {"AttributeValueList": [{"S":"app1"}], "ComparisonOperator": "EQ"}}' | jq -r '.Items[0].Parameters.M."nfs#IP".S'
Result will be: 172.16.0.178
The jq -r option gives you a raw output.
Update 2: jq based solution (with help from #jeff-mercado)
Here is an updated and commented version of Jeff Mercado jq function to unmarshall DynamoDB output. It will give you the expected output:
$> cat unmarshal_dynamodb.jq
def unmarshal_dynamodb:
# DynamoDB string type
(objects | .S)
# DynamoDB blob type
// (objects | .B)
# DynamoDB number type
// (objects | .N | strings | tonumber)
# DynamoDB boolean type
// (objects | .BOOL)
# DynamoDB map type, recursion on each item
// (objects | .M | objects | with_entries(.value |= unmarshal_dynamodb))
# DynamoDB list type, recursion on each item
// (objects | .L | arrays | map(unmarshal_dynamodb))
# DynamoDB typed list type SS, string set
// (objects | .SS | arrays | map(unmarshal_dynamodb))
# DynamoDB typed list type NS, number set
// (objects | .NS | arrays | map(tonumber))
# DynamoDB typed list type BS, blob set
// (objects | .BS | arrays | map(unmarshal_dynamodb))
# managing others DynamoDB output entries: "Count", "Items", "ScannedCount" and "ConsumedCapcity"
// (objects | with_entries(.value |= unmarshal_dynamodb))
// (arrays | map(unmarshal_dynamodb))
# leaves values
// .
;
unmarshal_dynamodb
If you save the DynamoDB query output to a file, lets say ddb-query-result.json, you can execute to get desired result:
$> jq -f unmarshal_dynamodb.jq ddb-query-result.json
You can decode the values recursively with a well crafted function. It looks like the key names correspond to a type:
S -> string
N -> number
M -> map
Handle each of the cases you want to decode if possible, otherwise filter it out. You can make use of the various type filters and the alternative operator to do so.
$ cat input.json
{
"Count": 1,
"Items": [
{
"Id": { "S": "app1" },
"Parameters": {
"M": {
"nfs#IP": { "S": "192.17.0.13" },
"maxCount": { "N": "1" },
"nfs#defaultPath": { "S": "/mnt/ebs/" }
}
}
}
],
"ScannedCount": 1,
"ConsumedCapacity": null
}
$ cat ~/.jq
def decode_ddb:
def _sprop($key): select(keys == [$key])[$key]; # single property objects only
((objects | { value: _sprop("S") }) # string (from string)
// (objects | { value: _sprop("NULL") | null }) # null (from boolean)
// (objects | { value: _sprop("B") }) # blob (from string)
// (objects | { value: _sprop("N") | tonumber }) # number (from string)
// (objects | { value: _sprop("BOOL") }) # boolean (from boolean)
// (objects | { value: _sprop("M") | map_values(decode_ddb) }) # map (from object)
// (objects | { value: _sprop("L") | map(decode_ddb) }) # list (from encoded array)
// (objects | { value: _sprop("SS") }) # string set (from string array)
// (objects | { value: _sprop("NS") | map(tonumber) }) # number set (from string array)
// (objects | { value: _sprop("BS") }) # blob set (from string array)
// (objects | { value: map_values(decode_ddb) }) # all other non-conforming objects
// (arrays | { value: map(decode_ddb) }) # all other non-conforming arrays
// { value: . }).value # everything else
;
$ jq 'decode_ddb' input.json
{
"Count": 1,
"Items": [
{
"Id": "app1",
"Parameters": {
"nfs#IP": "192.17.0.13",
"maxCount": 1,
"nfs#defaultPath": "/mnt/ebs/"
}
}
],
"ScannedCount": 1,
"ConsumedCapacity": null
}
Another way to achieve the post's goal would be to use a node.js extension like node-dynamodb or dynamodb-marshaler and build a node command line tool.
Interesting tutorial to build a node.js command line application with commander package: Creating Your First Node.js Command-line Application
Here's a quick and dirty oneliner that reads one record from stdin and prints it in simplified form:
node -e 'console.log(JSON.stringify(require("aws-sdk").DynamoDB.Converter.unmarshall(JSON.parse(require("fs").readFileSync(0, "utf-8")))))'
Here's an updated version of the jq solution that can handle null values.
$> cat unmarshal_dynamodb.jq
def unmarshal_dynamodb:
# null
walk( if type == "object" and .NULL then . |= null else . end ) |
# DynamoDB string type
(objects | .S)
# DynamoDB blob type
// (objects | .B)
# DynamoDB number type
// (objects | .N | strings | tonumber)
# DynamoDB boolean type
// (objects | .BOOL)
# DynamoDB map type, recursion on each item
// (objects | .M | objects | with_entries(.value |= unmarshal_dynamodb))
# DynamoDB list type, recursion on each item
// (objects | .L | arrays | map(unmarshal_dynamodb))
# DynamoDB typed list type SS, string set
// (objects | .SS | arrays | map(unmarshal_dynamodb))
# DynamoDB typed list type NS, number set
// (objects | .NS | arrays | map(tonumber))
# DynamoDB typed list type BS, blob set
// (objects | .BS | arrays | map(unmarshal_dynamodb))
# managing others DynamoDB output entries: "Count", "Items", "ScannedCount" and "ConsumedCapcity"
// (objects | with_entries(.value |= unmarshal_dynamodb))
// (arrays | map(unmarshal_dynamodb))
# leaves values
// .
;
unmarshal_dynamodb
$> jq -f unmarshal_dynamodb.jq ddb-query-result.json
Credit to #jeff-mercado and #herve for the original version.
As far as I know, there is no other output like the "verbose" one you've posted. Therefore I think, you can't avoid intermediate tools like jq oder sed
There are several proposals in this post for converting the raw dynamo data:
Export data from DynamoDB
Maybe you can adapt one of these scripts in conjunction with jq or sed
Here is another approach. This may be a little brutal but it shows the basic idea.
def unwanted: ["B","BOOL","M","S","L","BS","SS"];
def fixpath(p): [ p[] | select( unwanted[[.]]==[] ) ];
def fixnum(p;v):
if p[-2]=="NS" then [p[:-2]+p[-1:],(v|tonumber)]
elif p[-1]=="N" then [p[:-1], (v|tonumber)]
else [p,v] end;
reduce (tostream|select(length==2)) as [$p,$v] (
{}
; fixnum(fixpath($p);$v) as [$fp,$fv]
| setpath($fp;$fv)
)
Try it online!
Sample Run (assuming filter in filter.jq and data in data.json)
$ jq -M -f filter.jq data.json
{
"ConsumedCapacity": null,
"Count": 1,
"Items": [
{
"Id": "app1",
"Oldtypes": {
"typeBS": [
"VGVybWluYXRvcgo=",
"VGVybWluYXRvciAyOiBKdWRnbWVudCBEYXkK",
"VGVybWluYXRvciAzOiBSaXNlIG9mIHRoZSBNYWNoaW5lcwo=",
"VGVybWluYXRvciA0OiBTYWx2YXRpb24K",
"VGVybWluYXRvciA1OiBHZW5lc2lzCg=="
],
"typeNS": [
0,
1,
2,
3,
4,
5
],
"typeSS": [
"foo",
"bar",
"baz"
]
},
"Parameters": {
"nfs": {
"IP": "172.16.0.178",
"activated": true,
"defaultPath": "/mnt/ebs/",
"key": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
},
"ws": {
"number": 5,
"values": [
"12253456346346",
"23452353463464",
"23523453461232",
"34645745675675",
"46456745757575"
]
}
}
}
],
"ScannedCount": 1
}
Here is a script in node to do this.
I named the file reformat.js but you can call it whatever you want
'use strict';
/**
* This script will parse the AWS dynamo CLI JSON response into JS.
* This parses out the type keys in the objects.
*/
const fs = require('fs');
const rawData = fs.readFileSync('response.json'); // Import the raw response from the dynamoDB CLI query
const response = JSON.parse(rawData); // Parse to JS to make it easier to work with.
function shallowFormatData(data){
// Loop through the object and replace the Type key with the value.
for(const key in data){
const innerRawObject = data[key]
const innerKeys = Object.keys(innerRawObject)
innerKeys.forEach(innerKey => {
const innerFormattedObject = innerRawObject[innerKey]
if(typeof innerFormattedObject == 'object'){
data[key] = shallowFormatData(innerFormattedObject) // Recursively call formatData if there are nested objects
}else{
// Null items come back with a type of "NULL" and value of true. we want to set the value to null if the type is "NULL"
data[key] = innerKey == 'NULL' ? null : innerFormattedObject
}
})
}
return data
}
// this only gets the Items and not the meta data.
const result = response.Items.map(item => {
return shallowFormatData(item)
})
console.dir(result, {'maxArrayLength': null}); // There is a default limit on how big a console.log can be, this removes that limit.
Step 1) run your dynamoDB query via the CLI and save it to a JSON file. To save the response from the CLI just add > somefile.json. For convenience, I saved this in the same directory as my reformat file
// Example: Run in CLI
$ aws dynamodb query --table-name stage_requests-service_FoxEvents \
--key-condition-expression "PK = :v1" \
--expression-attribute-values file://expression-attributes.json > response.json
expression-attributes.json
{
":v1": {"S": "SOMEVAL"}
}
If you need more information on how I queried DynamoDB look at these examples in the documentation https://docs.aws.amazon.com/cli/latest/reference/dynamodb/query.html#examples
Now that you have a JSON file of the data you need to reformat run the format.js script from your terminal
Step 2)
// Run this in your terminal
$ node reformat.js > formatted.js
You should have a clean JS Object output if you want a JSON object output just put a JSON.stringify(result) in the console.dir at the end of the script