AWS Glue classifier for extracting JSON array values - json

I have files in S3 with inline JSON per line of the structure:
{ "resources": [{"resourceType":"A","id":"A",...},{...}] }
If I run glue over it, I get "resource: array" as the top level element. I want however the elements of the array to be inspected and used as the top level table elements. All the elements per resources array will have the same schema. So I expect
resourceType: string
id: string
....
Theoretically, a custom JSON classifier should handle this:
$.resources[*]
However, the path is not picked up. So I still get the resources:array as the top level element.
I could now run some pre-processing to extract the array elements myself and write them line per line. However, I want to understand why my path is not working.
UPDATE 1:
It might be something with the JSON that I do not understand (its valid JSON produced via JAVA Jackson). If I remove the outer object with the resources attribute and change the structure to
[{"resourceType":"A","id":"A",...},{...}]
the classifier $[*] should pick the sub-objects up. But I still get array:array as top level element.
UPDATE 2:
Its indeed a formatting issue. If I change the JSON files to
[
{"resourceType":"A","id":"A",...},{...}
]
$[*] starts to work.
UPDATE 3:
Its however not fixing the issue with $.resources[*] to reformat to
{
"resources": [
{"resourceType":"A","id":"A",...},{...}
]
}
UPDATE 4:
If I take my file and run it through a Intellij re-format, hence produce a JSON object where all nested elements have line breaks, it also starts working with $.resources[*]. Basically, like in UPDATE 3 just applied down the structure.
{
"resources": [
{
"resourceType":"A",
"id":"A"
},
{
...
}
]
}
What bothers me is, that the requirements regarding the structure are still not clear to me, since UPDATE 2 worked, but not UPDATE 3. I also find nowhere in the documentation a formal requirement regarding the JSON structure.
In this sense, I think I got to the conclusion of my own question, but the systematics stay a bit unclear.

To conclude here:
The issue is related to unclear documented JSON formatting requirements of Glue.
A normalisation via json.dumps(my_json, separators=(',',':')) produces compact JSON that works for my use case.
I normalised now the content via a lambda.
Lambda code as reference for whomever it may help:
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=my_bucket)
for page in pages:
try:
contents = page["Contents"]
except KeyError:
break
for obj in contents:
key = obj["Key"]
obj = s3.get_object(Bucket=my_bucket, Key=key)
j = json.loads(obj['Body'].read().decode('utf-8'))
new_json = json.dumps(j, separators=(',',':'))
target = 'nrmlzd/' + key
s3.put_object(
Body=new_json,
Bucket=my_bucket,
Key= target
)

Related

deleting JSON records using JsonDataObjects

I'm using the TJsonDataObjects Delphi component (https://github.com/ahausladen/JsonDataObjects). I am using it as the data store for what is is displayed in a editable TreeView. In the treeview I store the "path" using a JsonPath string. When the user modifies the values in the Treeview, the path property allows me locate the record by path and modify it via the component path property.
My issue is when a user wants to delete a record, I need to remove it from the JSON file. It does note seem like there is a simple way to do this via its "path. I expect I could trim off the item from the path to gets it parent and then delete it by "name" or "index" if an array. I was hoping there might just be an easier way before I start to code this up.
On a similar node, I didn't find any way to extract the text path of a given item. While it can modify or locate a node by path, there does not seem to be a way to get the actual path so I'm doing that manually as I parse the JSON file (yuck). Anyone have a better solution?
For example, this is the path of the "value" property in the JSON below: Level1.Level2.Level3
{
"Level1": {
"Level2": {
"Level3": "value"
}
}
}
In TJsonDataObjects you can set the path with:
Json.Path['Level1.Level2.Level3'] := "value";
//or
Json['Level1']['Level2']['Level3'] := "value";
Or retrieve it with:
prop := Json.Path['Level1.Level2.Level3'];
// or
prop := Json['Level1']['Level2']['Level3'];
So if you want to remove Level3, it would be nice if there was some simple function like Json.DeletePath('Level1.Level2.Level3');. As far as I can tell, there is nothing that does this. Since this is a very complex unit, I thought someone might have an easy answer that I overlooked. I have coded a way around this (as described above).
As to the second question, while you can access a value by its path, there is no function to "return" a path from a given node. And yes, I can and do build it as I go along, it would be handy as that way it remains consistent in its format of the JsonPath.

Invalid JSON syntax error in configuration file on homebridge

{
"bridge":{
"name":"Homebridge F8F5",
"username":"0E:8F:12:8D:F8:F5",
"port":51739,
"pin":"670-48-238"
},
"accessories":[
],
"platforms":[
{
"name":"Config",
"port":8581,
"platform":"config"
}
]
}{
"accessories":[
{
"name":"Roku",
"accessory":"Roku",
"ip":"http://10.204.1.238:8060",
}
I am getting an error when I try to run this config file in homebridge. What am I doing wrong? When I try to submit it through the web interface it will not allow me to and says “Config JSON error: invalid json syntax” Any help will be welcome! I have tried to put it through an online json error finder and it narrowed it down to this snippet.
Ummm... looks like you tried to edit this file without knowing the basic concepts of JSON.
Start by reading JSON - Introduction on W2Schools.com
Also, if you're not sure, use an online JSON validator. Use your fav. search engine to look for "JSON cleaner". (I use JSON Formatter & Validator at Curious Concept.)
Off the bat I can see a few issues with the JSON you provided.
the "}{" string ... what's that for? JSON cannot parse that ... either add "," between (if you wanted a new set) or (in this case) remove it.
you have two "accessories". JSON usually get parsed into an object or array ... one cannot have duplicates on the ket names. (In this case) remove the first one.
the second "accessories" array (denoted by "[") has no end (no "]")
the whole set (started with "{") has no end (no "}")

How to add entries to a JSON array/list

I'm trying to set up a Discord bot that only lets people on a list in a JSON file use it, I am wondering how to add data to the JSON array/list but I'm not sure how to move forward and I have had no real luck looking for answers elsewhere.
This is an example of how the JSON file looks:
{
IDs: [
"2359835092385",
"4634637576835",
"3454574836835"
]
}
Now, what I am looking to do, is add a new ID to "IDs" and not have it completely break, and I wish to be able to have other entries in the JSON file as well so i can make something like "AdminIDs" for people that can do more stuff to the bot.
Yes. I know I can do this stuff role based in guilds/servers, but I would like to be able to use the bot in DMs as well as on guilds/server.
What I want/need is a short and simple to manipulate script that I can easily put in to a new command so I can add new people to the bot without having to open and edit the JSON file manually.
If you haven't parsed your data already via the package json then you can do the following for parsing the data:
import json
json_code = { "..": ... }
parsed_json = json.dumps(json_code)
print(parsed_json['IDs'])
Then you can simply use this data like a normal list and append data to it.
All keys must be surrounded by a string
In this cause the key is the IDs while the value is the list and the value of the list would be the items inside it
import json
data={
"IDs":[
"2359835092385",
"4634637576835",
"3454574836835"
]
}
Let's say that your JSON data is from a file, to load it so that you can manipulate it do the following
raw_json_data=open('filename.json',encoding='utf-8')
j_data=json.load(raw_json_data) #Now j_data is basically the same as data except difference in name
print(j_data)
# >> {'IDs': ['2359835092385', '4634637576835', '3454574836835']}
To add things inside the list IDs you use the append method
data['IDs'].append('adding something') #or j_data['IDs'].append("SOMEthing")
print(data)
# >> {'IDs': ['2359835092385', '4634637576835', '3454574836835', 'adding something']}
To add a new key
data['Names']=['Jack','Nick','Alice','Nancy']
print(data)
# >> {'IDs': ['2359835092385', '4634637576835', '3454574836835', 'adding something'], 'Names': ['Jack', 'Nick', 'Alice', 'Nancy']}

append data to an existing json file

Appreciate if someone can point me to the right direction in here, bit new to python :)
I have a json file that looks like this:
[
{
"user":"user5",
"games":"game1"
},
{
"user":"user6",
"games":"game2"
},
{
"user":"user5",
"games":"game3"
},
{
"user":"user6",
"games":"game4"
}
]
And i have a small csv file that looks like this:
module_a,module_b
10,20
15,16
1,11
2,6
I am trying to append the csv data into the above mentioned json so it looks this, keeping the order as it is:
[
{
"user":"user5",
"module_a":"10",
"games":"game1",
"module_b":"20"
},
{
"user":"user6",
"module_a":"15",
"games":"game2",
"module_b":"16"
},
{
"user":"user5",
"module_a":"1",
"games":"game3",
"module_b":"11"
},
{
"user":"user6",
"module_a":"2",
"games":"game4",
"module_b":"6"
}
]
what would be the best approach to achive this keep the output order as it is.
Appreciate any guidance.
JSON specification doesn't prescribe orderness and it won't be enforced (unless it's a default mode of operation of the underlying platform) by any JSON parser so going a long way just to keep the order when processing JSON files is usually pointless. To quote:
An object is an unordered collection of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.
...
JSON parsing libraries have been observed to differ as to whether or
not they make the ordering of object members visible to calling
software. Implementations whose behavior does not depend on member
ordering will be interoperable in the sense that they will not be
affected by these differences.
That being said, if you really insist on order, you can parse your JSON into a collections.OrderedDict (and write it back from it) which will allow you to inject data at specific places while keeping the overall order. So, first load your JSON as:
import json
from collections import OrderedDict
with open("input_file.json", "r") as f: # open the JSON file for reading
json_data = json.load(f, object_pairs_hook=OrderedDict) # read & parse it
Now that you have your JSON, you can go ahead and load up your CSV, and since there's not much else to do with the data you can immediately apply it to the json_data. One caveat, tho - since there is no direct map between the CSV and the JSON one has to assume index as being the map (i.e. the first CSV row being applied to the first JSON element etc.) so we'll use enumerate() to track the current index. There is also no info on where to insert individual values so we'll assume that the first column goes after the first JSON object entry, the second goes after the second entry and so on, and since they can have different lenghts we'll use itertools.izip_longest() to interleave them. So:
import csv
from itertools import izip_longest # use zip_longest on Python 3.x
with open("input_file.csv", "rb") as f: # open the CSV file for reading
reader = csv.reader(f) # build a CSV reader
header = next(reader) # lets store the header so we can get the key values later
for index, row in enumerate(reader): # enumerate and iterate over the rest
if index >= len(json_data): # there are more CSV rows than we have elements in JSO
break
row = [(header[i], v) for i, v in enumerate(row)] # turn the row into element tuples
# since collections.OrderedDict doesn't support random access by index we'll have to
# rebuild it by mixing in the CSV elements with the existing JSON elements
# use json_data[i].items() on Python 3.x
data = (v for p in izip_longest(json_data[index].iteritems(), row) for v in p)
# then finally overwrite the current element in json_data with a new OrderedDict
json_data[index] = OrderedDict(data)
And with our CSV data nicely inserted into the json_data, all that's left is to write back the JSON (you may overwrite the original file if you wish):
with open("output_file.json", "w") as f: # open the output JSON file for writing
json.dump(json_data, f, indent=2) # finally, write back the modified JSON
This will produce the result you're after. It even respects the names in the CSV header so you can replace them with bob and fred and it will insert those keys in your JSON. You can even add more of them if you need more elements added to your JSON.
Still, just because it's possible, you really shouldn't rely on JSON orderness. If it's user-readibility you're after, there are far more suitable formats with optional orderness like YAML.

Verify whole json response in jmeter by value or sort Json

I'm not using JMeter too often, and I've run into very specific issue.
My REST response is always "the same", but nodes are not in the same order due to various reasons. As well, I can't put here whole response due to sensitive data, but let's use these dummy one:
First time response might be:
{
"properties":{
"prop1":false,
"prop2":false,
"prop3":165,
"prop4":"Audi",
"prop5":true,
"prop6":true,
"prop7":false,
"prop8":"1",
"prop9":"2.0",
"prop10":0
}
}
Then other time it might be like this:
{
"properties":{
"prop2":false,
"prop1":false,
"prop10":0,
"prop3":165,
"prop7":false,
"prop5":true,
"prop6":true,
"prop8":"1",
"prop9":"2.0",
"prop4":"Audi"
}
}
As you can see, the content it self is the same, but order of nodes it's not. I have 160+ nodes and thousand of possible response orders.
Is there an easy way to compare two JSON responses comparing matching key - values, or at least to sort the response, and then compare it with sorted one in assertion patterns?
I'm not using any plugins, just basic Apache JMeter.
Thanks
I've checked using Jython, you need to download the Jython Library and save to your jmeter lib directory.
I've checked 2 JSONs with Sampler1 and Sampler2, on Sampler1 I've add a BeanShell PostProcessor with this code:
vars.put("jsonSampler1",prev.getResponseDataAsString());
On Sampler2 I've add a BSF Assertion, specifying jython as the language and with the following code:
import json
jsonSampler1 = vars.get("jsonSampler1")
jsonSampler2 = prev.getResponseDataAsString()
objectSampler1 = json.loads(jsonSampler1)
objectSampler2 = json.loads(jsonSampler2)
if ( objectSampler1 != objectSampler2 ):
AssertionResult.setFailure(True)
AssertionResult.setFailureMessage("JSON data didn't match")
Yoy can find the whole jmx in this GistHub
You will most probably have to do this with a JSR223 Assertion and Groovy.
http://jmeter.apache.org/usermanual/component_reference.html#JSR223_Assertion
http://docs.groovy-lang.org/latest/html/api/groovy/json/JsonSlurper.html
Note that if you know Python, you might look at using Jython + JSR223.
I would just set up 10 jp#gc - JSON Path Assertions. Documentation for figuring out JSON Path format is here and you can test how it would work here.
For your example you would the assertion (Add > Assertion > jp#gc - JSON Path Assertions), then to test the prop 1 put:
$.properties.prop1
in the JSON Path field, click the Validate Against Expected Value checkbox, and put
false
in the expected value field. Repeat those steps for the other 9 changing the last part of the path to each key and the value you expected back in the expected value field.
This extractor is jmeter add on found here.