Parsing strange JSON-like format

Parsing strange JSON-like format - json

I got some strange JSON-like format from Google Cloud OCR. It doesn't have quoted keys, colons, or commas.
text_annotations {
description: ","
bounding_poly {
vertices {
x: 485
y: 237
}
vertices {
x: 492
y: 237
}
vertices {
x: 492
y: 266
}
vertices {
x: 485
y: 266
}
}
}
Is there any simple way to parse it, or format as JSON?
I've tried adding quotes, colons, and commas by hand, but it's not the best way.
Got these data using python code:
from google.cloud import vision
import io
client = vision.ImageAnnotatorClient()
feature = vision.Feature(
type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
response = client.text_detection(image=image)
print(response)

Finally i just did as people proposed in comments, and used API to print data in more useful format.

OK, making a custom parser is probably overkill, but what about this potentially easy, quick&dirty hack: if the linebreaks and indentation etc. are stable, could you do a search + replace on these (via a tool, because then can be automated and also perform multiple replacements) in the form of replacing "\n bounding_poly" with ",\n "bounding_poly": [", which adds the comma to the previous line, adds the colon to the key, adds quotes to the key, turns its contents into an array (you then too get rid of the "vertices" so they become anonymous objects inside the array as the array's elements). Please be aware of newline encoding differences (\n, \r\n, \r).
Ideally also recognize if one of these search strings matches zero times, or attempt to read the final result as JSON to check for error messages.
Not great, but maybe sufficient? Or has the source more variance or is it more dynamic/unstable?

Related

rl_json : Insert function

below my json :
"temperature": {
"level": 8,
"format": function (value) {
return value + ' (C°)';
},
"minimum": null,
...
...
}
My goal is to write key format, I have try this but without much conviction...
package require rl_json
namespace import rl_json::json
set rj "{}"
set s1 {function (value) {
return value + ' (C°)';
}
}
json set rj temperature [json object [list level {number 8} format [json template {{"~L:s1"}}] minimum {null}]]
Error parsing JSON value: Expecting : after object key at offset 8

What you are trying to produce is not (partial) JSON, as defined, but rather general Javascript. As such, you cannot reasonably expect JSON-specific tooling (such as rl_json) to work with it. It is usually better to keep your JSON and your Javascript separated, with the data in JSON (including any dynamic generation) and your Javascript as a static artefact (or generated from something like Typescript). Mixing code and dynamic generation requires great care to get right; it is usually better to try to write things so you don't need to be that careful!
If you must do this defintely-not-advised thing, use something like subst or format to inject the variable parts (which you can generate with rl_json) in the overall string. No code for that from me: my advice is don't do that.

Micropython: bytearray in json-file

i'm using micropython in the newest version. I also us an DS18b20 temperature sensor. An adress of theses sensor e.g. is "b'(b\xe5V\xb5\x01<:'". These is the string representation of an an bytearray. If i use this to save the adress in a json file, i run in some problems:
If i store directly "b'(b\xe5V\xb5\x01<:'" after reading the json-file there are no single backslahes, and i get b'(bxe5Vxb5x01<:' inside python
If i escape the backslashes like "b'(b\xe5V\xb5\x01<:'" i get double backslashes in python: b'(b\xe5V\xb5\x01<:'
How do i get an single backslash?
Thank you

You can't save bytes in JSON with micropython. As far as JSON is concerned that's just some string. Even if you got it to give you what you think you want (ie. single backslashes) it still wouldn't be bytes. So, you are faced with making some form of conversion, no-matter-what.
One idea is that you could convert it to an int, and then convert it back when you open it. Below is a simple example. Of course you don't have to have a class and staticmethods to do this. It just seemed like a good way to wrap it all into one, and not even need an instance of it hanging around. You can dump the entire class in some other file, import it in the necessary file, and just call it's methods as you need them.
import math, ujson, utime
class JSON(object):
#staticmethod
def convert(data:dict, convert_keys=None) -> dict:
if isinstance(convert_keys, (tuple, list)):
for key in convert_keys:
if isinstance(data[key], (bytes, bytearray)):
data[key] = int.from_bytes(data[key], 'big')
elif isinstance(data[key], int):
data[key] = data[key].to_bytes(1 if not data[key]else int(math.log(data[key], 256)) + 1, 'big')
return data
#staticmethod
def save(filename:str, data:dict, convert_keys=None) -> None:
#dump doesn't seem to like working directly with open
with open(filename, 'w') as doc:
ujson.dump(JSON.convert(data, convert_keys), doc)
#staticmethod
def open(filename:str, convert_keys=None) -> dict:
return JSON.convert(ujson.load(open(filename, 'r')), convert_keys)
#example with both styles of bytes for the sake of being thorough
json_data = dict(address=bytearray(b'\xFF\xEE\xDD\xCC'), data=b'\x00\x01\02\x03', date=utime.mktime(utime.localtime()))
keys = ['address', 'data'] #list of keys to convert to int/bytes
JSON.save('test.json', json_data, keys)
json_data = JSON.open('test.json', keys)
print(json_data) #{'date': 1621035727, 'data': b'\x00\x01\x02\x03', 'address': b'\xff\xee\xdd\xcc'}
You may also want to note that with this method you never actually touch any JSON. You put in a dict, you get out a dict. All the JSON is managed "behind the scenes". Regardless of all of this, I would say using struct would be a better option. You said JSON though so, my answer is about JSON.

How to convert a multi-dimensional dictionary to json file?

I have uploaded a *.mat file that contains a 'struct' to my jupyter lab using:
from pymatreader import read_mat
data = read_mat(mat_file)
Now I have a multi-dimensional dictionary, for example:
data['Forces']['Ss1']['flap'].keys()
Gives the output:
dict_keys(['lf', 'rf', 'lh', 'rh'])
I want to convert this into a JSON file, exactly by the keys that already exist, without manually do so because I want to perform it to many *.mat files with various key numbers.
EDIT:
Unfortunately, I no longer have access to MATLAB.
An example for desired output would look something like this:
json_format = {
"Forces": {
"Ss1": {
"flap": {
"lf": [1,2,3,4],
"rf": [4,5,6,7],
"lh": [23 ,5,6,654,4],
"rh": [4 ,34 ,35, 56, 66]
}
}
}
}
ANOTHER EDIT:
So after making lists of the subkeys (I won't elaborate on it), I did this:
FORCES = []
for ind in individuals:
for force in forces:
for wing in wings:
FORCES.append({
ind: {
force: {
wing: data['Forces'][ind][force][wing].tolist()
}
}
})
Then, to save:
with open(f'{ROOT_PATH}/Forces.json', 'w') as f:
json.dump(FORCES, f)
That worked but only because I looked manually for all of the keys... Also, for some reason, I have squared brackets at the beginning and at the end of this json file.

The json package will output dictionaries to JSON:
import json
with open('filename.json', 'w') as f:
json.dump(data, f)

If you are using MATLAB-R2016b or later, and want to go straight from MATLAB to JSON check out JSONENCODE and JSONDECODE. For your purposes JSONENCODE
encodes data and returns a character vector in JSON format.
MathWorks Docs
Here is a quick example that assumes your data is in the MATLAB variable test_data and writes it to a file specified in the variable json_file
json_data = jsonencode(test_data);
writematrix(json_data,json_file);
Note: Some MATLAB data formats cannot be translate into JSON data due to limitations in the JSON specification. However, it sounds like your data fits well with the JSON specification.

Lua access indeces from table generated from JSON

So, I am bound to use Lua for getting weather data from the Openweathermap API.
I managed to send a http request to return and store all data but now I am stuck with a Lua table I don't know how work with. I am very new to Lua and I didn't find any guide or similar regarding such a deep nested table in Lua.
In particular I am just interested in the field called temp in main. Here is a sample response from the API: Sample request response
The dependencies are Lua's socket.http and this json to Lua table formatter.
Here is my basic code structure
json = require ("json")
web = require ("socket.http")
local get = json.decode(web.request(<API Link>))
"get" now stores a table I don't know how to work with

With the help of https://www.json2yaml.com/, the structure is:
cod: '200'
message: 0.0036
cnt: 40
list:
- dt: 1485799200
main:
temp: 261.45
temp_min: 259.086
temp_max: 261.45
pressure: 1023.48
sea_level: 1045.39
grnd_level: 1023.48
humidity: 79
temp_kf: 2.37
weather:
- id: 800
main: Clear
description: clear sky
icon: 02n
clouds:
all: 8
wind:
speed: 4.77
deg: 232.505
snow: {}
sys:
pod: n
dt_txt: '2017-01-30 18:00:00'
…
- dt: 1486220400
…
city:
id: 524901
name: Moscow
coord:
lat: 55.7522
lon: 37.6156
country: none
So,
for index, entry in ipairs(get.list) do
print(index, entry.dt, entry.main.temp)
end
ipairs iterates over the positive integer keys in the table, up to but not including the first integer without a value. It seems that's the way the JSON library represent a JSON array.

If you don't know how to work with Lua tables you probably should learn the very basics of Lua. Refer to https://www.lua.org/start.html
The json string encodes a Lua table with all it's keys and values.
You can either read how the encoder encodes a table or you simply encode your own table and analyze the resulting json string.
print(json.encode({1,2,3}))
[1,2,3]
print(json.encode({a=1, b={1,2}, [3]="test"}))
{"3":"test","b":[1,2],"a":1}
and so on...
There's always table keys and values, separated by a colon.
Values can be numbers, strings, tables...
If the table only has numeric keys starting from one the value is a list of those values in brackets. If you have different keys in a table its encapsulated in curly brackets...
So let's have a look at your results. I'll remove 39 of the 40 entries to shorten it. I'll also indent to make the structure a bit more readable.
{
"cod":"200",
"message":0.0036,
"cnt":40,
"list":[{
"dt":1485799200,
"main":{
"temp":261.45,
"temp_min":259.086,
"temp_max":261.45,
"pressure":1023.48,
"sea_level":1045.39,
"grnd_level":1023.48,
"humidity":79,
"temp_kf":2.37},
"weather":[
{
"id":800,
"main":"Clear",
"description":"clear sky",
"icon":"02n"
}],
"clouds":{"all":8},
"wind":{"speed":4.77,"deg":232.505},
"snow":{},
"sys":{"pod":"n"},
"dt_txt":"2017-01-30 18:00:00"}
],
"city":{
"id":524901,
"name":"Moscow",
"coord":{
"lat":55.7522,
"lon":37.6156
},
"country":"none"
}
}

That sample response appears to have many subtables that have a main in them. Try this: get.list[1].main.temp.

After 2 days I finally found the error. I was working in a Minecraft Mod called OpenComputers which utilizes Lua. It seems like the mod uses a own version of socket.http and every time I wanted to print the response it returned two functions to use with request. I found out that if I put a "()" after the variable it returned the Response as a String and with the JSON library I could decode it into a workable table.
Side note: I could access the weather like this: json_table["weather"]["temp"]
The mod is pretty poorly documented on the http requests so I had to figure this out by myslef. Thanks for your responses, in the end the error was as always very unexpected!

Parsing CSV in groovy with exception tolerance

I've been trying to parse a csv file in groovy, currently using the library org.apache.commons.csv 2.4. The requirement I have is that there are invalid data value in csv cells, such as invalid characters, and instead of throwing an exception on first invalid row/cell, I want to collect these exceptions and keep iterating in the csv file until the end, then I will have a full list of invalid data this csv file has.
With that purpose, I've tried multiple ways of using this apache lib, but unfortunately as long as it uses the CSVParser.getNextRecord() for iteration, the iterator will just abort.
put in code, something like this:
def records = new CSVParser(reader, CSVFormat.EXCEL.withHeader().withIgnoreSurroundingSpaces())
// at this line, the iterator() inside CSVParser is always using getNextRecord() for its next() implementation, and it may throw exception on invalid char
records.each {record->
// if the exception is thrown from .each, that makes below try/catch in vain
try{
}catch(e){ //want collect Errors here }
}
So, is there anything else that I should dig in this library? Or could anybody point me to another more viable solution? Many thanks to all!
Update:
Sample CSV
"Company code for WBS element","WBS Element","PS: Short description (1st text line)","Responsible Cost Center for WBS Element","OBJNR","WBS Status"
"1001","RE-01768-011","Opex - To present a paper on Career con","0000016400","PR00031497","X"
"1001","RE-01768-011","Opex - To present a paper on "Career con","0000016400","PR00031497","X"
The second data row has invalid char " that makes parser throw exception

The problem you have is that one of the characters in one cell is the quote character used by the parser according with the format selected: CSVFormat.EXCEL.
The quote character is
the character used to encapsulate values containing special characters
so in your example the quote is misused and the parser complains about it.
You can workaround that using a different CSVFormat. For example, one without quote character:
#Grapes(
#Grab(group='org.apache.commons', module='commons-csv', version='1.2')
)
import java.nio.charset.*
import org.apache.commons.csv.*
def text = '''"Company code for WBS element","WBS Element","PS: Short description (1st text line)","Responsible Cost Center for WBS Element","OBJNR","WBS Status"
"1001","RE-01768-011","Opex - To present a paper on Career con","0000016400","PR00031497","X"
"1002","RE-01768-011","Opex - To present a paper on "Career con","0000016400","PR00031497","X"
"1003","RE-01768-011","Opex - To present a paper on Career con","0000016400","PR00031497","X"'''
def parsed = CSVParser.parse(text, CSVFormat.EXCEL.withHeader().withIgnoreSurroundingSpaces().withQuote(null))
parsed.getRecords().each {
println it.toMap().values()
}
And the above yields:
[]
["0000016400", "1001", "RE-01768-011", "Opex - To present a paper on Career con", "X", "PR00031497"]
["0000016400", "1002", "RE-01768-011", "Opex - To present a paper on "Career con", "X", "PR00031497"]
["0000016400", "1003", "RE-01768-011", "Opex - To present a paper on Career con", "X", "PR00031497"]
Of course, with the above workaround, you have the quotes (") included in each field.
You can replace them all if you want:
parsed.getRecords().each {
println it.toMap().values().collect({ it.replace('"', '') })
}

The problem is that if the csv file has invalid data, meaning data that breaks the rules of the csv format, then the parser cannot... parse. That's why it cannot reliably parse any more than the first error encountered.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Parsing strange JSON-like format - json

Finally i just did as people proposed in comments, and used API to print data in more useful format.

Related

rl_json : Insert function

Micropython: bytearray in json-file

How to convert a multi-dimensional dictionary to json file?

Lua access indeces from table generated from JSON

Parsing CSV in groovy with exception tolerance

Categories

Resources