I have a JSON file that, for now, is validated by hand prior to being placed into production. Ideally, this is an automated process, but for now this is the constraint.
One thing I found helpful in Eclipse were the JSON tools that would highlight duplicate keys in JSON files. Is there similar functionality in Sublime Text or through a plugin?
The following JSON, for example, could produce a warning about duplicate keys.
{
"a": 1,
"b": 2,
"c": 3,
"a": 4,
"d": 5
}
Thanks!
There are plenty of JSON validators available online. I just tried this one and it picked out the duplicate key right away. The problem with using Sublime-based JSON linters like JSONLint is that they use Python's json module, which does not error on extra keys:
import json
json_str = """
{
"a": 1,
"b": 2,
"c": 3,
"a": 4,
"d": 5
}"""
py_data = json.loads(json_str) # changes JSON into a Python dict
# which is unordered
print(py_data)
yields
{'c': 3, 'b': 2, 'a': 4, 'd': 5}
showing that the first a key is overwritten by the second. So, you'll need another, non-Python-based, tool.
Even Python documentation says that:
The RFC specifies that the names within a JSON object should be
unique, but does not mandate how repeated names in JSON objects should
be handled. By default, this module does not raise an exception;
instead, it ignores all but the last name-value pair for a given name:
weird_json = '{"x": 1, "x": 2, "x": 3}'
json.loads(weird_json) {'x': 3}
The object_pairs_hook parameter can be used to alter this behavior.
So as pointed from docs:
class JsonUniqueKeysChecker:
def __init__(self):
self.keys = []
def check(self, pairs):
for key, _value in pairs:
if key in self.keys:
raise ValueError("Non unique Json key: '%s'" % key)
else:
self.keys.append(key)
return pairs
And then:
c = JsonUniqueKeysChecker()
print(json.loads(json_str, object_pairs_hook=c.check)) # raises
JSON is very easy format, not very detailed so things like that can be painful. Detection of doubled keys is easy but I bet it's quite a lot of work to forge plugin from that.
Related
I have found sometimes a jsonb object:
{"a": 1, "b": 2}
will get re-encoded and stored as a jsonb string:
"{\"a\": 1, \"b\": 2}"
is there a way to write a function that will reparse the string when input is not a jsonb object?
The #>> operator (Extracts JSON sub-object at the specified path as text) does the job:
select ('"{\"a\": 1, \"b\": 2}"'::jsonb #>> '{}')::jsonb
This operator behavior is not officially documented. It appears to be a side effect of its underlying function. Oddly enough, its twin operator #> doesn't work that way, though it would be even more logical. It's probably worth asking Postgres developers to solve this, preferably by adding a new decoding function. While waiting for a system solution, you can define a simple SQL function to make queries clearer in cases where the problem occurs frequently.
create or replace function jsonb_unescape(text)
returns jsonb language sql immutable as $$
select ($1::jsonb #>> '{}')::jsonb
$$;
Note that the function works well both on escaped and plain strings:
with my_data(str) as (
values
('{"a": 1, "b": 2}'),
('"{\"a\": 1, \"b\": 2}"')
)
select str, jsonb_unescape(str)
from my_data;
str | jsonb_unescape
------------------------+------------------
{"a": 1, "b": 2} | {"a": 1, "b": 2}
"{\"a\": 1, \"b\": 2}" | {"a": 1, "b": 2}
(2 rows)
I'm trying to open a bunch of JSON files using read_json In order to get a Dataframe as follow
ddf.compute()
id owner pet_id
0 1 "Charlie" "pet_1"
1 2 "Charlie" "pet_2"
3 4 "Buddy" "pet_3"
but I'm getting the following error
_meta = pd.DataFrame(
columns=list(["id", "owner", "pet_id"]])
).astype({
"id":int,
"owner":"object",
"pet_id": "object"
})
ddf = dd.read_json(f"mypets/*.json", meta=_meta)
ddf.compute()
*** ValueError: Metadata mismatch found in `from_delayed`.
My JSON files looks like
[
{
"id": 1,
"owner": "Charlie",
"pet_id": "pet_1"
},
{
"id": 2,
"owner": "Charlie",
"pet_id": "pet_2"
}
]
As far I understand the problem is that I'm passing a list of dicts, so I'm looking for the right way to specify it the meta= argument
PD:
I also tried doing it in the following way
{
"id": [1, 2],
"owner": ["Charlie", "Charlie"],
"pet_id": ["pet_1", "pet_2"]
}
But Dask is wrongly interpreting the data
ddf.compute()
id owner pet_id
0 [1, 2] ["Charlie", "Charlie"] ["pet_1", "pet_2"]
1 [4] ["Buddy"] ["pet_3"]
The invocation you want is the following:
dd.read_json("data.json", meta=meta,
blocksize=None, orient="records",
lines=False)
which can be largely gleaned from the docstring.
meta looks OK from your code
blocksize must be None, since you have a whole JSON object per file and cannot split the file
orient "records" means list of objects
lines=False means this is not a line-delimited JSON file, which is the more common case for Dask (you are not assuming that a newline character means a new record)
So why the error? Probably Dask split your file on some newline character, and so a partial record got parsed, which therefore did not match your given meta.
I have a data structure that I want to convert to json and preserve the key order.
For example:
%{ x: 1, a: 5} should be converted to "{\"x\": 1, \"a\": 5}"
Poison does it without any problem. But when I upgrade to Jason, it changes to "{\"a\": 5, \"x\": 1}".
So I use JasonHelpers json_map to preserve the order like this:
Jason.Helpers.json_map([x: 1, a: 5])
It creates a fragment with correct order.
However, when I use a variable to do this:
list = [x: 1, a: 5]
Jason.Helpers.json_map(list)
I have an error:
** (Protocol.UndefinedError) protocol Enumerable not implemented for {:list, [line: 15], nil} of type Tuple.
....
QUESTION: How can I pass a pre-calculated list into Jason.Helpers.json_map ?
The calculation is complicated, so I don't want to repeat the code just to use json_map, but use the function that returns a list.
json_map/1 is a macro, from its docs:
Encodes a JSON map from a compile-time keyword.
It is designed for compiling JSON at compile-time, which is why it doesn't work with your runtime variable.
Support for encoding keyword lists was added to the Jason library a year ago, but it looks like it hasn't been pushed to hex yet. I managed to get it work by pulling the latest code from github:
defp deps do
[{:jason, git: "https://github.com/michalmuskala/jason.git"}]
end
Then by creating a struct that implements Jason.Encoder (adapted from this solution by the Jason author):
defmodule OrderedObject do
defstruct [:value]
def new(value), do: %__MODULE__{value: value}
defimpl Jason.Encoder do
def encode(%{value: value}, opts) do
Jason.Encode.keyword(value, opts)
end
end
end
Now we can encode objects with ordered keys:
iex(1)> Jason.encode!(OrderedObject.new([x: 1, a: 5]))
"{\"x\":1,\"a\":5}"
I don't know if this is part of the public API or just an implementation detail, but it appears you have some control of the order when implementing the Jason.Encoder protocol for a struct.
Let's say you've defined an Ordered struct:
defmodule Ordered do
#derive {Jason.Encoder, only: [:a, :x]}
defstruct [:a, :x]
end
If you encode the struct, the "a" key will be before the "x" key:
iex> Jason.encode!(%Ordered{a: 5, x: 1})
"{\"a\":5,\"x\":1}"
Let's reorder the keys we pass in to the :only option:
defmodule Ordered do
#derive {Jason.Encoder, only: [:x, :a]}
defstruct [:a, :x]
end
If we now encode the struct, the "x" key will be before the "a" key:
iex> Jason.encode!(%Ordered{a: 5, x: 1})
"{\"x\":1,\"a\":5}"
I've run into several scenarios where I lists & dictionaries of data in vim, with arbitrarily nested data structures, i.e.:
a = [ 'somedata', d : { 'one': 'x', 'two': 'y', 'three': 'z' }, 'moredata' ]
b = { 'one': '1', 'two': '2', 'three': [ 'x', 'y', 'z' ] }
I'd really like to have a way to 'pretty print' them in a tabular format. It would be especially helpful to simply treat them as JSON directly in vim. Any suggestions?
You may want to look at Tim Pope's Scriptease.vim which provides many niceties for vim scripting and plugin development.
Although I am not sure how pretty :PP is I have found it pretty enough for my uses.
It should also be noted that vim script dictionaries and arrays are very similar to JSON, so you could in theory use any JSON tools after some clean up.
If your text is valid json, you can turn to the external python -m json.tool
so, you just execute in vim: %!python -m json.tool.
Unfortunately your example won't work, if you take a valid json example with nested dict/lists:
Note
that in the screencast I have ft=json, so some quotes cannot be seen in normal mode, the text I used:
[{"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}, {"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}, {"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}, {"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}]
In the following JSON response, what's the proper way to check if the nested key "C" exists in python 2.7?
{
"A": {
"B": {
"C": {"D": "yes"}
}
}
}
one line JSON
{ "A": { "B": { "C": {"D": "yes"} } } }
This is an old question with accepted answer, but I would do this using nested if statements instead.
import json
json = json.loads('{ "A": { "B": { "C": {"D": "yes"} } } }')
if 'A' in json:
if 'B' in json['A']:
if 'C' in json['A']['B']:
print(json['A']['B']['C']) #or whatever you want to do
or if you know that you always have 'A' and 'B':
import json
json = json.loads('{ "A": { "B": { "C": {"D": "yes"} } } }')
if 'C' in json['A']['B']:
print(json['A']['B']['C']) #or whatever
Use the json module to parse the input. Then within a try statement try to retrieve key "A" from the parsed input then key "B" from the result and then key "C" from that result. If an error gets thrown the nested "C" does not exists
An quite easy and comfortable way is to use the package python-benedict with full keypath support. Therefore, cast your existing dict d with the function benedict():
d = benedict(d)
Now your dict has full key path support and you can check if the key exists in the pythonic way, using the in operator:
if 'mainsnak.datavalue.value.numeric-id' in d:
# do something
Please find here the complete documentation.
I used a simple recursive solution:
def check_exists(exp, value):
# For the case that we have an empty element
if exp is None:
return False
# Check existence of the first key
if value[0] in exp:
# if this is the last key in the list, then no need to look further
if len(value) == 1:
return True
else:
next_value = value[1:len(value)]
return check_exists(exp[value[0]], next_value)
else:
return False
To use this code, just set the nested key in an array of strings, for example:
rc = check_exists(json, ["A", "B", "C", "D"])