Pretty Printing Arbitrarily Nested Dictionaries & Lists in Vim - json

I've run into several scenarios where I lists & dictionaries of data in vim, with arbitrarily nested data structures, i.e.:
a = [ 'somedata', d : { 'one': 'x', 'two': 'y', 'three': 'z' }, 'moredata' ]
b = { 'one': '1', 'two': '2', 'three': [ 'x', 'y', 'z' ] }
I'd really like to have a way to 'pretty print' them in a tabular format. It would be especially helpful to simply treat them as JSON directly in vim. Any suggestions?

You may want to look at Tim Pope's Scriptease.vim which provides many niceties for vim scripting and plugin development.
Although I am not sure how pretty :PP is I have found it pretty enough for my uses.
It should also be noted that vim script dictionaries and arrays are very similar to JSON, so you could in theory use any JSON tools after some clean up.

If your text is valid json, you can turn to the external python -m json.tool
so, you just execute in vim: %!python -m json.tool.
Unfortunately your example won't work, if you take a valid json example with nested dict/lists:
Note
that in the screencast I have ft=json, so some quotes cannot be seen in normal mode, the text I used:
[{"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}, {"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}, {"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}, {"test1": 1, "test2": "win", "t3":{"nest1":"foo","nest2":"bar"}}]

Related

JSONiq - how do you convert an array to a sequence?

Using the JSONiq to JavaScript implementation of JSONiq, say I have an array
let $a := [1,2,3]
I'd like to get the elements as a sequence, but all of these return the array itself -
return $a()
return $a[]
return members($a)
What is the correct way to extract the members of the array?
My ultimate goal is to convert objects in an array to strings, like so -
let $updates := [
{"address": "%Q0.1", "keys": ["OUT2", "output.2"], "value": 0},
{"address": "%Q0.7", "keys": ["OUT8", "output.8"], "value": 1}
]
for $update in $updates()
return "<timestamp>|address|" || $update.address
in order to convert an array of JSON objects to a set of strings like <timestamp>|address|%Q0.7, etc
Edit: Using Zorba the $a() syntax seems to work okay - is it an issue with the node jsoniq parser?
e.g.
jsoniq version "1.0";
let $updates := [
{"address": "%Q0.1", "keys": ["OUT2", "output.2"], "value": 0},
{"address": "%Q0.7", "keys": ["OUT8", "output.8"], "value": 1}
]
for $update in $updates()
return current-dateTime() || "|address|" || $update.address
returns
2021-02-19T23:10:13.434273Z|address|%Q0.1 2021-02-19T23:10:13.434273Z|address|%Q0.7
In the core JSONiq syntax, an array is turned into a sequence (i.e., its members are extracted) with an empty pair or square brackets, like so:
$array[]
Example:
[1, 2, 3, 4][]
returns the sequence:
(1, 2, 3, 4)
This means that the query would be:
let $updates := [
{"address": "%Q0.1", "keys": ["OUT2", "output.2"], "value": 0},
{"address": "%Q0.7", "keys": ["OUT8", "output.8"], "value": 1}
]
for $update in $updates[]
return "<timestamp>|address|" || $update.address
The function-call-like notation with an empty pair of parenthesis dates back to JSONiq's early days, as it was primarily designed as an extension to XQuery and maps and arrays were navigated with function calls ($object("foo"), $array(), $array(2)). As JSONiq started having its own life, though, more user-friendly and intuitive syntax for JSON navigation was introduced:
$array[[1]]
for array member lookup given a position
$object.foo
for object lookup given a key and
$array[]
for array unboxing.
While the JSONiq extension to XQuery still exists for scenarios in which users need both JSON and XML support (and is supported by Zorba 3.0, IBM Websphere, etc), the core JSONiq syntax is the main one for all engines that specifically support JSON, like Rumble.
Some engines (including Zorba 3.0) support both the core JSONiq syntax and the JSONiq extension to XQuery, and you can pick the one you want with a language version declaration:
jsoniq version "1.0";
[1, 2, 3, 4][]
vs.
xquery version "3.0";
[1, 2, 3, 4]()
Zorba is relatively lenient and will probably even accept both () and [] in its core JSONiq implementation.
(Warning: Zorba 2.9 doesn't support the latest core JSONiq syntax, in particular the try.zorba.io page still runs on Zorba 2.9. You need to download Zorba 3.0 and run it locally if you want to use it).
A final note: JSON navigation works in parallel, on sequences of arrays and objects, too:
(
{"foo":1},
{"foo":2},
{"foo":3},
{"foo":4}
).foo
returns
(1, 2, 3, 4)
while
(
[1, 2],
[3, 4, 5],
[6, 7, 8]
)[]
returns
(1, 2, 3, 4, 5, 6, 7, 8)
This makes it very easy and compact to navigate large sequences:
$collection.foo[].bar[[1]].foobar[].foo

Python dictionary has json in it. How do I convert that dictionary to json?

I have this dictionary (or so type() tells me):
{'uploadedby': 'fred',
'return_url': '',
'id': '2200',
'question_json': '{"ops":[{"insert":"What metal is responsible for a Vulcan\'s green blood?\\n"}]}'}
When I use json.dumps on it, I get this:
{"uploadedby": "fred",
"return_url": "",
"id": "2200",
"question_json": "{\"ops\":[{\"insert\":\"What metal is responsible for a Vulcan's green blood?\\n\"}]}", "question": "What metal is responsible for a Vulcan's green blood?\r\n"}
I don't want all the escaping that's going on. Is there something I can do to correct this?
You can do something like the following to convert question_json into a python dict, and then dump the entire dict:
test = {'uploadedby': 'fred',
'return_url': '',
'id': '2200',
'question_json': '{"ops":[{"insert":"What metal is responsible for a Vulcan\'s green blood?\\n"}]}'}
json.dumps(
{k: json.loads(v) if k == 'question_json' else v for k,v in test.items()}
)
'{"question_json": {"ops": [{"insert": "What metal is responsible for a Vulcan\'s green blood?\\n"}]}, "uploadedby": "fred", "return_url": "", "id": "2200"}'
You could try the following, which has the added benefit of not needing to specify which key contains the offending value. Here we're checking to see if we can effectively load a JSON string from any of the key-value pairs and leaving them alone if that fails.
import json
mydict = {'uploadedby': 'fred',
'return_url': '',
'id': '2200',
'question_json': '{"ops":[{"insert":"What metal is responsible for a Vulcan\'s green blood?\\n"}]}'}
for key in mydict:
try:
mydict[key] = json.loads(mydict[key])
except:
pass
Now when we do a json.dumps(mydict), the offending key is fixed and others are as they were:
{'uploadedby': 'fred',
'return_url': '',
'id': 2200,
'question_json': {'ops': [{'insert': "What metal is responsible for a Vulcan's green blood?\n"}]}}
Note that the id value has been converted to an int, which may or may not be your intent. It's hard to tell from the original question.

How Do I Serialize spaCy Custom Span Extensions as JSON?

I am using spaCy 2.1.6 to define a custom extension on a span.
>>> from spacy import load
>>> nlp = load("en_core_web_lg")
>>> from spacy.tokens import Span
>>> Span.set_extension('my_label', default=None)
>>> d = nlp("The fox jumped.")
>>> d[0:2]._.my_label = "ANIMAL"
>>> d[0:2]._.my_label
'ANIMAL'
The custom span extension does not appear when I serialize the document to JSON.
>>> d.to_json()
{'text': 'The fox jumped.',
'ents': [],
'sents': [{'start': 0, 'end': 15}],
'tokens': [{'id': 0,
'start': 0,
'end': 3,
'pos': 'DET',
'tag': 'DT',
'dep': 'det',
'head': 1},
{'id': 1,
'start': 4,
'end': 7,
'pos': 'NOUN',
'tag': 'NN',
'dep': 'nsubj',
'head': 2},
{'id': 2,
'start': 8,
'end': 14,
'pos': 'VERB',
'tag': 'VBD',
'dep': 'ROOT',
'head': 2},
{'id': 3,
'start': 14,
'end': 15,
'pos': 'PUNCT',
'tag': '.',
'dep': 'punct',
'head': 2}]}
(I'm specifically interested in custom annotation of Spans, but the same appears to be true of the JSON serialization of Doc object.)
Pickling and unpickling the document does preserve the custom extension.
How do I get the custom span extensions into the JSON serialization, or is that not supported?
Use this function and add your custom extensions any way you want:
def doc2json(doc: spacy.tokens.Doc, model: str):
json_doc = {
"text": doc.text,
"text_with_ws": doc.text_with_ws,
"cats": doc.cats,
"is_tagged": doc.is_tagged,
"is_parsed": doc.is_parsed,
"is_nered": doc.is_nered,
"is_sentenced": doc.is_sentenced,
}
ents = [
{"start": ent.start, "end": ent.end, "label": ent.label_} for ent in doc.ents
]
if doc.is_sentenced:
sents = [{"start": sent.start, "end": sent.end} for sent in doc.sents]
else:
sents = []
if doc.is_tagged and doc.is_parsed:
noun_chunks = [
{"start": chunk.start, "end": chunk.end} for chunk in doc.noun_chunks
]
else:
noun_chunks = []
tokens = [
{
"text": token.text,
"text_with_ws": token.text_with_ws,
"whitespace": token.whitespace_,
"orth": token.orth,
"i": token.i,
"ent_type": token.ent_type_,
"ent_iob": token.ent_iob_,
"lemma": token.lemma_,
"norm": token.norm_,
"lower": token.lower_,
"shape": token.shape_,
"prefix": token.prefix_,
"suffix": token.suffix_,
"pos": token.pos_,
"tag": token.tag_,
"dep": token.dep_,
"is_alpha": token.is_alpha,
"is_ascii": token.is_ascii,
"is_digit": token.is_digit,
"is_lower": token.is_lower,
"is_upper": token.is_upper,
"is_title": token.is_title,
"is_punct": token.is_punct,
"is_left_punct": token.is_left_punct,
"is_right_punct": token.is_right_punct,
"is_space": token.is_space,
"is_bracket": token.is_bracket,
"is_currency": token.is_currency,
"like_url": token.like_url,
"like_num": token.like_num,
"like_email": token.like_email,
"is_oov": token.is_oov,
"is_stop": token.is_stop,
"is_sent_start": token.is_sent_start,
"head": token.head.i,
}
for token in doc
]
return {
"model": model,
"doc": json_doc,
"ents": ents,
"sents": sents,
"noun_chunks": noun_chunks,
"tokens": tokens,
}
Since I ran into the same issue and the only other answer didnt really help my I thought I mide as well give other persons looking into this some hints.
Since Spacy 2.1 Spacy removed print_tree and added the to_json. to_json does not return custom extensions as "this method will output the same format as the JSON training data expected by spacy train" (https://spacy.io/usage/v2-1).
If you want to output your custom extension you need to write your own to_json function.
To do this I recommend extending the to_json() given by spacy.
Not really a fan of the other two answers here since they seem a bit overkill (extending the Doc object by #Chooklii or the custom but flaky doc2json method solution by #Laksh) so I'll just drop here what I did for one of my projects here and maybe that is useful to someone.
doc = <YOUR_DOC_OBJECT>
extra_fields = [field for field in dir(doc._) if field not in ('get', 'set', 'has')]
doc_json = doc.to_json()
doc_json.update({field: doc._.get(field) for field in extra_fields})
The doc_json should now have all the fields that you set via the Extensions interface provided by spaCy along with the fields set by other spaCy pipelines.

Importing JSON file into Firebase error

I keep getting that there is an error uploading/importing my JSON file into Firebase. I initially had an excel spreadsheet that I saved as a CSV file, then I used a CSV to JSON converter.
I validated the JSON file (which have the .json extension) with a couple of online tools.
Though, I'm still getting an error.
Here is an example of my JSON:
{
"Rk": 1,
"Tm": "SEA",
"H/A": "H",
"DOW": "Sun",
"Opp": "CLE",
"QB": "Russell Wilson",
"Grade": "BLUE",
"Def mu pts": 4,
"Inj status": 0,
"Notes": "Got to wonder if not having a proven power RB under center will negatively impact Wilson's production.",
"TFS $50K": "$8,300",
"Init sal": "$8,300",
"Var": "$0",
"WC": 0
}
The issue is your key's..
Firebase keys must be:
UTF-8 encoded, cannot contain . $ # [ ] / or ASCII control characters
0-31 or 127
your $50k key and the H/A are the issues.

Find Duplicate JSON Keys in Sublime Text 3

I have a JSON file that, for now, is validated by hand prior to being placed into production. Ideally, this is an automated process, but for now this is the constraint.
One thing I found helpful in Eclipse were the JSON tools that would highlight duplicate keys in JSON files. Is there similar functionality in Sublime Text or through a plugin?
The following JSON, for example, could produce a warning about duplicate keys.
{
"a": 1,
"b": 2,
"c": 3,
"a": 4,
"d": 5
}
Thanks!
There are plenty of JSON validators available online. I just tried this one and it picked out the duplicate key right away. The problem with using Sublime-based JSON linters like JSONLint is that they use Python's json module, which does not error on extra keys:
import json
json_str = """
{
"a": 1,
"b": 2,
"c": 3,
"a": 4,
"d": 5
}"""
py_data = json.loads(json_str) # changes JSON into a Python dict
# which is unordered
print(py_data)
yields
{'c': 3, 'b': 2, 'a': 4, 'd': 5}
showing that the first a key is overwritten by the second. So, you'll need another, non-Python-based, tool.
Even Python documentation says that:
The RFC specifies that the names within a JSON object should be
unique, but does not mandate how repeated names in JSON objects should
be handled. By default, this module does not raise an exception;
instead, it ignores all but the last name-value pair for a given name:
weird_json = '{"x": 1, "x": 2, "x": 3}'
json.loads(weird_json) {'x': 3}
The object_pairs_hook parameter can be used to alter this behavior.
So as pointed from docs:
class JsonUniqueKeysChecker:
def __init__(self):
self.keys = []
def check(self, pairs):
for key, _value in pairs:
if key in self.keys:
raise ValueError("Non unique Json key: '%s'" % key)
else:
self.keys.append(key)
return pairs
And then:
c = JsonUniqueKeysChecker()
print(json.loads(json_str, object_pairs_hook=c.check)) # raises
JSON is very easy format, not very detailed so things like that can be painful. Detection of doubled keys is easy but I bet it's quite a lot of work to forge plugin from that.