Nest and merge JSON based on namespaces - json

I've got the following JSON document:
(sorted by namespace; any namespace can appear multiple times)
[ {"namespace": "/" , "exports": {"a": 10, "b": 11}}
, {"namespace": "/" , "exports": {"c": 12, "d": 13}}
, {"namespace": "/bar" , "exports": {"e": 14, "f": 15}}
, {"namespace": "/bar/baz", "exports": {"g": 16, "h": 17}}
]
that I need to convert into this JSON document:
(the risk of key collisions can be ignored)
{ "a": 10
, "b": 11
, "c": 12
, "d": 13
, "bar": { "e": 14
, "f": 15
, "baz": { "g": 16
, "h": 17}}}
Note that whilst nesting namespaces we only keep their basename e.g.,
/
/bar
/bar/baz
/bar/baz/bat
becomes:
{"bar": {"baz": {"bat": {}}}}
Members must be nested under their corresponding namespace object with the only expection of members of the root "/" namespace which become top-level properties.
I've scratched my head a few times over this problem as I'd like to get this done in one pass (ideally) but I'm open to any other suggestions.

Convert namespaces to paths and you can build the desired output using getpath/setpath built-ins.
reduce .[] as {$namespace, $exports} ({};
($namespace | ltrimstr("/") | split("/")) as $path
| setpath($path; getpath($path) + $exports)
)
Online demo

Related

Error while reading JSON file in chunksizes with python

I have a large json file, so I want to read the file in chunks while testing. I have implemented the code below:
if fpath.endswith('.json'):
with open(fpath, 'r') as f:
read_query = pd.read_json(f, lines=True, chunksize=100)
for chunk in read_query:
print(chunk)
I get the error:
File "nameoffile.py", line 168, in read_queries_func
for chunk in read_query:
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 798, in __next__
obj = self._get_object_parser(lines_json)
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 770, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 885, in parse
self._parse_no_numpy()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 1159, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
Why am I getting an error?
The JSON file looks like this:
[
{
"a": "13",
"b": "55"
},
{
"a": "15",
"b": "16"
},
{
"a": "18",
"b": "45"
},
{
"a": "1650",
"b": "26"
},
.
.
.
{
"a": "214",
"b": "23"
}
]
Also, is there a way to extract just the 'a' attribute's values while reading the file? Or can that only be done after I've read the file?
Your json file contains just one object. As per the line-delimited json doc to which the doc of the chunksize argument points:
pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark.
For line-delimited json files, pandas can also return an iterator which reads in chunksize lines at a time. This can be useful for large files or to read from a stream.
It also implies that lines=True, and the doc for lines says:
Read the file as a json object per line.
This means that files like this work:
{"a": 1, "b": 2}
{"a": 3, "b": 4}
{"a": 5, "b": 6}
{"a": 7, "b": 8}
{"a": 9, "b": 10}
These don’t:
[
{"a": 1, "b": 2},
{"a": 3, "b": 4},
{"a": 5, "b": 6},
{"a": 7, "b": 8},
{"a": 9, "b": 10}
]
So you have to read the file in one go, or modify it as you go to have one object per line.

How can I emit delimited text (like CSV) from Jq?

When using Jq for data processing, it's often more convenient to emit the processed text in some kind of "delimited" form that other CLI tools can consume, such as Awk, Cut, and the read builtin in Bash.
Is there a straightforward way to achieve this?
Sample data:
[
{"a": 11, "b": 12, "c": 13},
{"a": 21, "b": 22, "c": 23},
{"a": 31, "b": 32, "c": 33},
{"a": 41, "b": 42, "c": 43}
]
Desired output:
a,c
11,13
21,21
31,33
41,43
jq --raw-output 'map({ a, c }) | ( .[0] | keys_unsorted), (.[] | [.[]]) | #csv'
Will produce:
"a","c"
11,13
21,23
31,33
41,43
Online JqPlay Demo
If you can assume that the attribute names are the same in all array elements, you can use the #csv formatter along with --raw-output:
Put this in a script like json-records-to-csv.jq, adjusting the shebang as needed:
#!/usr/bin/jq --raw-output -f
# Like `keys`; extracts object values as an array.
def values:
to_entries | map(.value)
;
# Get the column names from the first array element keys
(.[0] | keys | #csv)
,
# Get the values from every array element values
(.[] | values | #csv)
Usage example:
json-records-to-csv.jq <<'JSON'
[
{"a": 11, "b": 12, "c": 13},
{"a": 21, "b": 22, "c": 23},
{"a": 31, "b": 32, "c": 33},
{"a": 41, "b": 42, "c": 43}
]
JSON
Output:
"a","b","c"
11,12,13
21,22,23
31,32,33
41,42,43

JQ: how can I remove keys based on regex?

I would like to remove all keys that start with "hide". Important to note that the keys may be nested at many levels. I'd like to see the answer using a regex, although I recognise that in my example a simple contains would suffice. (I don't know how to do this with contains, either, BTW.)
Input JSON 1:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
}
}
Desired output JSON:
{
"a": 1,
"b": 2,
"c": {
"d": 4
}
}
Input JSON 2:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
},
"e": null,
"f": "hiya",
"g": false,
"h": [{
"i": 343.232,
"hide9": "private",
"so_smart": true
}]
}
Thanks!
Since you're just checking the start of the keys, you could use startswith/1 instead in this case, otherwise you could use test/1 or test/2. Then you could pass those paths to be removed to delpaths/1.
You might want to filter the key by strings (or convert to strings) beforehand to account for arrays in your tree.
delpaths([paths | select(.[-1] | strings | startswith("hide"))])
delpaths([paths | select(.[-1] | strings | test("^hide"; "i"))])
A straightforward approach to the problem is to use walk in conjunction with with_entries, e.g.
walk(if type == "object"
then with_entries(select(.key | test("^hide") | not))
else . end)
If your jq does not have walk/1 simply include its def (available e.g. from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq) before invoking it.

Almost automatically sorting keys with `jq`, but keep "id" key, if present, on top

Is there a way to sort the keys of a JSON using jq but keeping keys named "id" as first descendants on all trees? It's nice to have a way to easily compare JSON files to one another and normalizing key order and formatting is a great way to ensure they are easy to match, but sometimes the "id" key is the one we are looking for and it's not always easy to find if it's buried in the middle of the tree.
As an example, this:
{
"z-displacement": 3,
"absorption": 0.4,
"collections": [
{
"b": 12,
"a": 18,
"id" 190:,
},
{
"m": 22,
"id": 169,
"n": 3,
},
],
"id": 256767
}
Would become something like:
{
"id": 256767,
"absorption": 0.4,
"collections": [
{
"id" 190:,
"a": 18,
"b": 12
},
{
"id": 169,
"m": 22,
"n": 3
}
],
"z-displacement": 3
}
Assuming you are using jq 1.4 or later, the following will do what is requested for all the JSON objects in the input, not only those at the top level:
def reorder:
(if has("id") then {id} else null end) + (to_entries | sort | from_entries );
walk(if type == "object" then reorder else . end)
If your jq does not have walk/1, you can snarf its def from the jq FAQ https://github.com/stedolan/jq/wiki/FAQ or from the "master" version of builtin.jq
I have no idea how robust this is, but it gets the desired result in this case.
jq -S '.' | jq '{id} + .'

How to convert a regular dataframe into JSON?

I've seen different conversions done on Stack, and none of them have the results I need. I have a data frame that I imported from an Excel file, manipulated, and want to export as a JSON file. I have tried this:
exportJson <- toJSON(data)
print(exportJson)
write(exportJson, "test.json")
json_data <- fromJSON(file="test.json")
My data looks like this:
Jill Jimmie Alex Jane
Jill Jill 0 Jill Jill
Jimmie 0 Jimmie Jimmie 0
Alex 0 Alex Alex 0
Jane Jane Jane Jane 0
My output looks like this:
{
"Jill": ["Jill",
"0",
"0",
"Jane",
"0",
"0",
"0",
"0",
"0",
"0",
...
when I need it to look like this format:
{
"nodes": [
{
"id": "id1",
"name": "Jill",
"val": 1
},
{
"id": "id2",
"name": "Jill",
"val": 10
},
(...)
],
"links": [
{
"source": "id1",
"target": "id2"
},
(...)
]
}
I've seen ways of converting JSON to a dataframe and I am aware of RJSONIO, jsonlite, rjson, etc. , I've googled it, and maybe I am just missing an obvious answer.
The '.' command in jq will reformat the JSON data. Using the jqr package:
library(jqr)
# Unformatted (no whitespace)
x <- '{"a":1,"b":2,"c":[1,2,3],"d":{"e":1,"f":2}}'
jq(x, '.')
Output reformatted (with whitespace)
{
"a": 1,
"b": 2,
"c": [
1,
2,
3
],
"d": {
"e": 1,
"f": 2
}
}
jq is also a available as a standalone utility: https://stedolan.github.io/jq/