How can I emit delimited text (like CSV) from Jq? - csv

When using Jq for data processing, it's often more convenient to emit the processed text in some kind of "delimited" form that other CLI tools can consume, such as Awk, Cut, and the read builtin in Bash.
Is there a straightforward way to achieve this?
Sample data:
[
{"a": 11, "b": 12, "c": 13},
{"a": 21, "b": 22, "c": 23},
{"a": 31, "b": 32, "c": 33},
{"a": 41, "b": 42, "c": 43}
]
Desired output:
a,c
11,13
21,21
31,33
41,43

jq --raw-output 'map({ a, c }) | ( .[0] | keys_unsorted), (.[] | [.[]]) | #csv'
Will produce:
"a","c"
11,13
21,23
31,33
41,43
Online JqPlay Demo

If you can assume that the attribute names are the same in all array elements, you can use the #csv formatter along with --raw-output:
Put this in a script like json-records-to-csv.jq, adjusting the shebang as needed:
#!/usr/bin/jq --raw-output -f
# Like `keys`; extracts object values as an array.
def values:
to_entries | map(.value)
;
# Get the column names from the first array element keys
(.[0] | keys | #csv)
,
# Get the values from every array element values
(.[] | values | #csv)
Usage example:
json-records-to-csv.jq <<'JSON'
[
{"a": 11, "b": 12, "c": 13},
{"a": 21, "b": 22, "c": 23},
{"a": 31, "b": 32, "c": 33},
{"a": 41, "b": 42, "c": 43}
]
JSON
Output:
"a","b","c"
11,12,13
21,22,23
31,32,33
41,42,43

Related

Error while reading JSON file in chunksizes with python

I have a large json file, so I want to read the file in chunks while testing. I have implemented the code below:
if fpath.endswith('.json'):
with open(fpath, 'r') as f:
read_query = pd.read_json(f, lines=True, chunksize=100)
for chunk in read_query:
print(chunk)
I get the error:
File "nameoffile.py", line 168, in read_queries_func
for chunk in read_query:
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 798, in __next__
obj = self._get_object_parser(lines_json)
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 770, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 885, in parse
self._parse_no_numpy()
File "C:\Users\Me\Python38\lib\site-packages\pandas\io\json\_json.py", line 1159, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
Why am I getting an error?
The JSON file looks like this:
[
{
"a": "13",
"b": "55"
},
{
"a": "15",
"b": "16"
},
{
"a": "18",
"b": "45"
},
{
"a": "1650",
"b": "26"
},
.
.
.
{
"a": "214",
"b": "23"
}
]
Also, is there a way to extract just the 'a' attribute's values while reading the file? Or can that only be done after I've read the file?
Your json file contains just one object. As per the line-delimited json doc to which the doc of the chunksize argument points:
pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark.
For line-delimited json files, pandas can also return an iterator which reads in chunksize lines at a time. This can be useful for large files or to read from a stream.
It also implies that lines=True, and the doc for lines says:
Read the file as a json object per line.
This means that files like this work:
{"a": 1, "b": 2}
{"a": 3, "b": 4}
{"a": 5, "b": 6}
{"a": 7, "b": 8}
{"a": 9, "b": 10}
These don’t:
[
{"a": 1, "b": 2},
{"a": 3, "b": 4},
{"a": 5, "b": 6},
{"a": 7, "b": 8},
{"a": 9, "b": 10}
]
So you have to read the file in one go, or modify it as you go to have one object per line.

Nest and merge JSON based on namespaces

I've got the following JSON document:
(sorted by namespace; any namespace can appear multiple times)
[ {"namespace": "/" , "exports": {"a": 10, "b": 11}}
, {"namespace": "/" , "exports": {"c": 12, "d": 13}}
, {"namespace": "/bar" , "exports": {"e": 14, "f": 15}}
, {"namespace": "/bar/baz", "exports": {"g": 16, "h": 17}}
]
that I need to convert into this JSON document:
(the risk of key collisions can be ignored)
{ "a": 10
, "b": 11
, "c": 12
, "d": 13
, "bar": { "e": 14
, "f": 15
, "baz": { "g": 16
, "h": 17}}}
Note that whilst nesting namespaces we only keep their basename e.g.,
/
/bar
/bar/baz
/bar/baz/bat
becomes:
{"bar": {"baz": {"bat": {}}}}
Members must be nested under their corresponding namespace object with the only expection of members of the root "/" namespace which become top-level properties.
I've scratched my head a few times over this problem as I'd like to get this done in one pass (ideally) but I'm open to any other suggestions.
Convert namespaces to paths and you can build the desired output using getpath/setpath built-ins.
reduce .[] as {$namespace, $exports} ({};
($namespace | ltrimstr("/") | split("/")) as $path
| setpath($path; getpath($path) + $exports)
)
Online demo

How do I transform this JSON data using JQ to extract each nested array element to the top level in turn?

Given input of the form
[
{"a": 1, "b": [{"c": 1}, {"c": 2}]},
{"a": 2, "b": [{"c": 4}, {"c": 5}]}
]
I'm trying to transform to look like:
[
{"a": 1, "b": [{"c": 1}],
{"a": 1, "b": [{"c": 2}],
{"a": 2, "b": [{"c": 3}],
{"a": 2, "b": [{"c": 4}]
]
I have [map(.b) ] | flatten, however any further operation using the parent context does not seems to be possible. I'm really stuck and would appreciate any help.
Thanks
Here's a straightforward solution that makes no mention of any keys besides "b":
map(. + (.b[] | {b: [.]}))
You can try this filter:
jq 'map({a,"b":.b[]|[.]})' file
It updates the content of b with each value of c separately.

JQ: how can I remove keys based on regex?

I would like to remove all keys that start with "hide". Important to note that the keys may be nested at many levels. I'd like to see the answer using a regex, although I recognise that in my example a simple contains would suffice. (I don't know how to do this with contains, either, BTW.)
Input JSON 1:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
}
}
Desired output JSON:
{
"a": 1,
"b": 2,
"c": {
"d": 4
}
}
Input JSON 2:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
},
"e": null,
"f": "hiya",
"g": false,
"h": [{
"i": 343.232,
"hide9": "private",
"so_smart": true
}]
}
Thanks!
Since you're just checking the start of the keys, you could use startswith/1 instead in this case, otherwise you could use test/1 or test/2. Then you could pass those paths to be removed to delpaths/1.
You might want to filter the key by strings (or convert to strings) beforehand to account for arrays in your tree.
delpaths([paths | select(.[-1] | strings | startswith("hide"))])
delpaths([paths | select(.[-1] | strings | test("^hide"; "i"))])
A straightforward approach to the problem is to use walk in conjunction with with_entries, e.g.
walk(if type == "object"
then with_entries(select(.key | test("^hide") | not))
else . end)
If your jq does not have walk/1 simply include its def (available e.g. from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq) before invoking it.

Assigning parent keys in innermost object using JQ

I would like to turn this:
{
"a": 1,
"b": [1,2,3,4]
}
into this
[
{"a": 1, "b": 1},
{"a": 1, "b": 2},
...
]
This is sort of like python's zip but with unequally shaped objects.
Thanks!
Here is a solution:
$ jq -Mc '[.b=.b[]]' data.json
If data.json contains the sample data the output is
[{"a":1,"b":1},{"a":1,"b":2},{"a":1,"b":3},{"a":1,"b":4}]
You can use cat ab.json|jq '[{"a": .a, "b": .b[]}]' to get the answer.
If minimizing keystrokes is the goal, then consider:
jq '.+{b:.b[]}' <<< "$j"
{
"a": 1,
"b": 1
}
{
"a": 1,
"b": 2
}
{
"a": 1,
"b": 3
}
{
"a": 1,
"b": 4
}
Using . here ensures that all keys other than "b" will be preserved. By contrast, if one wants to ignore all the keys other than "a" and "b", then one could use the jq filter:
{a,b:.b[]}
To turn the stream into an array, just wrap the expression in square brackets: [ ... ]