How to explode/unwrap all documents using jq? - json

I have a very large file that looks like this:
[
{a: 4, b: [1,2,3]},
{a: 6, b: [7,8,9]},
]
and I would like to transform it to
{a: 4, b: 1},
{a: 4, b: 2},
{a: 4, b: 3},
{a: 6, b: 7},
{a: 6, b: 8},
{a: 6, b: 9}
using jq. The filter .[] | {a: .a, b: .b[]} would work for a smaller set of input. Given the size of the file, I want to use --streaming. Anyone who could give a pointer on how to use streaming to solve this problem?

If the "very large file" fits into your memory, just decompose the array .[], and create your objects as needed using iterations {a, b: .b[]}:
jq -c '.[] | {a, b: .b[]}'
{"a":4,"b":1}
{"a":4,"b":2}
{"a":4,"b":3}
{"a":6,"b":7}
{"a":6,"b":8}
{"a":6,"b":9}
Demo
If not, but an array item alone would, use the --stream flag to read the file in parts, only consider the items level using truncate_stream with level 1, re-compose the array items using fromstream, and create the final objects as above:
jq --stream -cn 'fromstream(1 | truncate_stream(inputs)) | {a, b: .b[]}'
{"a":4,"b":1}
{"a":4,"b":2}
{"a":4,"b":3}
{"a":6,"b":7}
{"a":6,"b":8}
{"a":6,"b":9}

Related

Count elements in nested JSON with jq

I am trying to count all elements in a nested JSON-document with jq?
Given the following JSON-document
{"a": true, "b": [1, 2], "c": {"a": {"aa":1, "bb": 2}, "b": "blue"}}
I want to calculate the result 6.
In order to do this, I tried the following:
echo '{"a": true, "b": [1, 2], "c": {"a": {"aa":1, "bb": 2}, "b": "blue"}}' \
| jq 'reduce (.. | if (type == "object" or type == "array")
then length else 0 end) as $counts
(1; . + $counts)'
# Actual output: 10
# Desired output: 6
However, this counts the encountered objects and arrays as well and therefore yields 10 opposing to the desired output: 6
So, how can I only count the document's elements/leaf-nodes?
Thanks already in advance for you help!
Edit: What would be an efficient approach to count empty arrays and objects as well?
You can use the scalars filter to find leaf nodes. Scalars are all "simple" JSON values, i.e. null, true, false, numbers and strings. Alternatively you can compare the type of each item and use length to determine if an object or array has children.
I've expanded your input data a little to distinguish a few more corner cases:
Input:
{
"a": true,
"b": [1, 2],
"c": {
"a": {
"aa": 1,
"bb": 2
},
"b": "blue"
},
"d": [],
"e": [[], []],
"f": {}
}
This has 15 JSON entities:
5 of them are arrays or objects with children.
4 of them are empty arrays or objects.
6 of them are scalars.
Depending on what you're trying to do, you might consider only scalars to be "leaf nodes", or you might consider both scalars and empty arrays and objects to be leaf nodes.
Here's a filter that counts scalars:
[..|scalars]|length
Output:
6
And here's a filter that counts all entities which have no children. It just checks for all the scalar types explicitly (there are only six possible types for a JSON value) and if it's not one of those it must be an array or object, where we can check how many children it has with length.
[
..|
select(
(type|IN("boolean","number","string","null")) or
length==0
)
]|
length
Output:
10

How can I emit delimited text (like CSV) from Jq?

When using Jq for data processing, it's often more convenient to emit the processed text in some kind of "delimited" form that other CLI tools can consume, such as Awk, Cut, and the read builtin in Bash.
Is there a straightforward way to achieve this?
Sample data:
[
{"a": 11, "b": 12, "c": 13},
{"a": 21, "b": 22, "c": 23},
{"a": 31, "b": 32, "c": 33},
{"a": 41, "b": 42, "c": 43}
]
Desired output:
a,c
11,13
21,21
31,33
41,43
jq --raw-output 'map({ a, c }) | ( .[0] | keys_unsorted), (.[] | [.[]]) | #csv'
Will produce:
"a","c"
11,13
21,23
31,33
41,43
Online JqPlay Demo
If you can assume that the attribute names are the same in all array elements, you can use the #csv formatter along with --raw-output:
Put this in a script like json-records-to-csv.jq, adjusting the shebang as needed:
#!/usr/bin/jq --raw-output -f
# Like `keys`; extracts object values as an array.
def values:
to_entries | map(.value)
;
# Get the column names from the first array element keys
(.[0] | keys | #csv)
,
# Get the values from every array element values
(.[] | values | #csv)
Usage example:
json-records-to-csv.jq <<'JSON'
[
{"a": 11, "b": 12, "c": 13},
{"a": 21, "b": 22, "c": 23},
{"a": 31, "b": 32, "c": 33},
{"a": 41, "b": 42, "c": 43}
]
JSON
Output:
"a","b","c"
11,12,13
21,22,23
31,32,33
41,42,43

jq produces memory overflow

I have a json file where a time series in stored under data key and and an object id is in info key:
{info:
{id: abc},
data:[
[10, 5, 3],
[12, 6, 4],
# 5000 list items
]
}
I would like to flatten the json and produce something similar to:
[
{id: abc, time: 10, x: 5, y: 3},
{id: abc, time: 12, x: 6, y: 4},
# the rest of 5000 points
]
I'm running a jq query and seems to work well to produce a series of items:
"{time: .data[][0], x: .data[][2], y: .data[][1], item: .info.id}"
When I try to put the same expression into a list to create a list of dicts, I'm hitting a memory overflow limit:
"[{time: .data[][0], x: .data[][2], y: .data[][1], item: .info.id}]"
Is there anyhting else I can do differently? Many thanks in advance.
#peak has already pointed out the problem with your query, and here is the solution based on the insight he provided:
[ (.data[] | {time: .[0], x: .[1], y: .[2]}) + {id: .info.id} ]
See it online on jqplay.org

JQ: how can I remove keys based on regex?

I would like to remove all keys that start with "hide". Important to note that the keys may be nested at many levels. I'd like to see the answer using a regex, although I recognise that in my example a simple contains would suffice. (I don't know how to do this with contains, either, BTW.)
Input JSON 1:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
}
}
Desired output JSON:
{
"a": 1,
"b": 2,
"c": {
"d": 4
}
}
Input JSON 2:
{
"a": 1,
"b": 2,
"hideA": 3,
"c": {
"d": 4,
"hide4": 5
},
"e": null,
"f": "hiya",
"g": false,
"h": [{
"i": 343.232,
"hide9": "private",
"so_smart": true
}]
}
Thanks!
Since you're just checking the start of the keys, you could use startswith/1 instead in this case, otherwise you could use test/1 or test/2. Then you could pass those paths to be removed to delpaths/1.
You might want to filter the key by strings (or convert to strings) beforehand to account for arrays in your tree.
delpaths([paths | select(.[-1] | strings | startswith("hide"))])
delpaths([paths | select(.[-1] | strings | test("^hide"; "i"))])
A straightforward approach to the problem is to use walk in conjunction with with_entries, e.g.
walk(if type == "object"
then with_entries(select(.key | test("^hide") | not))
else . end)
If your jq does not have walk/1 simply include its def (available e.g. from https://raw.githubusercontent.com/stedolan/jq/master/src/builtin.jq) before invoking it.

Merge several json arrays in circe

Let's say we have 2 json arrays. How to merge them into a single array with circe? Example:
Array 1:
[{"id": 1}, {"id": 2}, {"id": 3}]
Array 2:
[{"id": 4}, {"id": 5}, {"id": 6}]
Needed:
[{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 5}, {"id": 6}]
I've tried deepMerge, but it only keeps the contents of the argument, not of the calling object.
Suppose we've got the following set-up (I'm using circe-literal for convenience, but your Json values could come from anywhere):
import io.circe.Json, io.circe.literal._
val a1: Json = json"""[{"id": 1}, {"id": 2}, {"id": 3}]"""
val a2: Json = json"""[{"id": 4}, {"id": 5}, {"id": 6}]"""
Now we can combine them like this:
for { a1s <- a1.asArray; a2s <- a2.asArray } yield Json.fromValues(a1s ++ a2s)
Or:
import cats.std.option._, cats.syntax.cartesian._
(a1.asArray |#| a2.asArray).map(_ ++ _).map(Json.fromValues)
Both of these approaches are going to give you an Option[Json] that will be None if either a1 or a2 don't represent JSON arrays. It's up to you to decide what you want to happen in that situation .getOrElse(a2) or .getOrElse(a1.deepMerge(a2)) might be reasonable choices, for example.
As a side note, the current contract of deepMerge says the following:
Null, Array, Boolean, String and Number are treated as values, and values from the argument JSON completely replace values from this JSON.
This isn't set in stone, though, and it might not be unreasonable to have deepMerge concatenate JSON arrays—if you want to open an issue we can do some more thinking about it.