Parse tab intended list to JSON with jq - json

I have a legacy cli tool which outputs a structured list with sub-items intended with a tab (stackoverflow won't let me put tabs here so I replaced them with 4 spaces in this example).
Heading One:
Sub One: 'Value 1'
Sub Two: 'Value 2'
Heading Two:
Sub Three: 'Value 3'
Sub Four: 'Value 4'
Key One: 'This key has no heading'
I try to achieve an JSON output like
{
"Heading One": {
"Sub One": "Value 1",
"Sub Two": "Value 2"
},
"Heading Two": {
"Sub Three": "Value 3",
"Sub Four": "Value 4"
},
"Key One": "This key has no heading"
}
Is this possible with jq or do I need to write a more complex python-script?

This is an approach for a deeply nested input. It splits on top-level items using a negative look-ahead regex on tabs following newlines, then separates the head and "unindents" the rest by removing one tab following a newline, which serves as input for a recursive call.
jq -Rs '
def comp:
reduce (splits("\n(?!\\t)") | select(length > 0)) as $item ({};
($item | index(":")) as $hpos | .[$item[:$hpos]] = (
$item[$hpos + 1:] | gsub("\n\t"; "\n")
| if test("\n") then comp else .[index("'\''") + 1: rindex("'\''")] end
)
);
comp
'
{
"Heading One": {
"Sub One": "Value 1",
"Sub Two": "Value 2"
},
"Heading Two": {
"Sub Three": "Value 3",
"Sub Four": "Value 4"
},
"Key One": "This key has no heading"
}

Related

JSON, Key-Value-Pairs in Groups to "flat" key-value-pair

Each key of a json should be combinded with _ (or any valid json symbol). To have a simple key-value list.
I have the following structure. A few json groups (no arrays), in side the groups there are key-value pairs. I need to flat them to a single key-value-list. I tried jq but there is only sth. "nested" / "unnested". I do not find sth. about flatten or compaine the keys.
So it should be "key_subkey_subsubkey": "value"
{
"welcome": {
"title" : "Hello World"
},
"block1": {
"header": "My Header",
"body": "My BODY of block 1",
"footer": "My Footer"
},
"multi": {
"level-01-A": {
"head": "Head Section",
"foot": "Foot Section"
"level-02-A": {
"head": "Head Section Level 2 A",
"fead": "Foot Section Level 2 A"
},
"level-02-B": {
"head": "Head Section Level 2 B",
"fead": "Foot Section Level 2 B"
},
},
"level-01-B": {
"head": "Head Section",
"foot": "Foot Section"
}
"no-level" : "Foo Bar",
}
}
and I want to have
{
"welcome_title" : "Hello World",
"block1_header": "My Header",
"block1_body": "My BODY of block 1",
"block1_footer": "My Footer",
"multi_level-01-A_head": "Head Section",
"multi_level-01-A_foot": "Foot Section",
"multi_level-01-A_level-02-A_head": "Head Section Level 1 A",
"multi_level-01-A_level-02-A_fead": "Foot Section Level 1 A",
"multi_level-01-A_level-02-B_head": "Head Section Level 1 B",
"multi_level-01-A_level-02-B_fead": "Foot Section Level 1 B",
"multi_level-01-B_head": "Head Section",
"multi_level-01-B_foot": "Foot Section",
"multi_no-level" : "Foo Bar"
}
Any idea, what tool i can use?
[ paths(scalars) as $p | { "key": $p | join("_"), "value": getpath($p) } ] | from_entries
Will generate
{
"welcome_title": "Hello World",
"block1_header": "My Header",
"block1_body": "My BODY of block 1",
"block1_footer": "My Footer",
"multi_level-01-A_head": "Head Section",
"multi_level-01-A_foot": "Foot Section",
"multi_level-01-A_level-02-A_head": "Head Section Level 2 A",
"multi_level-01-A_level-02-A_fead": "Foot Section Level 2 A",
"multi_level-01-A_level-02-B_head": "Head Section Level 2 B",
"multi_level-01-A_level-02-B_fead": "Foot Section Level 2 B",
"multi_level-01-B_head": "Head Section",
"multi_level-01-B_foot": "Foot Section",
"multi_no-level": "Foo Bar"
}
As you can test in this online demo
paths [docs]
paths outputs the paths to all the elements in its input (except it does not output the empty list, representing . itself).
getpath [docs]
The builtin function getpath outputs the values in . found at each path in PATHS.
from_entries [docs]
These functions convert between an object and an array of key-value pairs. If to_entries is passed an object, then for each k: v entry in the input, the output array includes {"key": k, "value": v}.
So the steps I took are:
Get every available (nested) path and hold it in a variable
paths(scalars) as $p
Create an object holding the key and value. We can retreive the value by using getpath
{ "key": $p | join("_"), "value": getpath($p) }
Use from_entries to convert to a single object
| from_entries

remember previous json value in array

I have a sorted array list and I wish to make a formatting decision depending on the value of a json bool in the previous iteration of the array. The code example uses the current value of igroup but I need to have a MARK on the first instance of igroup=true and then all subsequent igroup=true to be SPACE. To do this I need to know what the previous value of igroup was. It seems a variable to try and remember the prev value is not possible so I am at a loss as to how I can make this happen.
Code:
.result
| keys[] as $c
|
(
(.[$c].segments[0].lines | keys[]) as $l |
[
"line format",
( if .[$c].segments[0].lines[$l].igroup then "SPACE"
else "MARK" end ),
( [ .[$c].segments[0].lines[$l].products[].name ] | join(",")),
"end line format"
]
),
["END","0"]
| join("|")
example data:
{
"result": [
{
"cn": "abc",
"segments": [
{
"lines": [
{
"igroup": false,
"products": [
{
"name": "Should be MARK"
}
]
},
{
"igroup": false,
"products": [
{
"name": "Should be MARK"
},
{
"name": "Addon to MARK"
}
]
},
{
"igroup": true,
"products": [
{
"name": "Should be MARK First igroup=true BROKEN!!!"
}
]
},
{
"igroup": true,
"products": [
{
"name": "Should be SPACE !! After first igroup=true"
}
]
},
{
"igroup": true,
"products": [
{
"name": "Should be SPACE until next igroup=false"
}
]
},
{
"igroup": false,
"products": [
{
"name": "Should be MARK"
}
]
}
]
}
]
}
],
"id": 1
}
The output:
"line format|MARK|Should be MARK|end line format"
"line format|MARK|Should be MARK,Addon to MARK|end line format"
"line format|SPACE|Should be MARK First igroup=true BROKEN!!!|end line format"
"line format|SPACE|Should be SPACE !! After first igroup=true|end line format"
"line format|SPACE|Should be SPACE until next igroup=false|end line format"
"line format|MARK|Should be MARK|end line format"
"END|0"
There's a couple of approaches you could take, reduce over the indices so you could index into the array to check the values. But this requires keeping a separate reference to the original input.
Otherwise using foreach allows you to keep track of the previous igroup through the accumulator.
foreach .result[].segments[].lines[] as {$igroup,$products} (
[null,false]; # init: pair of previous and current value
[.[1],$igroup]; # update: update values
[
"line format",
if .[0] and .[0] == $igroup then "SPACE" else "MARK" end,
([$products[].name] | join(",")),
"end line format"
] # extract: the collection of values to output
), ["END",0] | join("|")
"line format|MARK|Should be MARK|end line format"
"line format|MARK|Should be MARK,Addon to MARK|end line format"
"line format|MARK|Should be MARK First igroup=true BROKEN!!!|end line format"
"line format|SPACE|Should be SPACE !! After first igroup=true|end line format"
"line format|SPACE|Should be SPACE until next igroup=false|end line format"
"line format|MARK|Should be MARK|end line format"
"END|0"
Tweak the output as needed.
jqplay
I was able to make some progress on my own but am still not happy with the result. I was able to use the index to check the prev arrays value. I dont like the 3 deep if-then-else-end blocks but at least it works. What it does do is allows me access to the multitude of other fields in the segments and lines arrays that the test data doesn't show.
.result
| keys[] as $c
|
(
(.[$c].segments[0].lines | keys[]) as $l |
[
( $l ),
( if $l > 0
then
( if .[$c].segments[0].lines[$l].igroup
then
( if .[$c].segments[0].lines[$l - 1].igroup
then "SPACE"
else "MARK" end
)
else "MARK" end
)
else "MARK" end
),
"line format",
( [ .[$c].segments[0].lines[$l].products[].name ] | join(",")),
"end line format"
]
),
["END","0"]
| join("|")
I will look into the solution above as an alternative.

Validate JSON schema for expected values

I have a JSON which I would like to validate.
There are is an object inside an array, within each object there is a property called name.
I want 1st validate that there are 3 objects.
And I want to validate the value of each of the property.
{
"hello": [
{
"world": "value 1"
},
{
"world": "value 2"
},
{
"world": "value 3"
}
]
}
I want to validate that the JSON has value 1, value 2, value 3 using a JS0N schema
Using the language of JSON Extended Structural Schemas (JESS), the three requirements could be written in JSON as follows (assuming that you meant world rather than name):
["&",
{ "hello": [ {"world": "string"} ] },
{"forall": ".[hello]|length", "equal": 3 },
{"setof": ".[hello][]|.[world]", "supersetof": ["value 1", "value 2", "value 3" ]}
]
This may not be exactly what you want, e.g. perhaps you want the constraints to be written without reference to the name of the top-level key. This could be accomplished as follows:
["&",
{"forall": ".[]", "schema": [ {"world": "string"} ] },
{"forall": ".[]|length", "equal": 3 },
{"setof": ".[][]|.[world]", "supersetof": ["value 1", "value 2", "value 3" ]}
]
Also you could modify the above to express the requirements without preventing the objects from having additional keys. It all depends on what you really want.
Note that the JESS checker requires jq to run. There is a ruby gem for jq.

jq, replace null values on any level, not touching non-null or not existing

please assist to a newbie in jq. :)
I have to update a field with specific name that might occur on any level of JSON structure - and might not. Like with all *.description fields in JSON below:
{
"a": {
"b": [{
"name": "b0",
"description": "b0 has description"
},
{
"name": "b1",
"description": null
},
{
"name": "b2"
}
],
"description": null
},
"s": "Some string value"
}
I need to update "description" value with some dummy value if only it has null value, but do not touch existing values and do not create new fields where they do not exist. So desired result in this case is:
{
"a": {
"b": [{
"name": "b0",
"description": "b0 has description"
},
{
"name": "b1",
"description": "DUMMY DESCRIPTION"
},
{
"name": "b2"
}
],
"description": "DUMMY DESCRIPTION"
},
"s": "Some string value"
}
Here, .a.b[0].description left untouched because it existed and was not null; .a.b[1].description and .a.description are forced to "DUMMY DESCRIPTION" because these field existed and were null; and .a.b[2] as well as root level left untouched because there was no description field at all.
If for example I try to use command on known paths like below
jq '.known.level.description //= "DUMMY DESCRIPTION"' ........
it fails to skip non-existing fields like .a.b[2].description; and, sure, it works on known positions in JSON only. And if I try to do recursive search like:
jq '.. | .description? //= "DUMMY DESCRIPTION"' ........
it does not seem to work correctly on arrays.
What's the correct approach to walk through entire JSON in this case? Thanks!
What's the correct approach to walk through entire JSON in this case?
The answer is walk!
If your jq does not already have walk/1, you can google for it easily enough (jq "def walk"), and then include its def before using it, e.g. as follows:
walk(if type == "object" and has("description") and .description == null
then .description = "DUMMY DESCRIPTION"
else . end)
One option you could consider is using streams. You'll get paths and values to every item in the input. With that you could look for name/value pairs with the name "description" and update the value.
$ jq --arg replacement "DUMMY DESCRIPTION" '
fromstream(tostream | if length == 2 and .[0][-1] == "description"
then .[1] |= (. // $replacement)
else .
end)
' input.json

jq filter to remove duplicates by comparing objects inside

I am trying to convert JSON lines to JSON and in the process trying to find and remove duplicates by comparing values from objects.
For example:
{"headline": "sample headline 1", "title": "sample title 1", "href": "sample link 1", "day": " Fri, 7 Jul 2017 , 8:30PM ", "tags": "tag1"}
{"headline": "sample headline 2", "title": "sample title 2", "href": "sample link 2", "day": " Fri, 7 Jul 2017 , 8:30PM ", "tags": "tag2"}
{"headline": "sample headline 3", "title": "sample title 3", "href": "sample link ", "day": " Fri, 7 Jul 2017 , 8:30PM ", "tags": "tag3"}
{"headline": "sample headline 4", "title": "sample title 1", "href": "sample link 4", "day": " Fri, 7 Jul 2017 , 8:30PM ", "tags": "tag4"}
Now I want to compare title from the first JSON line and the fourth JSON line, and if the title is the same I want to omit one of the entries.
I have only been able to convert it to JSON and remove duplicates by comparing all objects:
jq --slurp [.[]] | unique
but this compares all objects inside whereas I want to compare only one object and remove the entire line. How can I do that?
In words:
.. to compare title from the first JSON line and the fourth JSON line, and if the title is the same I want to omit one of the entries.
In jq:
jq -s 'if .[0].title == .[3].title then del(.[0]) else . end'
In words:
find and remove duplicates [based on .title]
In jq:
INDEX(.title) | [.[]]
Apart from brevity, the big advantage of using INDEX/1 here (e.g. vs unique or group_by) is that it does NOT incur the cost of sorting.
(If your jq does not have INDEX then simply copy its definition from https://github.com/stedolan/jq/blob/master/src/builtin.jq )
Using the -f option
Assuming you have jq 1.5 and that the file named program.jq contains the following text:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);
def INDEX(idx_expr): INDEX(.[]; idx_expr);
INDEX(.title) | [.[]]
you can invoke jq as follows:
jq -s -f program.jq input
where "input" is the name of the file containing the JSON lines (or JSON stream).
jq 1.4
If you only have access to jq 1.4, then you could use this variant:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);
INDEX(.[]; .title) | [.[]]
jq 1.3
jq 1.3 is very out-of-date but if you cannot upgrade, then for present purposes, it will suffice to use the version immediately above, replacing tojson with tostring. Or even just:
def INDEX(f): map( {(f|tostring): . } ) | add;