Add an empty line prior to another matched line in jq? - json

Say I have a raw input like the following:
"```"
"include <stdio.h>"
"..."
"```"
"''some example''"
"*bob"
"**bob"
"*bob"
And I'd like to add a blank line right before the "*bob":
"```"
"include <stdio.h>"
"..."
"```"
"''some example''"
""
"*bob"
"**bob"
"*bob"
Can this be done with jq?

Yes, but to do so efficiently you'd effectively need jq 1.5 or higher:
foreach inputs as $line (0;
if $line == "*bob" then . + 1 else . end;
if . == 1 then "" else empty end,
$line)
Don't forget to use the -n command-line option!

Here is another solution which uses the -s (slurp) option
.[: .[["*bob"]][0]] + ["\n"] + .[.[["*bob"]][0]:] | .[]
that's a little unreadable but we can make it better with a few functions:
def firstbob: .[["*bob"]][0] ;
def beforebob: .[: firstbob ] ;
def afterbob: .[ firstbob :] ;
beforebob + ["\n"] + afterbob
| .[]
if the above filter is in filter.jq and the sample data is in data then
$ jq -Ms -f filter.jq data
produces
"```"
"include <stdio.h>"
"..."
"```"
"''some example''"
"\n"
"*bob"
"**bob"
"*bob"
One issue with this approach is that beforebob and afterbob won't quite work as you probably want if "*bob" is not in the input. The easiest way to address that is with an if guard:
if firstbob then beforebob + ["\n"] + afterbob else . end
| .[]
with that the input will be unaltered if "*bob" is not present.

Related

jq ignores else clause

I would like to search and replace a specific value in a json line-based data using jq and keep the object unmodified if the match is false.
Considering the following input.json:
{"foo":{"bar":"one"}}
{"foo":{"bar":"two"}}
I tried the following statement, but the else clause is ignored, and therefore lines that don't match are lost:
jq -c '. | if (select(.foo.bar == "one")) then .foo.bar |= "ok" else . end' input.json
produces result:
{"foo":{"bar":"ok"}}
The following command produces the right output:
jq -c '(. | select(.foo.bar == "one")).foo.bar = "ok"' input.json
desired output:
{"foo":{"bar":"ok"}}
{"foo":{"bar":"two"}}
Why the first command fails to output the object . in the else clause?
Don't use select:
. | if .foo.bar == "one" then .foo.bar |= "ok" else . end
select is useful for filtering, but it doesn't return a false value if there's nothing to select, it doesn't return anything.

JSON to CSV: variable number of columns per row

I need to convert JSON to CSV where JSON has arrays of variable length, for example:
JSON objects:
{"labels": ["label1"]}
{"labels": ["label2", "label3"]}
{"labels": ["label1", "label4", "label5"]}
Resulting CSV:
labels,labels,labels
"label1",,
"label2","label3",
"label1","label4","label5"
There are many other properties in the source JSON, this is just an exсerpt for the sake of simplicity.
Also, I need to say that the process has to work with JSON as a stream because source JSON could be very large (>1GB).
I wanted to use jq with two passes, the first pass would collect the maximum length of the 'labels' array, the second pass would create CSV as the number of the resulting columns is known by this time. But jq doesn't have a concept of global variables, so I don't know where I can store the running total.
I'd like to be able to do that on Windows via CLI.
Thank you in advance.
The question shows a stream of JSON objects, so the following solutions assume that the input file is already a sequence as shown. These solutions can also easily be adapted to cover the case where the input file contains a huge array of objects, e.g. as discussed in the epilog.
A two-invocation solution
Here's a two-pass solution using two invocations of jq. The presentation assumes a bash-like environment, in case you have wsl:
n=$(jq -n 'reduce (inputs|.labels|length) as $i (-1;
if $i > . then $i else . end)' stream.json)
jq -nr --argjson n $n '
def fill($n): . + [range(length;$n)|null];
[range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv' stream.json
Assuming the input is as described, this is guaranteed to produce valid CSV. Hopefully you can adapt the above to your shell as necessary -- maybe this link will help:
Assign output of a program to a variable using a MS batch file
Using input_filename and a single invocation of jq
Unfortunately, jq does not have a "rewind" facility, but
there is an alternative: read the file twice within a single invocation of jq. This is more cumbersome than the two-invocation solution above but avoids any difficulties associated with the latter.
cat sample.json | jq -nr '
def fill($n): . + [range(length;$n)|null];
def max($x): if . < $x then $x else . end;
foreach (inputs|.labels) as $in ( {n:0};
if input_filename == "<stdin>"
then .n |= max($in|length)
else .printed+=1
end;
if .printed == null then empty
else .n as $n
| (if .printed == 1 then [range(0;$n)|"labels"] else empty end),
($in | fill($n))
end)
| #csv' - sample.json
Another single-invocation solution
The following solution uses a special value (here null) to delineate the two streams:
(cat stream.json; echo null; cat stream.json) | jq -nr '
def fill($n): . + [range(length; $n) | null];
def max($x): if . < $x then $x else . end;
(label $loop | foreach inputs as $in (0;
if $in == null then . else max($in|.labels|length) end;
if $in == null then ., break $loop else empty end)) as $n
| [range(0;$n)|"labels"],
(inputs | .labels | fill($n))
| #csv '
Epilog
A file with a top-level JSON array that is too large to fit into memory can be converted into a stream of the array's items by invoking jq with the --stream option, e.g. as follows:
jq -cn --stream 'fromstream(1|truncate_stream(inputs))'
For such a large file, you will probably want to do this in two separate invocations, one to get the count, then another to actually output the csv. If you wanted to read the whole file into memory, you could do this in one, but we definitely don't want to do that, we'll want to stream it in where possible.
Things get a little ugly when it comes to storing the result of commands to a variable, writing to a file might be simpler. But I'd rather not use temp files if we don't have to.
REM assuming in a batch file
for /f "usebackq delims=" %%i in (`jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json`) do set cols=%%i
jq -rn --stream --argjson cols "%cols%" "[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
> jq -n --stream "reduce (inputs | .[0][1] + 1) as $l (0; if $l > . then $l else . end)" input.json
For the first invocation to get the count of columns, we're just taking advantage of the fact that the paths to the array values could be used to indicate the lengths of the arrays. We'll just want to take the max across all items.
> jq -rn --stream --argjson cols "%cols%" ^
"[range($cols)|\"labels\"],(fromstream(1|truncate_stream(inputs))|[.[],(range($cols-length)|null)])|#csv" input.json
Then to output the rest, we're just taking the labels array (assuming it's the only property on the objects) and padding them out with null up to the $cols count. Then output as csv.
If the labels are in a different, deeply nested path than what's in your example here, you'll need to select based on the appropriate paths.
set labelspath=foo.bar.labels
jq -rn --stream --argjson cols "%cols%" --arg labelspath "%labelspath%" ^
"($labelspath|split(\".\")|[.,length]) as [$path,$depth] | [range($cols)|\"labels\"],(fromstream($depth|truncate_stream(inputs|select(.[0][:$depth] == $path)))|[.[],(range($cols-length)|null)])|#csv" input.json

Replace the placeholder in the String with actual value using jq

It is not working. How to check the part of the String (not the whole String)?
test.json
{
"url": "https://<part1>.test/hai/<part1>"
}
jq --arg input "$arg" \
'if .url == "<part1>"
then . + {"url" : ("https://" + $input + ".test/hai/" + $input) }
else . end' test.json > test123.json
In your test.json, the value of .url is "https://<part1>.test/hai/<part1>" so evidently you don't want to check that its value is "<part1>".
Perhaps you meant to test the condition: .url | contains("<part1>") or maybe .url | endswith("<part1>") -- the problem description is unclear.
If your jq has support for regular expressions, you could use test/1, e.g.
.url | test("//<part1>.*<part1>$")
Another possibility would be to use gsub.

How to Flatten JSON using jq and Bash into Bash Associative Array where Key=Selector?

As a follow-up to Flatten Arbitrary JSON, I'm looking to take the flattened results and make them suitable for doing queries and updates back to the original JSON file.
Motivation: I'm writing Bash (4.2+) scripts (on CentOS 7) that read JSON into a Bash associative array using the JSON selector/filter as the key. I do processing on the associative arrays, and in the end I want to update the JSON with those changes.
The preceding solution gets me close to this goal. I think there are two things that it doesn't do:
It doesn't quote keys that require quoting. For example, the key com.acme would need to be quoted because it contains a special character.
Array indexes are not represented in a form that can be used to query the original JSON.
Existing Solution
The solution from the above is:
$ jq --stream -n --arg delim '.' 'reduce (inputs|select(length==2)) as $i ({};
[$i[0][]|tostring] as $path_as_strings
| ($path_as_strings|join($delim)) as $key
| $i[1] as $value
| .[$key] = $value
)' input.json
For example, if input.json contains:
{
"a.b":
[
"value"
]
}
then the output is:
{
"a.b.0": "value"
}
What is Really Wanted
An improvement would have been:
{
"\"a.b\"[0]": "value"
}
But what I really want is output formatted so that it could be sourced directly in a Bash program (implying the array name is passed to jq as an argument):
ArrayName['"a.b"[0]']='value' # Note 'value' might need escapes for Bash
I'm looking to have the more human-readable syntax above as opposed to the more general:
ArrayName['.["a.b"][0]']='value'
I don't know if jq can handle all of this. My present solution is to take the output from the preceding solution and to post-process it to the form that I want. Here's the work in process:
#!/bin/bash
Flatten()
{
local -r OPTIONS=$(getopt -o d:m:f: -l "delimiter:,mapname:,file:" -n "${FUNCNAME[0]}" -- "$#")
eval set -- "$OPTIONS"
local Delimiter='.' MapName=map File=
while true ; do
case "$1" in
-d|--delimiter) Delimiter="$2"; shift 2;;
-m|--mapname) MapName="$2"; shift 2;;
-f|--file) File="$2"; shift 2;;
--) shift; break;;
esac
done
local -a Array=()
readarray -t Array <<<"$(
jq -c -S --stream -n --arg delim "$Delimiter" 'reduce (inputs|select(length==2)) as $i ({}; .[[$i[0][]|tostring]|join($delim)] = $i[1])' <<<"$(sed 's|^\s*[#%].*||' "$File")" |
jq -c "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" |
sed -e 's|^"||' -e 's|"$||' -e 's|=|\t|')"
if [[ ! -v $MapName ]]; then
local -gA $MapName
fi
. <(
IFS=$'\t'
while read -r Key Value; do
printf "$MapName[\"%s\"]=%q\n" "$Key" "$Value"
done <<<"$(printf "%s\n" "${Array[#]}")"
)
}
declare -A Map
Flatten -m Map -f "$1"
declare -p Map
With the output:
$ ./Flatten.sh <(echo '{"a.b":["value"]}')
declare -A Map='([a.b.0]="value" )'
1) jq is Turing complete, so it's all just a question of which hammer to use.
2)
An improvement would have been:
{
"\"a.b\"[0]": "value"
}
That is easily accomplished using a helper function along these lines:
def flattenPath(delim):
reduce .[] as $s ("";
if $s|type == "number"
then ((if . == "" then "." else . end) + "[\($s)]")
else . + ($s | tostring | if index(delim) then "\"\(.)\"" else . end)
end );
3)
I do processing on the associative arrays, and in the end I want to update the JSON with those changes.
This suggests you might have posed an xy-problem. However, if you really do want to serialize and unserialize some JSON text, then the natural way to do so using jq is using leaf_paths, as illustrated by the following serialization/deserialization functions:
# Emit (path, value) pairs
# Usage: jq -c -f serialize.jq input.json > serialized.json
def serialize: leaf_paths as $p | ($p, getpath($p));
# Usage: jq -n -f unserialize.jq serialized.json
def unserialize:
def pairwise(s):
foreach s as $i ([];
if length == 1 then . + [$i] else [$i] end;
select(length == 2));
reduce pairwise(inputs) as $p (null; setpath($p[0]; $p[1]));
If using bash, you could use readarray (mapfile) to read the paths and values into a single array, or if you want to distinguish between the paths and values more easily, you could (for example) use the approach illustrated by the following:
i=0
while read -r line ; do
path[$i]="$line"; read -r line; value[$i]="$line"
i=$((i + 1))
done < serialized.json
But there are many other alternatives.

Bash sqlite3 -line | How to convert to JSON format

I want to convert my sqlite data from my database to JSON format.
I would like to use this syntax:
sqlite3 -line members.db "SELECT * FROM members LIMIT 3" > members.txt
OUTPUT:
id = 1
fname = Leif
gname = Håkansson
genderid = 1
id = 2
fname = Yvonne
gname = Bergman
genderid = 2
id = 3
fname = Roger
gname = Sjöberg
genderid = 1
How to do this with nice and structur code in a for loop?
(Only in Bash)
I have tried some awk and grep but not with a great succes yet.
Would be nice with some tips.
I want a result similar to this:
[
{
"id":1,
"fname":"Leif",
"gname":"Hakansson",
"genderid":1
},
{
"id":2,
"fname":"Yvonne",
"gname":"Bergman",
"genderid":2
},
{
"id":3,
"fname":"Roger",
"gname":"Sjberg",
"genderid":1
}
}
If your sqlite3 is compiled with the json1 extension (or if you can obtain a version of sqlite3 with the json1 extension), then you can use it to generate JSON objects (one JSON object per row). For example:
select json_object('id', id, 'fname', fname, 'gname', gname, 'genderid', genderid) ...
You can then use a tool such as jq to convert the stream of objects into an array of objects, e.g. pipe the output of the sqlite3 to jq -s ..
(A less tiresome alternative might be to use the sqlite3 function json_array(), which produces an array, which you can reassemble into an object using jq.)
If the json1 extension is unavailable, then you could use the following as a starting point:
awk 'BEGIN { print "["; }
function out() {if (n++) {print ","}; if (line) {print "{" line "}"}; line="";}
function trim(x) { sub(/^ */, "", x); sub(/ *$/, "", x); return x; }
NF==0 { out(); next};
{if (line) {line = line ", " }
i=index($0,"=");
line = line "\"" trim(substr($0,1,i-1)) ": \"" substr($0, i+2) "\""}
END {out(); print "]"} '
Alternatively, you could use the following jq script, which converts numeric strings that occur on the RHS of "=" to numbers:
def trim: sub("^ *"; "") | sub(" *$"; "");
def keyvalue: index("=") as $i
| {(.[0:$i] | trim): (.[$i+2:] | (tonumber? // .))};
[foreach (inputs, "") as $line ({object: false, seed: {} };
if ($line|trim) == "" then { object: .seed, seed : {} }
else {object: false,
seed: (.seed + ($line | keyvalue)) }
end;
.object | if . and (. != {}) then . else empty end ) ]
Just type -json argument with SQLite 3.33.0 or higher and get json output:
$ sqlite3 -json database.db "select * from TABLE_NAME"
from SQLite Release 3.33.0 note:
...
CLI enhancements:
Added four new output modes: "box", "json", "markdown", and "table".
The "column" output mode automatically expands columns to contain the longest output row and automatically turns ".header" on if it has
not been previously set.
The "quote" output mode honors ".separator"
The decimal extension and the ieee754 extension are built-in to the CLI
...
I think I would prefer to parse sqlite output with a single line per record rather than the very wordy output format you suggested with sqlite3 -line. So, I would go with this:
sqlite3 members.db "SELECT * FROM members LIMIT 3"
which gives me this to parse:
1|Leif|Hakansson|1
2|Yvonne|Bergman|2
3|Roger|Sjoberg|1
I can now parse that with awk if I set the input separator to | with
awk -F '|'
and pick up the 4 fields on each line with the following and save them in an array like this:
{ id[++i]=$1; fname[i]=$2; gname[i]=$3; genderid[i]=$4 }
Then all I need to do is print the output format you need at the end. However, you have double quotes in your output and they are a pain to quote in awk, so I temporarily use another pipe symbol (|) as a double quote and then, at the very end, I get tr to replace all the pipe symbols with double quotes - just to make the code easier on the eye. So the total solution looks like this:
sqlite3 members.db "SELECT * FROM members LIMIT 3" | awk -F'|' '
# sqlite output line - pick up fields and store in arrays
{ id[++i]=$1; fname[i]=$2; gname[i]=$3; genderid[i]=$4 }
END {
printf "[\n";
for(j=1;j<=i;j++){
printf " {\n"
printf " |id|:%d,\n",id[j]
printf " |fname|:|%s|,\n",fname[j]
printf " |gname|:|%s|,\n",gname[j]
printf " |genderid|:%d\n",genderid[j]
closing=" },\n"
if(j==i){closing=" }\n"}
printf closing;
}
printf "]\n";
}' | tr '|' '"'
Sqlite-utils does exactly what you're looking for. By default, the output will be JSON.
Better late than never to plug jo.
Save sqlite3 to a text file.
Get jo (jo's also available in distro repos)
and use this bash script.
while read line
do
id=`echo $line | cut -d"|" -f1`
fname=`echo $line | cut -d"|" -f2`
gname=`echo $line | cut -d"|" -f3`
genderid=`echo $line | cut -d"|" -f4`
jsonline=`jo id="$id" fname="$fname" gname="$gname" genderid="$genderid"`
json="$json $jsonline"
done < "$1"
jo -a $json
Please don't create (or parse) json with awk. There are dedicated tools for this. Tools like xidel.
While first and foremost a html, xml and json parser, xidel can also parse plain text.
I'd like to offer a very elegant solution using this tool (with much less code than jq).
I'll assume your 'members.txt'.
First to create a sequence of each json object to-be:
xidel -s members.txt --xquery 'tokenize($raw,"\n\n")'
Or...
xidel -s members.txt --xquery 'tokenize($raw,"\n\n") ! (position(),.)'
1
id = 1
fname = Leif
gname = Håkansson
genderid = 1
2
id = 2
fname = Yvonne
gname = Bergman
genderid = 2
3
id = 3
fname = Roger
gname = Sjöberg
genderid = 1
...to better show you the individual items in the sequence.
Now you have 3 multi-line strings. To turn each item/string into another sequence where each item is a new line:
xidel -s members.txt --xquery 'tokenize($raw,"\n\n") ! x:lines(.)'
(x:lines(.) is a shorthand for tokenize(.,'\r\n?|\n'))
Now for each line tokenize on the " = " (which creates yet another sequence) and save it to a variable. For the first line for example this sequence is ("id","1"), for the second line ("fname","Leif"), etc.:
xidel -s members.txt --xquery 'tokenize($raw,"\n\n") ! (for $x in x:lines(.) let $a:=tokenize($x," = ") return ($a[1],$a[2]))'
Finally remove leading whitespace (normalize-space()), create a json object ({| {key-value-pair} |}) and put all json objects in an array ([ ... ]):
xidel -s members.txt --xquery '[tokenize($raw,"\n\n") ! {|for $x in x:lines(.) let $a:=tokenize($x," = ") return {normalize-space($a[1]):$a[2]}|}]'
Prettified + output:
xidel -s members.txt --xquery '
[
tokenize($raw,"\n\n") ! {|
for $x in x:lines(.)
let $a:=tokenize($x," = ")
return {
normalize-space($a[1]):$a[2]
}
|}
]
'
[
{
"id": "1",
"fname": "Leif",
"gname": "Håkansson",
"genderid": "1"
},
{
"id": "2",
"fname": "Yvonne",
"gname": "Bergman",
"genderid": "2"
},
{
"id": "3",
"fname": "Roger",
"gname": "Sjöberg",
"genderid": "1"
}
]
Note: For xidel-0.9.9.7173 and newer --json-mode=deprecated is needed to create a json array with [ ]. The new (XQuery 3.1) way to create a json array is to use array{ }.