Bash sqlite3 -line | How to convert to JSON format - json

I want to convert my sqlite data from my database to JSON format.
I would like to use this syntax:
sqlite3 -line members.db "SELECT * FROM members LIMIT 3" > members.txt
OUTPUT:
id = 1
fname = Leif
gname = Håkansson
genderid = 1
id = 2
fname = Yvonne
gname = Bergman
genderid = 2
id = 3
fname = Roger
gname = Sjöberg
genderid = 1
How to do this with nice and structur code in a for loop?
(Only in Bash)
I have tried some awk and grep but not with a great succes yet.
Would be nice with some tips.
I want a result similar to this:
[
{
"id":1,
"fname":"Leif",
"gname":"Hakansson",
"genderid":1
},
{
"id":2,
"fname":"Yvonne",
"gname":"Bergman",
"genderid":2
},
{
"id":3,
"fname":"Roger",
"gname":"Sjberg",
"genderid":1
}
}

If your sqlite3 is compiled with the json1 extension (or if you can obtain a version of sqlite3 with the json1 extension), then you can use it to generate JSON objects (one JSON object per row). For example:
select json_object('id', id, 'fname', fname, 'gname', gname, 'genderid', genderid) ...
You can then use a tool such as jq to convert the stream of objects into an array of objects, e.g. pipe the output of the sqlite3 to jq -s ..
(A less tiresome alternative might be to use the sqlite3 function json_array(), which produces an array, which you can reassemble into an object using jq.)
If the json1 extension is unavailable, then you could use the following as a starting point:
awk 'BEGIN { print "["; }
function out() {if (n++) {print ","}; if (line) {print "{" line "}"}; line="";}
function trim(x) { sub(/^ */, "", x); sub(/ *$/, "", x); return x; }
NF==0 { out(); next};
{if (line) {line = line ", " }
i=index($0,"=");
line = line "\"" trim(substr($0,1,i-1)) ": \"" substr($0, i+2) "\""}
END {out(); print "]"} '
Alternatively, you could use the following jq script, which converts numeric strings that occur on the RHS of "=" to numbers:
def trim: sub("^ *"; "") | sub(" *$"; "");
def keyvalue: index("=") as $i
| {(.[0:$i] | trim): (.[$i+2:] | (tonumber? // .))};
[foreach (inputs, "") as $line ({object: false, seed: {} };
if ($line|trim) == "" then { object: .seed, seed : {} }
else {object: false,
seed: (.seed + ($line | keyvalue)) }
end;
.object | if . and (. != {}) then . else empty end ) ]

Just type -json argument with SQLite 3.33.0 or higher and get json output:
$ sqlite3 -json database.db "select * from TABLE_NAME"
from SQLite Release 3.33.0 note:
...
CLI enhancements:
Added four new output modes: "box", "json", "markdown", and "table".
The "column" output mode automatically expands columns to contain the longest output row and automatically turns ".header" on if it has
not been previously set.
The "quote" output mode honors ".separator"
The decimal extension and the ieee754 extension are built-in to the CLI
...

I think I would prefer to parse sqlite output with a single line per record rather than the very wordy output format you suggested with sqlite3 -line. So, I would go with this:
sqlite3 members.db "SELECT * FROM members LIMIT 3"
which gives me this to parse:
1|Leif|Hakansson|1
2|Yvonne|Bergman|2
3|Roger|Sjoberg|1
I can now parse that with awk if I set the input separator to | with
awk -F '|'
and pick up the 4 fields on each line with the following and save them in an array like this:
{ id[++i]=$1; fname[i]=$2; gname[i]=$3; genderid[i]=$4 }
Then all I need to do is print the output format you need at the end. However, you have double quotes in your output and they are a pain to quote in awk, so I temporarily use another pipe symbol (|) as a double quote and then, at the very end, I get tr to replace all the pipe symbols with double quotes - just to make the code easier on the eye. So the total solution looks like this:
sqlite3 members.db "SELECT * FROM members LIMIT 3" | awk -F'|' '
# sqlite output line - pick up fields and store in arrays
{ id[++i]=$1; fname[i]=$2; gname[i]=$3; genderid[i]=$4 }
END {
printf "[\n";
for(j=1;j<=i;j++){
printf " {\n"
printf " |id|:%d,\n",id[j]
printf " |fname|:|%s|,\n",fname[j]
printf " |gname|:|%s|,\n",gname[j]
printf " |genderid|:%d\n",genderid[j]
closing=" },\n"
if(j==i){closing=" }\n"}
printf closing;
}
printf "]\n";
}' | tr '|' '"'

Sqlite-utils does exactly what you're looking for. By default, the output will be JSON.

Better late than never to plug jo.
Save sqlite3 to a text file.
Get jo (jo's also available in distro repos)
and use this bash script.
while read line
do
id=`echo $line | cut -d"|" -f1`
fname=`echo $line | cut -d"|" -f2`
gname=`echo $line | cut -d"|" -f3`
genderid=`echo $line | cut -d"|" -f4`
jsonline=`jo id="$id" fname="$fname" gname="$gname" genderid="$genderid"`
json="$json $jsonline"
done < "$1"
jo -a $json

Please don't create (or parse) json with awk. There are dedicated tools for this. Tools like xidel.
While first and foremost a html, xml and json parser, xidel can also parse plain text.
I'd like to offer a very elegant solution using this tool (with much less code than jq).
I'll assume your 'members.txt'.
First to create a sequence of each json object to-be:
xidel -s members.txt --xquery 'tokenize($raw,"\n\n")'
Or...
xidel -s members.txt --xquery 'tokenize($raw,"\n\n") ! (position(),.)'
1
id = 1
fname = Leif
gname = Håkansson
genderid = 1
2
id = 2
fname = Yvonne
gname = Bergman
genderid = 2
3
id = 3
fname = Roger
gname = Sjöberg
genderid = 1
...to better show you the individual items in the sequence.
Now you have 3 multi-line strings. To turn each item/string into another sequence where each item is a new line:
xidel -s members.txt --xquery 'tokenize($raw,"\n\n") ! x:lines(.)'
(x:lines(.) is a shorthand for tokenize(.,'\r\n?|\n'))
Now for each line tokenize on the " = " (which creates yet another sequence) and save it to a variable. For the first line for example this sequence is ("id","1"), for the second line ("fname","Leif"), etc.:
xidel -s members.txt --xquery 'tokenize($raw,"\n\n") ! (for $x in x:lines(.) let $a:=tokenize($x," = ") return ($a[1],$a[2]))'
Finally remove leading whitespace (normalize-space()), create a json object ({| {key-value-pair} |}) and put all json objects in an array ([ ... ]):
xidel -s members.txt --xquery '[tokenize($raw,"\n\n") ! {|for $x in x:lines(.) let $a:=tokenize($x," = ") return {normalize-space($a[1]):$a[2]}|}]'
Prettified + output:
xidel -s members.txt --xquery '
[
tokenize($raw,"\n\n") ! {|
for $x in x:lines(.)
let $a:=tokenize($x," = ")
return {
normalize-space($a[1]):$a[2]
}
|}
]
'
[
{
"id": "1",
"fname": "Leif",
"gname": "Håkansson",
"genderid": "1"
},
{
"id": "2",
"fname": "Yvonne",
"gname": "Bergman",
"genderid": "2"
},
{
"id": "3",
"fname": "Roger",
"gname": "Sjöberg",
"genderid": "1"
}
]
Note: For xidel-0.9.9.7173 and newer --json-mode=deprecated is needed to create a json array with [ ]. The new (XQuery 3.1) way to create a json array is to use array{ }.

Related

Looping over a list of keys to extract from a JSON file with jq

I'm trying to extract a series of properties (named in an input file) in jq and getting error when I feed those from bash via a loop:
while read line; do echo $line; cat big.json | jq ".$line"; sleep 1; done < big.properties.service
cfg.keyload.service.count
jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Unix shell quoting issues?) at <top-level>, line 1:
When i try to do it manually it works
$ line=cfg.keyload.service.count
$ echo $line
cfg.keyload.service.count
$ cat big.json | jq ".$line"
1
Is there any way to get it work in loop?
Here is example
cat >big.json <<EOF
{
"cfg": {
"keyload": {
"backend": {
"app": {
"shutdown": {
"timeout": "5s"
},
"jmx": {
"enable": true
}
}
}
}
}
}
EOF
cat >big.properties.service <<EOF
cfg.keyload.backend.app.shutdown.timeout
cfg.keyload.backend.app.jmx.enable
cfg.keyload.backend.app.jmx.nonexistent
cfg.nonexistent
EOF
...output should be:
cfg.keyload.backend.app.shutdown.timeout
"5s"
cfg.keyload.backend.app.jmx.enable
true
cfg.keyload.backend.app.jmx.nonexistent
null
cfg.nonexistent
null
Immediate Issue - Invalid Input
The "invalid character" at hand here is almost certainly a carriage return. Use dos2unix to convert your input file to a proper UNIX text file, and your original code will work (albeit very inefficiently, rereading your whole big.json every time it wants to extract a single property).
Performant Implementation - Loop In JQ, Not Bash
Don't use a bash loop for this at all -- it's much more efficient to have jq do the looping.
Note the sub("\r$"; "") used in this code to remove trailing carriage returns so it can accept input in DOS format.
jq -rR --argfile infile big.json '
sub("\r$"; "") as $keyname
| ($keyname | split(".")) as $pieces
| (reduce $pieces[] as $piece ($infile; .[$piece]?)) as $value
| ($keyname, ($value | tojson))
' <big.properties.service
properly emits as output, when given the inputs in the question:
cfg.keyload.backend.app.shutdown.timeout
"5s"
cfg.keyload.backend.app.jmx.enable
true
cfg.keyload.backend.app.jmx.nonexistent
null
cfg.nonexistent
null
Your properties file is effectively paths in the json that you want to retrieve values from. Convert them to paths that jq recognizes so you can get those values. Just make an array of keys that would need to be traversed. Be sure to read your properties file as raw input (-R) since it's not json, and use raw output (-r) to be able to output the paths as you want.
$ jq --argfile big big.json '
., (split(".") as $p | $big | getpath($p) | tojson)
' -Rr big.properties.service
cfg.keyload.backend.app.shutdown.timeout
"5s"
cfg.keyload.backend.app.jmx.enable
true
cfg.keyload.backend.app.jmx.nonexistent
null
cfg.nonexistent
null

Using jq, Flatten Arbitrary JSON to Delimiter-Separated Flat Dictionary

I'm looking to transform JSON using jq to a delimiter-separated and flattened structure.
There have been attempts at this. For example, Flatten nested JSON using jq.
However the solutions on that page fail if the JSON contains arrays. For example, if the JSON is:
{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}
The solution above will fail to transform the above to:
{"a.b.0":1,"x.0.y":2,"x.1.z":3}
In addition, I'm looking for a solution that will also allow for an arbitrary delimiter. For example, suppose the space character is the delimiter. In this case, the result would be:
{"a b 0":1,"x 0 y":2,"x 1 z":3}
I'm looking to have this functionality accessed via a Bash (4.2+) function as is found in CentOS 7, something like this:
flatten_json()
{
local JSONData="$1"
# jq command to flatten $JSONData, putting the result to stdout
jq ... <<<"$JSONData"
}
The solution should work with all JSON data types, including null and boolean. For example, consider the following input:
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
It should produce:
{"a b 0":"p q r","w 0 x":null,"w 1 y":false,"w 2 z":3}
If you stream the data in, you'll get pairings of paths and values of all leaf values. If not a pair, then a path marking the end of a definition of an object/array at that path. Using leaf_paths as you found would only give you paths to truthy leaf values so you'll miss out on null or even false values. As a stream, you won't get this problem.
There are many ways this could be combined to an object, I'm partial to using reduce and assignment in these situations.
$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
.[[$i[0][]|tostring]|join($delim)] = $i[1]
)' input.json
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
Here's the same solution broken up a bit to allow room for explanation of what's going on.
$ jq --arg delim '.' 'reduce (tostream|select(length==2)) as $i ({};
[$i[0][]|tostring] as $path_as_strings
| ($path_as_strings|join($delim)) as $key
| $i[1] as $value
| .[$key] = $value
)' input.json
Converting the input to a stream with tostream, we'll receive multiple values of pairs/paths as input to our filter. With this, we can pass those multiple values into reduce which is designed to accept multiple values and do something with them. But before we do, we want to filter those pairs/paths by only the pairs (select(length==2)).
Then in the reduce call, we're starting with a clean object and assigning new values using a key derived from the path and the corresponding value. Remember that every value produced in the reduce call is used for the next value in the iteration. Binding values to variables doesn't change the current context and assignments effectively "modify" the current value (the initial object) and passes it along.
$path_as_strings is just the path which is an array of strings and numbers to just strings. [$i[0][]|tostring] is a shorthand I use as an alternative to using map when the array I want to map is not the current array. This is more compact since the mapping is done as a single expression. That instead of having to do this to get the same result: ($i[0]|map(tostring)). The outer parentheses might not be necessary in general but, it's still two separate filter expressions vs one (and more text).
Then from there we convert that array of strings to the desired key using the provided delimiter. Then assign the appropriate values to the current object.
The following has been tested with jq 1.4, jq 1.5 and the current "master" version. The requirement about including paths to null and false is the reason for "allpaths" and "all_leaf_paths".
# all paths, including paths to null
def allpaths:
def conditional_recurse(f): def r: ., (select(.!=null) | f | r); r;
path(conditional_recurse(.[]?)) | select(length > 0);
def all_leaf_paths:
def isscalar: type | (. != "object" and . != "array");
allpaths as $p
| select(getpath($p)|isscalar)
| $p ;
. as $in
| reduce all_leaf_paths as $path ({};
. + { ($path | map(tostring) | join($delim)): $in | getpath($path) })
With this jq program in flatten.jq:
$ cat input.json
{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}
$ jq --arg delim . -f flatten.jq input.json
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
Collisions
Here is a helper function that illustrates an alternative path-flattening algorithm. It converts keys that contain the delimiter to quoted strings, and array elements are presented in square brackets (see the example below):
def flattenPath(delim):
reduce .[] as $s ("";
if $s|type == "number"
then ((if . == "" then "." else . end) + "[\($s)]")
else . + ($s | tostring | if index(delim) then "\"\(.)\"" else . end)
end );
Example: Using flattenPath instead of map(tostring) | join($delim), the object:
{"a.b": [1]}
would become:
{
"\"a.b\"[0]": 1
}
To add a new option to the solutions already given, jqg is a script I wrote to flatten any JSON file and then search it using a regex. For your purposes your regex would simply be '.' which would match everything.
$ echo '{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}' | jqg .
{
"a.b.0": 1,
"x.0.y": 2,
"x.1.z": 3
}
and can produce compact output:
$ echo '{"a":{"b":[1]},"x":[{"y":2},{"z":3}]}' | jqg -q -c .
{"a.b.0":1,"x.0.y":2,"x.1.z":3}
It also handles the more complicated example that #peak used:
$ echo '{"a":{"b":["p q r"]},"w":[{"x":null},{"y":false},{"z":3}]}' | jqg .
{
"a.b.0": "p q r",
"w.0.x": null,
"w.1.y": false,
"w.2.z": 3
}
as well as empty arrays and objects (and a few other edge-case values):
$ jqg . test/odd-values.json
{
"one.start-string": "foo",
"one.null-value": null,
"one.integer-number": 101,
"two.two-a.non-integer-number": 101.75,
"two.two-a.number-zero": 0,
"two.true-boolean": true,
"two.two-b.false-boolean": false,
"three.empty-string": "",
"three.empty-object": {},
"three.empty-array": [],
"end-string": "bar"
}
(reporting empty arrays & objects can be turned off with the -E option).
jqg was tested with jq 1.6
Note : I am the author of the jqg script.

Create JSON using jq from pipe-separated keys and values in bash

I am trying to create a json object from a string in bash. The string is as follows.
CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0
The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.
{
"CONTAINER":"nginx_container",
"CPU%":"0.02%",
....
}
I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.
Prerequisite
For all of the below, it's assumed that your content is in a shell variable named s:
s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'
What (modern jq)
# thanks to #JeffMercado and #chepner for refinements, see comments
jq -Rn '
( input | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"
How (modern jq)
This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:
Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
With from_entries, we're combining those pairs into objects containing those keys and values.
What (shell-assisted)
This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:
{
IFS='|' read -r -a keys # read first line into an array of strings
## read each subsequent line into an array named "values"
while IFS='|' read -r -a values; do
# setup: positional arguments to pass in literal variables, query with code
jq_args=( )
jq_query='.'
# copy values into the arguments, reference them from the generated code
for idx in "${!values[#]}"; do
[[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
jq_args+=( --arg "key$idx" "${keys[$idx]}" )
jq_args+=( --arg "value$idx" "${values[$idx]}" )
jq_query+=" | .[\$key${idx}]=\$value${idx}"
done
# run the generated command
jq "${jq_args[#]}" "$jq_query" <<<'{}'
done
} <<<"$s"
How (shell-assisted)
The invoked jq command from the above is similar to:
jq --arg key0 'CONTAINER' \
--arg value0 'nginx_container' \
--arg key1 'CPU%' \
--arg value1 '0.0.2%' \
--arg key2 'MEMUSAGE/LIMIT' \
--arg value2 '25.09MiB/15.26GiB' \
'. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
<<<'{}'
...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.
Result
Either of the above will emit:
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Why
In short: Because it's guaranteed to generate valid JSON as output.
Consider the following as an example that would break more naive approaches:
s='key ending in a backslash\
value "with quotes"'
Sure, these are unexpected scenarios, but jq knows how to deal with them:
{
"key ending in a backslash\\": "value \"with quotes\""
}
...whereas an implementation that didn't understand JSON strings could easily end up emitting:
{
"key ending in a backslash\": "value "with quotes""
}
I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo
A quick and easy example:
$ jo my_variable="simple"
{"my_variable":"simple"}
A little more complex
$ jo -p name=jo n=17 parser=false
{
"name": "jo",
"n": 17,
"parser": false
}
Add an array
$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
"name": "jo",
"n": 17,
"parser": false,
"my_array": [
1,
2,
3,
4,
5
]
}
I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.
You can ask docker to give you JSON data in the first place
docker stats --format "{{json .}}"
For more on this, see: https://docs.docker.com/config/formatting/
JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0
cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
if [[ "$LOOPNUM" = 0 ]]; then
JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
LOOPNUM=$(( LOOPNUM+1 ))
else
echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
fi
done
Returns:
{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }
Here is a solution which uses the -R and -s options along with transpose:
split("\n") # [ "CONTAINER...", "nginx_container|0.02%...", ...]
| (.[0] | split("|")) as $keys # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
| (.[1:][] | split("|")) # [ "nginx_container", "0.02%", ... ] [ ... ] ...
| select(length > 0) # (remove empty [] caused by trailing newline)
| [$keys, .] # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
| [ transpose[] | {(.[0]):.[1]} ] # [ {"CONTAINER": "nginx_container"}, ... ] ...
| add # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
Not using jq but possible to use args and environment in values.
CONTAINER=nginx_container
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.
echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
| sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
| jq '.[] | with_entries(select(.key|test("^a.*")|not))'
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Without jq, sqawk gives a bit too much:
[
{
"anr": "1",
"anf": "7",
"a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0",
"a8": "",
"a9": "",
"a10": ""
}
]

Perl insert new line between every }{ match

I have a text file which contains a large amount of JSON objects and It hasn't been created with new lines or any separator between to the objects.
Currently I am using:
perl -e '$/ = "}{"; print "$_\n" while <>' file.txt > out.txt
But this causes malformed data as when the file gets split on new line the JSON objects will be missing the opening { as the new line gets placed after the {character.
Is there a way to insert the new line replacement between the }{ match such as }\n{.
The file is quite large so I cant manually do it.
Doesn't have to be in Perl, can be in something more suited to the task.
Don't just print. Substitute the newline in between the } and {. The while needs a block now because the last s/// fails, so doing s/// && print while <> doesn't work.
$ cat json.json
{"foo":"bar"}{"bar":"baz"}{"bo":"shizzle"}
$ perl -e '$/ = "}{"; while (<>) { s/\}\{$/}\n{/; print; }' json.json
{"foo":"bar"}
{"bar":"baz"}
{"bo":"shizzle"}
$ cat in.json
{"a":"b","c":"d"}{"e":"f","g":"h"}
$ perl -MJSON::XS -0777ne'
my $parser = JSON::XS->new->utf8;
$parser->incr_parse($_);
while ( my $obj = $parser->incr_parse() ) {
print( $parser->encode($obj), "\n" );
}
' in.json
{"c":"d","a":"b"}
{"e":"f","g":"h"}
$ echo '{"a", "b", "c"}{42, "omg", "nyan"}{"no", "please", "stop"}' | perl -e '$/ = "}"; $\ = "}\n"; chomp and print while <>'
{"a", "b", "c"}
{42, "omg", "nyan"}
{"no", "please", "stop"}
You can do it manually with the search and replace tool.
Example:
Search for : "}{"
Replace with "}^p{" ( in Word ^p is new line)

Changing values in a JSON data file from shell

I have created a JSON file which in this case contains:
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"active"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"active"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"},
On other side I have a variable with values for example:
arr="10.1.1.2 10.1.1.3"
which comes from a subsequent check of the server status for example. For those values I want to change the status field to "inactive". In other words to grep the host and change its "status" value.
Expected output:
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"inactive"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"inactive"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"},
$ arr="10.1.1.2 10.1.1.3"
$ awk -v arr="$arr" -F, 'BEGIN { gsub(/\./,"\\.",arr); gsub(/ /,"|",arr) }
$1 ~ "\"(" arr ")\"" { sub(/active/,"in&") } 1' file
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"inactive"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"inactive"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"},
Here is a quick perl "wrap-around one-liner": that uses the JSON module and slurps with the -0 switch:
perl -MJSON -n0E '$j = decode_json($_);
for (#{$j->{hosts}}){$_->{status}=inactive if $_->{ipaddr}=~/2|3/} ;
say to_json( $j->{hosts}, {pretty=>1} )' status_data.json
might be nicer or might violate PBP recommendations for map:
perl -MJSON -n0E '$j = decode_json($_);
map { $_->{status}=inactive if $_->{ipaddr}=~/2|3/ } #{ $j->{hosts} } ;
say to_json( $j->{hosts} )' status_data.json
A shell script that resets status using jq would also be possible. Here's a quick way to parse and output changes to JSON using jq:
cat status_data.json| jq -r '.hosts |.[] |
select(.ipaddr == "10.1.1.2"//.ipaddr == "10.1.1.3" )' |jq '.status = "inactive"'
EDIT In an earlier comment I was uncertain whether the OP was more interested in an application than a quick search and replace (something about the phrases "On other side..." and "check on the server status"). Here is a (still simple) perl approach in script form:
use v5.16; #strict, warnings, say
use JSON ;
use IO::All;
my $status_data < io 'status_data.json';
my $network = JSON->new->utf8->decode($status_data) ;
my #changed_hosts= qw/10.1.1.2 10.1.1.3/;
sub status_report {
foreach my $host ( #{ $network->{hosts} }) {
say "$host->{hostname} is $host->{status}";
}
}
sub change_status {
foreach my $host ( #{ $network->{hosts} }){
foreach (#changed_hosts) {
$host->{status} = "inactive" if $host->{ipaddr} eq $_ ;
}
}
status_report;
}
defined $ENV{CHANGE_HAPPENED} ? change_status : status_report ;
The script reads the JSON file status_data.json (using IO::All which is great fun) then decodes it with JSON into a hash. It is hard to tell if this us a complete a solution because if you are "monitoring" host status then we should check the JSON data file periodically and compare it to our hash and then run the main body of the script one when changes have occurred.
To simulate changes occurring you can define/undefine CHANGE_HAPPENED in your environment with export CHANGE_HAPPENED=1 (or setenv if in in tcsh) and unset CHANGE_HAPPENED and the script will then either update the messages and the hash or "report". For this to be complete the data in our hash should be updated to match the the data file either periodically or when an event occurs. The status_report() subroutine could be changed so that it builds arrays of #inactive_hosts and #active_hosts when update_status() told it to do so: if ( something_happened() ) { update_status() }, etc.
Hope that helps.
status_data.json
{
"hosts":[
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"active"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"active"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"}
]
}
output:
~/ % perl network_status_json.pl
host2 is active
host3 is active
host4 is active
~/ % export CHANGE_HAPPENED=1
~/ % perl network_status_json.pl
host2 is inactive
host3 is inactive
host4 is active
Version 1:
Using a simple regex based transformation. This can be done in several ways. From the initial question, the list of ipaddr is in variable in arr. Example using a Bash env variable:
$ export var="... ..."
It would be a possible solution to provide this information by command line parameters.
#!/usr/bin/perl
my %inact; # ipaddr to inactivate
my $arr=$ENV{arr} ; # from external var (export arr=...)
## $arr=shift; # from command line arg
for( split(/\s+/, $arr)){ $inact{$_}=1 }
while(<>){ # one "json" line at the time
if(/"ipaddr":"(.*?)"/ and $inact{$1}){
s/"active"/"inactive"/}
print $_;
}
Version 2:
Using Json parser we can do more complex transformations; as the input is not real JSON we will process one line of "almost json" at the time:
use JSON;
use strict;
my ($line, %inact);
my $arr=$ENV{arr} ;
for( split(/\s+/, $arr)){ $inact{$_}=1 }
while(<>){ # one "json" line at the time
if(/^\{.*\},/){
s/,\n//;
$line = from_json( $_);
if($inact{$line->{ipaddr}}){
$line->{status} = "inactive" ;}
print to_json($line), ",\n"; }
else { print $_;}
}
#!/bin/ksh
# your "array" of IP
arr="10.1.1.2 10.1.1.3"
# create and prepare temporary file for sed action
SedAction=/tmp/Action.sed
# --- for/do generating SedAction --------
echo "#sed action" > ${SedAction}
#take each IP from the arr variable one by one
for IP in ${arr}
do
# prepare for a psearch pattern use
IP_RE="$( echo "${IP}" | sed 's/\./\\./g' )"
# generate sed action in temporary file.
# final action will be like:
# s/\("ipaddr":"10\.1\.1\.2".*\)"active"}/\1"inactive"}/;t
# escape(double) \ for in_file espace, escape(simple) " for this line interpretation
echo "s/\\\(\"ipaddr\":\"${IP_RE}\".*\\\)\"active\"}/\\\1\"inactive\"}/;t" >> ${SedAction}
done
# --- sed generating sed action ---------------
echo "${arr}" \
| tr " " "\n" \
| sed 's/\./\\./g
s#.*#s/\\("ipaddr":"&".*\\)"active"}/\\1"inactive"}/;t#
' \
> ${SedAction}
# core of the process (use -i for inline editing or "double" redirection for non GNU sed)
sed -f ${SedAction} YourFile
# clean temporary file
rm ${SedAction}
Self commented, tested in ksh/AIX.
2 way to generate the SedAction depending of action you want to do also (if any). You only need one to work, i prefer the second
This is very simple indeed in Perl, using the JSON module.
use strict;
use warnings;
use JSON qw/ from_json to_json /;
my $json = JSON->new;
my $data = from_json(do { local $/; <DATA> });
my $arr = "10.1.1.2 10.1.1.3";
my %arr = map { $_ => 1 } split ' ', $arr;
for my $item (#$data) {
$item->{status} = 'inactive' if $arr{$item->{ipaddr}};
}
print to_json($data, { pretty => 1 }), "\n";
__DATA__
[
{"ipaddr":"10.1.1.2","hostname":"host2","role":"http","status":"active"},
{"ipaddr":"10.1.1.3","hostname":"host3","role":"sql","status":"active"},
{"ipaddr":"10.1.1.4","hostname":"host4","role":"quad","status":"active"}
]
output
[
{
"role" : "http",
"hostname" : "host2",
"status" : "inactive",
"ipaddr" : "10.1.1.2"
},
{
"hostname" : "host3",
"role" : "sql",
"ipaddr" : "10.1.1.3",
"status" : "inactive"
},
{
"ipaddr" : "10.1.1.4",
"status" : "active",
"hostname" : "host4",
"role" : "quad"
}
]