Process huge GEOJson file with jq - json

Given a GEOJson file as follows:-
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"FEATCODE": 15014
},
"geometry": {
"type": "Polygon",
"coordinates": [
.....
I want to end up with the following:-
{
"type": "FeatureCollection",
"features": [
{
"tippecanoe" : {"minzoom" : 13},
"type": "Feature",
"properties": {
"FEATCODE": 15014
},
"geometry": {
"type": "Polygon",
"coordinates": [
.....
ie. I have added the tippecanoe object to each feature in the array features
I can make this work with:-
jq '.features[].tippecanoe.minzoom = 13' <GEOJSON FILE> > <OUTPUT FILE>
Which is fine for small files. But processing a large file of 414Mb seems to take forever with the processor maxing out and nothing being written to the OUTPUT FILE
Reading further into jq it appears that the --stream command line parameter may help but I am completely confused as to how to use this for my purposes.
I would be grateful for an example command line that serves my purposes along with an explanation as to what --stream is doing.

A one-pass jq-only approach may require more RAM than is available. If that is the case, then a simple all-jq approach is shown below, together with a more economical approach based on using jq along with awk.
The two approaches are the same except for the reconstitution of the stream of objects into a single JSON document. This step can be accomplished very economically using awk.
In both cases, the large JSON input file with objects of the required form is assumed to be named input.json.
jq-only
jq -c '.features[]' input.json |
jq -c '.tippecanoe.minzoom = 13' |
jq -c -s '{type: "FeatureCollection", features: .}'
jq and awk
jq -c '.features[]' input.json |
jq -c '.tippecanoe.minzoom = 13' | awk '
BEGIN {print "{\"type\": \"FeatureCollection\", \"features\": ["; }
NR==1 { print; next }
{print ","; print}
END {print "] }";}'
Performance comparison
For comparison, an input file with 10,000,000 objects in .features[] was used. Its size is about 1GB.
u+s:
jq-only: 15m 15s
jq-awk: 7m 40s
jq one-pass using map: 6m 53s

An alternative solution could be for example:
jq '.features |= map_values(.tippecanoe.minzoom = 13)'
To test this, I created a sample JSON as
d = {'features': [{"type":"Feature", "properties":{"FEATCODE": 15014}} for i in range(0,N)]}
and inspected the execution time as a function of N. Interestingly, while the map_values approach seems to have linear complexity in N, .features[].tippecanoe.minzoom = 13 exhibits quadratic behavior (already for N=50000, the former method finishes in about 0.8 seconds, while the latter needs around 47 seconds)
Alternatively, one might just do it manually with, e.g., Python:
import json
import sys
data = {}
with open(sys.argv[1], 'r') as F:
data = json.load(F)
extra_item = {"minzoom" : 13}
for feature in data['features']:
feature["tippecanoe"] = extra_item
with open(sys.argv[2], 'w') as F:
F.write(json.dumps(data))

In this case, map rather than map_values is far faster (*):
.features |= map(.tippecanoe.minzoom = 13)
However, using this approach will still require enough RAM.
p.s. If you want to use jq to generate a large file for timing, consider:
def N: 1000000;
def data:
{"features": [range(0;N) | {"type":"Feature", "properties": {"FEATCODE": 15014}}] };
(*) Using map, 20s for 100MB, and approximately linear.

Here, based on the work of #nicowilliams at GitHub, is a solution that uses the streaming parser available with jq. The solution is very economical with memory, but is currently quite slow if the input is large.
The solution has two parts: a function for injecting the update into the stream produced using the --stream command-line option; and a function for converting the stream back to JSON in the original form.
Invocation:
jq -cnr --stream -f program.jq input.json
program.jq
# inject the given object into the stream produced from "inputs" with the --stream option
def inject(object):
[object|tostream] as $object
| 2
| truncate_stream(inputs)
| if (.[0]|length == 1) and length == 1
then $object[]
else .
end ;
# Input: the object to be added
# Output: text
def output:
. as $object
| ( "[",
foreach fromstream( inject($object) ) as $o
(0;
if .==0 then 1 else 2 end;
if .==1 then $o else ",", $o end),
"]" ) ;
{}
| .tippecanoe.minzoom = 13
| output
Generation of test data
def data(N):
{"features":
[range(0;2) | {"type":"Feature", "properties": {"FEATCODE": 15014}}] };
Example output
With N=2:
[
{"type":"Feature","properties":{"FEATCODE":15014},"tippecanoe":{"minzoom":13}}
,
{"type":"Feature","properties":{"FEATCODE":15014},"tippecanoe":{"minzoom":13}}
]

Related

jq: filter result by value (contains) is very slow

I am trying to use jq to filter a large number of JSON files and extract the ids of each object who belong to a specific domain, as well as the full URL within that domain. Here's a sample of the data:
{
"items": [
{
"completeness": 5,
"dcLanguageLangAware": {
"def": [
"de"
]
},
"edmIsShownBy": [
"https://gallica.example/image/2IC6BQAEGWUEG4OP7AYBDGIGYAX62KZ6H366KXP2IKVAF4LKY37Q/presentation_images/5591be60-01fc-11e6-8e10-fa163e091926/node-3/image/SBB/Berliner_Börsenzeitung/1920/02/27/F_065_098_0/F_SBB_00007_19200227_065_098_0_001/full/full/0/default.jpg"
],
"id": "/9200355/BibliographicResource_3000117730632",
"type": "TEXT",
"ugc": [
false
]
}
]
}
Bigger sample here: https://www.dropbox.com/s/0s0zjtxe01mecjc/AoQhRn%2B56KDm5AJJPwEvOTIwMDUyMC9hcmtfXzEyMTQ4X2JwdDZrMTAyNzY2Nw%3D%3D.json?dl=0
I can extract both ids and URL which contains the string "gallica" using the following command:
jq '[ .items[] | select(.edmIsShownBy[] | contains ("gallica")) | {id: .id, link: .edmIsShownBy[] }]'
However, i have more than 28000 JSON files to process and it is taking a large amount of time (around 1 file per minute). I am processing the files using bash with the command:
find . -name "*.json" -exec cat '{}' ';' | jq '[ .items[] | select(.edmIsShownBy[] | contains ("gallica")) | {id: .id, link: .edmIsShownBy[] }]'
I was wondering if the slowness is due by the instruction given to jq, and if it is the case, is there a faster way to filter a string contained in a chosen value? Any ideas?
It would probably be wise not to attempt to cat all the files at once; indeed, it would probably be best to avoid cat altogether.
For example, assuming program.jq contains whichever jq program you decide on (and there is nothing wrong with using contains here), you could try:
find . -name "*.json" -exec jq -f program.jq '{}' +
Using the non-standard + instead of ';' minimizes the number of times jq must be called, though the overhead of invoking jq is actually quite small. If your find does not support + and you wish to avoid calling jq once per file, then consider using xargs, or GNU parallel with the —-xargs option.
If you know the JSON files of interest are in the pwd, you could also speed up find by specifying -maxdepth 1.

Replacing a value in a JSON structure with randomly-chosen strings from a list using jq

There are lots of similar questions but none for dynamically joining 2 files.
What I'm trying to do is to dynamically edit the following structure:
{
"features": [
{
"type": "Feature",
"properties": {
"name": "0",
"height": 0.7
}
},
{
"type": "Feature",
"properties": {
"name": "1",
"height": 0
}
}
]
}
I want to replace only the one field .features[].properties.name with a random value from a 1d-array inside another txt file. There are 8,000 features and around 100 names I've prepared.
This is what I've got now failing with errors:
#!/bin/bash
declare -a names=("name1" "name2" "name3")
jq '{
"features" : [
"type" : "Feature",
"properties" : {
"name" : `$names[seq 0 100]`,
"height" : .features[].properties.height
},
.features[].geometry
]
}' < areas.json
Is it even possible to do in a single command or I should use python or js for such tasks?
Your document (https://echarts.baidu.com/examples/data-gl/asset/data/buildings.json) is actually small enough that we don't need to do any crazy memory-conservation tricks to make it work; the following functions as-is:
# create sample data
[[ -e words.txt ]] || printf '%s\n' 'First Word' 'Second Word' 'Third Word' >words.txt
# actually run the replacements
jq -n --slurpfile buildings buildings.json '
# define a jq function that changes the current property name with the next input
def replaceName: (.properties.name |= input);
# now, for each document in buildings.json, replace each name it contains
$buildings[] | (.features |= map(replaceName))
' < <(shuf -r words.txt | jq -R .)
This works because shuf -r words.txt creates an unending stream of words randomly chosen from words.txt, and the jq -R . inside the process substitution quotes those as strings. (Because we only call input once per item in buildings.json, we don't try to keep running after that file's contents have been completely consumed).
For the tiny two-record document given in the question, the output looks like:
{
"features": [
{
"type": "Feature",
"properties": {
"name": "Third Word",
"height": 0.7
}
},
{
"type": "Feature",
"properties": {
"name": "Second Word",
"height": 0
}
}
]
}
...with the actual words varying each run; it's similarly been smoketested with the full externally-hosted file.
Here's a solution to the problem of choosing the names randomly with replacement, using the very simple PRNG written in jq
copied from https://rosettacode.org/wiki/Random_numbers#jq
Invocation:
jq --argjson names '["name1","name2","name3","name4"]' \
-f areas.jq areas.json
areas.jq
# The random numbers are in [0 -- 32767] inclusive.
# Input: an array of length at least 2 interpreted as [count, state, ...]
# Output: [count+1, newstate, r] where r is the next pseudo-random number.
def next_rand_Microsoft:
.[0] as $count | .[1] as $state
| ( (214013 * $state) + 2531011) % 2147483648 # mod 2^31
| [$count+1 , ., (. / 65536 | floor) ] ;
# generate a stream of random integers < $n
def randoms($n):
def r: next_rand_Microsoft
| (.[2] % $n), r;
[0,11] | r ;
. as $in
| ($names|length) as $count
| (.features|length) as $n
| [limit($n; randoms($count))] as $randoms
| reduce range(0; $n) as $i (.;
.features[$i].properties.name = $names[$randoms[$i]] )
Assuming your areas.json is valid JSON, then I believe the following would come close to accomplishing your intended edit:
names='["name1","name2","name3","name4"]'
jq --argjson names "$names" '.features[].properties.name = $names
' < areas.json
However, given your proposed solution, it's not clear to me what you mean by a "random value from a 1d-array". If you mean that the index should be randomly chosen (as by a PRNG), then I would suggest computing it using your favorite PRNG and passing in that random value as another argument to jq, as illustrated in the following section.
So the question becomes how to transform the text
['name1','name2','name3','name4']
into a valid JSON array. There are numerous ways this can be done, whether using jq or not, but I believe that is best left as a separate question or as an exercise, because the selection of the method will probably depend on specific details which are not mentioned in this Q. Personally, I'd use sed if possible; you might also consider using hjson, as also illustrated in the following section.
Illustration using hjson and awk
hjson -j <<< "['name1','name2','name3','name4']" > names.json.tmp
function randint {
awk -v n="$(jq length names.json.tmp)" '
function randint(n) {return int(n * rand())}
BEGIN {srand(); print randint(n)}'
}
jq --argfile names names.json.tmp --argjson n $(randint) '
.features[].properties.name = $names[$n]
' < areas.json
Addendum
Currently, jq does not have a builtin PRNG, but if you want to use jq and if you want a value from the "names" array to be chosen at random (with replacement?) for each occurrence of the .name field, then one option would be to pre-compute an array of the randomly selected names (an array of length features | length) using your favorite PRNG, and passing that array into jq:
jq --argjson randomnames "$randomnames" '
reduce range(0; .features[]|length) as $i (.;
.features[$i].properties.name = $randomnames[$i])
' < areas.json
Another option would be to use a PRNG written in jq, as illustrated elsewhere on this page.

jq streaming - filter nested list and retain global structure

In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document.
My example input it this (but the real one is large enough to demand streaming).
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
{
"keep": "false",
"extra": "non-keeper"
}
]
}
The required output just has one element of the 'filter_this' block removed:
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
]
}
The standard way to handle such cases appears to be using 'truncate_stream' to reconstitute streamed objects, before filtering those in the usual jq way. Specifically, the command:
jq -nc --stream 'fromstream(1|truncate_stream(inputs))'
gives access to a stream of objects:
{"keep_this":["this","list"]}
[{"keep":"true"},{"keep":"true","extra":"keeper"},
{"keep":"false","extra":"non-keeper"}]
at which point it is easy to filter for the required objects. However, this strips the results from the context of their parent object, which is not what I want.
Looking at the streaming structure:
[["keep_untouched","keep_this",0],"this"]
[["keep_untouched","keep_this",1],"list"]
[["keep_untouched","keep_this",1]]
[["keep_untouched","keep_this"]]
[["filter_this",0,"keep"],"true"]
[["filter_this",0,"keep"]]
[["filter_this",1,"keep"],"true"]
[["filter_this",1,"extra"],"keeper"]
[["filter_this",1,"extra"]]
[["filter_this",2,"keep"],"false"]
[["filter_this",2,"extra"],"non-keeper"]
[["filter_this",2,"extra"]]
[["filter_this",2]]
[["filter_this"]]
it seems I need to select all the 'filter_this' rows, truncate those rows only (using 'truncate_stream'), rebuild these rows as objects (using 'from_stream'), filter them, and turn the objects back into the stream data format (using 'tostream') to join the stream of 'keep untouched' rows, which are still in the streaming format. At that point it would be possible to re-build the whole json. If that is the right approach - which seems overly converluted to me - how do I do that? Or is there a better way?
If your input file consists of a single very large JSON entity that is too big for the regular jq parser to handle in your environment, then there is the distinct possibility that you won't have enough memory to reconstitute the JSON document.
With that caveat, the following may be worth a try. The key insight is that reconstruction can be accomplished using reduce.
The following uses a bunch of temporary files for the sake of clarity:
TMP=/tmp/$$
jq -c --stream 'select(length==2)' input.json > $TMP.streamed
jq -c 'select(.[0][0] != "filter_this")' $TMP.streamed > $TMP.1
jq -c 'select(.[0][0] == "filter_this")' $TMP.streamed |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
| .filter_this |= map(select(.keep=="true"))
| tostream
| select(length==2)' > $TMP.2
# Reconstruction
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' $TMP.1 $TMP.2
Output
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this": [
{
"keep": "true"
},
{
"keep": "true",
"extra": "keeper"
}
]
}
Many thanks to #peak. I found his approach really useful, but unrealistic in terms of performance. Stealing some of #peak's ideas, though, I came up with the following:
Extract the 'parent' object:
jq -c --stream 'select(length==2)' input.json |
jq -c 'select(.[0][0] != "filter_this")' |
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' > $TMP.parent
Extract the 'keepers' - though this means reading the file twice (:-<):
jq -nc --stream '[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' input.json > $TMP.keepers
Insert the filtered list into the parent object.
jq -nc -s 'inputs as $items
| $items[0] as $parent
| $parent
| .filter_this |= $items[1]
' $TMP.parent $TMP.keepers > result.json
Here is a simplified version of #PeteC's script. It requires one fewer invocations of jq.
In both cases, please note that the invocation of jq that uses "2|truncate_stream(_)" requires a more recent version of jq than 1.5.
TMP=/tmp/$$
INPUT=input.json
# Extract all but .filter_this
< $INPUT jq -c --stream 'select(length==2 and .[0][0] != "filter_this")' |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
' > $TMP.parent
# Need jq > 1.5
# Extract the 'keepers'
< $INPUT jq -n -c --stream '
[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' $INPUT > $TMP.keepers
# Insert the filtered list into the parent object:
jq -s '. as $in | .[0] | (.filter_this |= $in[1])
' $TMP.parent $TMP.keepers > result.json

Parsing JSON record-per-line with jq?

I've got a tool that outputs a JSON record on each line, and I'd like to process it with jq.
The output looks something like this:
{"ts":"2017-08-15T21:20:47.029Z","id":"123","elapsed_ms":10}
{"ts":"2017-08-15T21:20:47.044Z","id":"456","elapsed_ms":13}
When I pass this to jq as follows:
./tool | jq 'group_by(.id)'
...it outputs an error:
jq: error (at <stdin>:1): Cannot index string with string "id"
How do I get jq to handle JSON-record-per-line data?
Use the --slurp (or -s) switch:
./tool | jq --slurp 'group_by(.id)'
It outputs the following:
[
[
{
"ts": "2017-08-15T21:20:47.029Z",
"id": "123",
"elapsed_ms": 10
}
],
[
{
"ts": "2017-08-15T21:20:47.044Z",
"id": "456",
"elapsed_ms": 13
}
]
]
...which you can then process further. For example:
./tool | jq -s 'group_by(.id) | map({id: .[0].id, count: length})'
As #JeffMercado pointed out, jq handles streams of JSON just fine, but if you use group_by, then you'd have to ensure its input is an array. That could be done in this case using the -s command-line option; if your jq has the inputs filter, then it can also be done using that filter in conjunction with the -n option.
If you have a version of jq with inputs (which is available in jq 1.5), however, then a better approach would be to use the following streaming variant of group_by:
# sort-free stream-oriented variant of group_by/1
# f should always evaluate to a string.
# Output: a stream of arrays, one array per group
def GROUPS_BY(stream; f): reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;
Usage example: GROUPS_BY(inputs; .id)
Note that you will want to use this with the -n command line option.
Such a streaming variant has two main advantages:
it generally requires less memory in that it does not require a copy of the entire input stream to be kept in memory while it is being processed;
it is potentially faster because it does not require any sort operation, unlike group_by/1.
Please note that the above definition of GROUPS_BY/2 follows the convention for such streaming filters in that it produces a stream. Other variants are of course possible.
Handling a large amount of data
The following illustrates how to economize on memory. Suppose the task is to produce a frequency count of .id values. The humdrum solution would be:
GROUPS_BY(inputs; .id) | [(.[0]|.id), length]
A more economical and indeed far better solution would be:
GROUPS_BY(inputs|.id; .) | [.[0], length]

Create JSON using jq from pipe-separated keys and values in bash

I am trying to create a json object from a string in bash. The string is as follows.
CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0
The output is from docker stats command and my end goal is to publish custom metrics to aws cloudwatch. I would like to format this string as json.
{
"CONTAINER":"nginx_container",
"CPU%":"0.02%",
....
}
I have used jq command before and it seems like it should work well in this case but I have not been able to come up with a good solution yet. Other than hardcoding variable names and indexing using sed or awk. Then creating a json from scratch. Any suggestions would be appreciated. Thanks.
Prerequisite
For all of the below, it's assumed that your content is in a shell variable named s:
s='CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0'
What (modern jq)
# thanks to #JeffMercado and #chepner for refinements, see comments
jq -Rn '
( input | split("|") ) as $keys |
( inputs | split("|") ) as $vals |
[[$keys, $vals] | transpose[] | {key:.[0],value:.[1]}] | from_entries
' <<<"$s"
How (modern jq)
This requires very new (probably 1.5?) jq to work, and is a dense chunk of code. To break it down:
Using -n prevents jq from reading stdin on its own, leaving the entirety of the input stream available to be read by input and inputs -- the former to read a single line, and the latter to read all remaining lines. (-R, for raw input, causes textual lines rather than JSON objects to be read).
With [$keys, $vals] | transpose[], we're generating [key, value] pairs (in Python terms, zipping the two lists).
With {key:.[0],value:.[1]}, we're making each [key, value] pair into an object of the form {"key": key, "value": value}
With from_entries, we're combining those pairs into objects containing those keys and values.
What (shell-assisted)
This will work with a significantly older jq than the above, and is an easily adopted approach for scenarios where a native-jq solution can be harder to wrangle:
{
IFS='|' read -r -a keys # read first line into an array of strings
## read each subsequent line into an array named "values"
while IFS='|' read -r -a values; do
# setup: positional arguments to pass in literal variables, query with code
jq_args=( )
jq_query='.'
# copy values into the arguments, reference them from the generated code
for idx in "${!values[#]}"; do
[[ ${keys[$idx]} ]] || continue # skip values with no corresponding key
jq_args+=( --arg "key$idx" "${keys[$idx]}" )
jq_args+=( --arg "value$idx" "${values[$idx]}" )
jq_query+=" | .[\$key${idx}]=\$value${idx}"
done
# run the generated command
jq "${jq_args[#]}" "$jq_query" <<<'{}'
done
} <<<"$s"
How (shell-assisted)
The invoked jq command from the above is similar to:
jq --arg key0 'CONTAINER' \
--arg value0 'nginx_container' \
--arg key1 'CPU%' \
--arg value1 '0.0.2%' \
--arg key2 'MEMUSAGE/LIMIT' \
--arg value2 '25.09MiB/15.26GiB' \
'. | .[$key0]=$value0 | .[$key1]=$value1 | .[$key2]=$value2' \
<<<'{}'
...passing each key and value out-of-band (such that it's treated as a literal string rather than parsed as JSON), then referring to them individually.
Result
Either of the above will emit:
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Why
In short: Because it's guaranteed to generate valid JSON as output.
Consider the following as an example that would break more naive approaches:
s='key ending in a backslash\
value "with quotes"'
Sure, these are unexpected scenarios, but jq knows how to deal with them:
{
"key ending in a backslash\\": "value \"with quotes\""
}
...whereas an implementation that didn't understand JSON strings could easily end up emitting:
{
"key ending in a backslash\": "value "with quotes""
}
I know this is an old post, but the tool you seek is called jo: https://github.com/jpmens/jo
A quick and easy example:
$ jo my_variable="simple"
{"my_variable":"simple"}
A little more complex
$ jo -p name=jo n=17 parser=false
{
"name": "jo",
"n": 17,
"parser": false
}
Add an array
$ jo -p name=jo n=17 parser=false my_array=$(jo -a {1..5})
{
"name": "jo",
"n": 17,
"parser": false,
"my_array": [
1,
2,
3,
4,
5
]
}
I've made some pretty complex stuff with jo and the nice thing is that you don't have to worry about rolling your own solution worrying about the possiblity of making invalid json.
You can ask docker to give you JSON data in the first place
docker stats --format "{{json .}}"
For more on this, see: https://docs.docker.com/config/formatting/
JSONSTR=""
declare -a JSONNAMES=()
declare -A JSONARRAY=()
LOOPNUM=0
cat ~/newfile | while IFS=: read CONTAINER CPU MEMUSE MEMPC NETIO BLKIO PIDS; do
if [[ "$LOOPNUM" = 0 ]]; then
JSONNAMES=("$CONTAINER" "$CPU" "$MEMUSE" "$MEMPC" "$NETIO" "$BLKIO" "$PIDS")
LOOPNUM=$(( LOOPNUM+1 ))
else
echo "{ \"${JSONNAMES[0]}\": \"${CONTAINER}\", \"${JSONNAMES[1]}\": \"${CPU}\", \"${JSONNAMES[2]}\": \"${MEMUSE}\", \"${JSONNAMES[3]}\": \"${MEMPC}\", \"${JSONNAMES[4]}\": \"${NETIO}\", \"${JSONNAMES[5]}\": \"${BLKIO}\", \"${JSONNAMES[6]}\": \"${PIDS}\" }"
fi
done
Returns:
{ "CONTAINER": "nginx_container", "CPU%": "0.02%", "MEMUSAGE/LIMIT": "25.09MiB/15.26GiB", "MEM%": "0.16%", "NETI/O": "0B/0B", "BLOCKI/O": "22.09MB/4.096kB", "PIDS": "0" }
Here is a solution which uses the -R and -s options along with transpose:
split("\n") # [ "CONTAINER...", "nginx_container|0.02%...", ...]
| (.[0] | split("|")) as $keys # [ "CONTAINER", "CPU%", "MEMUSAGE/LIMIT", ... ]
| (.[1:][] | split("|")) # [ "nginx_container", "0.02%", ... ] [ ... ] ...
| select(length > 0) # (remove empty [] caused by trailing newline)
| [$keys, .] # [ ["CONTAINER", ...], ["nginx_container", ...] ] ...
| [ transpose[] | {(.[0]):.[1]} ] # [ {"CONTAINER": "nginx_container"}, ... ] ...
| add # {"CONTAINER": "nginx_container", "CPU%": "0.02%" ...
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "nginx_container" "0.02%" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
Not using jq but possible to use args and environment in values.
CONTAINER=nginx_container
json_template='{"CONTAINER":"%s","CPU%":"%s","MEMUSAGE/LIMIT":"%s", "MEM%":"%s","NETI/O":"%s","BLOCKI/O":"%s","PIDS":"%s"}'
json_string=$(printf "$json_template" "$CONTAINER" "$1" "25.09MiB/15.26GiB" "0.16%" "0B/0B" "22.09MB/4.096kB" "0")
echo "$json_string"
If you're starting with tabular data, I think it makes more sense to use something that works with tabular data natively, like sqawk to make it into json, and then use jq work with it further.
echo 'CONTAINER|CPU%|MEMUSAGE/LIMIT|MEM%|NETI/O|BLOCKI/O|PIDS
nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0' \
| sqawk -FS '[|]' -RS '\n' -output json 'select * from a' header=1 \
| jq '.[] | with_entries(select(.key|test("^a.*")|not))'
{
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0"
}
Without jq, sqawk gives a bit too much:
[
{
"anr": "1",
"anf": "7",
"a0": "nginx_container|0.02%|25.09MiB/15.26GiB|0.16%|0B/0B|22.09MB/4.096kB|0",
"CONTAINER": "nginx_container",
"CPU%": "0.02%",
"MEMUSAGE/LIMIT": "25.09MiB/15.26GiB",
"MEM%": "0.16%",
"NETI/O": "0B/0B",
"BLOCKI/O": "22.09MB/4.096kB",
"PIDS": "0",
"a8": "",
"a9": "",
"a10": ""
}
]