Retrieving required set-off json objects using through "Jq" method - json

I have below mentioned Json file. I wanted to do the below checks.
Get 1st 5 objects from the whole list and save them in a separate file (i.e FirstTopObject.json)
Get another set-off 5 objects and store them into another file (i.e SecondTopObject.json)
Get the last 5 objects and store them into another file (i.e ThirdTopObject.json)
Basically, wanted to split the Objects based on the Numbers and save them into a separate file.
Is there any solution is available to achieve through the “jq” function/method?
Input File:
{
"storeId": "0001"
}
{
"storeId": "0002"
}
{
"storeId": "0003"
}
{
"storeId": "0004"
}
{
"storeId": "0005"
}
{
"storeId": "0006"
}
{
"storeId": "0007"
}
{
"storeId": "0008"
}
{
"storeId": "0009"
}
{
"storeId": "00010"
}
{
"storeId": "00011"
}
{
"storeId": "00012"
}
{
"storeId": "00013"
}
{
"storeId": "00014"
}
{
"storeId": "00015"
}
enter code here
enter code here
Expecting output:
FirstTopObject.json should have the below set.
{
"storeId": "0001"
}
{
"storeId": "0002"
}
{
"storeId": "0003"
}
{
"storeId": "0004"
}
{
"storeId": "0005"
}
SecondTopObject.json - shold contain below setoff objects.
{
"storeId": "0006"
}
{
"storeId": "0007"
}
{
"storeId": "0008"
}
{
"storeId": "0009"
}
{
"storeId": "00010"
}
Like wise for other set.
It Would be more helpful if some help me.
Thanks in advance!

You could use JQ to process the input file - reformat it using -c (compact) and then use standard unix tools to split the files, i.e.
cat input | jq --slurp -c .[] | head -5 | jq . > FirstTopObject.json
cat input | jq --slurp -c .[] | sed '6,10!d' | jq . > SecondTopObject.json
cat input | jq --slurp -c .[] | tail -5 | jq . > ThirdTopObject.json

You can use a combination of the jq functions to_entries and group_by for this, along with a little bash.
This snippet will create 25 strings ("line 0", "line 1", etc.), group them by 5s, and write them into files 0.json, 1.json, etc. Everything before to_entries can be replaced with any list. In your case, you can use the slurp flag -s to get all your JSON objects in your input file into a list.
FILE_NUM=0
jq -nc '
# create input
["line " + (range(25) | tostring)] |
# process input
to_entries | group_by(.key / 5 | floor)[] | map(.value)
' | while read LINE; do echo "$LINE" > "/tmp/$((FILE_NUM++)).json"; done

There is no need to slurp the input file! Even if the output must be pretty-printed, there is no need for more than four invocations of jq altogether.
Handling a small input file
If the input is not so big, you can simply run
jq -c . input
directing the output to a temporary file, and then split that file into three using whichever standard command-line tools you find most convenient (a single invocation of awk might be worth considering ...).
Handling a very large input file
If the input file is very large, then it would make sense to use jq just to copy the (15) items of interest into a temporary file, and then process that file:
Step 1
Invoke the following program with jq -cn:
def echo($n1; $n2; $last):
foreach (inputs,null) as $in ({ix:-1, first:[], second:[], last:[]};
if $in then
.ix += 1
| if .ix < $n1
then .first += [$in]
elif .ix < $n1+n2 then .second += [$in]
else .last += [$in]
| .last = (.last[ - $last: ])
end
else . end;
if $in == null then del(.ix) else empty end
)
| .[];
echo(5;5;5)
(This program is somewhat complex because it makes no assumptions about the relative sizes of the three blocks.)
Step 2
Assuming the output from Step 1 is in input.tmp, then run:
sed -n 1p input.tmp | jq .[] > FirstTopObject.json
sed -n 2p input.tmp | jq .[] > SecondTopObject.json
sed -n 3p input.tmp | jq .[] > ThirdTopObject.json

Related

Generate csv files from a JSON

Unfortunately I have considerable difficulties to generate three csv files from one json format. Maybe someone has a good hint how I could do this. Thanks
Here is the output. Within dropped1 and dropped2 can be several different and multiple addresses.
{
"result": {
"found": 0,
"dropped1": {
"address10": 1140
},
"rates": {
"total": {
"1min": 3579,
"5min": 1593,
"15min": 5312,
"60min": 1328
},
"dropped2": {
"address20": {
"1min": 9139,
"5min": 8355,
"15min": 2785,
"60min": 8196
}
}
},
"connections": 1
},
"id": "whatever",
"jsonrpc": "2.0"
}
The 3 csv files should be displayed in this form.
address10,1140
total,3579,1593,5312,1328
address20,9139,8355,2785,8196
If you decide to use jq, then unless there is some specific reason not to, I'd suggest invoking jq once for each of the three output files. The three invocations would then look like these:
jq -r '.result.dropped1 | [to_entries[][]] | #csv' > 1.csv
jq -r '.result.rates.total | ["total", .["1min"], .["5min"], .["15min"], .["60min"]] | #csv' > 2.csv
jq -r '.result.rates.dropped2
| to_entries[]
| [.key] + ( .value | [ .["1min"], .["5min"], .["15min"], .["60min"]] )
| #csv
' > 3.csv
If you can be sure the ordering of keys within the total and address20 objects is fixed and in the correct order, then the last two invocations can be simplified.
Did you try using this library?
https://www.npmjs.com/package/json-to-csv-stream
npm i json-to-csv-stream

jq: from one json input, construct multiple rows of tsv using an expression against the keys?

Using jq I can extract the data in this simple way as follows:
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_A_Foo.value, .data.Item_A_Bar.value] | #tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | #tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | #tsv' >> foobar.tsv
...
# and so on
But this seems pretty wasteful. Is there a more advanced way to use JQ, and perhaps:
Filter for .data.Item_*_Foo.value, .data.Item_*_Bar.value
OR chain these rows in a single jq expression (reasonably readable, compact)
# Here is a made up JSON file that can motivate this question.
# Imagine there are 100,000 of these and they are larger.
{
"data":
{
"Item_A_Foo": {
"adj": "wild",
"adv": "unruly",
"value": "unknown"
},
"Item_A_Bar": {
"adj": "rotund",
"quality": "mighty",
"value": "swing"
},
"Item_B_Foo": {
"adj": "nice",
"adv": "heroically",
"value": "medium"
},
... etc. for many Foo's and Bar's of A, B, C, ..., Z types
"Not_an_Item": {
"value": "doesn't matter"
}
}
And the goal is:
unknown, swing # data.Item_A_Foo.value, data.Item_A_Bar.value
medium, hit # data.Item_B_Foo.value, data.Item_B_Bar.value
whatever, etc. # data.Item_C_Foo.value, data.Item_C_Bar.value
The details of your requirements are unclear, but you could proceed along the lines suggested by this jq filter:
.data
| (keys_unsorted|map(select(test("^Item_[^_]*_Foo$")))) as $foos
| ($foos | map(sub("_Foo$"; "_Bar"))) as $bars
| [ .[$foos[]].value, .[$bars[]].value]
| #tsv
The idea is to determine dynamically which keys to select.

Building new JSON with JQ and bash

I am trying to create JSON from scratch using bash.
The final structure needs to be like:
{
"hosts": {
"a_hostname" : {
"ips" : [
1,
2,
3
]
},
{...}
}
}
First I'm creating an input file with the format:
hostname ["1.1.1.1","2.2.2.2"]
host-name2 ["3.3.3.3","4.4.4.4"]
This is being created by:
for host in $( ansible -i hosts all --list-hosts ) ; \
do echo -n "${host} " ; \
ansible -i hosts $host -m setup | sed '1c {' | jq -r -c '.ansible_facts.ansible_all_ipv4_addresses' ; \
done > hosts.txt
The key point here is that the IP list/array, is coming from a JSON file and being extracted by jq. This extraction outputs an already valid / quoted JSON array, but as a string in a txt file.
Next I'm using jq to parse the whole text file into the desired JSON:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]):
{ips:.[1]}
}
] | add }
' < ~/hosts.txt
This is almost correct, everything except for the IPs value which is treated as a string and quoted leading to:
{
"hosts": {
"hostname1": {
"ips": "[\"1.1.1.1\",\"2.2.2.2\"]"
},
"host-name2": {
"ips": "[\"3.3.3.3\",\"4.4.4.4\"]"
}
}
}
I'm now stuck at this final hurdle - how to insert the IPs without causing them to be quoted again.
Edit - quoting solved by using {ips: .[1] | fromjson }} instead of {ips:.[1]}.
However this was completely negated by #CharlesDuffy's help suggesting converting to TSV.
Original Q body:
So far I've got to
jq -n {hosts:{}} | \
for host in $( ansible -i hosts all --list-hosts ) ; \
do jq ".hosts += {$host:{}}" | \
jq ".hosts.$host += {ips:[1,2,3]}" ; \
done ;
([1,2,3] is actually coming from a subshell but including it seemed unnecessary as that part works, and made it harder to read)
This sort of works, but there seems to be 2 problems.
1) Final output only has a single host in it containg data from the first host in the list (this persists even if the second problem is bypassed):
{
"hosts": {
"host_1": {
"ips": [
1,
2,
3
]
}
}
}
2) One of the hostnames has a - in it, which causes syntax and compiler errors from jq. I'm stuck going around quote hell trying to get it to be interpreted but also quoted. Help!
Thanks for any input.
Let's say your input format is:
host_1 1 2 3
host_2 2 3 4
host-with-dashes 3 4 5
host-with-no-addresses
...re: edit specifying a different format: Add #tsv onto the JQ command producing the existing format to generate this one instead.
If you want to transform that to the format in question, it might look like:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]): .[1:]}
] | add
}' <input.txt
Which yields as output:
{
"hosts": {
"host_1": [
"1",
"2",
"3"
],
"host_2": [
"2",
"3",
"4"
],
"host-with-dashes": [
"3",
"4",
"5"
],
"host-with-no-addresses": []
}
}

jq streaming - filter nested list and retain global structure

In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document.
My example input it this (but the real one is large enough to demand streaming).
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
{
"keep": "false",
"extra": "non-keeper"
}
]
}
The required output just has one element of the 'filter_this' block removed:
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
]
}
The standard way to handle such cases appears to be using 'truncate_stream' to reconstitute streamed objects, before filtering those in the usual jq way. Specifically, the command:
jq -nc --stream 'fromstream(1|truncate_stream(inputs))'
gives access to a stream of objects:
{"keep_this":["this","list"]}
[{"keep":"true"},{"keep":"true","extra":"keeper"},
{"keep":"false","extra":"non-keeper"}]
at which point it is easy to filter for the required objects. However, this strips the results from the context of their parent object, which is not what I want.
Looking at the streaming structure:
[["keep_untouched","keep_this",0],"this"]
[["keep_untouched","keep_this",1],"list"]
[["keep_untouched","keep_this",1]]
[["keep_untouched","keep_this"]]
[["filter_this",0,"keep"],"true"]
[["filter_this",0,"keep"]]
[["filter_this",1,"keep"],"true"]
[["filter_this",1,"extra"],"keeper"]
[["filter_this",1,"extra"]]
[["filter_this",2,"keep"],"false"]
[["filter_this",2,"extra"],"non-keeper"]
[["filter_this",2,"extra"]]
[["filter_this",2]]
[["filter_this"]]
it seems I need to select all the 'filter_this' rows, truncate those rows only (using 'truncate_stream'), rebuild these rows as objects (using 'from_stream'), filter them, and turn the objects back into the stream data format (using 'tostream') to join the stream of 'keep untouched' rows, which are still in the streaming format. At that point it would be possible to re-build the whole json. If that is the right approach - which seems overly converluted to me - how do I do that? Or is there a better way?
If your input file consists of a single very large JSON entity that is too big for the regular jq parser to handle in your environment, then there is the distinct possibility that you won't have enough memory to reconstitute the JSON document.
With that caveat, the following may be worth a try. The key insight is that reconstruction can be accomplished using reduce.
The following uses a bunch of temporary files for the sake of clarity:
TMP=/tmp/$$
jq -c --stream 'select(length==2)' input.json > $TMP.streamed
jq -c 'select(.[0][0] != "filter_this")' $TMP.streamed > $TMP.1
jq -c 'select(.[0][0] == "filter_this")' $TMP.streamed |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
| .filter_this |= map(select(.keep=="true"))
| tostream
| select(length==2)' > $TMP.2
# Reconstruction
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' $TMP.1 $TMP.2
Output
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this": [
{
"keep": "true"
},
{
"keep": "true",
"extra": "keeper"
}
]
}
Many thanks to #peak. I found his approach really useful, but unrealistic in terms of performance. Stealing some of #peak's ideas, though, I came up with the following:
Extract the 'parent' object:
jq -c --stream 'select(length==2)' input.json |
jq -c 'select(.[0][0] != "filter_this")' |
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' > $TMP.parent
Extract the 'keepers' - though this means reading the file twice (:-<):
jq -nc --stream '[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' input.json > $TMP.keepers
Insert the filtered list into the parent object.
jq -nc -s 'inputs as $items
| $items[0] as $parent
| $parent
| .filter_this |= $items[1]
' $TMP.parent $TMP.keepers > result.json
Here is a simplified version of #PeteC's script. It requires one fewer invocations of jq.
In both cases, please note that the invocation of jq that uses "2|truncate_stream(_)" requires a more recent version of jq than 1.5.
TMP=/tmp/$$
INPUT=input.json
# Extract all but .filter_this
< $INPUT jq -c --stream 'select(length==2 and .[0][0] != "filter_this")' |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
' > $TMP.parent
# Need jq > 1.5
# Extract the 'keepers'
< $INPUT jq -n -c --stream '
[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' $INPUT > $TMP.keepers
# Insert the filtered list into the parent object:
jq -s '. as $in | .[0] | (.filter_this |= $in[1])
' $TMP.parent $TMP.keepers > result.json

How to use jq to find all paths to a certain key

In a very large nested json structure I'm trying to find all of the paths that end in a key.
ex:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
},
"A2": {
"_": "_"
}
},
"B": {
"B1": {}
},
"foo": {
"_": "_"
}
}
would print something along the lines of:
["A","A1","foo"], ["foo"]
Unfortunately I don't know at what level of nesting the keys will appear, so I haven't been able to figure it out with a simple select. I've gotten close with jq '[paths] | .[] | select(contains(["foo"]))', but the output contains all the permutations of any tree that contains foo.
output: ["A", "A1", "foo"]["A", "A1", "foo", "_"]["foo"][ "foo", "_"]
Bonus points if I could keep the original data structure format but simply filter out all paths that don't contain the key (in this case the sub trees under "foo" wouldn't need to be hidden).
With your input:
$ jq -c 'paths | select(.[-1] == "foo")'
["A","A1","foo"]
["foo"]
Bonus points:
(1) If your jq has tostream:
$ jq 'fromstream(tostream| select(.[0]|index("foo")))'
Or better yet, since your input is large, you can use the streaming parser (jq -n --stream) with this filter:
fromstream( inputs|select( (.[0]|index("foo"))))
(2) Whether or not your jq has tostream:
. as $in
| reduce (paths(scalars) | select(index("foo"))) as $p
(null; setpath($p; $in|getpath($p)))
In all three cases, the output is:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
}
},
"foo": {
"_": "_"
}
}
I had the same fundamental problem.
With (yaml) input like:
developer:
android:
members:
- alice
- bob
oncall:
- bob
hr:
members:
- charlie
- doug
this:
is:
really:
deep:
nesting:
members:
- example deep nesting
I wanted to find all arbitrarily nested groups and get their members.
Using this:
yq . | # convert yaml to json using python-yq
jq '
. as $input | # Save the input for later
. | paths | # Get the list of paths
select(.[-1] | tostring | test("^(members|oncall|priv)$"; "ix")) | # Only find paths which end with members, oncall, and priv
. as $path | # save each path in the $path variable
( $input | getpath($path) ) as $members | # Get the value of each path from the original input
{
"key": ( $path | join("-") ), # The key is the join of all path keys
"value": $members # The value is the list of members
}
' |
jq -s 'from_entries' | # collect kv pairs into a full object using slurp
yq --sort-keys -y . # Convert back to yaml using python-yq
I get output like this:
developer-android-members:
- alice
- bob
developer-android-oncall:
- bob
hr-members:
- charlie
- doug
this-is-really-deep-nesting-members:
- example deep nesting

Categories