Using JQ to merge two JSON snippets from one file - json

I've got output from a script that outputs two structurally identical JSON snippets into one file:
{
"Objects": [
{
"Key": "somevalue",
"VersionId": "someversion"
}
],
"Quiet": false
}
{
"Objects": [
{
"Key": "someothervalue",
"VersionId": "someotherversion"
}
],
"Quiet": false
}
I would like to pass this output through JQ to have one Objects[] list, concatenating all of the objects within the two lists, and outputting the same overall structure. I can accomplish it with piping between two separate JQ commands:
jq '.Objects[]' inputfile | jq -s '{"Objects":., "Quiet":false}' -
But I'm wondering if there is a more elegant way to do so using only one invocation of JQ.
I'm currently using JQ version 1.5 but can update if needed.

You don't need to invoke JQ twice there. The second object can be fetched using the input keyword.
.Objects += input.Objects
Online demo

You can use reduce:
jq -s 'reduce .[] as $item ({ Quiet: false }; .Objects += $item.Objects)'
See it in action.
As #oguz-ismail suggested in a comment, the -s (slurp) flag can be removed by using inputs to get the rest of the entries after the first one:
jq 'reduce inputs as $item (.; .Objects += $item.Objects)'
See it in action.
Both versions work with any number of entries in the input (the second version requires at least one).

Related

JQ write each object to subdirectory file

I'm new to jq (around 24 hours). I'm getting the filtering/selection already, but I'm wondering about advanced I/O features. Let's say I have an existing jq query that works fine, producing a stream (not a list) of objects. That is, if I pipe them to a file, it produces:
{
"id": "foo"
"value": "123"
}
{
"id": "bar"
"value": "456"
}
Is there some fancy expression I can add to my jq query to output each object individually in a subdirectory, keyed by the id, in the form id/id.json? For example current-directory/foo/foo.json and current-directory/bar/bar.json?
As #pmf has pointed out, an "only-jq" solution is not possible. A solution using jq and awk is as follows, though it is far from robust:
<input.json jq -rc '.id, .' | awk '
id=="" {id=$0; next;}
{ path=id; gsub(/[/]/, "_", path);
system("mkdir -p " path);
print >> path "/" id ".json";
id="";
}
'
As you will need help from outside jq anyway (see #peak's answer using awk), you also might want to consider using another JSON processor instead which offers more I/O features. One that comes to my mind is mikefarah/yq, a jq-inspired processor for YAML, JSON, and other formats. It can split documents into multiple files, and since its v4.27.2 release it also supports reading multiple JSON documents from a single input source.
$ yq -p=json -o=json input.json -s '.id'
$ cat foo.json
{
"id": "foo",
"value": "123"
}
$ cat bar.json
{
"id": "bar",
"value": "456"
}
The argument following -s defines the evaluation filter for each output file's name, .id in this case (the .json suffix is added automatically), and can be manipulated to further needs, e.g. -s '"file_with_id_" + .id'. However, adding slashes will not result in subdirectories being created, so this (from here on comparatively easy) part will be left over for post-processing in the shell.

Use jq to concatenate JSON arrays in multiple files

I have a series of JSON files containing an array of records, e.g.
$ cat f1.json
{
"records": [
{"a": 1},
{"a": 3}
]
}
$ cat f2.json
{
"records": [
{"a": 2}
]
}
I want to 1) extract a single field from each record and 2) output a single array containing all the field values from all input files.
The first part is easy:
jq '.records | map(.a)' f?.json
[
1,
3
]
[
2
]
But I cannot figure out how to get jq to concatenate those output arrays into a single array!
I'm not married to jq; I'll happily use another tool if necessary. But I would love to know how to do this with jq, because it's something I have been trying to figure out for years.
Assuming your jq has inputs (which is true of jq 1.5 and later), it would be most efficient to use it, e.g. along the lines of:
jq -n '[inputs.records[].a]' f*.json
Use -s (or --slurp):
jq -s 'map(.records[].a)' f?.json
You need to use --slurp so that jq will apply its filter on the aggregation of all inputs rather than on each input. When using this option, jq's input will be an array of the inputs which you need to account for.
I would use the following :
jq --slurp 'map(.records | map(.a)) | add' f?.json
We apply your current transformation to each elements of the slurped array of inputs (your previous individual inputs), then we merge those transformed arrays into one with add.
If your input files are large, slurping the file could eat up lots of memory in which you case you can reduce which works in iterative manner, appending the contents of the array .a one object at a time
jq -n 'reduce inputs.records[].a as $d (.; . += [$d])' f?.json
The -n flag is to ensure to construct the output JSON from scratch with the data available from inputs. The reduce function takes the initial value of . which because of the null input would be just null. Then for each of the input objects . += [$d] ensures that the array contents of .a are appended together.
As a compromise between the readability of --slurp and the efficiency of reduce, you can run jq twice. The first is a slightly altered version of your original command, the second slurps the undifferentiated output into a single array.
$ jq '.records[] | .a' f?.json | jq -s
[
1,
3,
2
]
--slurp (-s) key is needed and map() to do so in one shot
$ cat f1.json
{
"records": [
{"a": 1},
{"a": 3}
]
}
$ cat f2.json
{
"records": [
{"a": 2}
]
}
$ jq -s 'map(.records[].a)' f?.json
[
1,
3,
2
]

jq streaming - filter nested list and retain global structure

In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document.
My example input it this (but the real one is large enough to demand streaming).
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
{
"keep": "false",
"extra": "non-keeper"
}
]
}
The required output just has one element of the 'filter_this' block removed:
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
]
}
The standard way to handle such cases appears to be using 'truncate_stream' to reconstitute streamed objects, before filtering those in the usual jq way. Specifically, the command:
jq -nc --stream 'fromstream(1|truncate_stream(inputs))'
gives access to a stream of objects:
{"keep_this":["this","list"]}
[{"keep":"true"},{"keep":"true","extra":"keeper"},
{"keep":"false","extra":"non-keeper"}]
at which point it is easy to filter for the required objects. However, this strips the results from the context of their parent object, which is not what I want.
Looking at the streaming structure:
[["keep_untouched","keep_this",0],"this"]
[["keep_untouched","keep_this",1],"list"]
[["keep_untouched","keep_this",1]]
[["keep_untouched","keep_this"]]
[["filter_this",0,"keep"],"true"]
[["filter_this",0,"keep"]]
[["filter_this",1,"keep"],"true"]
[["filter_this",1,"extra"],"keeper"]
[["filter_this",1,"extra"]]
[["filter_this",2,"keep"],"false"]
[["filter_this",2,"extra"],"non-keeper"]
[["filter_this",2,"extra"]]
[["filter_this",2]]
[["filter_this"]]
it seems I need to select all the 'filter_this' rows, truncate those rows only (using 'truncate_stream'), rebuild these rows as objects (using 'from_stream'), filter them, and turn the objects back into the stream data format (using 'tostream') to join the stream of 'keep untouched' rows, which are still in the streaming format. At that point it would be possible to re-build the whole json. If that is the right approach - which seems overly converluted to me - how do I do that? Or is there a better way?
If your input file consists of a single very large JSON entity that is too big for the regular jq parser to handle in your environment, then there is the distinct possibility that you won't have enough memory to reconstitute the JSON document.
With that caveat, the following may be worth a try. The key insight is that reconstruction can be accomplished using reduce.
The following uses a bunch of temporary files for the sake of clarity:
TMP=/tmp/$$
jq -c --stream 'select(length==2)' input.json > $TMP.streamed
jq -c 'select(.[0][0] != "filter_this")' $TMP.streamed > $TMP.1
jq -c 'select(.[0][0] == "filter_this")' $TMP.streamed |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
| .filter_this |= map(select(.keep=="true"))
| tostream
| select(length==2)' > $TMP.2
# Reconstruction
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' $TMP.1 $TMP.2
Output
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this": [
{
"keep": "true"
},
{
"keep": "true",
"extra": "keeper"
}
]
}
Many thanks to #peak. I found his approach really useful, but unrealistic in terms of performance. Stealing some of #peak's ideas, though, I came up with the following:
Extract the 'parent' object:
jq -c --stream 'select(length==2)' input.json |
jq -c 'select(.[0][0] != "filter_this")' |
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' > $TMP.parent
Extract the 'keepers' - though this means reading the file twice (:-<):
jq -nc --stream '[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' input.json > $TMP.keepers
Insert the filtered list into the parent object.
jq -nc -s 'inputs as $items
| $items[0] as $parent
| $parent
| .filter_this |= $items[1]
' $TMP.parent $TMP.keepers > result.json
Here is a simplified version of #PeteC's script. It requires one fewer invocations of jq.
In both cases, please note that the invocation of jq that uses "2|truncate_stream(_)" requires a more recent version of jq than 1.5.
TMP=/tmp/$$
INPUT=input.json
# Extract all but .filter_this
< $INPUT jq -c --stream 'select(length==2 and .[0][0] != "filter_this")' |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
' > $TMP.parent
# Need jq > 1.5
# Extract the 'keepers'
< $INPUT jq -n -c --stream '
[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' $INPUT > $TMP.keepers
# Insert the filtered list into the parent object:
jq -s '. as $in | .[0] | (.filter_this |= $in[1])
' $TMP.parent $TMP.keepers > result.json

jq - parsing& replacement based on key-value pairs within json

I have a json file in the form of a key-value map. For example:
{
"users":[
{
"key1":"user1",
"key2":"user2"
}
]
}
I have another json file. The values in the second file has to be replaced based on the keys in first file.
For example 2nd file is:
{
"info":
{
"users":["key1","key2","key3","key4"]
}
}
This second file should be replaced with
{
"info":
{
"users":["user1","user2","key3","key4"]
}
}
Because the value of key1 in first file is user1. this could be done with any python program, but I am learning jq and would like to try if it is possible with jq itself. I tried different combinations with reading file using slurpfile, then select & walk etc. But couldn't arrive at the required solution.
Any suggestions for the same will be appreciated.
Since .users[0] is a JSON dictionary, it would make sense to use it as such (e.g. for efficiency):
Invocation:
jq -c --slurpfile users users.json -f program.jq input.json
program.jq:
$users[0].users[0] as $dict
| .info.users |= map($dict[.] // .)
Output:
{"info":{"users":["user1","user2","key3","key4"]}}
Note: the above assumes that the dictionary contains no null or false values, or rather that any such values in the dictionary should be ignored. This avoids the double lookup that would otherwise be required. If this assumption is invalid, then a solution using has or in (e.g. as provided by RomanPerekhrest) would be appropriate.
Solution to supplemental problem
(See "comments".)
$users[0].users[0] as $dict
| second
| .info.users |= (map($dict[.] | select(. != null)))
sponge
It is highly inadvisable to use redirection to overwrite an input file.
If you have or can install sponge, then it would be far better to use it. For further details, see e.g. "What is jq's equivalent of sed -i?" in the jq FAQ.
jq solution:
jq --slurpfile users 1st.json '$users[0].users[0] as $users
| .info.users |= map(if in($users) then $users[.] else . end)' 2nd.json
The output:
{
"info": {
"users": [
"user1",
"user2",
"key3",
"key4"
]
}
}

How to get a subobject out of JSON using jq, keeping final key in the result without Bash processing?

I'm writing a Bash function to get a portion of a JSON object. The API for the function is:
GetSubobject()
{
local Filter="$1" # Filter is of the form .<key>.<key> ... .<key>
local File="$2" # File is the JSON to get the subobject
# Code to get subobject using jq
# ...
}
To illustrate what I mean by a subobject, consider the Bash function call:
GetSubobject .b.x.y example.json
where the file example.json contains:
{
"a": { "p": 1, "q": 2 },
"b":
{
"x":
{
"y": { "j": true, "k": [1,2,3] },
"z": [4,5,6]
}
}
}
The result from the function call would be emitted to stdout:
{
"y": {
"j": true,
"k": [
1,
2,
3
]
}
}
Note that the code jq -r "$Filter" "$File" would not give the desired answer. It would give:
{ "j": true, "k": [1,2,3] }
Please note that the answer I'm looking for needs to be something I can use in the Bash function API above. So, the answer should use the Filter and File variables as show above and not be specific to the example above.
I have come up with a solution; however, it relies on Bash to do part of the job. I am hoping that the solution can be pure jq without reliance on Bash processing.
#!/bin/bash
GetSubobject()
{
local Filter="$1"
local File="$2"
# General case: separate:
# .<key1>.<key2> ... .<keyN-1>.<keyN>
# into:
# Prefix=.<key1>.<key2> ... .<keyN-1>
# Suffix=<keyN>
local Suffix="${Filter##*.}"
local Prefix="${Filter%.$Suffix}"
# Edge case: where Filter = .<key>
# Set:
# Prefix=.
# Suffix=<key>
if [[ -z $Prefix ]]; then
Prefix='.'
Suffix="${Filter#.}"
fi
jq -r "$Prefix|to_entries|map(select(.key==\"$Suffix\"))|from_entries" "$File"
}
GetSubobject "$#"
How would I complete the above Bash function using jq to obtain the desired result, hopefully in a less brute-force way that takes advantage of jq's capabilities without having to do pre-processing in Bash?
Somewhat further simplifying the jq part but with the same general constraints as JawguyChooser's answer, how about the much simpler Bash function
GetSubject () {
local newroot=${1##*.}
jq -r "{$newroot: $1}" "$2"
}
I may be overlooking some nuances of your more-complex Bash processing, but this seems to work for the example you provided.
If I understand what you're trying to do correctly, it doesn't seem possible to me to do it "pure jq" having read the docs (and being a regular jq user myself). The closest I could come to helping here was to simplify the jq part itself:
jq -r "$Prefix| { $Suffix }" "$File"
This has the same behavior as your example (on this limited set of cases):
GetSubobject '.b.x.y' example.json
{
"y": {
"j": true,
"k": [
1,
2,
3
]
}
}
This is really a case of metaprogramming, you want to programmatically operate on a jq program. Well, it makes sense (to me) that jq takes its program as input but doesn't allow you to alter the program itself. bash seems like an appropriate choice for doing the metaprogramming here: to convert a jq program into another one and then run jq using that.
If the goal is to do as little as possible in bash, then maybe the following bash function will fill the bill:
function GetSubobject {
local Filter="$1" # Filter is of the form .<key>.<key> ... .<key>
local File="$2" # File is the JSON to get the subobject
jq '(null|path('"$Filter"')) as $path
| {($path[-1]): '"$Filter"'}' "$File"
}
An alternative would be to pass $Filter in as a string (e.g. --arg filter "$Filter"), have jq do the parsing, and then use getpath.
It would of course be simplest if GetSubobject could be called with the path separated from the field of interest, like this:
GetSubobject .b.x y filename