Use jq to concatenate JSON arrays in multiple files - json

I have a series of JSON files containing an array of records, e.g.
$ cat f1.json
{
"records": [
{"a": 1},
{"a": 3}
]
}
$ cat f2.json
{
"records": [
{"a": 2}
]
}
I want to 1) extract a single field from each record and 2) output a single array containing all the field values from all input files.
The first part is easy:
jq '.records | map(.a)' f?.json
[
1,
3
]
[
2
]
But I cannot figure out how to get jq to concatenate those output arrays into a single array!
I'm not married to jq; I'll happily use another tool if necessary. But I would love to know how to do this with jq, because it's something I have been trying to figure out for years.

Assuming your jq has inputs (which is true of jq 1.5 and later), it would be most efficient to use it, e.g. along the lines of:
jq -n '[inputs.records[].a]' f*.json

Use -s (or --slurp):
jq -s 'map(.records[].a)' f?.json

You need to use --slurp so that jq will apply its filter on the aggregation of all inputs rather than on each input. When using this option, jq's input will be an array of the inputs which you need to account for.
I would use the following :
jq --slurp 'map(.records | map(.a)) | add' f?.json
We apply your current transformation to each elements of the slurped array of inputs (your previous individual inputs), then we merge those transformed arrays into one with add.

If your input files are large, slurping the file could eat up lots of memory in which you case you can reduce which works in iterative manner, appending the contents of the array .a one object at a time
jq -n 'reduce inputs.records[].a as $d (.; . += [$d])' f?.json
The -n flag is to ensure to construct the output JSON from scratch with the data available from inputs. The reduce function takes the initial value of . which because of the null input would be just null. Then for each of the input objects . += [$d] ensures that the array contents of .a are appended together.

As a compromise between the readability of --slurp and the efficiency of reduce, you can run jq twice. The first is a slightly altered version of your original command, the second slurps the undifferentiated output into a single array.
$ jq '.records[] | .a' f?.json | jq -s
[
1,
3,
2
]

--slurp (-s) key is needed and map() to do so in one shot
$ cat f1.json
{
"records": [
{"a": 1},
{"a": 3}
]
}
$ cat f2.json
{
"records": [
{"a": 2}
]
}
$ jq -s 'map(.records[].a)' f?.json
[
1,
3,
2
]

Related

Merge 2 files which have Json objects using Jq

I have a requirement where in 2 parameter files needs to be merged to one using Jq
param1.json
[
"name=xyz",
"age=40",
"email=qqqq"
]
param2.json
[
"name=xyz",
"age=42",
"drivingLicense=2761"
]
I need a resultant value to be
[
"name=xyz",
"age=42",
"email=qqqq",
"drivingLicense=2761"
]
When I try to use Jq add jq -s '.[0] + .[1]' param1.json param2.json the resultant
[
"name=xyz",
"age=40",
"email=qqqq",
"name=xyz",
"age=42",
"drivingLicense=2761"
]
I tried using jq '. * input' param1.json param2.json but that is not working either
What is the best way to merge them
TIA
This approach makes use of the circumstance that object field names are unique. On collision, latter items overwrite former ones.
jq -s '[add | with_entries(.key = (.value | .[:index("=")]))[]]'
[
"name=xyz",
"age=42",
"email=qqqq",
"drivingLicense=2761"
]
Demo
Note: Instead of add you can, of course, still use .[0] + .[1] or . + input (the latter without -s).
You can first convert your arrays into objects, then add those objects together; then convert to an array again:
$ jq -s 'map(map(./"="|{(first):.[1:]|join("=")})|add)|add|to_entries|map(join("="))' param1.json param2.json
[
"name=xyz",
"age=42",
"email=qqqq",
"drivingLicense=2761"
]
If your values cannot contain an equal sign, then {(first):.[1:]|join("=")} can be simplified to {(first):last}.
Or merging the arrays to one big array before converting to objects:
add
|map(./"="|{(first):.[1:]|join("=")})
|add
|to_entries
|map(join("="))
Levaraging the fact that this can be reformulated as a grouping problem, you can group by the "key" of your string, then select the last item in each group (A reusable function to build group objects can help but is not required).
$ jq -s 'add | map(./"=") | group_by(first) | map(last|join("="))' param1.json param2.json
[
"age=42",
"drivingLicense=2761",
"email=qqqq",
"name=xyz"
]

Using JQ to merge two JSON snippets from one file

I've got output from a script that outputs two structurally identical JSON snippets into one file:
{
"Objects": [
{
"Key": "somevalue",
"VersionId": "someversion"
}
],
"Quiet": false
}
{
"Objects": [
{
"Key": "someothervalue",
"VersionId": "someotherversion"
}
],
"Quiet": false
}
I would like to pass this output through JQ to have one Objects[] list, concatenating all of the objects within the two lists, and outputting the same overall structure. I can accomplish it with piping between two separate JQ commands:
jq '.Objects[]' inputfile | jq -s '{"Objects":., "Quiet":false}' -
But I'm wondering if there is a more elegant way to do so using only one invocation of JQ.
I'm currently using JQ version 1.5 but can update if needed.
You don't need to invoke JQ twice there. The second object can be fetched using the input keyword.
.Objects += input.Objects
Online demo
You can use reduce:
jq -s 'reduce .[] as $item ({ Quiet: false }; .Objects += $item.Objects)'
See it in action.
As #oguz-ismail suggested in a comment, the -s (slurp) flag can be removed by using inputs to get the rest of the entries after the first one:
jq 'reduce inputs as $item (.; .Objects += $item.Objects)'
See it in action.
Both versions work with any number of entries in the input (the second version requires at least one).

jq streaming - filter nested list and retain global structure

In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document.
My example input it this (but the real one is large enough to demand streaming).
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
{
"keep": "false",
"extra": "non-keeper"
}
]
}
The required output just has one element of the 'filter_this' block removed:
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this":
[
{"keep" : "true"},
{
"keep": "true",
"extra": "keeper"
} ,
]
}
The standard way to handle such cases appears to be using 'truncate_stream' to reconstitute streamed objects, before filtering those in the usual jq way. Specifically, the command:
jq -nc --stream 'fromstream(1|truncate_stream(inputs))'
gives access to a stream of objects:
{"keep_this":["this","list"]}
[{"keep":"true"},{"keep":"true","extra":"keeper"},
{"keep":"false","extra":"non-keeper"}]
at which point it is easy to filter for the required objects. However, this strips the results from the context of their parent object, which is not what I want.
Looking at the streaming structure:
[["keep_untouched","keep_this",0],"this"]
[["keep_untouched","keep_this",1],"list"]
[["keep_untouched","keep_this",1]]
[["keep_untouched","keep_this"]]
[["filter_this",0,"keep"],"true"]
[["filter_this",0,"keep"]]
[["filter_this",1,"keep"],"true"]
[["filter_this",1,"extra"],"keeper"]
[["filter_this",1,"extra"]]
[["filter_this",2,"keep"],"false"]
[["filter_this",2,"extra"],"non-keeper"]
[["filter_this",2,"extra"]]
[["filter_this",2]]
[["filter_this"]]
it seems I need to select all the 'filter_this' rows, truncate those rows only (using 'truncate_stream'), rebuild these rows as objects (using 'from_stream'), filter them, and turn the objects back into the stream data format (using 'tostream') to join the stream of 'keep untouched' rows, which are still in the streaming format. At that point it would be possible to re-build the whole json. If that is the right approach - which seems overly converluted to me - how do I do that? Or is there a better way?
If your input file consists of a single very large JSON entity that is too big for the regular jq parser to handle in your environment, then there is the distinct possibility that you won't have enough memory to reconstitute the JSON document.
With that caveat, the following may be worth a try. The key insight is that reconstruction can be accomplished using reduce.
The following uses a bunch of temporary files for the sake of clarity:
TMP=/tmp/$$
jq -c --stream 'select(length==2)' input.json > $TMP.streamed
jq -c 'select(.[0][0] != "filter_this")' $TMP.streamed > $TMP.1
jq -c 'select(.[0][0] == "filter_this")' $TMP.streamed |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
| .filter_this |= map(select(.keep=="true"))
| tostream
| select(length==2)' > $TMP.2
# Reconstruction
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' $TMP.1 $TMP.2
Output
{
"keep_untouched": {
"keep_this": [
"this",
"list"
]
},
"filter_this": [
{
"keep": "true"
},
{
"keep": "true",
"extra": "keeper"
}
]
}
Many thanks to #peak. I found his approach really useful, but unrealistic in terms of performance. Stealing some of #peak's ideas, though, I came up with the following:
Extract the 'parent' object:
jq -c --stream 'select(length==2)' input.json |
jq -c 'select(.[0][0] != "filter_this")' |
jq -n 'reduce inputs as [$p,$x] (null; setpath($p;$x))' > $TMP.parent
Extract the 'keepers' - though this means reading the file twice (:-<):
jq -nc --stream '[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' input.json > $TMP.keepers
Insert the filtered list into the parent object.
jq -nc -s 'inputs as $items
| $items[0] as $parent
| $parent
| .filter_this |= $items[1]
' $TMP.parent $TMP.keepers > result.json
Here is a simplified version of #PeteC's script. It requires one fewer invocations of jq.
In both cases, please note that the invocation of jq that uses "2|truncate_stream(_)" requires a more recent version of jq than 1.5.
TMP=/tmp/$$
INPUT=input.json
# Extract all but .filter_this
< $INPUT jq -c --stream 'select(length==2 and .[0][0] != "filter_this")' |
jq -nc 'reduce inputs as [$p,$x] (null; setpath($p;$x))
' > $TMP.parent
# Need jq > 1.5
# Extract the 'keepers'
< $INPUT jq -n -c --stream '
[fromstream(2|truncate_stream(inputs))
| select(type == "object" and .keep == "true")]
' $INPUT > $TMP.keepers
# Insert the filtered list into the parent object:
jq -s '. as $in | .[0] | (.filter_this |= $in[1])
' $TMP.parent $TMP.keepers > result.json

convert output of jq from list of lists to delimited string

I have a json file which I'm outputing two numbers (lat/lon) and the output now is
[
2.294891,
48.875284
]
[
-2.14908,
53.281214
]
[
1.963667,
48.768891
]
[
-3.739434,
40.390413
]
what I want is the numbers to become strings and be concatenated like
2.294891,48.875284
-2.14908,53.281214
...
but I don't know how to do it with jq.
Update:
I could convert the output to
[2.294891,48.875284]
[-2.14908,53.281214]
[1.963667,48.768891]
with -c argument and use tr -d [ | tr -d ] in the pipe to remove the brackets but I'm sure there is a more elegant way of doing it.
Easy!
$ jq -r #csv input.json
2.294891,48.875284
-2.14908,53.281214
1.963667,48.768891
-3.739434,40.390413
Beware, though, that the precision may differ (or in general, be lost).

Filter only specific keys from an external file in jq

I have a JSON file with the following format:
[
{
"id": "00001",
"attr": {
"a": "foo",
"b": "bar",
...
}
},
{
"id": "00002",
"attr": {
...
},
...
},
...
]
and a text file with a list of ids, one per line. I'd like to use jq to filter only the records whose ids are mentioned in the text file. I.e. if the list contains "00001", only the first one should be printed.
Note, that I can't simply grep since each record may have an arbitrary number of attributes and sub-attributes.
There are basically two ways to proceed:
read the file of ids from STDIN
read the JSON from STDIN
Both are feasible, but here we illustrate (2) as it leads to a simple but efficient solution.
Suppose the JSON file is named in.json and the list of ids is in a file named ids.txt like so:
00001
00010
Notice that this file has no quotation marks. If it does, then the following can be significantly simplified as shown in the postscript.
The trick is to convert ids.txt into a JSON array. With the above assumption about quotation marks, this can be done by:
jq -R . ids.txt | jq -s .
Assuming a reasonable shell, a simple solution is now at hand:
jq --argjson ids "$(jq -R . ids.txt | jq -s .)" '
map( select( .id as $id | $ids | index($id) ))' in.json
Faster
Assuming your jq has any/2, then a simpler and more efficient solution can be obtaining by defining:
def isin($a): . as $in | any($a[]; $in == .);
The required jq filter is then just:
map( select( .id | isin($ids) ) )
If these two lines of jq are put into a file named select.jq, the required incantation is simply:
jq --argjson ids "$(jq -R . ids.txt | jq -s)" -f select.jq in.json
Postscript
If the index file consists of a stream of valid JSON texts (e.g., strings with quotation marks) and if your jq supports the --slurpfile option, the invocation can be further simplified to:
jq --slurpfile ids ids.txt -f select.jq in.json
Or if you want everything as a one-liner:
jq --slurpfile ids ids.txt 'map(select(.id as $id|any($ids[];$id==.)))' in.json