In practice, keys have to be unique within a JSON object (e.g. Does JSON syntax allow duplicate keys in an object?). However, suppose I have a file with the following contents:
{
"a" : "1",
"b" : "2",
"a" : "3"
}
Is there a simple way of converting the repeated keys to an array? So that the file becomes:
{
"a" : [ {"key": "1"}, {"key": "3"}],
"b" : "2"
}
Or something similar, but which combines the repeated keys into an array (or finds and alternative way to extract the repeated key values).
Here's a solution in Java: Convert JSON object with duplicate keys to JSON array
Is there any way to do it with awk/bash/python?
If your input is really a flat JSON object with primitives as values, this should work:
jq -s --stream 'group_by(.[0]) | map({"key": .[0][0][0], "value": map(.[1])}) | from_entries'
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
For more complex outputs, that would require actually understanding how --stream is supposed to be used, which is beyond me.
Building on Santiago's answer using -s --stream, the following filter builds up the object one step at a time, thus preserving the order of the keys and of the values for a specific key:
reduce (.[] | select(length==2)) as $kv ({};
$kv[0][0] as $k
|$kv[1] as $v
| (.[$k]|type) as $t
| if $t == "null" then .[$k] = $v
elif $t == "array" then .[$k] += [$v]
else .[$k] = [ .[$k], $v ]
end)
For the given input, the result is:
{
"a": [
"1",
"3"
],
"b": "2"
}
To illustrate that the ordering of values for each key is preserved, consider the following input:
{
"c" : "C",
"a" : "1",
"b" : "2",
"a" : "3",
"b" : "1"
}
The output produced by the filter above is:
{
"c": "C",
"a": [
"1",
"3"
],
"b": [
"2",
"1"
]
}
Building up on peak's answer, the following filter also works on multi object-input, with nested objects and without the slurp-Option (-s).
This is not an answer to the initial question, but because the jq-FAQ links here it might be useful for some visitors
File jqmergekeys.txt
def consumestream($arr): # Reads stream elements from stdin until we have enough elements to build one object and returns them as array
input as $inp
| if $inp|has(1) then consumestream($arr+[$inp]) # input=keyvalue pair => Add to array and consume more
elif ($inp[0]|has(1)) then consumestream($arr) # input=closing subkey => Skip and consume more
else $arr end; # input=closing root object => return array
def convert2obj($stream): # Converts an object in stream notation into an object, and merges the values of duplicate keys into arrays
reduce ($stream[]) as $kv ({}; # This function is based on http://stackoverflow.com/a/36974355/2606757
$kv[0] as $k
| $kv[1] as $v
| (getpath($k)|type) as $t # type of existing value under the given key
| if $t == "null" then setpath($k;$v) # value not existing => set value
elif $t == "array" then setpath($k; getpath($k) + [$v] ) # value is already an array => add value to array
else setpath($k; [getpath($k), $v ]) # single value => put existing and new value into an array
end);
def mainloop(f): (convert2obj(consumestream([input]))|f),mainloop(f); # Consumes streams forever, converts them into an object and applies the user provided filter
def mergeduplicates(f): try mainloop(f) catch if .=="break" then empty else error end; # Catches the "break" thrown by jq if there's no more input
#---------------- User code below --------------------------
mergeduplicates(.) # merge duplicate keys in input, without any additional filters
#mergeduplicates(select(.layers)|.layers.frame) # merge duplicate keys in input and apply some filter afterwards
Example:
tshark -T ek | jq -nc --stream -f ./jqmergekeys.txt
Here's a simple alternative that generalizes well:
reshape.jq
def augmentpath($path; $value):
getpath($path) as $v
| setpath($path; $v + [$value]);
reduce (inputs | select(length==2)) as $pv
({}; augmentpath($pv[0]; $pv[1]) )
Usage
jq -n -f reshape.jq input.json
Output
With the given input:
{
"a": [
"1",
"3"
],
"b": [
"2"
]
}
Postscript
If it's important to avoid arrays of singletons, either the def of augmentpath could be modified, or a postprocessing step could be added.
Related
Let's say I have the following JSON
[
{
name : "A",
value : "1"
},
{
name : "B",
value : "5"
},
{
name : "E",
value : "8"
}
]
and I simply want to to be like
{
name : "A",
value : "1"
},
{
name : "B",
value : "5"
},
{
name : "E",
value : "8"
}
I used jq normal filter so jq'.[]', however I get a list of objects separated by a return as such:
{
name : "A",
value : "1"
}
{
name : "B",
value : "5"
}
{
name : "E",
value : "8"
}
Notice that the commas between the objects have magically vanished. Using reduce would work only if the object is indexed by the name let's say, I used the following:
jq 'reduce .[] as $i ({}; .[$i.name] = $i)'
Anybody did run into a similar situation?
Neither the input as shown nor the desired output is valid as JSON or as a JSON stream, so the question seems questionable, and the following responses are accordingly offered with the caveat that they probably should be avoided.
It should also be noted that, except for the sed-only approach, the solutions offered here produce comma-separated-JSON, which may not be what is desired.
They assume that the quasi-JSON input is in a file qjson.txt.
sed-only
< qjson.txt sed -e '1d;$d; s/^ //'
hjson, jq, and sed
< qjson.txt hjson -j | jq -r '.[] | (.,",")' | sed '$d'
hjson and jq
< qjson.txt hjson -j | jq -r '
foreach .[] as $x (-1; .+1;
if . == 0 then $x else ",", $x end)'
consider a file 'b.json':
[
{
"id": 3,
"foo": "cannot be replaced, id isn't in a.json, stay untouched",
"baz": "do not touch3"
},
{
"id": 2,
"foo": "should be replaced with 'foo new2'",
"baz": "do not touch2"
}
]
and 'a.json':
[
{
"id": 2,
"foo": "foo new2",
"baz": "don't care"
}
]
I want to update the key "foo" in b.json using jq with the matching value from a.json. It should also work with more than one entry in a.json.
Thus the desired output is:
[
{
"id": 3,
"foo": "cannot be replaced, id isn't in a.json, stay untouched",
"baz": "do not touch3"
},
{
"id": 2,
"foo": "foo new2",
"baz": "do not touch2"
}
]
Here's one of several possibilities that use INDEX/2. If your jq does not have this as a built-in, see below.
jq --argfile a a.json '
INDEX($a[]; .id) as $dict
| map( (.id|tostring) as $id
| if ($dict|has($id)) then .foo = $dict[$id].foo
else . end)' b.json
There are other ways to pass in the contents of a.json and b.json.
Caveat
The above use of INDEX assumes there are no "collisions", which would happen if, for example, one of the objects has .id equal to 1 and another has .id equal to "1". If there is a possibility of such a collision, then a more complex definition of INDEX could be used.
INDEX/2
Straight from builtin.jq:
def INDEX(stream; idx_expr):
reduce stream as $row ({}; .[$row|idx_expr|tostring] = $row);
Here's a generic answer that makes no assumptions about the values of the .id keys except that they are distinct JSON values.
Generalization of INDEX/2
def type2: [type, if type == "string" then . else tojson end];
def dictionary(stream; f):
reduce stream as $s ({}; setpath($s|f|type2; $s));
def lookup(value):
getpath(value|type2);
def indictionary(value):
(value|type2) as $t
| has($t[0]) and (.[$t[0]] | has($t[1]));
Invocation
jq --argfile a a.json -f program.jq b.json
main
dictionary($a[]; .id) as $dict
| b
| map( .id as $id
| if ($dict|indictionary($id))
then .foo = ($dict|lookup($id).foo)
else . end)
I have a file which has multiple individual JSON arrays, which I want to combine (and remove empty arrays) into a single JSON array
Input
[]
[]
[
[
[
"asdfsdfsdf",
"CCsdfnceR1",
"running",
"us-east-1a",
"34.6X.7X.2X",
"10.75.170.118"
]
]
]
[]
[]
[
[
[
"tyutyut",
"CENTOS-BASE",
"stopped",
"us-west-2b",
null,
"10.87.159.249"
]
],
[
[
"tyutyut",
"dfgdfg-TEST",
"stopped",
"us-west-2b",
"54.2X.8.X8",
"10.87.159.247"
]
]
]
Required output
[
[
"asdfsdfsdf",
"CCsdfnceR1",
"running",
"us-east-1a",
"34.6X.7X.2X",
"10.75.170.118"
],
[
"tyutyut",
"CENTOS-BASE",
"stopped",
"us-west-2b",
null,
"10.87.159.249"
],
[
"tyutyut",
"dfgdfg-TEST",
"stopped",
"us-west-2b",
"54.2X.8.X8",
"10.87.159.247"
]
]
I have a file which has multiple individual JSON arrays, which I want to combine (and remove empty arrays) into a single JSON array
Thanks in advance
This selects only non-empty arrays none of whose elements is an array, and puts them into an array:
jq -n '[ inputs | .. | select(type=="array" and .!=[] and all(.[]; type!="array")) ]' file
The exact requirements aren't too clear to me but using the following def produces the expected result and might be of interest as it is recursive:
def peel:
if type == "array"
then if length == 0 then empty
elif length == 1 and (.[0] | type) == "array" then .[0] | peel
elif all(.[]; type=="array") then .[] | peel
else [.[] | peel]
end
else .
end;
With this def, and the following "main" program:
[inputs | peel]
an invocation of jq using the -n option produces the expected result.
I'm looking for efficient means to search through an large JSON object for "sub-objects" that match a filter (via select(), I imagine). However, the top-level JSON is an object with arbitrary nesting contained within, including more simple values, objects and arrays of objects. For example:
{
"name": "foo",
"class": "system",
"description": "top-level-thing",
"configuration": {
"status": "normal",
"uuid": "id"
},
"children": [
{
"id": "c1",
"class": "c1",
"children": [
{
"id": "c1.1",
"class": "c1.1"
},
{
"id": "c1.1",
"class": "FINDME"
}
]
},
{
"id": "c2",
"class": "FINDME"
}
],
"thing": {
"id": "c3",
"class": "FINDME"
}
}
I have a solution which does part of what I want (and is understandable):
jq -r '.. | arrays | .[] | select(.class=="FINDME"?) | .id'
which returns:
c2
c1.1
... however, it misses c3, plus it changes the order of items output. Additionally I'm expecting this to operate on potentially very large JSON structures, I would like to make sure I find an efficient solution. Bonus points for something that remains readable by jq neophytes (myself included).
FWIW, references I was using to help me on the way, in case they help others:
Select objects based on value of variable in object using jq
How to use jq to find all paths to a certain key
Recursive search values by key
For small to modest-sized JSON input, you're on the right track with ..
but it seems you want to select objects, like so:
.. | objects | select(.class=="FINDME"?) | .id
For JSON documents that are very large, this might require too much memory, so it may be worth knowing about jq's streaming parser. Unfortunately it's much more difficult to use, so I'd suggest trying the above, and if you're interested, look in the usual places for documentation about the --stream option.
Here's a streaming-parser solution. To make sense of it, you'll need to read up on the --stream option, but the key is that the output includes lines of the form: [PATH, VALUE]
program.jq
foreach inputs as $in (null;
if has("id") and has("class") then null
else . as $x
| $in
| if length != 2 then null
elif .[0][-1] == "id" then ($x + {id: .[-1]})
elif .[0][-1] == "class"
and .[-1] == "FINDME" then ($x + {class: .[-1]})
else $x
end
end;
select(has("id") and has("class")) | .id )
Invocation
jq -n --stream -f program.jq input.json
Output with sample input
"c1.1"
"c2"
"c3"
I'm unsure if "transpose" is the correct term here, but I'm looking to use jq to transpose a 2-dimensional object such as this:
[
{
"name": "A",
"keys": ["k1", "k2", "k3"]
},
{
"name": "B",
"keys": ["k2", "k3", "k4"]
}
]
I'd like to transform it to:
{
"k1": ["A"],
"k2": ["A", "B"],
"k3": ["A", "B"],
"k4": ["A"],
}
I can split out the object with .[] | {key: .keys[], name} to get a list of keys and names, or I could use .[] | {(.keys[]): [.name]} to get a collection of key–value pairs {"k1": ["A"]} and so on, but I'm unsure of the final concatenation step for either approach.
Are either of these approaches heading in the right direction? Is there a better way?
This should work:
map({ name, key: .keys[] })
| group_by(.key)
| map({ key: .[0].key, value: map(.name) })
| from_entries
The basic approach is to convert each object to name/key pairs, regroup them by key, then map them out to entries of an object.
This produces the following output:
{
"k1": [ "A" ],
"k2": [ "A", "B" ],
"k3": [ "A", "B" ],
"k4": [ "B" ]
}
Here is a simple solution that may also be easier to understand. It is based on the idea that a dictionary (a JSON object) can be extended by adding details about additional (key -> value) pairs:
# input: a dictionary to be extended by key -> value
# for each key in keys
def extend_dictionary(keys; value):
reduce keys[] as $key (.; .[$key] += [value]);
reduce .[] as $o ({}; extend_dictionary($o.keys; $o.name) )
$ jq -c -f transpose-object.jq input.json
{"k1":["A"],"k2":["A","B"],"k3":["A","B"],"k4":["B"]}
Here is a better solution for the case that all the values of "name"
are distinct. It is better because it uses a completely generic
filter, invertMapping; that is, invertMapping could be a built-in or
library function. With the help of this function, the solution
becomes a simple three-liner.
Furthermore, if the values of "name" are not all unique, then the solution
below can easily be tweaked by modifying the initial reduction of the input
(i.e. the line immediately above the invocation of invertMapping).
# input: a JSON object of (key, values) pairs, in which "values" is an array of strings;
# output: a JSON object representing the inverse relation
def invertMapping:
reduce to_entries[] as $pair
({}; reduce $pair.value[] as $v (.; .[$v] += [$pair.key] ));
map( { (.name) : .keys} )
| add
| invertMapping