How to merge arrays from two files into one array with jq? - json

I would like to merge two files containing JSON. They each contain an array of JSON objects.
registration.json
[
{ "name": "User1", "registration": "2009-04-18T21:55:40Z" },
{ "name": "User2", "registration": "2010-11-17T15:09:43Z" }
]
useredits.json
[
{ "name": "User1", "editcount": 164 },
{ "name": "User2", "editcount": 150 },
{ "name": "User3", "editcount": 10 }
]
In the ideal scenario, I would like to have the following as a result of the merge operation:
[
{ "name": "User1", "editcount": 164, "registration": "2009-04-18T21:55:40Z" },
{ "name": "User2", "editcount": 150, "registration": "2010-11-17T15:09:43Z" }
]
I have found https://github.com/stedolan/jq/issues/1247#issuecomment-348817802 but I get
jq: error: module not found: jq

jq solution:
jq -s '[ .[0] + .[1] | group_by(.name)[]
| select(length > 1) | add ]' registration.json useredits.json
The output:
[
{
"name": "User1",
"registration": "2009-04-18T21:55:40Z",
"editcount": 164
},
{
"name": "User2",
"registration": "2010-11-17T15:09:43Z",
"editcount": 150
}
]

Although not strictly answering the question, the command below
jq -s 'flatten | group_by(.name) | map(reduce .[] as $x ({}; . * $x))'
registration.json useredits.json
generates this output:
[
{ "name": "User1", "editcount": 164, "registration": "2009-04-18T21:55:40Z" },
{ "name": "User2", "editcount": 150, "registration": "2010-11-17T15:09:43Z" },
{ "name": "User3", "editcount": 10 }
]
Source:
jq - error when merging two JSON files "cannot be multiplied"

The following assumes you have jq 1.5 or later, and that:
joins.jq as shown below is in the directory ~/.jq/ or the directory ~/.jq/joins/
there is no file named joins.jq in the pwd
registration.json has been fixed to make it valid JSON (btw, this can be done by jq itself).
The invocation to use would then be:
jq -s 'include "joins"; joins(.name)' registration.json useredits.json
joins.jq
# joins.jq Version 1 (12-12-2017)
def distinct(s):
reduce s as $x ({}; .[$x | (type[0:1] + tostring)] = $x)
|.[];
# Relational Join
# joins/6 provides similar functionality to the SQL INNER JOIN statement:
# SELECT (Table1|p1), (Table2|p2)
# FROM Table1
# INNER JOIN Table2 ON (Table1|filter1) = (Table2|filter2)
# where filter1, filter2, p1 and p2 are filters.
# joins(s1; s2; filter1; filter2; p1; p2)
# s1 and s2 are streams of objects corresponding to rows in Table1 and Table2;
# filter1 and filter2 determine the join criteria;
# p1 and p2 are filters determining the final results.
# Input: ignored
# Output: a stream of distinct pairs [p1, p2]
# Note: items in s1 for which filter1 == null are ignored, otherwise all rows are considered.
#
def joins(s1; s2; filter1; filter2; p1; p2):
def it: type[0:1] + tostring;
def ix(s;f):
reduce s as $x ({}; ($x|f) as $y | if $y == null then . else .[$y|it] += [$x] end);
# combine two dictionaries using the cartesian product of distinct elements
def merge:
.[0] as $d1 | .[1] as $d2
| ($d1|keys_unsorted[]) as $k
| if $d2[$k] then distinct($d1[$k][]|p1) as $a | distinct($d2[$k][]|p2) as $b | [$a,$b]
else empty end;
[ix(s1; filter1), ix(s2; filter2)] | merge;
def joins(s1; s2; filter1; filter2):
joins(s1; s2; filter1; filter2; .; .) | add ;
# Input: an array of two arrays of objects
# Output: a stream of the joined objects
def joins(filter1; filter2):
joins(.[0][]; .[1][]; filter1; filter2);
# Input: an array of arrays of objects.
# Output: a stream of the joined objects where f defines the join criterion.
def joins(f):
# j/0 is defined so TCO is applicable
def j:
if length < 2 then .[][]
else [[ joins(.[0][]; .[1][]; f; f)]] + .[2:] | j
end;
j ;

Related

Read complex JSON to extract key values

I have a JSON and I'm trying to read part of it to extract keys and values.
Assuming response is my JSON data, here is my code:
data_dump = json.dumps(response)
data = json.loads(data_dump)
Here my data object becomes a list and I'm trying to get the keys as below
id = [key for key in data.keys()]
This fails with the error:
A list object does not have an attribute keys**. How can I get over this to get my below output?
Here is my JSON:
{
"1": {
"task": [
"wakeup",
"getready"
]
},
"2": {
"task": [
"brush",
"shower"
]
},
"3": {
"task": [
"brush",
"shower"
]
},
"activites": ["standup", "play", "sitdown"],
"statuscheck": {
"time": 60,
"color": 1002,
"change(me)": 9898
},
"action": ["1", "2", "3", "4"]
}
The output I need is as below. I do not need data from the rest of JSON.
id
task
1
wakeup, getready
2
brush , shower
If you know that the keys you need are "1" and "2", you could try reading the JSON string as a dataframe, unpivoting it, exploding and grouping:
from pyspark.sql import functions as F
df = (spark.read.json(sc.parallelize([data_dump]))
.selectExpr("stack(2, '1', `1`, '2', `2`) (id, task)")
.withColumn('task', F.explode('task.task'))
.groupBy('id').agg(F.collect_list('task').alias('task'))
)
df.show()
# +---+------------------+
# | id| task|
# +---+------------------+
# | 1|[wakeup, getready]|
# | 2| [brush, shower]|
# +---+------------------+
However, it may be easier to deal with it in Python:
data = json.loads(data_dump)
data2 = [(k, v['task']) for k, v in data.items() if k in ['1', '2']]
df = spark.createDataFrame(data2, ['id', 'task'])
df.show()
# +---+------------------+
# | id| task|
# +---+------------------+
# | 1|[wakeup, getready]|
# | 2| [brush, shower]|
# +---+------------------+

jq: change multiple values within a select object

I have an array of JSON objects, and I am trying change the name and version on the object of a given #type, with the following input
[
{
"name": "oldname",
"version": "oldversion",
"#type": "Project"
},
{
"name": "bomname",
"version": "bomversion",
"#type": "BOM"
},
{
"name": "componentname",
"version": "componentversion",
"#type": "Component"
}
]
I found many examples for changing a single value, and I can successfully do this by chaining multiple select statements together.
$ cat original.json | jq '[ .[] | (select(.["#type"] == "Project") | .name ) = "newname" | (select(.["#type"] == "Project") | .version ) = "newversion ] ' > renamed.json
But I was hoping I could condense this so I only have perform the select once to change both values.
Using your approach:
[ .[]
| if .["#type"] == "Project"
then .name = "newname" | .version = "newversion"
else . end ]
or if you want to use select, you could write:
map( (select(.["#type"] == "Project") | .name = "newname" | .version = "newversion" ) // .)
or more exotically:
(.[] | select(["#type"] == "Project"))
|= (.name = "newname" | .version = "newversion" )
Merge an object with the new values.
map(select(.["#type"] == "Project") * {name: "newname", version: "newversion"} // .)

How to create key:value list from JSON? Key name should contain some values from object itself

I'm trying to parse JSON and store certain values as metrics in Graphite.
In order to make my Graphite more user-friendly I have to form a metric name, that contains some values from its object.
I got working solution on bash loops + jq, but it's really slow. So I'm asking for help :)
Here is my input:
{
...
},
"Johnny Cage": {
"firstname": "Johnny",
"lastname": "Cage",
"height": 183,
"weight": 82,
"hands": 2,
"legs": 2,
...
},
...
}
Desired output:
mk.fighter.Johnny.Cage.firstname Johnny
mk.fighter.Johnny.Cage.lastname Cage
mk.fighter.Johnny.Cage.height 183
mk.fighter.Johnny.Cage.weight 82
mk.fighter.Johnny.Cage.hands 2
mk.fighter.Johnny.Cage.legs 2
...
With single jq command:
Sample input.json:
{
"Johnny Cage": {
"firstname": "Johnny",
"lastname": "Cage",
"height": 183,
"weight": 82,
"hands": 2,
"legs": 2
}
}
jq -r 'to_entries[] | (.key | sub(" "; ".")) as $name
| .value | to_entries[]
| "mk.fighter.\($name).\(.key) \(.value)"' input.json
To get $name as a combination of inner firstname and lastname keys replace (.key | sub(" "; ".")) as $name with "\(.value.firstname).\(.value.lastname)" as $name
The output:
mk.fighter.Johnny.Cage.firstname Johnny
mk.fighter.Johnny.Cage.lastname Cage
mk.fighter.Johnny.Cage.height 183
mk.fighter.Johnny.Cage.weight 82
mk.fighter.Johnny.Cage.hands 2
mk.fighter.Johnny.Cage.legs 2

Compare two json files using jq or any other tools in bash

I want to compare two json files to see if one can be extracted from the other one.
P1 (p1.json)
{
"id": 12,
"keys": ["key1","key2"],
"body": {
"height": "180cm",
"wight": "70kg"
},
"name": "Alex"
}
P2 (p2.json)
{
"id": 12,
"keys": ["key2","key1"],
"body": {
"height": "180cm"
}
}
As it can be seen P2 is not completely equal to P1 but it can be extracted from P1 (It provides less data about the same person but the data is correct).
Expected behavior:
p1 extends p2 --> true
p2 extends p1 --> false
Notes
- An array cannot be extracted from the same array with some additional elements
The following definition of extends/1 uses a purely object-based definition of extension (in particular, it does not sort arrays). The OP requirements regarding arrays are unclear to me, but a variant definition is offered in the following section.
# Usage: $in | extends($b) iff $in contains $b in an object-based sense
def extends($b):
# Handle the case that both are objects:
def objextends($x):
. as $in | all($x|keys[]; . as $k | $in[$k] | extends($x[$k]));
# Handle the case that both are arrays:
def arrayextends($x):
. as $in
| length == ($x|length) and
all( range(0;length); . as $i | $in[$i] | extends($x[$i]));
if . == $b then true
else . as $in
| type as $intype
| ($intype == ($b|type)) and
(($intype == "object" and objextends($b)) or
($intype == "array" and arrayextends($b)))
end;
Examples:
{a:{a:1,b:2}, b:2} | extends({a:{a:1}}) # true
{a:{a:1,b:2}, b:2} | extends({a:{a:2}}) # false
{a:{a:1,b:2}, b:[{x:1,y:2}]} | extends({a:{a:2}, b:[{x:1}]}) # true
Alternative definition
The following definition sorts arrays and is sufficiently generous to handle the given example:
# Usage: $in | extends2($b) iff $in contains $b in a way which ignores the order of array elements
def extends2($b):
# Both are objects
def objextends($x):
. as $in | all($x|keys[]; . as $k | $in[$k] | extends($x[$k]));
def arrayextends($x): ($x|sort) - sort == [];
if . == $b then true
else . as $in
| type as $intype
| ($intype == ($b|type)) and
(($intype == "object" and objextends($b)) or
($intype == "array" and arrayextends($b)))
end;
With $P1 and $P2 as shown:
$P1 | extends2($P2) # yields true
If you know there are no duplicates in any subarrays then you could use this approach which computes the difference between sets of [path,value] pairs returned from tostream replacing array indices with null:
def details:[
tostream
| select(length==2) as [$p,$v]
| [$p|map(if type=="number" then null else . end),$v]
];
def extends(a;b): (b|details) - (a|details) == [];
If P1 and P2 are functions returning the sample data
def P1: {
"id": 12,
"keys": ["key1","key2"],
"body": {
"height": "180cm",
"wight": "70kg"
},
"name": "Alex"
}
;
def P2: {
"id": 12,
"keys": ["key2","key1"],
"body": {
"height": "180cm"
}
}
;
then
extends(P1;P2) # returns true
, extends(P2;P1) # returns false
In the presence of duplicates the result is less clear. e.g.
extends(["a","b","b"];["a","a","b"]) # returns true
Try it online!

How to generate continuing indices for multiple objects in nested arrays that are in an array

Given
[{
"objects": [{
"key": "value"
},{
"key": "value"
}]
}, {
"objects": [{
"key": "value"
}, {
"key": "value"
}]
}]
How do I generate
[{
"objects": [{
"id": 0,
"key": "value"
},{
"id": 1,
"key": "value"
}]
}, {
"objects": [{
"id": 2,
"key": "value"
}, {
"id": 3,
"key": "value"
}]
}]
Using jq?
I tried to use this one, but ids are all 0:
jq '[(-1) as $i | .[] | {objects: [.objects[] | {id: ($i + 1 as $i | $i), key}]}]'
The key to a simple solution here is to break the problem down into easy pieces. This can be accomplished by defining a helper function, addId/1. Once that is done, the rest is straightforward:
# starting at start, add {id: ID} to each object in the input array
def addId(start):
reduce .[] as $o
([];
length as $l
| .[length] = ($o | (.id = start + $l)));
reduce .[] as $o
( {start: -1, answer: []};
(.start + 1) as $next
| .answer += [$o | (.objects |= addId($next))]
| .start += ($o.objects | length) )
| .answer
Inspired by #peak answer, I came up with this solution. Not much difference, just shorter way to generate IDs and opt for foreach instead of reduce since there is intermediate result involved.
def addIdsStartWith($start):
[to_entries | map((.value.id = .key + $start) | .value)];
[foreach .[] as $set (
{start: 0};
.set = $set |
.start as $start | .set.objects |= addIdsStartWith($start) |
.start += ($set.objects | length);
.set
)]