jq - create new object based on existing objects - json

This is my very first post, so please forgive me if it is not perfect.
I have json like below. It consists of two parts. The first part contains information about user (id, fullname and email) and the second part contains informations to which team user belongs. (id, team, role)
What I want to get an object which contains: id, fullName, emaila, team and role. I can do it but only when user belongs to one team. If user belongs to more than one team - I can't handle it.
Below my json:
[
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com"
},
{
"id": "user2",
"fullName": "User Two",
"email": "user.two#my.mail.com"
},
{
"id": "user1",
"team": "Team_A",
"role": "TEAM_MEMBER"
},
{
"id": "user1",
"team": "Team_B",
"role": "TEAM_ADMIN"
},
{
"id": "user2",
"team": "Team_B",
"role": "TEAM_ADMIN"
}
]
When I use: group_by(.id)[] | add
I get:
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
},
{
"id": "user2",
"fullName": "User Two",
"email": "user.two#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
}
and it is almost what I want to achieve. My goal is:
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com",
"team": "Team_A,
"role": "TEAM_MEMBER"
},
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
},
{
"id": "user2",
"fullName": "User Two",
"email": "user.two#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
}
I tried reduce also, but with no success.
I have made many attempts, but none of them gave the desired effect.
How is it possible using jq?
Thanks in advance,
krzyhon

It looks like you have a collection of bits of information about users. You can't just flatten them down by merging everything that has the same id, they need to be separate.
You have "user info" (full name and email) and "team info". You need to group by the id, then by the type, then distribute the "user info".
Here's one approach you could take.
# partition the data by "user/team type"
reduce .[] as $i ({}; if "fullName" | in($i) then .user += [$i] else .team += [$i] end)
# create a lookup of "user" data
| (.user | INDEX(.id)) as $user
# group the "team" objects by team
| .team | group_by(.team)
# merge corresponding "user info" with all team objects
| map(map(. + $user[.id]))
[
[
{
"id": "user1",
"team": "Team_A",
"role": "TEAM_MEMBER",
"fullName": "User One",
"email": "user.one#my.mail.com"
}
],
[
{
"id": "user1",
"team": "Team_B",
"role": "TEAM_ADMIN",
"fullName": "User One",
"email": "user.one#my.mail.com"
},
{
"id": "user2",
"team": "Team_B",
"role": "TEAM_ADMIN",
"fullName": "User Two",
"email": "user.two#my.mail.com"
}
]
]
jqplay
Here's another more concise solution, assuming there will only be one "user info" per id.
# group by id
group_by(.id) | map(
# for each group, partition by "type"
group_by(.fullName)
# create combinations of all the info and team objects and merge them
| combinations | add
)
jqplay

This solution uses a variant of #JeffMercado's approach based on combinations, but only uses group_by once (which means that for large datasets, it should be more efficient, since group_by is a relatively expensive operation).
The proposed solution here produces an array of the JSON objects you indicate you want, but if you want a stream of the JSON objects, simply omit the outer square brackets.
[group_by(.id)[]
| [map(select(has("team")|not)), map(select(has("team")))]
| combinations
| add]
More efficiently ...
To avoid using group_by or select at all, we could use the following generic variant of group_by:
def aggregate_by(s; f):
reduce s as $x (null; .[$x|f] += [$x]);
The solution can now be written as follows:
[ aggregate_by(.[]; .id)[]
| aggregate_by(.[]; .team == null | tostring)
| [.[]]
| combinations
| add ]

Related

Compare 2 JSON and retrieve subset from one of them based on condition in Powershell

I have two JSON files abc.json and xyz.json.
Content in abc.json is:
[{"id": "121",
"name": "John",
"location": "europe"
},
{"id": "100",
"name": "Jane",
"location": "asia"
},
{"id": "202",
"name": "Doe",
"location": "america"
}
]
Updated -> Content in xyz.json is:
{
"value": [
{
"id": "111",
"city": "sydney",
"profession": "painter"
},
{
"id": "200",
"city": "istanbul",
"profession": "actor"
},
{
"id": "202",
"city": "seattle",
"profession": "doctor"
}
],
"count": {
"type": "Total",
"value": 3
}
}
I want to get those records of abc.json in when the id in both objects are equal.In this case:
{"id": "202",
"name": "Doe",
"location": "america"
}
I need to do this in Powershell and the version I am using is 5.1.This is what I have tried:
$OutputList = #{}
$abcHash = Get-Content 'path\to\abc.json' | Out-String | ConvertFrom-Json
$xyzHash = Get-Content 'path\to\xyz.json' | Out-String | ConvertFrom-Json
$xyzResp = $xyzHash.value
foreach($item in $xyzResp){
foreach ($record in $abcHash){
if ($item.id -eq $record.id){
$OutputList.Add($record, $null)
}
}
}
Write-Output $OutputList
But on printing the OutputList , I get like this:
Key: #{"id": "202",
"name": "Doe",
"location": "america"
}
Value:
Name:#{"id": "202",
"name": "Doe",
"location": "america"
}
What I require is more of a PSObject like:
id: 202
name:Doe
location:america
I tried using Get-Member cmdlet but could not quite reach there.
Is there any suggestion I could use?
I have corrected your example xyz.json because there was an extra comma in there that should not be there. Also, the example did not have an iten with id 202, so there would be no match at all..
xyz.json
{
"value": [
{
"id": "111",
"city": "sydney",
"profession": "painter"
},
{
"id": "202",
"city": "denver",
"profession": "painter"
},
{
"id": "111",
"city": "sydney",
"profession": "painter"
}
],
"count": {
"type": "Total",
"value": 3
}
}
That said, you can use a simple Where-Object{...} to get the item(s) with matching id's like this:
$abc = Get-Content 'path\to\abc.json' -Raw | ConvertFrom-Json
$xyz = Get-Content 'path\to\xyz.json' -Raw | ConvertFrom-Json
# get the items with matching id's as object(s)
$abc | Where-Object { $xyz.value.id -contains $_.id}
Output:
id name location
-- ---- --------
202 Doe america
Of course you can capture the output first and display as list and/or save to csv, convert back to json and save that.

jq - Copy values of first array element if other values are null

I have an object which contains an array of objects. I want to replace null values of a specific key value pair with the value of the 0th array index. In the example below I want for array elements 1 to contain "website" and "email" of the "email" and "website" of the 0th array element because they are both null. For element 2 I only expect the "website" to be set with the result of the 0th array element as only that is null.
The json that I have is
{
"id": 123,
"offices": [
{
"officeId": 12345,
"name": "Name LLP",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "HO"
},
{
"officeId": 123456,
"name": "Name",
"website": null,
"email": null,
"officeType": "BRANCH"
},
{
"officeId": 1234567,
"name": "Name",
"website": null,
"email": "example#website.com",
"officeType": "BRANCH"
},
],
}
My expected json output would be
{
"id": 123,
"offices": [
{
"officeId": 12345,
"name": "Name LLP",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "HO"
},
{
"officeId": 123456,
"name": "Name",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "BRANCH"
},
{
"officeId": 1234567,
"name": "Name",
"website": "www.example.com",
"email": "example#website.com",
"officeType": "BRANCH"
},
],
}
I have attempted to solve this using map and walk but cannot seem to find the correct way to solve it
After fixing the errors in your JSON file so it's valid:
jq '. as $orig |
.offices |= map(.website //= $orig.offices[0].website |
.email //= $orig.offices[0].email)' input.json
{
"id": 123,
"offices": [
{
"officeId": 12345,
"name": "Name LLP",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "HO"
},
{
"officeId": 123456,
"name": "Name",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "BRANCH"
},
{
"officeId": 1234567,
"name": "Name",
"website": "www.example.com",
"email": "example#website.com",
"officeType": "BRANCH"
}
]
}
Since you have not given any details about your attempts, let me, in the spirit of "How to Solve It", suggest a strategy for doing so.
Specifically, let's formulate and solve the obvious "subproblem" which should make the solution to the original problem easy to the point of almost being trivial. The obvious "subproblem" is: given a reference object, $ref, how to update another object so that null-valued keys in the latter are taken from $ref if available?
def infer($ref):
with_entries( if .value == null then $ref[.key] else . end);
Now the original problem becomes much easier, right?

Match and merge from multiple files with jq

I have a directory with a bunch of "logins" json files like this:
[
{
"user_id": "5ce722b803b54f03f745cdf45d579920",
"time": "2019-10-29T20:03:18.894006Z"
},
{
"user_id": "5ce722b858f3e80e6e85aad3113a1665",
"time": "2019-10-29T20:11:32.4843Z"
}
]
In another directory I have a bunch of "users" json files like this:
[
{
"id": "5ce722b803b54f03f745cdf45d579920",
"email": "foo#gmail.com",
"first_name": "John",
"last_name": "Doe",
"enabled": true,
"created_at": "2019-06-13T17:07:17.2925Z",
"updated_at": "2019-06-13T17:15:20.903085Z",
"groups": {
"count": 1,
"shortlist": [
{
"id": "5d0282c5d5d6063286140e864a0c6506",
"name": "cool users",
"description": "cool users",
"locked": true
}
]
},
"avatar": "",
"role_id": "5d0282c488bba9ebc62df8b3c38571a9",
"company_uid": ""
},
{
"id": "5d0284fdec62d47039e7119013b0aa2c",
"email": "bar#gmail.com",
"first_name": "Jane",
"last_name": "Doe",
"enabled": true,
"created_at": "2019-06-13T17:16:45.210018Z",
"updated_at": "2019-06-13T17:16:45.210018Z",
"groups": {
"count": 1,
"shortlist": [
{
"id": "5d0282c5d5d6063286140e864a0c6506",
"name": "cool users",
"description": "cool users",
"locked": true
}
]
},
"avatar": "",
"role_id": "5d0282c488bba9ebc62df8b3c38571a9",
"company_uid": ""
}
]
What I'm trying to do with jq is:
For each user_id in the "Logins" files, I want to find a matching id in the "Users" files.
I want to merge those two object together.
The intended outcome is another json file(s), which contains login and corresponding user data. As a bonus, I only want the email first and last name from "Users".
End result would be something like this:
[
{
"user_id": "5ce722b803b54f03f745cdf45d579920",
"time": "2019-10-29T20:03:18.894006Z",
"email": "foo#gmail.com",
"first_name": "John",
"last_name": "Doe"
}
}
I've tried variations of the below, but end up with what looks like an infinite loop or something. I know my for loops are wrong, just not sure how to work with multiple files like this.
lastlogins="/last10/*.json"
users="/users/*.json"
for ll in $lastlogins; do
for user in $users; do
userid=$(jq -r '.[].user_id' $ll)
jq -c --arg userid "$userid" '.[] | select(.id == $userid)' $user
done
done
There is no need to use shell looping. That is, everything can be done using jq.
For example, using the invocation:
jq -f users-logins.jq --argfile users users.json logins.json
where users-logins.jq contains:
INDEX($users[]; .id) as $udict
| map( if $udict[.user_id] then . + $udict[.user_id] else empty end)
| map( {user_id, time, email, first_name, last_name} )
the output using the sample inputs would be:
[
{
"user_id": "5ce722b803b54f03f745cdf45d579920",
"time": "2019-10-29T20:03:18.894006Z",
"email": "foo#gmail.com",
"first_name": "John",
"last_name": "Doe"
}
]

Splitting nested arrays as separate entities

I have some JSON data which contains attributes and some array elements. I would like to push a given set of fields into the array elements and then separate the arrays as separate entities.
Source data looks like this
[
{
"phones": [
{
"phone": "555-555-1234",
"type": "home"
},
{
"phone": "555-555-5678",
"type": "mobile"
}
],
"email": [
{
"email": "a#b.com",
"type": "work"
},
{
"email": "x#c.com",
"type": "home"
}
],
"name": "john doe",
"year": "2012",
"city": "cupertino",
"zip": "555004"
},
{
"phones": [
{
"phone": "555-666-1234",
"type": "home"
},
{
"phone": "555-666-5678",
"type": "mobile"
}
],
"email": [
{
"email": "a#b.com",
"type": "work"
},
{
"email": "x#c.com",
"type": "home"
}
],
"name": "jane doe",
"year": "2000",
"city": "los angeles",
"zip": "555004"
}
]
I expect a result like this
{
"person": [
{
"name": "john doe",
"year": "2012",
"city": "cupertino",
"zip": "555004"
},
{
"name": "jane doe",
"year": "2000",
"city": "los angeles",
"zip": "555004"
}
],
"phones": [
{
"name": "john doe",
"year": "2012",
"phone": "555-555-1234",
"type": "home"
},
{
"name": "john doe",
"year": "2012",
"phone": "555-555-5678",
"type": "mobile"
},
{
"name": "jane doe",
"year": "2000",
"phone": "555-666-1234",
"type": "home"
},
{
"name": "jane doe",
"year": "2000",
"phone": "555-666-5678",
"type": "mobile"
}
],
"email": [
{
"name": "john doe",
"year": "2012",
"email": "a#b.com",
"type": "work"
},
{
"name": "john doe",
"year": "2012",
"email": "x#c.com",
"type": "home"
},
{
"name": "jane doe",
"year": "2000",
"email": "a#b.com",
"type": "work"
},
{
"name": "jane doe",
"year": "2000",
"email": "x#c.com",
"type": "home"
}
]
}
I have been able to get the desired result, but I can't make it work in a generic way.
experiment on jqterm
The code below achieves the job, but I would like to pass the array of columns to be injected into the child arrays, the name of the primary result and an array containing the array field names.
["phones", "email"] as $children
| ["name", "year"] as $ids
|{person: map(with_entries(
. as $data | select($children|contains([$data.key])|not)
))}
+ {"phones": split_child($children[0];$ids)}
+ {"email": split_child($children[1];$ids)}
It's a lot more easier to achieve this using multiple reduces, like:
def split_data($parent; $ids; $arr_cols):
($arr_cols | map([.])) as $p
| reduce .[] as $in ({}; .[$parent] += [$in | delpaths($p)]
| (reduce $ids[] as $k ({}; . + {($k): $in[$k]}) as $s
| reduce $arr_cols[] as $k (.; .[$k] += [$in[$k][] + $s])
);
split_data("person"; ["name", "year"]; ["phones", "email"])
Here's a straightforward solution to the generic problem (it uses reduce only once, in a helper function). To understand it, it might be helpful to see it as an abstraction of this concrete solution:
{ person: [.[] | {name, year, city, zip} ]}
+ { phones: [.[] | ({name, year} + .phones[]) ]}
+ { email: [.[] | ({name, year} + .email[]) ]}
Helper function
Let's first define a helper function for constructing an object by selecting a set of keys:
def pick($ary):
. as $in
| reduce $ary[] as $k ({};
. + {($k): $in[$k]});
split_data
Here finally is the function that takes as arguments the $parent, $ids, and columns of interest. The main complication is ensuring that the supplemental keys ("city" and "zip") are dealt with in the proper order.
def split_data($parent; $ids; $arr_cols):
(.[0]|keys_unsorted - $arr_cols - $ids) as $extra
| { ($parent): [.[] | pick($ids + $extra)] }
+ ([$arr_cols[] as $k
| {($k): [.[] | pick($ids) + .[$k][]] }] | add) ;
The invocation:
split_data("person"; ["name", "year"]; ["phones", "email"])
produces the desired result.

Using jq to return specific information in JSON object

I wish to parse individual elements of inner JSON object to build / load in the database.
The following is the JSON object. How can I parse elements like id, name queue etc? I will iterate it in loop and work and build the insert query.
{
"apps": {
"app": [
{
"id": "application_1540378900448_18838",
"user": "hive",
"name": "insert overwrite tabl...summary_view_stg_etl(Stage-2)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
},
{
"id": "application_1540378900448_18833",
"user": "hive",
"name": "insert into SNOW_WORK...metric_definitions')(Stage-13)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
}
]
}
}
You're better off converting the data to a format easily consumed by a database processor, like csv, then do something about it.
$ jq -r '(.apps.app[0] | keys_unsorted) as $k
| $k, (.apps.app[] | [.[$k[]]])
| #csv
' input.json
its pretty simple just fetch elment which is having an array of values.
var JSONOBJ={
"apps": {
"app": [
{
"id": "application_1540378900448_18838",
"user": "hive",
"name": "insert overwrite tabl...summary_view_stg_etl(Stage-2)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
},
{
"id": "application_1540378900448_18833",
"user": "hive",
"name": "insert into SNOW_WORK...metric_definitions')(Stage-13)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
}
]
}
}
JSONOBJ.apps.app.forEach(function(o){console.log(o.id);console.log(o.user);console.log(o.name);})