Match and merge from multiple files with jq - json

I have a directory with a bunch of "logins" json files like this:
[
{
"user_id": "5ce722b803b54f03f745cdf45d579920",
"time": "2019-10-29T20:03:18.894006Z"
},
{
"user_id": "5ce722b858f3e80e6e85aad3113a1665",
"time": "2019-10-29T20:11:32.4843Z"
}
]
In another directory I have a bunch of "users" json files like this:
[
{
"id": "5ce722b803b54f03f745cdf45d579920",
"email": "foo#gmail.com",
"first_name": "John",
"last_name": "Doe",
"enabled": true,
"created_at": "2019-06-13T17:07:17.2925Z",
"updated_at": "2019-06-13T17:15:20.903085Z",
"groups": {
"count": 1,
"shortlist": [
{
"id": "5d0282c5d5d6063286140e864a0c6506",
"name": "cool users",
"description": "cool users",
"locked": true
}
]
},
"avatar": "",
"role_id": "5d0282c488bba9ebc62df8b3c38571a9",
"company_uid": ""
},
{
"id": "5d0284fdec62d47039e7119013b0aa2c",
"email": "bar#gmail.com",
"first_name": "Jane",
"last_name": "Doe",
"enabled": true,
"created_at": "2019-06-13T17:16:45.210018Z",
"updated_at": "2019-06-13T17:16:45.210018Z",
"groups": {
"count": 1,
"shortlist": [
{
"id": "5d0282c5d5d6063286140e864a0c6506",
"name": "cool users",
"description": "cool users",
"locked": true
}
]
},
"avatar": "",
"role_id": "5d0282c488bba9ebc62df8b3c38571a9",
"company_uid": ""
}
]
What I'm trying to do with jq is:
For each user_id in the "Logins" files, I want to find a matching id in the "Users" files.
I want to merge those two object together.
The intended outcome is another json file(s), which contains login and corresponding user data. As a bonus, I only want the email first and last name from "Users".
End result would be something like this:
[
{
"user_id": "5ce722b803b54f03f745cdf45d579920",
"time": "2019-10-29T20:03:18.894006Z",
"email": "foo#gmail.com",
"first_name": "John",
"last_name": "Doe"
}
}
I've tried variations of the below, but end up with what looks like an infinite loop or something. I know my for loops are wrong, just not sure how to work with multiple files like this.
lastlogins="/last10/*.json"
users="/users/*.json"
for ll in $lastlogins; do
for user in $users; do
userid=$(jq -r '.[].user_id' $ll)
jq -c --arg userid "$userid" '.[] | select(.id == $userid)' $user
done
done

There is no need to use shell looping. That is, everything can be done using jq.
For example, using the invocation:
jq -f users-logins.jq --argfile users users.json logins.json
where users-logins.jq contains:
INDEX($users[]; .id) as $udict
| map( if $udict[.user_id] then . + $udict[.user_id] else empty end)
| map( {user_id, time, email, first_name, last_name} )
the output using the sample inputs would be:
[
{
"user_id": "5ce722b803b54f03f745cdf45d579920",
"time": "2019-10-29T20:03:18.894006Z",
"email": "foo#gmail.com",
"first_name": "John",
"last_name": "Doe"
}
]

Related

How to extract a paticular key from the json

I am trying to extract values from a json that I obtained using the curl command for api testing. My json looks as below. I need some help extracting the value "20456" from here?
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.076+0000"
},
"links": {},
"data": {
"id": 24843,
"username": "abcd",
"firstName": "abc",
"lastName": "xyz",
"email": "abc#abc.com",
"phone": "",
"title": "",
"location": "",
"licenseType": "FLOATING",
"active": true,
"uid": "u24843",
"type": "users"
}
}
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.282+0000",
"pageInfo": {
"startIndex": 0,
"resultCount": 1,
"totalResults": 1
}
},
"links": {
"data.createdBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.createdBy}"
},
"data.fields.user1": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.user1}"
},
"data.modifiedBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.modifiedBy}"
},
"data.fields.projectManager": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.projectManager}"
},
"data.parent": {
"type": "projects",
"href": "https://abc#abc.com/rest/v1/projects/{data.parent}"
}
},
"data": [
{
"id": 20456,
"projectKey": "Stratus",
"parent": 20303,
"isFolder": false,
"createdDate": "2018-03-12T23:46:59.000+0000",
"modifiedDate": "2020-04-28T22:14:35.000+0000",
"createdBy": 18994,
"modifiedBy": 18865,
"fields": {
"projectManager": 18373,
"user1": 18628,
"projectKey": "Stratus",
"text1": "",
"name": "Stratus",
"description": "",
"date2": "2019-03-12",
"date1": "2018-03-12"
},
"type": "projects"
}
]
}
I have tried the following, but end up getting error:
▶ cat jqTrial.txt | jq '.data[].id'
jq: error (at <stdin>:21): Cannot index number with string "id"
20456
Also tried this but I get strings outside the object that I am not sure how to remove:
cat jqTrial.txt | jq '.data[]'
Assuming you want the project id not the user id:
jq '
.data
| if type == "object" then . else .[] end
| select(.type == "projects")
| .id
' file.json
There's probably a better way to write the 2nd expression
Indeed, thanks to #pmf
.data | objects // arrays[] | select(.type == "projects").id
Your input consists of two JSON documents; both have a data field on top level. But while the first one is itself an object which has an .id field, the second one is an array with one object item, which also has an .id field.
To retrieve both, you could use the --slurp (or -s) option which wraps both top-level objects into an array, then you can address them separately by index:
jq --slurp '.[0].data.id, .[1].data[].id' jqTrial.txt
24843
20456
Demo

jq - Copy values of first array element if other values are null

I have an object which contains an array of objects. I want to replace null values of a specific key value pair with the value of the 0th array index. In the example below I want for array elements 1 to contain "website" and "email" of the "email" and "website" of the 0th array element because they are both null. For element 2 I only expect the "website" to be set with the result of the 0th array element as only that is null.
The json that I have is
{
"id": 123,
"offices": [
{
"officeId": 12345,
"name": "Name LLP",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "HO"
},
{
"officeId": 123456,
"name": "Name",
"website": null,
"email": null,
"officeType": "BRANCH"
},
{
"officeId": 1234567,
"name": "Name",
"website": null,
"email": "example#website.com",
"officeType": "BRANCH"
},
],
}
My expected json output would be
{
"id": 123,
"offices": [
{
"officeId": 12345,
"name": "Name LLP",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "HO"
},
{
"officeId": 123456,
"name": "Name",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "BRANCH"
},
{
"officeId": 1234567,
"name": "Name",
"website": "www.example.com",
"email": "example#website.com",
"officeType": "BRANCH"
},
],
}
I have attempted to solve this using map and walk but cannot seem to find the correct way to solve it
After fixing the errors in your JSON file so it's valid:
jq '. as $orig |
.offices |= map(.website //= $orig.offices[0].website |
.email //= $orig.offices[0].email)' input.json
{
"id": 123,
"offices": [
{
"officeId": 12345,
"name": "Name LLP",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "HO"
},
{
"officeId": 123456,
"name": "Name",
"website": "www.example.com",
"email": "website#example.com",
"officeType": "BRANCH"
},
{
"officeId": 1234567,
"name": "Name",
"website": "www.example.com",
"email": "example#website.com",
"officeType": "BRANCH"
}
]
}
Since you have not given any details about your attempts, let me, in the spirit of "How to Solve It", suggest a strategy for doing so.
Specifically, let's formulate and solve the obvious "subproblem" which should make the solution to the original problem easy to the point of almost being trivial. The obvious "subproblem" is: given a reference object, $ref, how to update another object so that null-valued keys in the latter are taken from $ref if available?
def infer($ref):
with_entries( if .value == null then $ref[.key] else . end);
Now the original problem becomes much easier, right?

jq - create new object based on existing objects

This is my very first post, so please forgive me if it is not perfect.
I have json like below. It consists of two parts. The first part contains information about user (id, fullname and email) and the second part contains informations to which team user belongs. (id, team, role)
What I want to get an object which contains: id, fullName, emaila, team and role. I can do it but only when user belongs to one team. If user belongs to more than one team - I can't handle it.
Below my json:
[
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com"
},
{
"id": "user2",
"fullName": "User Two",
"email": "user.two#my.mail.com"
},
{
"id": "user1",
"team": "Team_A",
"role": "TEAM_MEMBER"
},
{
"id": "user1",
"team": "Team_B",
"role": "TEAM_ADMIN"
},
{
"id": "user2",
"team": "Team_B",
"role": "TEAM_ADMIN"
}
]
When I use: group_by(.id)[] | add
I get:
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
},
{
"id": "user2",
"fullName": "User Two",
"email": "user.two#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
}
and it is almost what I want to achieve. My goal is:
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com",
"team": "Team_A,
"role": "TEAM_MEMBER"
},
{
"id": "user1",
"fullName": "User One",
"email": "user.one#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
},
{
"id": "user2",
"fullName": "User Two",
"email": "user.two#my.mail.com",
"team": "Team_B",
"role": "TEAM_ADMIN"
}
I tried reduce also, but with no success.
I have made many attempts, but none of them gave the desired effect.
How is it possible using jq?
Thanks in advance,
krzyhon
It looks like you have a collection of bits of information about users. You can't just flatten them down by merging everything that has the same id, they need to be separate.
You have "user info" (full name and email) and "team info". You need to group by the id, then by the type, then distribute the "user info".
Here's one approach you could take.
# partition the data by "user/team type"
reduce .[] as $i ({}; if "fullName" | in($i) then .user += [$i] else .team += [$i] end)
# create a lookup of "user" data
| (.user | INDEX(.id)) as $user
# group the "team" objects by team
| .team | group_by(.team)
# merge corresponding "user info" with all team objects
| map(map(. + $user[.id]))
[
[
{
"id": "user1",
"team": "Team_A",
"role": "TEAM_MEMBER",
"fullName": "User One",
"email": "user.one#my.mail.com"
}
],
[
{
"id": "user1",
"team": "Team_B",
"role": "TEAM_ADMIN",
"fullName": "User One",
"email": "user.one#my.mail.com"
},
{
"id": "user2",
"team": "Team_B",
"role": "TEAM_ADMIN",
"fullName": "User Two",
"email": "user.two#my.mail.com"
}
]
]
jqplay
Here's another more concise solution, assuming there will only be one "user info" per id.
# group by id
group_by(.id) | map(
# for each group, partition by "type"
group_by(.fullName)
# create combinations of all the info and team objects and merge them
| combinations | add
)
jqplay
This solution uses a variant of #JeffMercado's approach based on combinations, but only uses group_by once (which means that for large datasets, it should be more efficient, since group_by is a relatively expensive operation).
The proposed solution here produces an array of the JSON objects you indicate you want, but if you want a stream of the JSON objects, simply omit the outer square brackets.
[group_by(.id)[]
| [map(select(has("team")|not)), map(select(has("team")))]
| combinations
| add]
More efficiently ...
To avoid using group_by or select at all, we could use the following generic variant of group_by:
def aggregate_by(s; f):
reduce s as $x (null; .[$x|f] += [$x]);
The solution can now be written as follows:
[ aggregate_by(.[]; .id)[]
| aggregate_by(.[]; .team == null | tostring)
| [.[]]
| combinations
| add ]

Using jq to return specific information in JSON object

I wish to parse individual elements of inner JSON object to build / load in the database.
The following is the JSON object. How can I parse elements like id, name queue etc? I will iterate it in loop and work and build the insert query.
{
"apps": {
"app": [
{
"id": "application_1540378900448_18838",
"user": "hive",
"name": "insert overwrite tabl...summary_view_stg_etl(Stage-2)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
},
{
"id": "application_1540378900448_18833",
"user": "hive",
"name": "insert into SNOW_WORK...metric_definitions')(Stage-13)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
}
]
}
}
You're better off converting the data to a format easily consumed by a database processor, like csv, then do something about it.
$ jq -r '(.apps.app[0] | keys_unsorted) as $k
| $k, (.apps.app[] | [.[$k[]]])
| #csv
' input.json
its pretty simple just fetch elment which is having an array of values.
var JSONOBJ={
"apps": {
"app": [
{
"id": "application_1540378900448_18838",
"user": "hive",
"name": "insert overwrite tabl...summary_view_stg_etl(Stage-2)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
},
{
"id": "application_1540378900448_18833",
"user": "hive",
"name": "insert into SNOW_WORK...metric_definitions')(Stage-13)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
}
]
}
}
JSONOBJ.apps.app.forEach(function(o){console.log(o.id);console.log(o.user);console.log(o.name);})

insert the array into the database

I am getting this result when I am using graph api . it is in array format
{
"id": "216805086",
"name": "raj sharma",
"first_name": "raj ",
"last_name": "sharma",
"link": "https://www.facebook.com/raj.sharma.5",
"username": "raj .sharma.5",
"favorite_teams": [
{
"id": "198358615428",
"name": "Mumbai Indians"
},
{
"id": "190313434323691",
"name": "Indian Cricket Team"
}
],
"favorite_athletes": [
{
"id": "100787839962234",
"name": "Saina Nehwal"
}
],
"gender": "male",
"email": "raj.discoverme#gmail.com",
"timezone": 5.5,
"locale": "en_GB",
"verified": true,
"updated_time": "2013-08-13T06:01:17+0000"
}
I am working in a php language and phpmyadmin database . Now i want to insert the array into my database . Should i make a column for id , name , first_name ,last_name,link,favorite_teams etc or should i make a one column for all of this ........
how toinsert tha array into the database
Actually this is not an array. This is JSON. In JSON there are two formats,
JSONArray [ ]
JSONObject { }
You are getting the JSONObject as your output. There is a function in PHP callerd JSONDecode.
Go through this you will get idea.
Storing facebook app data in a database is against Facebook policy http://developers.facebook.com/policy/
$data = '{
"id": "216805086",
"name": "raj sharma",
"first_name": "raj ",
"last_name": "sharma",
"link": "https://www.facebook.com/raj.sharma.5",
"username": "raj .sharma.5",
"favorite_teams": [
{
"id": "198358615428",
"name": "Mumbai Indians"
},
{
"id": "190313434323691",
"name": "Indian Cricket Team"
}
],
"favorite_athletes": [
{
"id": "100787839962234",
"name": "Saina Nehwal"
}
],
"gender": "male",
"email": "raj.discoverme#gmail.com",
"timezone": 5.5,
"locale": "en_GB",
"verified": true,
"updated_time": "2013-08-13T06:01:17+0000"
}';
//decode to get as php variable
$values = json_decode($data,true); //true to decode as a array not an object
$sql = "INSERT INTO TableName (id,name,first_name,last_name,link,username)
VALUES ('".$values['id']."','".$values['name']."','".$values['first_name']."','".$values['last_name']."','".$values['link']."','".$values['username']."')";
mysql_query($sql);
Json_decode() takes a JSON encoded string and converts it into a PHP variable.