Reshape JSON with jq to expand each object into multiple rows - json

I have a database of resumes in json format that I want to reshape so that each row corresponds to a person's employment history at a given company:
personid, company_name, start_date, end_date
However, running the following jq command
{personid:.personid, company_name: .experience[].company.name, sdate: .experience[].start_date, edate: .experience[].end_date}
produces the cartesian product of all the fields (3 jobs x 3 fields). For example, a person who has held 3 jobs at 3 different companies in the past looks like this after running the jq command above:
{"id":"abc123","companyname":"companyA","sdate":"2020-06","edate":null}
{"id":"abc123","companyname":"companyA","sdate":"2020-06","edate":null}
{"id":"abc123","companyname":"companyA","sdate":"2020-06","edate":"2017-07"}
{"id":"abc123","companyname":"companyA","sdate":"2016-10","edate":null}
{"id":"abc123","companyname":"companyA","sdate":"2016-10","edate":null}
{"id":"abc123","companyname":"companyA","sdate":"2016-10","edate":"2017-07"}
{"id":"abc123","companyname":"companyA","sdate":"2017-05","edate":null}
{"id":"abc123","companyname":"companyA","sdate":"2017-05","edate":null}
{"id":"abc123","companyname":"companyA","sdate":"2017-05","edate":"2017-07"}
There are 9 entries for CompanyB and CompanyC each but I truncated the output above for brevity.
I think I need to use the group_by() command, but I've been unsuccessful.
Thanks in advance.

Without seeing the original data, my guess is that you get the cartesian product because you are iterating three times (.experience[]) within the object construction. You might want to pull out the iteration, maybe save it in a variable, and reference that instead:
.experience[] as $experience | {
personid: .personid,
company_name: $experience.company.name,
sdate: $experience.start_date,
edate: $experience.end_date
}
Depending on the outer structure of your construction, also the other way around may be appropriate, ie. storing the .person field instead in a variable:
.personid as $id | .experience[] | {
personid: $id,
company_name: .company.name,
sdate: .start_date,
edate: .end_date
}

Related

How to select an element in an array based on two conditions in JMESPath?

I'm trying to select the SerialNumber of a specific AWS MFADevice for different profiles.
This command returns the list of MFADevices for a certain profile:
aws iam list-mfa-devices --profile xxx
and this is a sample JSON output:
{
"MFADevices": [
{
"UserName": "foobar#example.com",
"SerialNumber": "arn:aws:iam::000000000000:mfa/foo",
"EnableDate": "2022-12-06T16:23:41+00:00"
},
{
"UserName": "barfoo#example.com",
"SerialNumber": "arn:aws:iam::111111111111:mfa/bar_cli",
"EnableDate": "2022-12-12T09:13:10+00:00"
}
]
}
I would like to select the SerialNumber of the device containing the string cli. But in case there is only one device in the list (regardless of the presence or absence of the string cli), I'd like to get its SerialNumber.
I have this expression which already filters for the first condition, namely the desired string:
aws iam list-mfa-devices --profile xxx --query 'MFADevices[].SerialNumber | [?contains(#,`cli`)] | [0]'
However I still haven't been able to figure out how to add the if number_of_devices == 1 then return the serial of that single device.
I can get the number of MFADevices with this command:
aws iam list-mfa-devices --profile yyy --query 'length(MFADevices)'
And as a first step towards my final solution I wanted to initially get the SerialNumber only in the case the list has exactly one element, so, I thought of something like this:
aws iam list-mfa-devices --profile yyy --query 'MFADevices[].SerialNumber | [?length(MFADevices) ==`1`]'
but actually already at this stage I get the error below (left alone the fact that I still need to combine it with the cli part):
In function length(), invalid type for value: None, expected one of: ['string', 'array', 'object'], received: "null"
Does anybody know how to achieve what I want?
I know that I could just pipe the raw output to jq and do the filtering there, but I was wondering if there is a way to do it directly in the command using some JMESPath expression.
In order to do those kind of condition in JMESPath you will have to rely on logical or (||) and logical and (&&), because the language does not have a conditional keyword, per se.
So, in pseudo-code, instead of doing:
if length(MFADevices) == 1
MFADevices[0]
else
MFADevices[?someFilter]
You have to do, like in bash:
length(MFADevices) == 1 and MFADevices[0] or MFADevices[?someFilter]
So, in JMESPath:
length(MFADevices) == `1`
&& MFADevices[0].SerialNumber
|| (MFADevices[?contains(SerialNumber, `cli`)] | [0]).SerialNumber
Note: this assumes that, if there are more than one element but none contains cli, we should get null.
If you want the first element, even when there are multiple devices and the SerialNumber does not contains cli, then you can simplify it further and simply do a logical or, when the contains filter return nothing (as a null result will evaluates to false):
(MFADevices[?contains(SerialNumber, `cli`)] | [0]).SerialNumber
|| MFADevices[0].SerialNumber
With stedolan/jq you can filter for the substring and unconditonally add the first, then take the first of them:
.MFADevices | map(.SerialNumber) | first((.[] | select(contains("cli"))), first)
Demo
or
[.MFADevices[].SerialNumber] | map(select(contains("cli"))) + .[:1] | first
Demo
Output:
arn:aws:iam::111111111111:mfa/bar_cli

Kusto KQL reference first object in an JSON array

I need to grab the value of the first entry in a json array with Kusto KQL in Microsoft Defender ATP.
The data format looks like this (anonymized), and I want the value of "UserName":
[{"UserName":"xyz","DomainName":"xyz","Sid":"xyz"}]
How do I split or in any other way get the "UserName" value?
In WDATP/MSTAP, for the "LoggedOnUsers" type of arrays, you want "mv-expand" (multi-value expand) in conjunction with "parsejson".
"parsejson" will turn the string into JSON, and mv-expand will expand it into LoggedOnUsers.Username, LoggedOnUsers.DomainName, and LoggedOnUsers.Sid:
DeviceInfo
| mv-expand parsejson(LoggedOnUsers)
| project DeviceName, LoggedOnUsers.UserName, LoggedOnUsers.DomainName
Keep in mind that if the packed field has multiple entries (like DeviceNetworkInfo's IPAddresses field often does), the entire row will be expanded once per entry - so a row for a machine with 3 entries in "IPAddresses" will be duplicated 3 times, with each different expansion of IpAddresses:
DeviceNetworkInfo
| where Timestamp > ago(1h)
| mv-expand parsejson(IPAddresses)
| project DeviceName, IPAddresses.IPAddress
to access the first entry's UserName property you can do the following:
print d = dynamic([{"UserName":"xyz","DomainName":"xyz","Sid":"xyz"}])
| extend result = d[0].UserName
to get the UserName for all entries, you can use mv-expand/mv-apply:
print d = dynamic([{"UserName":"xyz","DomainName":"xyz","Sid":"xyz"}])
| mv-apply d on (
project d.UserName
)
thanks for the reply, but the proposed solution didn't work for me. However instead I found the following solution:
project substring(split(split(LoggedOnUsers,',',0),'"',4),2,9)
The output of this is: UserName

Is there a way to enrich JSON field in MySQL?

Let's take a simple schema with two tables, one that describes an simple entity item (id, name)
id | name
------------
1 | foo
2 | bar
and another, lets call it collection, that references to an item, but inside a JSON Object in something like
{
items: [
{
id: 1,
quantity: 2
}
]
}
I'm looking for a way to eventually enrich this field (kind of like populate in Mongo) in the collection with the item element referenced, to retrieve something like
{
...
items: [
{
item: {
id: 1,
name: foo
},
quantity: 2
}
]
...
}
If you have a solution with PostgreSQL, I take it as well.
If I understood correctly, your requirement is to convert an Input JSON data into MySQL table so that you can work with JSON but leverage the power of SQL.
Mysql8 recently released JSONTABLE function. By using this function, you can store your JSON in the table directly and then query it like any other SQL query.
It should serve your immediate case, but this means that your table schema will have a JSON column instead of traditional MySQL columns. You will need to check if it serves your purpose.
This is a good tutorial for the same.

How can i adjust joined table query result in needed JSON format with CodeIngiter

Suppose we have table person and table phoneand the relation between them is one to many.
I need to retrieve this like result with one query.
[
{
name:"abc",
lname:"def",
phones:[
{
dial_code="+1",
number:"12345667"
},
{
dial_code="+1",
number:"12345667"
}
]
},
{
name:"xyz",
lname:"lmn",
phones[
{
dial_code="+2",
number:"2643525"
}
]
},
{...}
]
I can do this by multiple query like first getting all persons and then get their phones one by one but i think its so weird and need lots of time and reduce performance. and if i get all data by joining table it wouldn't be like this JSON format.
Any idea will be appreciated.
Sorry for my bad English.
First things first, you cannot retrieve the desired result with multiple phone inside each person with one single query.
Now, running the query inside person loop will hugely affect the performance of the script if there are a lot of data. In this way, first, you need to execute a query to fetch all persons(say n persons). Then you have to again loop all n persons to fetch their respective phones.
So you need to run something like following inside $persons loop n times:
SELECT * FROM phone WHERE person_id = [$person_id]
Therefore in this way you need to execute n+1 queries.
To overcome this n+1 query problem we can apply a methodology which is called as eager loading. Here you also need to execute the first query to retrieve all persons and then write a query to fetch all phones which belongs to those retrieved persons:
SELECT * FROM person
Result($persons):
id name
5 John
10 Bob
20 Jenna
SELECT * FROM phone WHERE person_id IN (5,10,20)
Result($phones):
id person_id dial_code number
1 5 +2 12345
2 10 +1 12312
3 20 +1 98765
Now we combine these two results in PHP scripts to produce the desired array. In this way, we write only two queries instead of n+1 queries.
You can write a PHP script like following to combine the two result sets:
// Create an array of phones with person_id as key
$phones_with_person_id_as_key = [];
foreach($phones as $key => $phone) {
$phones_with_person_id_as_key[$phone->person_id][$key] = $phone;
}
// Loop $persons array and add phones to person object
foreach($persons as $key => $person) {
// create phones key and add data
if (!empty($phones_with_person_id_as_key[$person->id])) {
$person->phones = $phones_with_person_id_as_key[$person->id];
}
else {
$person->phones = [];
}
}
Now $persons contains the formatted desired output.

JSON path parent object, or equivalent MongoDB query

I am selecting nodes in a JSON input but can't find a way to include parent object detail for each array entry that I am querying. I am using pentaho data integration to query the data using JSON input form a mongodb input.
I have also tried to create a mongodb query to achieve the same but cannot seem to do this either.
Here are the two fields/paths that display the data:
$.size_break_costs[*].size
$.size_break_costs[*].quantity
Here is the json source format:
{
"_id" : ObjectId("4f1f74ecde074f383a00000f"),
"colour" : "RAVEN-SMOKE",
"name" : "Authority",
"size_break_costs" : [
{
"quantity" : NumberLong("80"),
"_id" : ObjectId("518ffc0697eee36ff3000002"),
"size" : "S"
},
{
"quantity" : NumberLong("14"),
"_id" : ObjectId("518ffc0697eee36ff3000003"),
"size" : "M"
},
{
"quantity" : NumberLong("55"),
"_id" : ObjectId("518ffc0697eee36ff3000004"),
"size" : "L"
}
],
"sku" : "SK3579"
}
I currently get the following results:
S,80
M,14
L,55
I would like to get the SKU and Name as well as my source will have multiple products (SKU/Description):
SK3579,Authority,S,80
SK3579,Authority,M,14
SK3579,Authority,L,55
When I try To include using $.sku, I the process errors.
The end result i'm after is a report of all products and the available quantities of their various sizes. Possibly there's an alternative mongodb query that provides this.
EDIT:
It seems the issue may be due to the fact that not all lines have the same structure. For example the above contains 3 sizes - S,M,L. Some products come in one size - PACK. Other come in multiple sizes - 28,30,32,33,34,36,38 etc.
The error produced is:
*The data structure is not the same inside the resource! We found 1 values for json path [$.sku], which is different that the number retourned for path [$.size_break_costs[].quantity] (7 values). We MUST have the same number of values for all paths.
I have tried the following mongodb query separately which gives the correct results, but the corresponding export of this doesn't work. No values are returned for the Size and Quantity.
Query:
db.product_details.find( {}, {sku: true, "size_break_costs.size": true, "size_break_costs.quantity": true}).pretty();
Export:
mongoexport --db brandscope_production --collection product_details --csv --out Test01.csv --fields sku,"size_break_costs.size","size_break_costs.quantity" --query '{}';
Shortly after I added my own bounty, I figured out the solution. My problem has the same basic structure, which is a parent identifier, and some number N child key/value pairs for ratings (quality, value, etc...).
First, you'll need a JSON Input step that gets the SKU, Name, and size_break_costs array, all as Strings. The important part is that size_break_costs is a String, and is basically just a stringified JSON array. Make sure that under the Content tab of the JSON Input, that "Ignore missing path" is checked, in case you get one with an empty array or the field is missing for some reason.
For your fields, use:
Name | Path | Type
ProductSKU | $.sku | String
ProductName | $.name | String
SizeBreakCosts | $.size_break_costs | String
I added a "Filter rows" block after this step, with the condition "SizeBreakCosts IS NOT NULL", which is then passed to a second JSON Input block. This second JSON block, you'll need to check "Source is defined in a field?", and set the value of "Get source from field" to "SizeBreakCosts", or whatever you named it in the first JSON Input block.
Again, make sure "Ignore missing path" is checked, as well as "Ignore empty file". From this block, we'll want to get two fields. We'll already have ProductSKU and ProductName with each row that's passed in, and this second JSON Input step will further split it into however many rows are in the SizeBreakCosts input JSON. For fields, use:
Name | Path | Type
Quantity | $.[*].quantity | Integer
Size | $.[*].size | String
As you can see, these paths use "$.[*].FieldName", because the JSON string we passed in has an array as the root item, so we're getting every item in that array, and parsing out its quantity and size.
Now every row should have the SKU and name from the parent object, and the quantity and size from each child object. Dumping this example to a text file, I got:
ProductSKU;ProductName;Size;Quantity
SK3579;Authority;S; 80
SK3579;Authority;M; 14
SK3579;Authority;L; 55