need to extract specific string with JQ - json

I have a JSON file (see below) and with JQ I need to extract the resourceName value for value = mail#mail1.com
So in my case, the result should be name_1
Any idea to do that ?
Because this does not work :
jq '.connections[] | select(.emailAddresses.value | test("mail#mail1.com"; "i")) | .resourceName' file.json
{
"connections": [
{
"resourceName": "name_1",
"etag": "123456789",
"emailAddresses": [
{
"metadata": {
"primary": true,
"source": {
"type": "CONTACT",
"id": "123456"
}
},
"value": "mail#mail1.com",
}
]
},
{
"resourceName": "name_2",
"etag": "987654321",
"emailAddresses": [
{
"metadata": {
"primary": true,
"source": {
"type": "CONTACT",
"id": "654321"
},
"sourcePrimary": true
},
"value": "mail#mail2.com"
}
]
}
],
"totalPeople": 187,
"totalItems": 187
}

One solution is to store the parent object while selecting on the child array:
jq '.connections[] | . as $parent | .emailAddresses // empty | .[] | select(.value == "mail#mail1.com") | $parent.resourceName' file.json

emailAddresses is an array. Use any if finding one element that matches will suffice.
.connections[] | select(any(.emailAddresses[];.value == "mail#mail1.com")).resourceName

Related

Pyspark transform json into multiple dataframes

I have multiple json with this structure (association can have one or multiple objects & Charasteritics doesn't always has the same number of kv pairs:
{
"vl:VNETList": {
"Template": {
"ID": "SomeId",
"Object": [
{
"ID": "my_first_id",
"Context": {
"ID": "Avngate"
},
"Name": "Model Description",
"ClassID": "PID",
"Association": [
{
"Object": {
"ID": "test.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
},
{
"Object": {
"ID": "Project Description",
"Context": {
"ID": "Avngate"
}
},
"#type": "is an element of"
}
],
"Characteristic": [
{
"Name": "InfoType",
"Value": "image/svg+xml"
},
{
"Name": "LOCK",
"Value": false
},
{
"Name": "EXFI",
"Value": 10000
}
]
},
{
"ID": "my_second_id",
"Context": {
"ID": "Avngate2"
},
"Name": "Model Description2",
"ClassID": "PID2",
"Association": [
{
"Object": {
"ID": "test2.svg",
"Context": {
"ID": "Avngate"
}
},
"#type": "is fulfilled by"
}
],
"Characteristic": [
{
"Name": "Dbtencoding",
"Value": "unicode"
}
]
}
]
}
}
I would like to build two dataframes like this:
and the second dataframe like this:
What's the best approach? If too complex, I would be able also to save the characteristics as a separate table referencing the objectId like with the association.
Read json and groupBy for the first one, just select for the second one with explode.
df1 = spark.read.json('test.json', multiLine=True)
df2 = df1.select(f.explode('vl:VNETList.Template.Object').alias('value')) \
.select('value.*')
df_f1 = df2.withColumn('Characteristic', f.explode('Characteristic')) \
.groupBy('ID', 'Name', 'ClassId') \
.pivot('Characteristic.Name') \
.agg(f.first('Characteristic.Value'))
df_f2 = df2.withColumn('Association', f.explode('Association')) \
.select('ID', 'Association.Object.ID', 'Association.#Type') \
.toDF('ID', 'AssociationId', 'AssociationType')
df_f1.show()
df_f2.show()
+------------+------------------+-------+-----------+-----+-------------+-----+
| ID| Name|ClassId|Dbtencoding| EXFI| InfoType| LOCK|
+------------+------------------+-------+-----------+-----+-------------+-----+
| my_first_id| Model Description| PID| null|10000|image/svg+xml|false|
|my_second_id|Model Description2| PID2| unicode| null| null| null|
+------------+------------------+-------+-----------+-----+-------------+-----+
+------------+-------------------+----------------+
| ID| AssociationId| AssociationType|
+------------+-------------------+----------------+
| my_first_id| test.svg| is fulfilled by|
| my_first_id|Project Description|is an element of|
|my_second_id| test2.svg| is fulfilled by|
+------------+-------------------+----------------+

Check if Json field Exist, giving default value with (select) jq command

I have some JSON data and i want to push some of them to DB , but sometimes the json values not exists for specific devices:
from all of the following data I want just to pull , "ICCID","MDN","MSISDN","MCC","MNC","FeatureTypes","RatePlanCode","RatePlanDescription","DeviceState","BillingCycleStartDate","BillingCycleEndDate","CurrentBillCycleDataUnRatedUsage"
and if any one not exist print not-exist .
{
"categories": [{
"categoryName": "DeviceIdentifier",
"extendedAttributes": [{
"key": "ICCID",
"value": "89148000"
},
{
"key": "IMSI",
"value": "31148094"
},
{
"key": "MDN",
"value": "5514048068"
},
{
"key": "MEID",
"value": "A0000000005006"
},
{
"key": "MIN",
"value": "5514041185"
}
]
},
{
"categoryName": "DeviceAttributes",
"extendedAttributes": [{
"key": "MCC",
"value": "311"
},
{
"key": "MNC",
"value": "480"
},
{
"key": "FeatureCodes",
"value": "75802,84777,54307"
},
{
"key": "FeatureNames",
"value": "75802,84777,54307"
},
{
"key": "FeatureTypes",
"value": "4G Public Dynamic"
},
{
"key": "RatePlanCode",
"value": "4G5G "
},
{
"key": "RatePlanDescription",
"value": "4G5G"
},
{
"key": "Services",
"value": "null"
}
]
},
{
"categoryName": "Provisioning",
"extendedAttributes": [{
"key": "LastActivationDate",
"value": "2022-03-01T19:38:52Z"
},
{
"key": "CreatedAt",
"value": "2021-12-01T21:22:55Z"
},
{
"key": "DeviceState",
"value": "active"
},
{
"key": "LastDeactivationDate",
"value": "2021-12-01T21:22:55Z"
}
]
},
{
"categoryName": "Connectivity",
"extendedAttributes": [{
"key": "Connected",
"value": "true"
},
{
"key": "LastConnectionDate",
"value": "2022-09-08T03:38:55Z"
},
{
"key": "LastDisconnectDate",
"value": "2022-09-08T03:25:15Z"
}
]
},
{
"categoryName": "Billing",
"extendedAttributes": [{
"key": "BillingCycleStartDate",
"value": "2022-09-02T00:00:00Z"
},
{
"key": "BillingCycleEndDate",
"value": "2022-10-01T00:00:00Z"
},
{
"key": "DefaultRatePlan",
"value": "0"
}
]
},
{
"categoryName": "Usage",
"extendedAttributes": [{
"key": "CurrentRatedUsageRecordDate",
"value": "2022-09-04T00:00:00Z"
}, {
"key": "CurrentUnRatedUsageRecordDate",
"value": "2022-09-08T01:25:15Z"
},
{
"key": "CurrentBillCycleDataUnRatedUsage",
"value": "1698414605"
}
]
}
]
}
i'm not pushing all fields to db so i'm selecting a specific keys from that,
(what i'am selecting its fixed not changed) so the select will not change and always will be :
Expected output :
"89148000"
"5514048068"
"not-exist"
"4G Public Dynamic"
"4G5G"
"4G5G"
"active"
"2022-09-02T00:00:00Z"
"2022-10-01T00:00:00Z"
"2022-09-08T01:25:15Z"
I would like to check if the value of key is missing for this case "MSISDN" ,if not will print for me not-exist or null
any help ?
.categories[].Attributes[] |
if (.key | IN(["AAA","BBB","DDD","EEE"][]))
then .value
else "NOT-EXIST"
end
Gives the following output
"111"
"222"
"NOT-EXIST"
"444"
"555"
First we loop over the Attributes
Then we use an if to;
Check if key exist in ["AAA","BBB","DDD","EEE"]
TRUE: use .value
FALSE: use NOT-EXIST as value
Demo
Another approach, using with_entries() to update the .value before looping over all the objects to show just the value, gives the same output as above:
.categories[].Attributes[]
| select(.key | IN(["AAA","BBB","DDD","EEE"][]) | not).value = "NOT-EXIST"
| .value
Demo
I hope I understood your requirements correctly, but here is a solution that looks simple enough to understand and should be somewhat efficient. If you always expect the same 5 keys in your input, you can try:
.categories[].Attributes | from_entries as $attr
| ["AAA", "BBB", "CCC", "DDD", "EEE"]
| map($attr[.] // "NOT-EXIST")
Input:
{"categories": [
{
"categoryName": "Device",
"Attributes": [
{
"key": "AAA",
"value": "111"
},
{
"key": "BBB",
"value": "222"
},
{
"key": "DDD",
"value": "444"
},
{
"key": "EEE",
"value": "555"
}
]
}]}
Output:
[
"111",
"222",
"NOT-EXIST",
"444",
"555"
]
If you require only the values, add [] or | .[] at the end of the script or rewrite to:
.categories[].Attributes | from_entries as $attr
| "AAA", "BBB", "CCC", "DDD", "EEE"
| $attr[.] // "NOT-EXIST"
With the input from updated question, you intend to first merge all extendedAttributes array into one big array, convert to an object and then use this complete object to look up your values:
.categories | map(.extendedAttributes[]) | from_entries as $attr
| "ICCID", "MDN", "MSISDN", "MCC", "MNC", "FeatureTypes", "RatePlanCode", "RatePlanDescription", "DeviceState", "BillingCycleStartDate", "BillingCycleEndDate", "CurrentBillCycleDataUnRatedUsage"
| $attr[.] // "NOT-EXIST"
.categories | map(.extendedAttributes[]) can be rewritten as [.categories[].extendedAttributes[]] or .categories | map(.extendedAttributes) | add, which might be easier to grok.
Output:
"89148000"
"5514048068"
"NOT-EXIST"
"311"
"480"
"4G Public Dynamic"
"4G5G "
"4G5G"
"active"
"2022-09-02T00:00:00Z"
"2022-10-01T00:00:00Z"
"1698414605"
A version without an intermediate variable is also possible:
[
.categories | map(.extendedAttributes[]) | from_entries[
"ICCID",
"MDN",
"MSISDN",
"MCC",
"MNC",
"FeatureTypes",
"RatePlanCode",
"RatePlanDescription",
"DeviceState",
"BillingCycleStartDate",
"BillingCycleEndDate",
"CurrentBillCycleDataUnRatedUsage"
]
]
| map(. // "NOT-EXIST")
or
[
.categories | map(.extendedAttributes[]) | from_entries
| .["ICCID", "MDN", "MSISDN", "MCC", "MNC", "FeatureTypes", "RatePlanCode", "RatePlanDescription", "DeviceState", "BillingCycleStartDate", "BillingCycleEndDate", "CurrentBillCycleDataUnRatedUsage"]
]
| map(. // "NOT-EXIST")

How to make jq to pick name value pairs

Might be more or less the same ask as How to get JQ name/value pair from nested (array?) response?, but that question and example there is way too convoluted than what I'm asking --
Giving the input jason as in https://jqplay.org/s/jyKBnpx9NYX
Pick out all the name/value pair under .QueryString, .Params into the same unnested array
E.g., for an input of
{
"Some": "Random stuff",
"One": {
"QueryString": [
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" }
]
},
"Two": {
"QueryString": [
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" }
]
},
"Params": [
{ "Name": "ClassName", "Value": "PRODUCT" },
{ "Name": "ListID", "Value": "Products" },
{ "Name": "Mode ", "Value": "1" },
{ "Name": "Dept" , "Value": "5" },
{ "Name": "HasPrevOrder", "Value": "" }
],
"And": {
"QueryString":[]
},
"More": "like",
"More+": "this"
}
The output would be:
[
{
"Name": "IsOrdered",
"Value": "1"
},
{
"Name": "TimeStamp",
"Value": "11654116426247"
},
{
"Name": "IsOrdered",
"Value": "1"
},
{
"Name": "TimeStamp",
"Value": "11654116426247"
},
{
"Name": "ClassName",
"Value": "PRODUCT"
},
{
"Name": "ListID",
"Value": "Products"
},
...
],
without any empty arrays output ([]), while keep the repeated values in the array.
I tried to remove empty arrays output ([]) by changing the jq expression from
[( .. | objects | ( .QueryString, .Params ) | select( . != null) )]
to
[( .. | objects | ( .QueryString, .Params ) | select( . != null && . != []) )]
but it failed.
And the final output need to be unnested into a single array too.
Bonus Q: Would it be possible to output each name/value pair on one line of their own like the following?
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" },
{ "Name": "IsOrdered", "Value": "1" },
{ "Name": "TimeStamp", "Value": "11654116426247" },
To get the Name/Value objects, one per line, you could go with:
jq -c '.. | objects | (.QueryString, .Params) | .. | objects | select( .Name and .Value)'
or more cavalierly:
jq -c '.. | objects | select( .Name and .Value)'
The && must be replaced with and. On the result you can use | flatten to convert "array of arrays of objects" into just "array of objects".
Bonus A: Use the -c/--compact-output flag of jq together with | flatten[] instead of just | flatten.
Together:
jq -c '
[
..
| objects
| ( .QueryString, .Params )
| select(. != null and . != [])
]
| flatten[]' input.json
Although this expression can be simplified into .. | objects | .QueryString[]?, .Params[]?
The output is:
{"Name":"ClassName","Value":"PRODUCT"}
{"Name":"ListID","Value":"Products"}
{"Name":"Mode ","Value":"1"}
{"Name":"Dept","Value":"5"}
{"Name":"HasPrevOrder","Value":""}
{"Name":"IsOrdered","Value":"1"}
{"Name":"TimeStamp","Value":"11654116426247"}
{"Name":"IsOrdered","Value":"1"}
{"Name":"TimeStamp","Value":"11654116426247"}

jq sort using the value of a nested array element

I need some help using jq to sort an array of elements where each element contains a nested
tags array of elements. My input JSON looks like this:
{
"result": [
{
"name": "ct-1",
"tags": [
{
"key": "service_name",
"value": "BaseCT"
},
{
"key": "sequence",
"value": "bb"
}
]
},
{
"name": "ct-2",
"tags": [
{
"key": "service_name",
"value": "BaseCT"
},
{
"key": "sequence",
"value": "aa"
}
]
}
]
}
I would like to sort using the value of the sequence tag in the nested tags array so that the output looks like this:
{
"result": [
{
"name": "ct-2",
"tags": [
{
"key": "service_name",
"value": "BaseCT"
},
{
"key": "sequence",
"value": "aa"
}
]
},
{
"name": "ct-1",
"tags": [
{
"key": "service_name",
"value": "BaseCT"
},
{
"key": "sequence",
"value": "bb"
}
]
}
]
}
I have tried the following jq command:
$ jq '.result |= ([.[] | .tags[] | select(.key == "sequence") | .value] | sort_by(.))' input.json
but I get the following result:
{
"result": [
"aa",
"bb"
]
}
Please let me know if you know how to deal with this scenario.
from_entries converts an array of key-value pairs to an object, you can use it with sort_by like this:
.result |= sort_by(.tags | from_entries | .sequence)

Query parent data (multi-level) based on a child value, on a json file, using jq

I have a ksh script that retrives (using curl) a json file similar to the one bellow:
{
"Type1": {
"dev": {
"server": [
{ "group": "APP1", "name": "DAPP1002", "ip": "10.1.1.1" },
{ "group": "APP2", "name": "DAPP2001", "ip": "10.1.1.2" }
]
},
"qa": {
"server": [
{ "group": "APP1", "name": "QAPP1002", "ip": "10.1.2.1" },
{ "group": "APP2", "name": "QAPP2001", "ip": "10.1.2.2" }
]
},
"prod": {
"proxy": "type1.prod.proxy.mydomain.com",
"server": [
{ "group": "APP1", "name": "PAPP1001", "ip": "10.1.3.1" },
{ "group": "APP1", "name": "PAPP1002", "ip": "10.1.3.2" },
{ "group": "APP2", "name": "PAPP2001", "ip": "10.1.3.3" }
]
}
},
"Type2": {
"dev": {
"server": [
{ "group": "APP8", "name": "DAPP8002", "ip": "10.2.1.1" },
{ "group": "APP9", "name": "DAPP9001", "ip": "10.2.1.2" }
]
},
"qa": {
"server": [
{ "group": "APP8", "name": "QAPP8002", "ip": "10.2.2.1" },
{ "group": "APP9", "name": "QAPP9001", "ip": "10.2.2.2" }
]
},
"prod": {
"proxy": "type2.prod.proxy.mydomain.com",
"server": [
{ "group": "APP8", "name": "PAPP8001", "ip": "10.2.3.1" },
{ "group": "APP9", "name": "PAPP9001", "ip": "10.2.3.2" },
{ "group": "APP9", "name": "PAPP9002", "ip": "10.2.3.3" }
]
}
}
}
... based on a server name (field "name") I would have to collect the following info, to pass to a function:
"Type", "name", "ip", "proxy"
(Note that the "proxy" info is optional)
I am new to json, and I am trying to get this filtered with jq but so far, I am out of lucky.
What I acomplished so far is the following jq query, when searching for "PAPP9001" :
jq '.[] | .[] | select(.server[].name=="PAPP9001") | .proxy as $proxy | .server[] | {proxy: $proxy, name: .name, ip: .ip} | select(.name=="PAPP9001")' curlreturn.json
which returns me:
{
"proxy": "type2.prod.proxy.mydomain.com",
"name": "PAPP9001",
"ip": "10.2.3.2"
}
but:
I could not get the "Type" info, at the top level
Considering the number of pipes and the 2 selects, I doubt that this is the most efficient way.
One way to retrieve the key names programmatically is using to_entries. For example, given your input, this jq filter:
to_entries[]
| .key as $type
| .value[]
| .proxy as $proxy
| .server[]
| select(.name == "PAPP9001")
| { Type: $type, name, ip, proxy: $proxy }
yields:
{
"Type": "Type2",
"name": "PAPP9001",
"ip": "10.2.3.2",
"proxy": "type2.prod.proxy.mydomain.com"
}
Variations
If, for example, you wanted these four fields as a CSV row, then you could replace the last line of the filter above with:
| [$type, .name, .ip, $proxy] | #csv
See the jq manual for how to use string interpolation.