Remove duplicate JSON blocks from file using JQ

Remove duplicate JSON blocks from file using JQ - json

I have a JSON file that contains thousands of entries, and i need to remove the duplicate blocks.
Here is an example of the file:
{ "signatures": [
{
"signatureId": 0050,
"mode": 0
},
{
"signatureId": 0012,
"mode": 0
},
{
"signatureId": 0012,
"mode": 1
}
]}
Here is the target result to achieve:
{ "signatures": [
{
"signatureId": 0050,
"mode": 0
},
{
"signatureId": 0012,
"mode": 0
}
]}
And as you see, the "mode" value doesnt matter, what really matters is that the "signatureId" must not be duplicate, so when we remove the whole block, which ever "mode" stays, its not a problem.
I can only use Shell and/or JQ.

Use unique_by with the field to be checked for duplicates as its argument. It will always take the first of a kind (here, the one with "mode": 0)
jq '.signatures |= unique_by(.signatureId)'
{
"signatures": [
{
"signatureId": 12,
"mode": 0
},
{
"signatureId": 50,
"mode": 0
}
]
}
Demo

Related

Delete duplications in JSON file

I am trying to reedit json file to print only subgroups that has any attributes marked as "change": false.
Json below:
{"group":{
"subgroup1":{
"attributes":[
{
"change":false,
"name":"Name"},
{
"change":false,
"name":"SecondName"},
],
"id":1,
"name":"MasterTest"},
"subgroup2":{
"attributes":[
{
"change":true,
"name":"Name"
},
{
"change":false,
"name":"Newname"
}
],
"id":2,
"name":"MasterSet"},
}}
I was trying to use command:
cat test.json | jq '.group[] | select (.attributes[].change==false)
which produce needed output but with duplicates. Can anyone help here? Or shall I use different command to achieve that result?

.attributes[] iterates over the attributes, and each iteration step produces its own result. Use the any filter which aggregates multiple values into one, in this case a boolean with the meaning of "at least one":
.group[] | select(any(.attributes[]; .change==false))
{
"attributes": [
{
"change": false,
"name": "Name"
},
{
"change": false,
"name": "SecondName"
}
],
"id": 1,
"name": "MasterTest"
}
{
"attributes": [
{
"change": true,
"name": "Name"
},
{
"change": false,
"name": "Newname"
}
],
"id": 2,
"name": "MasterSet"
}
Demo

Looks to me like the duplicate is NOT a duplicate, but a condition arising from a nested sub-grouping, which gives the appearance of a duplicate. You should look to see if there is a switch to skip processing sub-groups when the upper-level meets the condition, thereby avoiding the perceived duplication.

How to search within sections of a JSON File?S

So, lets say I had a JSON File like this:
{
"content": [
{
"word": "cat",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "small"
}
]
},
{
"word": "dog",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "big"
}
]
},
{
"word": "chocolate",
"adjectives": [
{
"type": "visual",
"adjective": "small"
},
{
"type": "gustatory",
"adjective": "sweet"
}
]
}
]
}
Now, say I wanted to search for two words. For example, "Fluffy" and "Small." The problem with this is that both words' adjectives contain small, and so I would have to manually search for which one contains fluffy. So, how would I do this in a quicker manner?
In other words, how would I find the word(s) with both "fluffy" and "small"
EDIT: Sorry, new asker. Anything that words in a terminal is fair game. jq is a really great JSON searcher, and so this is preferred, and sorry for the confusion. I also fixed the JSON

A command-line solution would be to use jq:
jq -r '.content[] | select(.adjectives[].adjective == "fluffy") | .word' /pathToJsonFile.json
Output:
cat
Are you looking for something like this? Do you need a solution that uses other programming languages?
(P.S. your JSON example appears to be invalid)

Since jq is now fair game (this was only clarified later in the comments), here is one solution using jq.
First, fix the JSON to be actually valid:
{
"content": [
{
"word": "cat",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "small"
}
]
},
{
"word": "dog",
"adjectives": [
{
"type": "textile",
"adjective": "fluffy"
},
{
"type": "visual",
"adjective": "big"
}
]
},
{
"word": "chocolate",
"adjectives": [
{
"type": "visual",
"adjective": "small"
},
{
"type": "gustatory",
"adjective": "sweet"
}
]
}
]
}
Then, the following jq filter returns an array containing the words which contain both adjectives:
.content
| map(
select(
.adjectives as $adj
| all("small","fluffy"; IN($adj[].adjective))
)
| .word
)
If a non-array output is required, and only one word per line, use .[] instead of map (either after content or as a final filter), e.g.:
jq -r '.content[]
| select(
.adjectives as $adj
| all("small","fluffy"; IN($adj[].adjective))
)
| .word'

JSON schema help, array of objects

I am trying to write a JSON object where the key "pStock" is the total stock of an array of bike sizes 'size'. Each size has an inventory or 'count'. I have two versions of the same code. the first one returns an error message even though the syntax looks correct to my eye.
"pStock": [
{
"size": {
"type": "string",
"count": {
"type": "number"
}
}
}
}
]
Here is the second version which returns no errors but I'm not quite sure it's saying what I want it to say.
"pStock": {
"type": ["object"],
"size": {
"type": "string",
"count": {
"type": "number"
}
}
}
EDIT 1
I appreciate all of these responses. I made a silly error in posting. Below is the correct "wrong" code that isn't working. I get the error. 'Error, schema is invalid: data/properties/pStock should be object,boolean
at Ajv.validateSchema' Rephrasing. the below code still does not work and received the error 'Error, schema is invalid: data/properties/pStock should be object,boolean
at Ajv.validateSchema'
"pStock": [
{
"size": {
"type": "string",
"count": {
"type": "number"
}
}
}
]
Any help would be greatly appreciated.

Count the opening and closing curly braces on your first JSON. It has 3 opening and 4 closing.
"pStock": [
{ // Open 1
"size": { // Open 2
"type": "string",
"count": { // Open 3
"type": "number"
} // Close 3
} // Close 2
} // Close 1
} // Close what?
]
Just remove the last one and it will work.

You are missing the closing square bracket ] on the pStock array because you have an extra brace } i.e.
"pStock": [
{
"size": {
"type": "string",
"count": {
"type": "number"
}
}
}
} <--- this is wrong
]
should be
{
"pStock":[
{
"size":{
"type":"string",
"count":{
"type":"number"
}
}
}
]
}

The first version should look like that:
"pStock": [
{
"size": {
"type": "string",
"count": {
"type": "number"
}
}
}
]
You had too many } (line 7)
The second version does not represent what you wanted, it does not contain the array of sizes.
But you can create this (pStock with multiple keys of different sizes. Then in each size write the inventory/count):
"pStock": {
"size1": {
inventory: "5",
count: 4
},
"size2": {
inventory: "5",
count: 4
}
}

In Logic Apps JSON Array while parsing throwing error for single object but for multiple objects it is working fine

While parsing JSON in Azure Logic App in my array I can get single or multiple values/objects (Box as shown in below example)
Both type of inputs are correct but when only single object is coming then it is throwing an error "Invalid type. Expected Array but got Object "
Input 1 (Throwing error) : -
{
"MyBoxCollection":
{
"Box":{
"BoxName": "Box 1"
}
}
}
Input 2 (Working Fine) : -
{
"MyBoxCollection":
[
{
"Box":{
"BoxName": "Box 1"
},
"Box":{
"BoxName": "Box 2"
}
}]
}
JSON Schema :
"MyBoxCollection": {
"type": "object",
"properties": {
"box": {
"type": "array",
items": {
"type": "object",
"properties": {
"BoxName": {
"type": "string"
},
......
.....
..
}
Error Details :-
[
{
"message": "Invalid type. Expected Array but got Object .",
"lineNumber": 0,
"linePosition": 0,
"path": "Order.MyBoxCollection.Box",
"schemaId": "#/properties/Root/properties/MyBoxCollection/properties/Box",
"errorType": "type",
"childErrors": []
}
]

I used to use the trick of injecting a couple of dummy rows in the resultset as suggested by the other posts, but I recently found a better way. Kudos to Thomas Prokov for providing the inspiration in his NETWORG blog post.
The JSON parse schema accepts multiple choices as type, so simply replace
"type": "array"
with
"type": ["array","object"]
and your parse step will happily parse either an array or a single value (or no value at all).
You may then need to identify which scenario you're in: 0, 1 or multiple records in the resultset? I'm pasting below how you can create a variable (ResultsetSize) which takes one of 3 values (rs_0, rs_1 or rs_n) for your switch:
"Initialize_ResultsetSize": {
"inputs": {
"variables": [
{
"name": "ResultsetSize",
"type": "string",
"value": "rs_n"
}
]
},
"runAfter": {
"<replace_with_name_of_previous_action>": [
"Succeeded"
]
},
"type": "InitializeVariable"
},
"Check_if_resultset_is_0_or_1_records": {
"actions": {
"Set_ResultsetSize_to_0": {
"inputs": {
"name": "ResultsetSize",
"value": "rs_0"
},
"runAfter": {},
"type": "SetVariable"
}
},
"else": {
"actions": {
"Set_ResultsetSize_to_1": {
"inputs": {
"name": "ResultsetSize",
"value": "rs_1"
},
"runAfter": {},
"type": "SetVariable"
}
}
},
"expression": {
"and": [
{
"equals": [
"#string(body('<replace_with_name_of_Parse_JSON_action>')?['<replace_with_name_of_root_element>']?['<replace_with_name_of_list_container_element>']?['<replace_with_name_of_item_element>']?['<replace_with_non_null_element_or_attribute>'])",
""
]
}
]
},
"runAfter": {
"Initialize_ResultsetSize": [
"Succeeded"
]
},
"type": "If"
},
"Process_resultset_depending_on_ResultsetSize": {
"cases": {
"Case_no_record": {
"actions": {
},
"case": "rs_0"
},
"Case_one_record_only": {
"actions": {
},
"case": "rs_1"
}
},
"default": {
"actions": {
}
},
"expression": "#variables('ResultsetSize')",
"runAfter": {
"Check_if_resultset_is_0_or_1_records": [
"Succeeded",
"Failed",
"Skipped",
"TimedOut"
]
},
"type": "Switch"
}

For this problem, I met another stack overflow post which is similar to this problem. While there is one "Box", it will be shown as {key/value pair} but not [array] when we convert it to json format. I think it is caused by design, so maybe we can just add a record "Box" at the source of your xml data such as:
<Box>specific_test</Box>
And do some operation to escape the "specific_test" in the next steps.
Another workaround for your reference:
If your json data has only one array, we can use it to do a judgment. We can judge the json data if it contains "[" character. If it contains "[", the return value is the index of the "[" character. If not contains, the return value is -1.
The expression shows as below:
indexOf('{"MyBoxCollection":{"Box":[aaa,bbb]}}', '[')
The screenshot above is the situation when it doesn't contain "[", it return -1.
Then we can add a "If" condition. If >0, do "Parse JSON" with one of the schema. If =-1, do "Parse JSON" with the other schema.
Hope it would be helpful to your problem~

We faced a similar issue. The only solution we find is by manipulating the XML before conversion. We updated XML nodes which needs to be an array even when we have single element using this. We used a Azure function to update the required XML attributes and then returned the XML for conversion in Logic Apps. Hope this helps someone.

JMESPpath: filtering out by nested attributes

I am trying to apply the filter using JMESPath jp (https://github.com/jmespath/jp) utility.
My goal is to have only the flow whose state is 'ADDED' and having specific device id (e.g. 0000debf17cff54b) filtered out.
I am trying something like this:
cat test | ./jp '[][?id=="of:00002259146f7743" && state=="ADDED"]'
but the result is []
[
{
"flow": [
{
"ethType": "0x86dd",
"type": "ETH_TYPE"
},
{
"protocol": 58,
"type": "IP_PROTO"
},
{
"icmpv6Type": 135,
"type": "ICMPV6_TYPE"
}
],
"id": "of:00001aced404664b",
"state": "ADDED"
},
{
"flow": [
{
"ethType": "0x86dd",
"type": "ETH_TYPE"
},
{
"protocol": 58,
"type": "IP_PROTO"
},
{
"icmpv6Type": 136,
"type": "ICMPV6_TYPE"
}
],
"id": "of:0000debf17cff54b",
"state": "ADDED"
}
]

No need to use the first [], [?id=='of:0000debf17cff54b' && state=='ADDED'] works fine.
Using the first [] gives you the entire array, that does not contain a id or state keys.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Remove duplicate JSON blocks from file using JQ - json

Use unique_by with the field to be checked for duplicates as its argument. It will always take the first of a kind (here, the one with "mode": 0) jq '.signatures |= unique_by(.signatureId)' { "signatures": [ { "signatureId": 12, "mode": 0 }, { "signatureId": 50, "mode": 0 } ] } Demo

Related

Delete duplications in JSON file

How to search within sections of a JSON File?S

JSON schema help, array of objects

In Logic Apps JSON Array while parsing throwing error for single object but for multiple objects it is working fine

JMESPpath: filtering out by nested attributes

Categories

Resources