JQ - removing duplicates using unique_by - json

I am trying to remove the duplicates from the following json by id
Here is the json:
{
"Result": [
{
"name": "validation-of-art",
"id": "12",
"status": "passed",
"duration": 4740302
},
{
"name": "validation-of-art",
"id": "12",
"status": "passed",
"duration": 272320
},
{
"name": "validation-of-art",
"id": "13",
"status": "passed",
"duration": 272320
}
]
}
Here is what i have tried with:
jq -r 'unique_by(.Result.name)'
and also with jq 'unique_by(.Result[].name)'
I am getting an error - Cannot index array with string "Result"
Any help would be appreciated.

Here is an example which eliminates all but one of the .Result objects using unique_by(.name)
$ jq -M '.Result |= unique_by(.name)' data.json
{
"Result": [
{
"name": "validation-of-art",
"id": "12",
"status": "passed",
"duration": 4740302
}
]
}
If this isn't quite what you want you can generalize this easily. E.g. to keep one object for each unique {name,id} you could use
$ jq -M '.Result |= unique_by({name, id})' data.json
{
"Result": [
{
"name": "validation-of-art",
"id": "12",
"status": "passed",
"duration": 4740302
},
{
"name": "validation-of-art",
"id": "13",
"status": "passed",
"duration": 272320
}
]
}

Related

Cannot get jq to query json object [duplicate]

This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}
The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.

How to solve JQ processing as multiple dictionaries if document has an array and JQ filter uses group_by?

I have the following JSON document
[
{
"id": 6,
"description": "Component 1",
"due": "20211122T183000Z",
"entry": "20211119T181735Z",
"modified": "20211119T181735Z",
"project": "product1",
"status": "pending",
"uuid": "55bf0497-208c-492a-8f76-bb692d48afaa",
"tags": [
"abc",
"123"
],
"urgency": 13.9699
},
{
"id": 10,
"description": "Component 2",
"due": "20211129T183000Z",
"entry": "20211130T045620Z",
"modified": "20211130T045620Z",
"project": "product2",
"status": "pending",
"uuid": "d57eb8f7-e5ec-497c-ac47-f1cf34b005db",
"tags": [
"foo",
"bar"
],
"urgency": 14.0151
},
{
"id": 11,
"description": "Component 3",
"due": "20211202T183000Z",
"entry": "20211130T121529Z",
"completed": "20211130T123915Z",
"project": "product3",
"status": "pending",
"uuid": "9f15e6a4-5cef-4b0f-915b-fc916ab152c7",
"tags": [
"xyz",
"676"
],
"urgency": 14.0096
},
{
"id": 12,
"description": "Component 4",
"due": "20211202T183000Z",
"entry": "20211130T122537Z",
"pending": "20211130T122537Z",
"project": "product1",
"status": "pending",
"uuid": "91c9ec76-42a7-4ebc-9649-b3a12027feb1",
"tags": [
"def"
],
"urgency": 13.9096
}
]
I have written below JQ filter to parse the JSON, the expected output is not to generate multiple dictionaries.
group_by(.project,.status)
| .[]
| { project: .[0].project , status: .[0].status ,
description: [{"\(.[].description)" : (.[].tags | join(";"))}] }
After applying the filter, i get the below output with multiple dictionaries because of the tags array
{
"project": "product1",
"status": "pending",
"description": [
{
"Component 1": "abc;123"
},
{
"Component 1": "def"
},
{
"Component 4": "abc;123"
},
{
"Component 4": "def"
}
]
}
{
"project": "product2",
"status": "completed",
"description": [
{
"Component 2": "foo;bar"
}
]
}
{
"project": "product3",
"status": "completed",
"description": [
{
"Component 3": "xyz;676"
}
]
}
The output I am expecting is without multiple dictionaries as below
{
"project": "product1",
"status": "pending",
"description": [
{
"Component 1": "abc;123"
},
{
"Component 4": "def"
}
]
}
{
"project": "product2",
"status": "completed",
"description": [
{
"Component 2": "foo;bar"
}
]
}
{
"project": "product3",
"status": "completed",
"description": [
{
"Component 3": "xyz;676"
}
]
}
How can I generate the above-expected output using JQ?
One similar option to yours would be
jq 'group_by(.project)[]
| { project: .[0].project, status:.[0].status, "description": [.[]
| { (.description) : .tags|join(";") } ] }'
Demo
To just bring together .description and .tags use
jq '.[] | del(.description, .tags) + ({(.description): .tags | join(";")})'
Demo
To also group by .project and just consider .project, .status and an array with the .description and .tags from above, go
jq '
group_by(.project)[]
| (first | {project, status})
+ {description: map({(.description): .tags | join(";")})}
'
Demo

How to avoid generating all combinations of selected data while constructing an object?

My original JSON is given below.
[
{
"id": "1",
"name": "AA_1",
"total": "100002",
"files": [
{
"filename": "8665b987ab48511eda9e458046fbc42e.csv",
"filename_original": "some.csv",
"status": "3",
"total": "100002",
"time": "2020-08-24 23:25:49"
}
],
"status": "3",
"created": "2020-08-24 23:25:49",
"filenames": "8665b987ab48511eda9e458046fbc42e.csv",
"is_append": "0",
"is_deleted": "0",
"comment": null
},
{
"id": "4",
"name": "AA_2",
"total": "43806503",
"files": [
{
"filename": "1b4812fe634938928953dd40db1f70b2.csv",
"filename_original": "other.csv",
"status": "3",
"total": "21903252",
"time": "2020-08-24 23:33:43"
},
{
"filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"filename_original": "some.csv",
"status": "2",
"total": 0,
"time": "2020-08-24 23:29:30"
}
],
"status": "2",
"created": "2020-08-24 23:35:51",
"filenames": "1b4812fe634938928953dd40db1f70b2.csv&&63ab85fef2412ce80ae8bd018497d8bf.csv",
"is_append": "0",
"is_deleted": "0",
"comment": null
}
]
From this JSON I want to create new objects by combining fields from objects which have status: 2 and their files which also have the same pair, status: 2.
So, I am expecting a JSON array as below.
[
{
"id": "4",
"name": "AA_2",
"file_filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"file_status": 2
}
]
So far I tried with this JQ filter:
.[]|select(.status=="2")|[{id:.id,file_filename:.files[].filename,file_status:.files[].status}]
But this produces some invalid data.
[
{
"id": "4", # want to remove this as file.status != 2
"file_filename": "1b4812fe634938928953dd40db1f70b2.csv",
"file_status": "3"
},
{
"id": "4",
"file_filename": "1b4812fe634938928953dd40db1f70b2.csv",
"file_status": "2"
},
{
"id": "4", # Repeat
"file_filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"file_status": "3"
},
{
"id": "4", # Repeat
"file_filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"file_status": "2"
}
]
How do I filter the new JSON using JQ and remove these duplicate objects?
By applying [] operator to files twice, you're running into a combinatorial explosion. That needs to be avoided, for example:
[ .[] | select(.status == "2") | {id, name} + (.files[] | select(.status == "2") | {file_filename: .filename, file_status: .status}) ]
Online demo

Passing JSON value using jq command to a new JSON file

I ran curl command and then parsed the value ("id").
request:
curl "http://192.168.22.22/test/index/limit:1/page:1/sort:id/pag1.json" | jq -r '.[0].id'
curl response:
[
{
"id": "381",
"org_id": "9",
"date": "2018-10-10",
"info": "THIS IS TEST",
"uuid": "5bbd1b41bc",
"published": 1,
"an": "2",
"attribute_count": "4",
"orgc_id": "8",
"timestamp": "1",
"dEST": "0",
"sharing": "0",
"proposal": false,
"locked": false,
"level_id": "1",
"publish_timestamp": "0",
"disable_correlation": false,
"extends_uuid": "",
"Org": {
"id": "5",
"name": "test",
"uuid": "5b9bc"
},
"Orgc": {
"id": "1",
"name": "test",
"uuid": "5b9f93bdeac1b41bc"
},
"ETag": []
}
]
jq response:
381
Now I'm trying to get the "id" number 381, and then to create a new JSON file on the disk when I place the "id" number in the right place.
The new JSON file for example:
{
"request": {
"Event": {
"id": "381",
"task": "new"
}
}
}
Given your input, this works:
jq -r '{"request": {"Event": {"id": .[0].id, "task": "new"}}}' > file

Add the same element of array in a existing JSON using jq

I have a json file and I want to add some value from top in another place in json.
I am trying to use jq command line.
{
"channel": "mychannel",
"videos": [
{
"id": "10",
"url": "youtube.com"
},
{
"id": "20",
"url": "youtube.com"
}
]
}
The output would be:
{
"channel": "mychannel",
"videos": [
{
"channel": "mychannel",
"id": "10",
"url": "youtube.com"
},
{
"channel": "mychannel",
"id": "20",
"url": "youtube.com"
}
]
}
in my json the "channel" is static, same value always. I need a way to concatenate always in each video array.
Someone can help me?
jq .videos + channel
Use a variable to remember .channel in the later stages of the pipeline.
$ jq '.channel as $ch | .videos[].channel = $ch' tmp.json
{
"channel": "mychannel",
"videos": [
{
"id": "10",
"url": "youtube.com",
"channel": "mychannel"
},
{
"id": "20",
"url": "youtube.com",
"channel": "mychannel"
}
]
}