Conditionally add a field in JSON transformation using JQ - json

I am trying to transform my JSON to different structure using JQ. I am able to achieve my new structure, however i am getting feilds with Null objects if they are not present in Source, my client wants to remove the fields if they are having null values..
As i iterate i am able to get the structure in new format. But additional structures are coming.
Code Snippet- https://jqplay.org/s/w2N_Ozg9Ag
JSON
{
"amazon": {
"activeitem": 2,
"createdDate": "2019-01-15T17:36:31.588Z",
"lastModifiedDate": "2019-01-15T17:36:31.588Z",
"user": "net",
"userType": "new",
"items": [
{
"id": 1,
"name": "harry potter",
"state": "sold",
"type": {
"branded": false,
"description": "artwork",
"contentLevel": "season"
}
},
{
"id": 2,
"name": "adidas shoes",
"state": null ,
"type": {
"branded": false,
"description": "Spprts",
"contentLevel": "season"
}
},
{
"id": 3,
"name": "watch",
"type": {
"branded": false,
"description": "walking",
"contentLevel": "special"
}
},
{
"id": 4,
"name": "adidas shoes",
"state": "in inventory",
"type": {
"branded": false,
"description": "running",
"contentLevel": "winter"
}
}
],
"product": {
"id": 4,
"name": "adidas shoes",
"source": "dealer",
"destination": "resident"
}
}
}
JQ Query:
.amazon | { userType: .userType, userName: .user, itemCatalog: (.items | map({ itemId: .id, name, state} )) }
Expected Response:
{
"userType": "new",
"userName": "net",
"itemCatalog": [
{
"itemId": 1,
"name": "harry potter",
"state": "sold"
},
{
"itemId": 2,
"name": "adidas shoes"
},
{
"itemId": 3,
"name": "watch"
},
{
"itemId": 4,
"name": "adidas shoes",
"state": "in inventory"
}
]
}
With the query i have, i am getting state : null for the entries which has empty or null values. I want to hide the field itself in these cases.

Horrible solution, delete them after the query. There must be a neater way? (I started using jq today)
.amazon | { userType: .userType, userName: .user, itemCatalog: (.items | map({ itemId: .id, name, state} )) }| del(.itemCatalog[].state |select(. == null))
https://jqplay.org/s/jlNYmJNi25

Related

Check if a key exists and return another key

I need help with jq syntax on how to return the Gitlab job ID if it contains an artifact. The JSON output looks like this (removed a lot of unrelated info from it and added [...]):
[{
"id": 3219589880,
"status": "success",
"stage": "test",
"name": "job_with_no_artifact",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.119Z",
"started_at": "2022-10-24T18:21:25.986Z",
"finished_at": "2022-10-24T18:21:38.464Z",
"duration": 12.478682,
"queued_duration": 0.499786,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
"pipeline": {
"id": 123456789,
[...]
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts": [],
"runner": {
"id": 12270859,
[...]
},
"artifacts_expire_at": null,
"tag_list": []
}, {
"id": 3219589878,
"status": "success",
"stage": "test",
"name": "create_artifact_job_2",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.111Z",
"started_at": "2022-10-24T18:21:25.922Z",
"finished_at": "2022-10-24T18:21:39.090Z",
"duration": 13.168405,
"queued_duration": 0.464364,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 675641982,
[...],
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 223
},
"artifacts": [{
"file_type": "archive",
"size": 223,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 153,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270845,
[...]
},
"artifacts_expire_at": "2022-10-25T18:21:35.859Z",
"tag_list": []
}, {
"id": 3219589876,
"status": "success",
"stage": "test",
"name": "create_artifact_job_1",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.103Z",
"started_at": "2022-10-24T18:21:25.503Z",
"finished_at": "2022-10-24T18:21:41.407Z",
"duration": 15.904028,
"queued_duration": 0.098837,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 123456789,
[...]
},
"web_url": "WEB_URL",
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 217
},
"artifacts": [{
"file_type": "archive",
"size": 217,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 152,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270857,
},
"artifacts_expire_at": "2022-10-25T18:21:37.808Z",
"tag_list": []
}]
I've been trying to do either of the following using jQ:
Either:
Check if artifacts_file key exists in each iteration and if it does return the (job) id (so .[].id)
Check if artifacts array is empty in each iteration and if it is empty return the (job) id.
In both cases I'm able to do the first part but I am not sure how to return the .id key.
Related stackoverflow questions that I've been trying to utilize and adapt to my case:
jq - return array value if its length is not null
How to check for presence of 'key' in jq before iterating over the values
What I have so far: jq '[.[].artifacts[]|select(length > 0)] | .[]' which returns all the artifacts found (but it doesn't contain the .id of the job).
Checking the existence of a field using has:
.[] | select(has("artifacts_file")).id
3219589878
3219589876
Demo
Checking if a field is an empty array by comparing it to []:
.[] | select(.artifacts == []).id
3219589880
Demo

Cannot get jq to query json object [duplicate]

This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}
The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.

Nested json - store values in csv

I am trying to convert a nested json file into csv. It's data from a darts API and the structure is always the same. Nevertheless I got some problems flattening and storing the values in a csv because of the nested structure.
json:
{
"summaries": [{
"sport_event": {
"id": "sr:sport_event:12967512",
"start_time": "2017-11-11T13:15:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29766",
"name": "Grand Slam of Darts 2017, Group G",
"group_name": "G"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:35936",
"name": "Smith, Michael",
"abbreviation": "SMI",
"qualifier": "home"
}, {
"id": "sr:competitor:83895",
"name": "Wilson, James",
"abbreviation": "WIL",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 3,
"winner_id": "sr:competitor:35936"
}
}, {
"sport_event": {
"id": "sr:sport_event:12967508",
"start_time": "2017-11-11T13:40:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29764",
"name": "Grand Slam of Darts 2017, Group F",
"group_name": "F"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:70916",
"name": "Bunting, Stephen",
"abbreviation": "BUN",
"qualifier": "home"
}, {
"id": "sr:competitor:191262",
"name": "de Zwaan, Jeffrey",
"abbreviation": "DEZ",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 4,
"winner_id": "sr:competitor:70916"
}
}
So for each sport_event I would like to store the variables:
"start_time"
from "season" the variable "name"
from "competitors" both "id" and "name"
from "sport_event_status" the "winner_id"
I have already tried to flatten the json file with this code:
import json
f = open(r'path of file.json')
data = json.load(f)
def flatten(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if type(value) == type(dict()):
flatten(value)
elif type(value) == type(list()):
for val in value:
if type(val) == type(str()):
pass
elif type(val) == type(list()):
pass
else:
flatten(val)
flatten(data)
print(data)
This actually prints out the following:
id->sr:season:47332
name->Grand Slam of Darts 2017
start_date->2017-11-11
end_date->2017-11-20
year->2017
competition_id->sr:competition:597
Now my question is how to store the values I mentioned above in a csv file.
Thanks in advance for your support.
Using jq, you basically just have to transcribe your specification, adding a bit of context and taking care of an embedded array:
.summaries[]
| .sport_event # Your specification:
| [.start_time, # start_time
.sport_event_context.season.name] # from "season" the variable "name"
+ [.competitors[] | .id, .name] # from "competitors" both "id" and "name"
+ [.sport_event_status.winner_id] # from "sport_event_status" the "winner_id"
| #csv
Invocation
E.g.
jq -rf program.jq my.json

Merge objects in same array on single key

I have an array of JSON objects formatted as follows:
[
{
"id": 1,
"names": [
{
"name": "Bulbasaur",
"language": {
"name": "en",
"url": "http://myserver.com:8000/api/v2/language/9/"
}
},
],
},
{
"id": 1,
"types": [
{
"slot": 1,
"type": {
"name": "grass",
"url": "http://myserver.com:8000/api/v2/type/12/"
}
},
{
"slot": 2,
"type": {
"name": "poison",
"url": "http://myserver.com:8000/api/v2/type/4/"
}
}
]
},
{
"id": 2,
"names": [
{
"name": "Ivysaur",
"language": {
"name": "en",
"url": "http://myserver.com:8000/api/v2/language/9/"
}
},
],
},
{
"id": 2,
"types": [
{
"slot": 1,
"type": {
"name": "ice",
"url": "http://myserver.com:8000/api/v2/type/10/"
}
},
{
"slot": 2,
"type": {
"name": "electric",
"url": "http://myserver.com:8000/api/v2/type/8/"
}
}
]
},
{
"id": 3,
"names": [
{
"name": "Venusaur",
"language": {
"name": "en",
"url": "http://myserver.com:8000/api/v2/language/9/"
}
},
],
},
{
"id": 3,
"types": [
{
"slot": 1,
"type": {
"name": "ground",
"url": "http://myserver.com:8000/api/v2/type/2/"
}
},
{
"slot": 2,
"type": {
"name": "rock",
"url": "http://myserver.com:8000/api/v2/type/3/"
}
}
]
}
]
Note that these are pairs of separate objects that appear sequentially in a JSON array, with each pair sharing an id field. This pattern repeats several hundred times in the array. What I need to accomplish is to "merge" each id-sharing pair into one object. So, the resultant output would be
[
{
"id": 1,
"names": [
{
"name": "Bulbasaur",
"language": {
"name": "en",
"url": "http://myserver.com:8000/api/v2/language/9/"
}
},
],
"types": [
{
"slot": 1,
"type": {
"name": "grass",
"url": "http://myserver.com:8000/api/v2/type/12/"
}
},
{
"slot": 2,
"type": {
"name": "poison",
"url": "http://myserver.com:8000/api/v2/type/4/"
}
}
]
},
{
"id": 2,
"names": [
{
"name": "Ivysaur",
"language": {
"name": "en",
"url": "http://myserver.com:8000/api/v2/language/9/"
}
},
],
"types": [
{
"slot": 1,
"type": {
"name": "ice",
"url": "http://myserver.com:8000/api/v2/type/10/"
}
},
{
"slot": 2,
"type": {
"name": "electric",
"url": "http://myserver.com:8000/api/v2/type/8/"
}
}
]
},
{
"id": 3,
"names": [
{
"name": "Venusaur",
"language": {
"name": "en",
"url": "http://myserver.com:8000/api/v2/language/9/"
}
},
],
"types": [
{
"slot": 1,
"type": {
"name": "ground",
"url": "http://myserver.com:8000/api/v2/type/2/"
}
},
{
"slot": 2,
"type": {
"name": "rock",
"url": "http://myserver.com:8000/api/v2/type/3/"
}
}
]
}
]
I've gotten these objects to appear next to each other via the group_by(.id) command, but I'm at a loss as to how I should actually combine them. I'm very much still a novice with jq so I'm a bit overwhelmed with the amount of possible solutions.
[Note: The following assumes that the data shown in the Q have been corrected so that they are valid JSON.]
The merging you want can be achieved by object addition (x + y). For example, given the two JSON objects as shown in the question (i.e., as a stream), you could write:
jq -s '.[0] + .[1]'
However, since the question also indicates these objects are actually in an array, let's next consider the case of an array with two objects. In that case, you could simply write:
jq add
Finally, if you have an array of arrays each of which is an array of objects, you could use map(add). Since you don't have a very large array, you could simply write:
group_by(.id) | map(add)
Please note that jq defines object addition in a non-commutative way. Specifically, there is a bias towards the right-most key.

JSON Transformation using JQ

I am trying to transform my JSON to different structure using JQ. I am able to partially achieve my new structure, however i get a additional blocks of data.
As i iterate i am able to get the structure in new format. But additional structures are coming.
Code Snippet- https://jqplay.org/s/3gulSlZiWz
JSON
{
"amazon": {
"activeitem": 2,
"createdDate": "2019-01-15T17:36:31.588Z",
"lastModifiedDate": "2019-01-15T17:36:31.588Z",
"user": "net",
"userType": "new",
"items": [
{
"id": 1,
"name": "harry potter",
"state": "sold",
"type": {
"branded": false,
"description": "artwork",
"contentLevel": "season"
}
},
{
"id": 2,
"name": "adidas shoes",
"state": "in inventory",
"type": {
"branded": false,
"description": "Spprts",
"contentLevel": "season"
}
},
{
"id": 3,
"name": "watch",
"state": "returned",
"type": {
"branded": false,
"description": "walking",
"contentLevel": "special"
}
},
{
"id": 4,
"name": "adidas shoes",
"state": "in inventory",
"type": {
"branded": false,
"description": "running",
"contentLevel": "winter"
}
}
],
"product": {
"id": 4,
"name": "adidas shoes",
"source": "dealer",
"destination": "resident"
}
}
}
JQ Query:
.amazon|
{
userType: .userType,
userName: .user,
itemCatalog: {
itemId: .items[].id,
name: .items[].name
}
}
Expected Response:
{
"userType": "new",
"userName": "net",
"itemCatalog": {
"itemId": 1,
"name": "harry potter"
},{
"itemId": 2,
"name": "adidas shoes"
}, {
"itemId": 3,
"name": "watch"
},{
"itemId": 4,
"name": "adidas shoes"
}
}
But getting something weird long duplicated response.
As already pointed out in a comment, the "expected response" is not JSON and probably not what you want anyway. The following would make sense and in any case illustrates how to iterate appropriately:
.amazon
| { userType: .userType,
userName: .user,
itemCatalog: (.items | map({ itemId: .id, name} ))
}
Output
{
"userType": "new",
"userName": "net",
"itemCatalog": [
{
"itemId": 1,
"name": "harry potter"
},
{
"itemId": 2,
"name": "adidas shoes"
},
{
"itemId": 3,
"name": "watch"
},
{
"itemId": 4,
"name": "adidas shoes"
}
]
}