Related
I need help with jq syntax on how to return the Gitlab job ID if it contains an artifact. The JSON output looks like this (removed a lot of unrelated info from it and added [...]):
[{
"id": 3219589880,
"status": "success",
"stage": "test",
"name": "job_with_no_artifact",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.119Z",
"started_at": "2022-10-24T18:21:25.986Z",
"finished_at": "2022-10-24T18:21:38.464Z",
"duration": 12.478682,
"queued_duration": 0.499786,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
"pipeline": {
"id": 123456789,
[...]
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts": [],
"runner": {
"id": 12270859,
[...]
},
"artifacts_expire_at": null,
"tag_list": []
}, {
"id": 3219589878,
"status": "success",
"stage": "test",
"name": "create_artifact_job_2",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.111Z",
"started_at": "2022-10-24T18:21:25.922Z",
"finished_at": "2022-10-24T18:21:39.090Z",
"duration": 13.168405,
"queued_duration": 0.464364,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 675641982,
[...],
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 223
},
"artifacts": [{
"file_type": "archive",
"size": 223,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 153,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270845,
[...]
},
"artifacts_expire_at": "2022-10-25T18:21:35.859Z",
"tag_list": []
}, {
"id": 3219589876,
"status": "success",
"stage": "test",
"name": "create_artifact_job_1",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.103Z",
"started_at": "2022-10-24T18:21:25.503Z",
"finished_at": "2022-10-24T18:21:41.407Z",
"duration": 15.904028,
"queued_duration": 0.098837,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 123456789,
[...]
},
"web_url": "WEB_URL",
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 217
},
"artifacts": [{
"file_type": "archive",
"size": 217,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 152,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270857,
},
"artifacts_expire_at": "2022-10-25T18:21:37.808Z",
"tag_list": []
}]
I've been trying to do either of the following using jQ:
Either:
Check if artifacts_file key exists in each iteration and if it does return the (job) id (so .[].id)
Check if artifacts array is empty in each iteration and if it is empty return the (job) id.
In both cases I'm able to do the first part but I am not sure how to return the .id key.
Related stackoverflow questions that I've been trying to utilize and adapt to my case:
jq - return array value if its length is not null
How to check for presence of 'key' in jq before iterating over the values
What I have so far: jq '[.[].artifacts[]|select(length > 0)] | .[]' which returns all the artifacts found (but it doesn't contain the .id of the job).
Checking the existence of a field using has:
.[] | select(has("artifacts_file")).id
3219589878
3219589876
Demo
Checking if a field is an empty array by comparing it to []:
.[] | select(.artifacts == []).id
3219589880
Demo
I have the following data frame, df1:
A B C
123 B1 C1
456 B2 C2
And data frame df2:
A
[
{
"id": "123",
"details": {
"id": "123",
"color": null,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": null,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I want to find all values in df1[A] inside the JSON found inside df2[A] under the first instance of the id parameter. Once found, I want to replace the NULL values inside the color parameter with the df1[B] and offer_id with df1[C].
The output should create a new column with the appended values:
df2[B]:
[
{
"id": "123",
"details": {
"id": "123",
"color": B1,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": C1,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": B2,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": C2,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I just started researching how to approach this, but I need guidance on the most efficient way. Any insight would be greatly appreciated.
This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}
The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.
I am trying to convert a nested json file into csv. It's data from a darts API and the structure is always the same. Nevertheless I got some problems flattening and storing the values in a csv because of the nested structure.
json:
{
"summaries": [{
"sport_event": {
"id": "sr:sport_event:12967512",
"start_time": "2017-11-11T13:15:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29766",
"name": "Grand Slam of Darts 2017, Group G",
"group_name": "G"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:35936",
"name": "Smith, Michael",
"abbreviation": "SMI",
"qualifier": "home"
}, {
"id": "sr:competitor:83895",
"name": "Wilson, James",
"abbreviation": "WIL",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 3,
"winner_id": "sr:competitor:35936"
}
}, {
"sport_event": {
"id": "sr:sport_event:12967508",
"start_time": "2017-11-11T13:40:00+00:00",
"start_time_confirmed": true,
"sport_event_context": {
"sport": {
"id": "sr:sport:22",
"name": "Darts"
},
"category": {
"id": "sr:category:104",
"name": "International"
},
"competition": {
"id": "sr:competition:597",
"name": "Grand Slam of Darts"
},
"season": {
"id": "sr:season:47332",
"name": "Grand Slam of Darts 2017",
"start_date": "2017-11-11",
"end_date": "2017-11-20",
"year": "2017",
"competition_id": "sr:competition:597"
},
"stage": {
"order": 1,
"type": "league",
"phase": "stage_1",
"start_date": "2017-11-11",
"end_date": "2017-11-15",
"year": "2017"
},
"round": {
"number": 1
},
"groups": [{
"id": "sr:league:29764",
"name": "Grand Slam of Darts 2017, Group F",
"group_name": "F"
}]
},
"coverage": {
"live": true
},
"competitors": [{
"id": "sr:competitor:70916",
"name": "Bunting, Stephen",
"abbreviation": "BUN",
"qualifier": "home"
}, {
"id": "sr:competitor:191262",
"name": "de Zwaan, Jeffrey",
"abbreviation": "DEZ",
"qualifier": "away"
}]
},
"sport_event_status": {
"status": "closed",
"match_status": "ended",
"home_score": 5,
"away_score": 4,
"winner_id": "sr:competitor:70916"
}
}
So for each sport_event I would like to store the variables:
"start_time"
from "season" the variable "name"
from "competitors" both "id" and "name"
from "sport_event_status" the "winner_id"
I have already tried to flatten the json file with this code:
import json
f = open(r'path of file.json')
data = json.load(f)
def flatten(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if type(value) == type(dict()):
flatten(value)
elif type(value) == type(list()):
for val in value:
if type(val) == type(str()):
pass
elif type(val) == type(list()):
pass
else:
flatten(val)
flatten(data)
print(data)
This actually prints out the following:
id->sr:season:47332
name->Grand Slam of Darts 2017
start_date->2017-11-11
end_date->2017-11-20
year->2017
competition_id->sr:competition:597
Now my question is how to store the values I mentioned above in a csv file.
Thanks in advance for your support.
Using jq, you basically just have to transcribe your specification, adding a bit of context and taking care of an embedded array:
.summaries[]
| .sport_event # Your specification:
| [.start_time, # start_time
.sport_event_context.season.name] # from "season" the variable "name"
+ [.competitors[] | .id, .name] # from "competitors" both "id" and "name"
+ [.sport_event_status.winner_id] # from "sport_event_status" the "winner_id"
| #csv
Invocation
E.g.
jq -rf program.jq my.json
My original JSON is given below.
[
{
"id": "1",
"name": "AA_1",
"total": "100002",
"files": [
{
"filename": "8665b987ab48511eda9e458046fbc42e.csv",
"filename_original": "some.csv",
"status": "3",
"total": "100002",
"time": "2020-08-24 23:25:49"
}
],
"status": "3",
"created": "2020-08-24 23:25:49",
"filenames": "8665b987ab48511eda9e458046fbc42e.csv",
"is_append": "0",
"is_deleted": "0",
"comment": null
},
{
"id": "4",
"name": "AA_2",
"total": "43806503",
"files": [
{
"filename": "1b4812fe634938928953dd40db1f70b2.csv",
"filename_original": "other.csv",
"status": "3",
"total": "21903252",
"time": "2020-08-24 23:33:43"
},
{
"filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"filename_original": "some.csv",
"status": "2",
"total": 0,
"time": "2020-08-24 23:29:30"
}
],
"status": "2",
"created": "2020-08-24 23:35:51",
"filenames": "1b4812fe634938928953dd40db1f70b2.csv&&63ab85fef2412ce80ae8bd018497d8bf.csv",
"is_append": "0",
"is_deleted": "0",
"comment": null
}
]
From this JSON I want to create new objects by combining fields from objects which have status: 2 and their files which also have the same pair, status: 2.
So, I am expecting a JSON array as below.
[
{
"id": "4",
"name": "AA_2",
"file_filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"file_status": 2
}
]
So far I tried with this JQ filter:
.[]|select(.status=="2")|[{id:.id,file_filename:.files[].filename,file_status:.files[].status}]
But this produces some invalid data.
[
{
"id": "4", # want to remove this as file.status != 2
"file_filename": "1b4812fe634938928953dd40db1f70b2.csv",
"file_status": "3"
},
{
"id": "4",
"file_filename": "1b4812fe634938928953dd40db1f70b2.csv",
"file_status": "2"
},
{
"id": "4", # Repeat
"file_filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"file_status": "3"
},
{
"id": "4", # Repeat
"file_filename": "63ab85fef2412ce80ae8bd018497d8bf.csv",
"file_status": "2"
}
]
How do I filter the new JSON using JQ and remove these duplicate objects?
By applying [] operator to files twice, you're running into a combinatorial explosion. That needs to be avoided, for example:
[ .[] | select(.status == "2") | {id, name} + (.files[] | select(.status == "2") | {file_filename: .filename, file_status: .status}) ]
Online demo