This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}
The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.
Related
I have the following data frame, df1:
A B C
123 B1 C1
456 B2 C2
And data frame df2:
A
[
{
"id": "123",
"details": {
"id": "123",
"color": null,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": null,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": null,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I want to find all values in df1[A] inside the JSON found inside df2[A] under the first instance of the id parameter. Once found, I want to replace the NULL values inside the color parameter with the df1[B] and offer_id with df1[C].
The output should create a new column with the appended values:
df2[B]:
[
{
"id": "123",
"details": {
"id": "123",
"color": B1,
"param_1": {
"name": "mike"
},
"location": "US",
"items": [
{
"item_1": "#227858",
"offer_id": C1,
"item_details": {
"detials_1": [{ "notes": "other:", "quantity": 1 }]
}
}
],
"version": 1,
}
}
]
[
{
"id": "456",
"details": {
"id": "456",
"color": B2,
"param_1": {
"name": "james"
},
"location": "KR",
"items": [
{
"item_1": "#2221",
"offer_id": C2,
"item_details": {
"detials_1": [{ "notes": "other", "quantity": 1 }]
}
}
],
"version": 2,
}
}
]
I just started researching how to approach this, but I need guidance on the most efficient way. Any insight would be greatly appreciated.
I have a long list of json file . I want to extract the subdomain of harvard.edu which is in the variable host in "host": "ceonlineb2b.hms.harvard.edu using bash . I would be happy if anyone can help out .Below is only a snippet of json file.
{
"data": {
"total_items": 3,
"offset": 0,
"limit": 1,
"items": [
{
"name": "ceonlineb2b.hms.harvard.edu",
"alexa": null,
"cert_summary": null,
"dns_records": {
"A": [
"3.221.168.206",
"54.174.253.3"
],
"AAAA": null,
"CAA": null,
"CNAME": [
"hms-moodleb2b-prod.cabem.com"
],
"MX": null,
"NS": null,
"SOA": null,
"TXT": null,
"SPF": null,
"updated_at": "2021-05-14T23:12:43.332816923Z"
},
"hosts_enrichment": [
{
"ip": "3.221.168.206",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location": {
"lat": 39.0481,
"lon": -77.4728
}
},
{
"ip": "54.174.253.3",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location": {
"lat": 39.0481,
"lon": -77.4728
}
}
],
"http_extract": {
"cookies": [
{
"domain": "",
"expire": "0001-01-01T00:00:00Z",
"http_only": true,
"key": "MoodleSession",
"max_age": 0,
"path": "/",
"security": true,
"value": "tqhmqc4muk513sad1bmnl3kocj"
}
],
"description": "",
"emails": null,
"final_redirect_url": {
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
},
"extracted_at": "2020-10-04T20:55:26.043777194Z",
"favicon_sha256": "",
"http_headers": [
{
"name": "date",
"value": "Sun, 04 Oct 2020 20:55:25 GMT"
},
{
"name": "content-type",
"value": "text/html; charset=utf-8"
},
{
"name": "server",
"value": "Apache/2.4.46 () OpenSSL/1.0.2k-fips"
},
{
"name": "x-powered-by",
"value": "PHP/7.2.24"
},
{
"name": "content-language",
"value": "en"
},
{
"name": "content-script-type",
"value": "text/javascript"
},
{
"name": "content-style-type",
"value": "text/css"
},
{
"name": "x-ua-compatible",
"value": "IE=edge"
},
{
"name": "cache-control",
"value": "private, pre-check=0, post-check=0, max-age=0, no-transform"
},
{
"name": "pragma",
"value": "no-cache"
},
{
"name": "expires",
"value": ""
},
{
"name": "accept-ranges",
"value": "none"
},
{
"name": "set-cookie",
"value": "MoodleSession=tqhmqc4muk513sad1bmnl3kocj; path=/; secure;HttpOnly;Secure;SameSite=None"
}
],
"http_status_code": 200,
"links": [
{
"anchor": "Forgotten your username or password?",
"url": "https://ceonlineb2b.hms.harvard.edu/login/forgot_password.php",
"url_host": "ceonlineb2b.hms.harvard.edu"
},
{
"anchor": "Privacy Statement",
"url": "/local/staticpage/view.php?page=privacy-statement",
"url_host": ""
},
{
"anchor": "Terms of Service",
"url": "/local/staticpage/view.php?page=terms-of-service",
"url_host": ""
},
{
"anchor": "Copyright Information",
"url": "/local/staticpage/view.php?page=copyright-information",
"url_host": ""
}
],
"meta_tags": [
{
"name": "keywords",
"value": "moodle, HMS Postgraduate Courses: Log in to the site"
},
{
"name": "format-detection",
"value": "telephone=no"
},
{
"name": "robots",
"value": "noindex"
},
{
"name": "viewport",
"value": "width=device-width, initial-scale=1.0"
}
],
"robots_txt": "",
"scripts": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/javascript-static.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/requirejs/require.min.js",
"https://ceonlineb2b.hms.harvard.edu/theme/javascript.php/hms/1589465013/footer"
],
"styles": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.css",
"https://ceonlineb2b.hms.harvard.edu/theme/styles.php/hms/1589465013_1/all"
],
"title": "HMS Postgraduate Courses: Log in to the site"
},
"is_CNAME": null,
"is_MX": null,
"is_NS": null,
"is_PTR": null,
"is_subdomain": true,
"name_without_suffix": "ceonlineb2b.hms.harvard",
"updated_at": "2021-05-16T10:25:01.59086376Z",
"user_scan_at": null,
"whois_parsed": null,
"security_score": {
"score": 100
},
"cve_list": null,
"technologies": [
{
"name": "Moodle",
"version": ""
},
{
"name": "RequireJS",
"version": ""
}
],
"trackers": null,
"organizations": null
}
]
}
}
For json parsing on bash, I recommend checking out jq. It's lightweight and versatile.
We can use the -r flag to output only values.
Output the fields of each object with the keys in sorted order.
--raw-output / -r:
The structure of the JSON you provided has the subdomain at .data.items[].http_extract.final_redirect_url.host
{
"data": {
"items": [
{
"http_extract": {
"final_redirect_url": {
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
},
...
I've saved your json to a file, se.json
Example extracting full domain with jq
jq -r '.data.items[].http_extract.final_redirect_url.host' se.json
Output
ceonlineb2b.hms.harvard.edu
To extract the subdomain, just perform a search/replace using sub().
sub(regex; tostring) sub(regex; string; flags)
Emit the string obtained by replacing the first match of regex in the input string with tostring, after interpolation. tostring should be a jq string, and may contain references to named captures. The named captures are, in effect, presented as a JSON object (as constructed by capture) to tostring, so a reference to a captured variable named "x" would take the form: "(.x)".
Extracting subdomain using jq
jq -r '.data.items[].http_extract.final_redirect_url.host | sub(".hms.harvard.edu";"")' se.json
Output
ceonlineb2b
Im trying to use jq on kubernetes json output, to create new json object containing list of objects - container and image per pod, however im getting cartesian product.
my input data (truncated from sensitive info):
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"creationTimestamp": "2021-06-30T12:45:40Z",
"name": "pod-1",
"namespace": "default",
"resourceVersion": "757679286",
"selfLink": "/api/v1/namespaces/default/pods/pod-1"
},
"spec": {
"containers": [
{
"image": "image-1",
"imagePullPolicy": "Always",
"name": "container-1",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"readOnly": true
}
]
},
{
"image": "image-2",
"imagePullPolicy": "Always",
"name": "container-2",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"enableServiceLinks": true,
"priority": 0,
"restartPolicy": "Always",
"schedulerName": "default-scheduler",
"securityContext": {},
"serviceAccount": "default",
"serviceAccountName": "default",
"terminationGracePeriodSeconds": 30,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 300
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 300
}
],
"volumes": [
{
"name": "default-token-b954f",
"secret": {
"defaultMode": 420,
"secretName": "default-token-b954f"
}
}
]
},
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"message": "containers with unready status: [container-1 container-2]",
"reason": "ContainersNotReady",
"status": "False",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"message": "containers with unready status: [container-1 container-2]",
"reason": "ContainersNotReady",
"status": "False",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"image": "image-1",
"imageID": "",
"lastState": {},
"name": "container-1",
"ready": false,
"restartCount": 0,
"started": false,
"state": {
"waiting": {
"message": "Back-off pulling image \"image-1\"",
"reason": "ImagePullBackOff"
}
}
},
{
"image": "image-2",
"imageID": "",
"lastState": {},
"name": "container-2",
"ready": false,
"restartCount": 0,
"started": false,
"state": {
"waiting": {
"message": "Back-off pulling image \"image-2\"",
"reason": "ImagePullBackOff"
}
}
}
],
"qosClass": "BestEffort",
"startTime": "2021-06-30T12:45:40Z"
}
}
],
"kind": "List",
"metadata": {
"resourceVersion": "",
"selfLink": ""
}
}
my command:
jq '.items[] | { "name": .metadata.name, "containers": [{ "name": .spec.containers[].name, "image": .spec.containers[].image }]} '
desired output:
{
"name": "pod_1",
"containers": [
{
"name": "container_1",
"image": "image_1"
},
{
"name": "container_2",
"image": "image_2"
}
]
}
output I get:
{
"name": "pod-1",
"containers": [
{
"name": "container-1",
"image": "image-1"
},
{
"name": "container-1",
"image": "image-2"
},
{
"name": "container-2",
"image": "image-1"
},
{
"name": "container-2",
"image": "image-2"
}
]
}
Could anyone explain what am I doing wrong?
Best Regards, Piotr.
The problem is "name": .spec.containers[].name and "image": .spec.containers[].image:
Both expressions generate a sequence of each value for name and image which will than be combined.
Simplified example of why you get a Cartesian product:
jq -c -n '{name: ("A", "B"), value: ("C", "D")}'
outputs:
{"name":"A","value":"C"}
{"name":"A","value":"D"}
{"name":"B","value":"C"}
{"name":"B","value":"D"}
You get the desired output using this jq filter on your input:
jq '
.items[]
| {
"name": .metadata.name,
"containers": .spec.containers
| map({name, image})
}'
output:
{
"name": "pod-1",
"containers": [
{
"name": "container-1",
"image": "image-1"
},
{
"name": "container-2",
"image": "image-2"
}
]
}
can someone please send me solution or link for PowerShell 5 and 7 how can I access child elements if specific condition is fulfilled for JSON file which I have as output.json. I haven't find it on the net.
I want to retrieve value of the "children" elements if type element has value FILE and to put that into some list. So final result should be [test1.txt,test2.txt]
Thank you!!!
{
"path": {
"components": [
"Packages"
],
"parent": "",
"name": "Packages",
},
"children": {
"values": [
{
"path": {
"components": [
"test1.txt"
],
"parent": "",
"name": "test1.txt",
},
"type": "FILE",
"size": 405
},
{
"path": {
"components": [
"test2.txt"
],
"parent": "",
"name": "test2.txt",
},
"type": "FILE",
"size": 409
},
{
"path": {
"components": [
"FOLDER"
],
"parent": "",
"name": "FOLDER",
},
"type": "DIRECTORY",
"size": 1625
}
]
"start": 0
}
}
1.) The json is incorrect, I assumt that this one is the correct one:
{
"path": {
"components": [
"Packages"
],
"parent": "",
"name": "Packages"
},
"children": {
"values": [
{
"path": {
"components": [
"test1.txt"
],
"parent": "",
"name": "test1.txt"
},
"type": "FILE",
"size": 405
},
{
"path": {
"components": [
"test2.txt"
],
"parent": "",
"name": "test2.txt"
},
"type": "FILE",
"size": 409
},
{
"path": {
"components": [
"FOLDER"
],
"parent": "",
"name": "FOLDER"
},
"type": "DIRECTORY",
"size": 1625
}
],
"start": 0
}
}
2.) The structure is not absolute clear, but for your example this seems to me to be the correct solution:
$element = $json | ConvertFrom-Json
$result = #()
$element.children.values | foreach {
if ($_.type -eq 'FILE') { $result += $_.path.name }
}
$result | ConvertTo-Json
Be aware, that the used construct $result += $_.path.name is fine if you have up to ~10k items, but for very large items its getting very slow and you need to use an arraylist. https://adamtheautomator.com/powershell-arraylist/
How to convert below json to valid json
{
"attribute_id":"r2d2",
"attribute_values":[
WrappedArray( [
5 b1ed5df4a0330a13a3b6f4c,
{
"L0":{
"name":"ENTERTAINMENT",
"id":"20000"
},
"L1":{
"name":"VIDEO GAMES BOOKS AND OTHER MEDIA",
"id":"26000"
},
"L2":{
"name":"MEDIA",
"id":"26001"
},
"L3":{
"name":"BOOKS",
"id":"26100"
},
"L4":{
"name":"BOOKS MISC L4",
"id":"26800"
}
},
PRIORITY_OTHERS,
1536662873016,
26800
] )
],
"bu":"0",
"item_id":"705024754",
"last_updated_by":"QARTH",
"mart":"0",
"published_at":"1536662873017",
"source":"RAMP",
"timestamp":"1536662873016",
"vertical":"0",
"wpid":"7HX1KF1N9W9Y"
}
Valid json should be this
{
"attribute_id": "r2d2",
"attribute_values": [
[{
"L0": {
"name": "ENTERTAINMENT",
"id": "20000"
},
"L1": {
"name": "VIDEO GAMES BOOKS AND OTHER MEDIA",
"id": "26000"
},
"L2": {
"name": "MEDIA",
"id": "26001"
},
"L3": {
"name": "BOOKS",
"id": "26100"
},
"L4": {
"name": "BOOKS MISC L4",
"id": "26800"
}
},
"PRIORITY_OTHERS",
1536662873016,
26800
]
],
"bu": "0",
"item_id": "705024754",
"last_updated_by": "QARTH",
"mart": "0",
"published_at": "1536662873017",
"source": "RAMP",
"timestamp": "1536662873016",
"vertical": "0",
"wpid": "7HX1KF1N9W9Y"
}
Could you please post your sql if this does not answer your question?