How can I extract subdomains from a json file? - json

I have a long list of json file . I want to extract the subdomain of harvard.edu which is in the variable host in "host": "ceonlineb2b.hms.harvard.edu using bash . I would be happy if anyone can help out .Below is only a snippet of json file.
{
"data": {
"total_items": 3,
"offset": 0,
"limit": 1,
"items": [
{
"name": "ceonlineb2b.hms.harvard.edu",
"alexa": null,
"cert_summary": null,
"dns_records": {
"A": [
"3.221.168.206",
"54.174.253.3"
],
"AAAA": null,
"CAA": null,
"CNAME": [
"hms-moodleb2b-prod.cabem.com"
],
"MX": null,
"NS": null,
"SOA": null,
"TXT": null,
"SPF": null,
"updated_at": "2021-05-14T23:12:43.332816923Z"
},
"hosts_enrichment": [
{
"ip": "3.221.168.206",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location": {
"lat": 39.0481,
"lon": -77.4728
}
},
{
"ip": "54.174.253.3",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location": {
"lat": 39.0481,
"lon": -77.4728
}
}
],
"http_extract": {
"cookies": [
{
"domain": "",
"expire": "0001-01-01T00:00:00Z",
"http_only": true,
"key": "MoodleSession",
"max_age": 0,
"path": "/",
"security": true,
"value": "tqhmqc4muk513sad1bmnl3kocj"
}
],
"description": "",
"emails": null,
"final_redirect_url": {
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
},
"extracted_at": "2020-10-04T20:55:26.043777194Z",
"favicon_sha256": "",
"http_headers": [
{
"name": "date",
"value": "Sun, 04 Oct 2020 20:55:25 GMT"
},
{
"name": "content-type",
"value": "text/html; charset=utf-8"
},
{
"name": "server",
"value": "Apache/2.4.46 () OpenSSL/1.0.2k-fips"
},
{
"name": "x-powered-by",
"value": "PHP/7.2.24"
},
{
"name": "content-language",
"value": "en"
},
{
"name": "content-script-type",
"value": "text/javascript"
},
{
"name": "content-style-type",
"value": "text/css"
},
{
"name": "x-ua-compatible",
"value": "IE=edge"
},
{
"name": "cache-control",
"value": "private, pre-check=0, post-check=0, max-age=0, no-transform"
},
{
"name": "pragma",
"value": "no-cache"
},
{
"name": "expires",
"value": ""
},
{
"name": "accept-ranges",
"value": "none"
},
{
"name": "set-cookie",
"value": "MoodleSession=tqhmqc4muk513sad1bmnl3kocj; path=/; secure;HttpOnly;Secure;SameSite=None"
}
],
"http_status_code": 200,
"links": [
{
"anchor": "Forgotten your username or password?",
"url": "https://ceonlineb2b.hms.harvard.edu/login/forgot_password.php",
"url_host": "ceonlineb2b.hms.harvard.edu"
},
{
"anchor": "Privacy Statement",
"url": "/local/staticpage/view.php?page=privacy-statement",
"url_host": ""
},
{
"anchor": "Terms of Service",
"url": "/local/staticpage/view.php?page=terms-of-service",
"url_host": ""
},
{
"anchor": "Copyright Information",
"url": "/local/staticpage/view.php?page=copyright-information",
"url_host": ""
}
],
"meta_tags": [
{
"name": "keywords",
"value": "moodle, HMS Postgraduate Courses: Log in to the site"
},
{
"name": "format-detection",
"value": "telephone=no"
},
{
"name": "robots",
"value": "noindex"
},
{
"name": "viewport",
"value": "width=device-width, initial-scale=1.0"
}
],
"robots_txt": "",
"scripts": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/javascript-static.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/requirejs/require.min.js",
"https://ceonlineb2b.hms.harvard.edu/theme/javascript.php/hms/1589465013/footer"
],
"styles": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.css",
"https://ceonlineb2b.hms.harvard.edu/theme/styles.php/hms/1589465013_1/all"
],
"title": "HMS Postgraduate Courses: Log in to the site"
},
"is_CNAME": null,
"is_MX": null,
"is_NS": null,
"is_PTR": null,
"is_subdomain": true,
"name_without_suffix": "ceonlineb2b.hms.harvard",
"updated_at": "2021-05-16T10:25:01.59086376Z",
"user_scan_at": null,
"whois_parsed": null,
"security_score": {
"score": 100
},
"cve_list": null,
"technologies": [
{
"name": "Moodle",
"version": ""
},
{
"name": "RequireJS",
"version": ""
}
],
"trackers": null,
"organizations": null
}
]
}
}

For json parsing on bash, I recommend checking out jq. It's lightweight and versatile.
We can use the -r flag to output only values.
Output the fields of each object with the keys in sorted order.
--raw-output / -r:
The structure of the JSON you provided has the subdomain at .data.items[].http_extract.final_redirect_url.host
{
"data": {
"items": [
{
"http_extract": {
"final_redirect_url": {
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
},
...
I've saved your json to a file, se.json
Example extracting full domain with jq
jq -r '.data.items[].http_extract.final_redirect_url.host' se.json
Output
ceonlineb2b.hms.harvard.edu
To extract the subdomain, just perform a search/replace using sub().
sub(regex; tostring) sub(regex; string; flags)
Emit the string obtained by replacing the first match of regex in the input string with tostring, after interpolation. tostring should be a jq string, and may contain references to named captures. The named captures are, in effect, presented as a JSON object (as constructed by capture) to tostring, so a reference to a captured variable named "x" would take the form: "(.x)".
Extracting subdomain using jq
jq -r '.data.items[].http_extract.final_redirect_url.host | sub(".hms.harvard.edu";"")' se.json
Output
ceonlineb2b

Related

Check if a key exists and return another key

I need help with jq syntax on how to return the Gitlab job ID if it contains an artifact. The JSON output looks like this (removed a lot of unrelated info from it and added [...]):
[{
"id": 3219589880,
"status": "success",
"stage": "test",
"name": "job_with_no_artifact",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.119Z",
"started_at": "2022-10-24T18:21:25.986Z",
"finished_at": "2022-10-24T18:21:38.464Z",
"duration": 12.478682,
"queued_duration": 0.499786,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
"pipeline": {
"id": 123456789,
[...]
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts": [],
"runner": {
"id": 12270859,
[...]
},
"artifacts_expire_at": null,
"tag_list": []
}, {
"id": 3219589878,
"status": "success",
"stage": "test",
"name": "create_artifact_job_2",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.111Z",
"started_at": "2022-10-24T18:21:25.922Z",
"finished_at": "2022-10-24T18:21:39.090Z",
"duration": 13.168405,
"queued_duration": 0.464364,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 675641982,
[...],
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 223
},
"artifacts": [{
"file_type": "archive",
"size": 223,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 153,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270845,
[...]
},
"artifacts_expire_at": "2022-10-25T18:21:35.859Z",
"tag_list": []
}, {
"id": 3219589876,
"status": "success",
"stage": "test",
"name": "create_artifact_job_1",
"ref": "main",
"tag": false,
"coverage": null,
"allow_failure": false,
"created_at": "2022-10-24T18:21:25.103Z",
"started_at": "2022-10-24T18:21:25.503Z",
"finished_at": "2022-10-24T18:21:41.407Z",
"duration": 15.904028,
"queued_duration": 0.098837,
"user": {
"id": 123456789,
[...]
},
"commit": {
"id": "5e0e1f287d20daf2036a3ca71c656dce55999265",
[...]
},
"pipeline": {
"id": 123456789,
[...]
},
"web_url": "WEB_URL",
"project": {
"ci_job_token_scope_enabled": false
},
"artifacts_file": {
"filename": "artifacts.zip",
"size": 217
},
"artifacts": [{
"file_type": "archive",
"size": 217,
"filename": "artifacts.zip",
"file_format": "zip"
}, {
"file_type": "metadata",
"size": 152,
"filename": "metadata.gz",
"file_format": "gzip"
}],
"runner": {
"id": 12270857,
},
"artifacts_expire_at": "2022-10-25T18:21:37.808Z",
"tag_list": []
}]
I've been trying to do either of the following using jQ:
Either:
Check if artifacts_file key exists in each iteration and if it does return the (job) id (so .[].id)
Check if artifacts array is empty in each iteration and if it is empty return the (job) id.
In both cases I'm able to do the first part but I am not sure how to return the .id key.
Related stackoverflow questions that I've been trying to utilize and adapt to my case:
jq - return array value if its length is not null
How to check for presence of 'key' in jq before iterating over the values
What I have so far: jq '[.[].artifacts[]|select(length > 0)] | .[]' which returns all the artifacts found (but it doesn't contain the .id of the job).
Checking the existence of a field using has:
.[] | select(has("artifacts_file")).id
3219589878
3219589876
Demo
Checking if a field is an empty array by comparing it to []:
.[] | select(.artifacts == []).id
3219589880
Demo

Cannot get jq to query json object [duplicate]

This question already has answers here:
How to use jq when the variable has reserved characters?
(3 answers)
Closed 6 months ago.
I have a JSON file that I am trying to query with jq. I am unable to retrieve the observations. I am trying to retieve each of the "observations using the following command and not able to get to the result:
cat sample3.json | jq .dataSets[0].series.0:0:0:0:0.observations.0[0]
I am able to retieve up to the series using:
cat sample3.json | jq .dataSets[0].series
But once I try to drill down further I am getting a compile error:
$ cat sample3.json | jq .dataSets[0].series.0:0:0:0:0
jq: error: syntax error, unexpected LITERAL, expecting end of file (Unix shell quoting issues?) at <top-level>, line 1:
.dataSets[0].series.0:0:0:0:0
jq: 1 compile error
I am not sure what I am doing wrong here....
The input file is:
{
"header": {
"id": "b8be2cd5-33bf-4687-9e81-eb032f6f8a71",
"test": false,
"prepared": "2022-09-01T13:30:57.013+02:00",
"sender": {
"id": "ECB"
}
},
"dataSets": [
{
"action": "Replace",
"validFrom": "2022-09-01T13:30:57.013+02:00",
"series": {
"0:0:0:0:0": {
"attributes": [
0,
null,
0,
null,
null,
null,
null,
null,
null,
null,
null,
null,
0,
null,
0,
null,
0,
0,
0,
0
],
"observations": {
"0": [
1.4529,
0,
0,
null,
null
],
"1": [
1.4472,
0,
0,
null,
null
],
"2": [
1.4591,
0,
0,
null,
null
]
}
}
}
}
],
"structure": {
"links": [
{
"title": "Exchange Rates",
"rel": "dataflow",
"href": "https://sdw-wsrest.ecb.europa.eu:443/service/dataflow/ECB/EXR/1.0"
}
],
"name": "Exchange Rates",
"dimensions": {
"series": [
{
"id": "FREQ",
"name": "Frequency",
"values": [
{
"id": "D",
"name": "Daily"
}
]
},
{
"id": "CURRENCY",
"name": "Currency",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "CURRENCY_DENOM",
"name": "Currency denominator",
"values": [
{
"id": "EUR",
"name": "Euro"
}
]
},
{
"id": "EXR_TYPE",
"name": "Exchange rate type",
"values": [
{
"id": "SP00",
"name": "Spot"
}
]
},
{
"id": "EXR_SUFFIX",
"name": "Series variation - EXR context",
"values": [
{
"id": "A",
"name": "Average"
}
]
}
],
"observation": [
{
"id": "TIME_PERIOD",
"name": "Time period or range",
"role": "time",
"values": [
{
"id": "2022-08-29",
"name": "2022-08-29",
"start": "2022-08-29T00:00:00.000+02:00",
"end": "2022-08-29T23:59:59.999+02:00"
},
{
"id": "2022-08-30",
"name": "2022-08-30",
"start": "2022-08-30T00:00:00.000+02:00",
"end": "2022-08-30T23:59:59.999+02:00"
},
{
"id": "2022-08-31",
"name": "2022-08-31",
"start": "2022-08-31T00:00:00.000+02:00",
"end": "2022-08-31T23:59:59.999+02:00"
}
]
}
]
},
"attributes": {
"series": [
{
"id": "TIME_FORMAT",
"name": "Time format code",
"values": [
{
"name": "P1D"
}
]
},
{
"id": "BREAKS",
"name": "Breaks",
"values": []
},
{
"id": "COLLECTION",
"name": "Collection indicator",
"values": [
{
"id": "A",
"name": "Average of observations through period"
}
]
},
{
"id": "COMPILING_ORG",
"name": "Compiling organisation",
"values": []
},
{
"id": "DISS_ORG",
"name": "Data dissemination organisation",
"values": []
},
{
"id": "DOM_SER_IDS",
"name": "Domestic series ids",
"values": []
},
{
"id": "PUBL_ECB",
"name": "Source publication (ECB only)",
"values": []
},
{
"id": "PUBL_MU",
"name": "Source publication (Euro area only)",
"values": []
},
{
"id": "PUBL_PUBLIC",
"name": "Source publication (public)",
"values": []
},
{
"id": "UNIT_INDEX_BASE",
"name": "Unit index base",
"values": []
},
{
"id": "COMPILATION",
"name": "Compilation",
"values": []
},
{
"id": "COVERAGE",
"name": "Coverage",
"values": []
},
{
"id": "DECIMALS",
"name": "Decimals",
"values": [
{
"id": "4",
"name": "Four"
}
]
},
{
"id": "NAT_TITLE",
"name": "National language title",
"values": []
},
{
"id": "SOURCE_AGENCY",
"name": "Source agency",
"values": [
{
"id": "4F0",
"name": "European Central Bank (ECB)"
}
]
},
{
"id": "SOURCE_PUB",
"name": "Publication source",
"values": []
},
{
"id": "TITLE",
"name": "Title",
"values": [
{
"name": "Australian dollar/Euro"
}
]
},
{
"id": "TITLE_COMPL",
"name": "Title complement",
"values": [
{
"name": "ECB reference exchange rate, Australian dollar/Euro, 2:15 pm (C.E.T.)"
}
]
},
{
"id": "UNIT",
"name": "Unit",
"values": [
{
"id": "AUD",
"name": "Australian dollar"
}
]
},
{
"id": "UNIT_MULT",
"name": "Unit multiplier",
"values": [
{
"id": "0",
"name": "Units"
}
]
}
],
"observation": [
{
"id": "OBS_STATUS",
"name": "Observation status",
"values": [
{
"id": "A",
"name": "Normal value"
}
]
},
{
"id": "OBS_CONF",
"name": "Observation confidentiality",
"values": [
{
"id": "F",
"name": "Free"
}
]
},
{
"id": "OBS_PRE_BREAK",
"name": "Pre-break observation value",
"values": []
},
{
"id": "OBS_COM",
"name": "Observation comment",
"values": []
}
]
}
}
}
The .foo syntax cannot be used if the key name has anything but alphanumeric characters or the underscore, or if the first character of the key name is numeric.
Assuming you are using a recent version of jq,
you can always use the form: ."foo", which is actually an abbreviation of the basic form, .["foo"].
So assuming you're using a sufficiently recent version of jq, your query could begin with:
.dataSets[0].series."0:0:0:0:0"
If you are presenting the jq query on a command line, then you may have to escape the double-quotes appropriately, e.g. in a bash shell, by enclosing the jq query in single-quotes.

Kubernetes + jq - retrieving containers list per pod yields cartesian product

Im trying to use jq on kubernetes json output, to create new json object containing list of objects - container and image per pod, however im getting cartesian product.
my input data (truncated from sensitive info):
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"creationTimestamp": "2021-06-30T12:45:40Z",
"name": "pod-1",
"namespace": "default",
"resourceVersion": "757679286",
"selfLink": "/api/v1/namespaces/default/pods/pod-1"
},
"spec": {
"containers": [
{
"image": "image-1",
"imagePullPolicy": "Always",
"name": "container-1",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"readOnly": true
}
]
},
{
"image": "image-2",
"imagePullPolicy": "Always",
"name": "container-2",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"enableServiceLinks": true,
"priority": 0,
"restartPolicy": "Always",
"schedulerName": "default-scheduler",
"securityContext": {},
"serviceAccount": "default",
"serviceAccountName": "default",
"terminationGracePeriodSeconds": 30,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 300
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 300
}
],
"volumes": [
{
"name": "default-token-b954f",
"secret": {
"defaultMode": 420,
"secretName": "default-token-b954f"
}
}
]
},
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"message": "containers with unready status: [container-1 container-2]",
"reason": "ContainersNotReady",
"status": "False",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"message": "containers with unready status: [container-1 container-2]",
"reason": "ContainersNotReady",
"status": "False",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2021-06-30T12:45:40Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"image": "image-1",
"imageID": "",
"lastState": {},
"name": "container-1",
"ready": false,
"restartCount": 0,
"started": false,
"state": {
"waiting": {
"message": "Back-off pulling image \"image-1\"",
"reason": "ImagePullBackOff"
}
}
},
{
"image": "image-2",
"imageID": "",
"lastState": {},
"name": "container-2",
"ready": false,
"restartCount": 0,
"started": false,
"state": {
"waiting": {
"message": "Back-off pulling image \"image-2\"",
"reason": "ImagePullBackOff"
}
}
}
],
"qosClass": "BestEffort",
"startTime": "2021-06-30T12:45:40Z"
}
}
],
"kind": "List",
"metadata": {
"resourceVersion": "",
"selfLink": ""
}
}
my command:
jq '.items[] | { "name": .metadata.name, "containers": [{ "name": .spec.containers[].name, "image": .spec.containers[].image }]} '
desired output:
{
"name": "pod_1",
"containers": [
{
"name": "container_1",
"image": "image_1"
},
{
"name": "container_2",
"image": "image_2"
}
]
}
output I get:
{
"name": "pod-1",
"containers": [
{
"name": "container-1",
"image": "image-1"
},
{
"name": "container-1",
"image": "image-2"
},
{
"name": "container-2",
"image": "image-1"
},
{
"name": "container-2",
"image": "image-2"
}
]
}
Could anyone explain what am I doing wrong?
Best Regards, Piotr.
The problem is "name": .spec.containers[].name and "image": .spec.containers[].image:
Both expressions generate a sequence of each value for name and image which will than be combined.
Simplified example of why you get a Cartesian product:
jq -c -n '{name: ("A", "B"), value: ("C", "D")}'
outputs:
{"name":"A","value":"C"}
{"name":"A","value":"D"}
{"name":"B","value":"C"}
{"name":"B","value":"D"}
You get the desired output using this jq filter on your input:
jq '
.items[]
| {
"name": .metadata.name,
"containers": .spec.containers
| map({name, image})
}'
output:
{
"name": "pod-1",
"containers": [
{
"name": "container-1",
"image": "image-1"
},
{
"name": "container-2",
"image": "image-2"
}
]
}

Firebase: JSON payload not accepted

I'm trying to push JSON from a service in to Firebase, and it's not accepting it.
I cant figure out why. JSONLint validates it as valid.
The error I get is:
error: "Invalid data; couldn't parse JSON object, array, or value."
Here's the JSON payload:
{
"app_id": "e4805b8d5a9f4032b0bb8b6d9c6726b8",
"archived": false,
"attachments": {
"form.xml": {
"content_type": "text/xml",
"length": 4500,
"url": "https://someurl.com/form.xml"
}
},
"build_id": "3c5703e20346462ebcb07a3f36d5fe9b",
"domain": "my-test",
"edited_by_user_id": null,
"edited_on": null,
"form": {
"#type": "data",
"#name": "Beneficiary Registration",
"#uiVersion": "1",
"#version": "1",
"#xmlns": "http://openrosa.org/formdesigner/123456",
"beneficiary_age": {
"beneficiary_age_num": "8",
"beneficiary_exact_age_ind": "N"
},
"beneficiary_gender_cd": "M",
"beneficiary_id": "Hhr-O6I3C5L-J3G1L",
"case": {
"#case_id": "a07972bc-1a34-4c84-90ea-91250639f2a4",
"#date_modified": "2020-03-17T20:57:26.986000Z",
"#user_id": "2275c340c48ac83b6852035b0a15b5d3",
"#xmlns": "http://someurl.org/case/transaction/v2"
},
"case_name": "Hhr-O6I3C5L-J3G1L|joel galager|Katsekera|Katsekera|Mpando|Katsekera|KE",
"existing_beneficiaries": "Jane Doe\n\nJoan Doe",
"existing_beneficiaries_label": "",
"household_information": {
"beneficiary_household_information_display": "",
"beneficiary_location_hierarchy_1": "KE",
"beneficiary_location_hierarchy_1_text": "Kenya",
"beneficiary_location_hierarchy_2": "HF0001",
"beneficiary_location_hierarchy_2_text": "Katsekera",
"beneficiary_location_hierarchy_3": "TA0001",
"beneficiary_location_hierarchy_3_text": "Mpando",
"beneficiary_location_hierarchy_4": "GHV0001",
"beneficiary_location_hierarchy_4_text": "Katsekera",
"beneficiary_location_hierarchy_5": "V0001",
"beneficiary_location_hierarchy_5_text": "Katsekera",
"correct_information": "",
"hh_first_name": "John",
"hh_full_name": "John Doe",
"hh_id_fk": "Hhr-O6I3C5L",
"hh_last_name": "Doe",
"household_information": ""
},
"meta": {
"#xmlns": "http://openrosa.org/jr/xforms",
"appVersion": "Formplayer Version: 2.47",
"app_build_version": 1,
"commcare_version": null,
"deviceID": "Formplayer",
"drift": "0",
"geo_point": null,
"instanceID": "123-321-232",
"timeEnd": "2020-03-17T20:57:26.986000Z",
"timeStart": "2020-03-17T20:57:13.958000Z",
"userID": "2275c340c48ac83b6852035b0a15b5d3",
"username": "some_username"
},
"name_group": {
"beneficiary_first_name": "joel",
"beneficiary_full_name": "joel galager",
"beneficiary_last_name": "galager"
},
"subcase_0": {
"case": {
"#case_id": "2a4bfe27-a5c3-4f3a-8540-5f8ded86db85",
"#date_modified": "2020-03-17T20:57:26.986000Z",
"#user_id": "abc123",
"#xmlns": "http://commcarehq.org/case/transaction/v2",
"create": {
"case_name": "Hhr-O6I3C5L-J3G1L|joel galager|Katsekera|Katsekera|Mpando|Katsekera|KE",
"case_type": "beneficiary_case",
"owner_id": "abc123"
},
"index": {
"parent": {
"#text": "a07972bc-1a34-4c84-90ea-91250639f2a4",
"#case_type": "household_case"
}
},
"update": {
"beneficiary_age_num": "8",
"beneficiary_exact_age_ind": "N",
"beneficiary_first_name": "joel",
"beneficiary_full_name": "joel galager",
"beneficiary_gender_cd": "M",
"beneficiary_id": "Hhr-O6I3C5L-J3G1L",
"beneficiary_last_name": "galager",
"beneficiary_location_hierarchy_1": "KE",
"beneficiary_location_hierarchy_1_text": "Kenya",
"beneficiary_location_hierarchy_2": "HF0001",
"beneficiary_location_hierarchy_2_text": "Katsekera",
"beneficiary_location_hierarchy_3": "TA0001",
"beneficiary_location_hierarchy_3_text": "Mpando",
"beneficiary_location_hierarchy_4": "GHV0001",
"beneficiary_location_hierarchy_4_text": "Katsekera",
"beneficiary_location_hierarchy_5": "V0001",
"beneficiary_location_hierarchy_5_text": "Katsekera",
"hh_id_fk": "Hhr-O6I3C5L"
}
}
}
},
"id": "d547ccea-503c-4c50-b974-38ed564ae78a",
"indexed_on": "2020-03-17T21:01:04.695654",
"initial_processing_complete": true,
"is_phone_submission": true,
"metadata": {
"appVersion": "Formplayer Version: 2.47",
"app_build_version": 1,
"commcare_version": null,
"deviceID": "Formplayer",
"drift": "0",
"geo_point": null,
"instanceID": "01010101",
"location": null,
"timeEnd": "2020-03-17T20:57:26.986000Z",
"timeStart": "2020-03-17T20:57:13.958000Z",
"userID": "2275c340c48ac83b6852035b0a15b5d3",
"username": "myusername"
},
"problem": null,
"received_on": "2020-03-17T20:57:27.179986Z",
"resource_uri": "",
"server_modified_on": "2020-03-17T20:57:27.382548Z",
"type": "data",
"uiversion": "1",
"version": "1"
}
This is probably because of #type inside form.
The # character can't be used in RTDB paths.
See How data is structured from the docs.
Keys cannot contain
.
$
#
[
]
/
ASCII control characters 0-31 or 127

Extract values in json object with awk/sed, but cannot get it to work

I have a file with the return of a curl statement in it, in the form of json. Each object has a set of values, but the parameters for these values are all called the same names. See code below.
These objects are part of a larger object called workflow. The Cleaning up object is the last process that runs in our workflow. For every video that passes through the workflow, a json file in this format is created. (There are more than only these three objects, this is just for illustrative purposes)
I want to take the value of completed of the object with "description": "Cleaning up" and store it as a variable $end_time. Then I want to take the value of completed of the object with "description": "Ingest" and store it as a variable $start_time. These two values are then subtracted to give me an integer time in milliseconds so I can calculate the time it took for the video to go through this part of the process. With the maths part I am fine, and know how to do it. It is the extraction of the values that I am struggling with.
I hope this makes sense? ANY help would be appreciated. Thank you in advance!
EDIT: Had to delete original code in post, due to character limitations
Here is a proper example of the file that I have to work with:
{
"workflows": {
"count": "20",
"searchTime": "1",
"startPage": "0",
"totalCount": "1",
"workflow": {
"configurations": {
"configuration": [
{
"$": "1409750880000",
"key": "schedule.start"
},
{
"$": "1409755980000",
"key": "schedule.stop"
},
{
"$": "Capture_agent",
"key": "schedule.location"
},
{
"$": "false",
"key": "trimHold"
},
{
"$": "true",
"key": "archiveOp"
},
{
"$": "false",
"key": "captionHold"
},
{
"$": "false",
"key": "videoPreview"
}
]
},
"creator": {
"organization": "mh_default_org",
"roles": [
"76b1bdde-a080-40a4-b929-bde89af6a0a8_Instructor",
"ROLE_ADMIN",
"ROLE_ANONYMOUS",
"ROLE_USER"
],
"userName": user_name
},
"description": "This workflow definition defines the steps involved in scheduling a recording, capturing it, and\n ingesting it, after which processing operations may be added.\n ",
"errors": "",
"id": "15518",
"mediapackage": {
"attachments": "",
"creators": {
"creator": "Name"
},
"id": "2d25ed19-2978-458d-a4a0-c9c56d791c68",
"license": "Creative Commons 3.0: Attribution-NonCommercial-NoDerivs",
"media": "",
"metadata": "",
"publications": {
"publication": {
"channel": "engage-player",
"id": "b7b68f91-2c33-4673-ba7c-2e9b891788f9",
"mimetype": "text/html",
"tags": "",
"url": "http://some.url.com:80/engage/ui/watch.html?id=2d25ed19-2978-458d-a4a0-c9c56d791c68"
}
},
"series": "76b1bdde-a080-40a4-b929-bde89af6a0a8",
"seriestitle": "Recording_Title_user_name",
"start": "2014-09-03T13:28:00Z",
"title": "Recording_Title"
},
"operations": {
"operation": [
{
"abortable": "false",
"completed": 1409750882092,
"configurations": {
"configuration": [
{
"$": "1409750880000",
"key": "schedule.start"
},
{
"$": "1409755980000",
"key": "schedule.stop"
},
{
"$": "Capture_agent",
"key": "schedule.location"
}
]
},
"continuable": "false",
"description": "Scheduled",
"execution-history": "",
"execution-host": "http://some.url.com:8080",
"fail-on-error": "true",
"failed-attempts": "0",
"hold-action-title": "View schedule",
"holdurl": "/workflow/hold/org.opencastproject.workflow.handler.scheduleworkflowoperationhandler",
"id": "schedule",
"job": "15519",
"max-attempts": "1",
"retry-strategy": "none",
"started": 1409750881745,
"state": "SUCCEEDED",
"time-in-queue": 0
},
{
"abortable": "false",
"configurations": "",
"continuable": "false",
"description": "Capture",
"execution-history": "",
"execution-host": "http://some.url.com:8080",
"fail-on-error": "true",
"failed-attempts": "0",
"hold-action-title": "Monitor capture",
"holdurl": "/workflow/hold/org.opencastproject.workflow.handler.captureworkflowoperationhandler",
"id": "capture",
"job": "42894",
"max-attempts": "1",
"retry-strategy": "none",
"started": 1409750884085,
"state": "SKIPPED",
"time-in-queue": 0
},
{
"completed": 1409756171224,
"configurations": "",
"description": "Ingest",
"execution-history": "",
"fail-on-error": "true",
"failed-attempts": "0",
"id": "ingest",
"max-attempts": "1",
"retry-strategy": "none",
"state": "SUCCEEDED"
},
{
"completed": 1409854379552,
"configurations": {
"configuration": {
"key": "preserve-flavors"
}
},
"description": "Cleaning up",
"execution-history": "",
"execution-host": "http://some.url.com:8080",
"fail-on-error": "false",
"failed-attempts": "0",
"id": "cleanup",
"job": "45113",
"max-attempts": "1",
"retry-strategy": "none",
"started": 1409854378128,
"state": "SUCCEEDED",
"time-in-queue": 0
}
]
},
"organization": {
"adminRole": "ROLE_ADMIN",
"anonymousRole": "ROLE_ANONYMOUS",
"id": "mh_default_org",
"name": "Opencast Project",
"properties": {
"property": [
{
"$": "true",
"key": "adminui.i18n_tab_episode.enable"
},
{
"$": "false",
"key": "adminui.i18n_tab_users.enable"
},
{
"$": "/engage/ui/img/mh_logos/OpencastLogo.png",
"key": "logo_small"
},
{
"$": "http://opencast.org/matterhorn/",
"key": "engageui.link_mobile_redirect.url"
},
{
"$": "false",
"key": "engageui.annotations.enable"
},
{
"$": "true",
"key": "engageui.links_media_module.enable"
},
{
"$": "2024",
"key": "adminui.chunksize"
},
{
"$": "false",
"key": "adminui.series_prepopulate.enable"
},
{
"$": "true",
"key": "engageui.link_download.enable"
},
{
"$": "false",
"key": "engageui.link_mobile_redirect.enable"
},
{
"$": "For more information have a look at the official site.",
"key": "engageui.link_mobile_redirect.description"
},
{
"$": "/engage/ui/img/mh_logos/MatterhornLogo_large.png",
"key": "logo_large"
}
]
},
"servers": {
"server": {
"name": "localhost",
"port": "8080"
}
}
},
"parent": {
"nil": "true"
},
"state": "SUCCEEDED",
"template": "full",
"title": "Scheduled Workflow"
}
}
}
Here is a jq example that should point you to getting what you want:
#!/bin/bash
# Assuming the json is in a file workflow.json
end_time=$( jq '.workflows.workflow.operations.operation[] | select(.description == "Cleaning up") | .completed' < workflow.json )
start_time=$( jq '.workflows.workflow.operations.operation[] | select(.description == "Ingest") | .completed' < workflow.json )
This is assuming the input you have is in an JSON array called workflow at the top level. Here's this on the command line:
$ jq '.workflows.workflow.operations.operation[] | select(.description == "Ingest") | .completed' < workflow.json
1406051539118
$ jq '.workflows.workflow.operations.operation[] | select(.description == "Cleaning up") | .completed' < workflow.json
1406051695440