Conditionally merging two separate JSON objects in JQ - json

This is how my input looks:
{
"text" : "Some text here"
}
{
"usage": {
"text_units": 1,
"text_characters": 101,
"features": 1
},
"language": "en",
"categories": [
{
"score": 0.655041,
"label": "/technology law, govt and politics/espionage and intelligence/surveillance"
},
{
"score": 0.639809,
"label": "/technology and computing/computer security/network security"
},
{
"score": 0.624533,
"label": "/business and industrial/business operations"
}
]
}
Using JQ, if the first element of array category in the second object contains /technology, I want to add a new field named relevant with 1 as value (which I managed), and copy the text field from the first object.
So, the expected output is:
{
"usage": {
"text_units": 1,
"text_characters": 101,
"features": 1
},
"language": "en",
"categories": [
{
"score": 0.655041,
"label": "/technology law, govt and politics/espionage and intelligence/surveillance"
},
{
"score": 0.639809,
"label": "/technology and computing/computer security/network security"
},
{
"score": 0.624533,
"label": "/business and industrial/business operations"
}
],
"relevant": 1,
"text": "Some text here"
}
And this is what I have done so far:
if .categories[0].label | test("/technology"; "i") then . |=( . + {"relevant": 1} + {"text": .text}) else . |= . + {"relevant": 0} end
Link to a demo on jqplay

Your input consists of two separate objects. In order to be able to access the first while processing the second, you could save the first into a variable.
. as {$text} | input | if .categories[0].label | test("/technology"; "i") then . + {relevant: 1, $text} else . + {relevant: 0} end
Online demo

Related

Append JSON file after specific array index by using shell script

I want to append some content by using shell script.
I have a JSON file test.json as below.
{
"reference": "Json Test",
"title": {
"a": "Json Test"
},
"components": [
{
"reference": "Json Test",
"type": "panel",
"content": [
{
"link": "abc/123",
"label": {
"a": "for test 123 - a",
"b": "for test 123 - b"
}
},
{
"link": "abc/456",
"label": {
"a": "for test 456 - a",
"b": "for test 456 - b"
}
},
{
"link": "abc/789",
"label": {
"a": "for test 789 - a",
"b": "for test 789 - b"
}
}
]
}
]
}
I want to append the content and output as following by using shell script (*.sh) How can I achieve this ?
{
"reference": "Json Test",
"title": {
"a": "Json Test"
},
"components": [
{
"reference": "Json Test",
"type": "panel",
"content": [
{
"link": "abc/123",
"label": {
"a": "for test 123 - a",
"b": "for test 123 - b"
}
},
{
"link": "abc/101112",
"label": {
"a": "for test 101112 - a",
"b": "for test 101112 - b"
}
},
{
"link": "abc/456",
"label": {
"a": "for test 456 - a",
"b": "for test 456 - b"
}
},
{
"link": "abc/789",
"label": {
"a": "for test 789 - a",
"b": "for test 789 - b"
}
}
]
}
]
}
I tried to access the index and add some test string, the below command will replace the original data.
jq '.components[].content[1] + { "link" : "test" } ' test.json
You can use the slice filter to extract the head and the tail of the array, then use + to concatenate head + the new object + the tail. Finally, use update-assignment |= to modify the array:
.components[].content |= .[0:1] + [{ link: "test" }] + .[1:]
If you are planning on using this more often, consider defining a reusable function:
def splice($at; $obj): .[0:$at] + [$obj] + .[$at:];
.components[].content |= splice(1; {link: "test"})
Grab the empty sub-array at position 1 (slicing either by start and end position .[1:1], or by start position and length .[1:][:0]), and assign to it your insert value formatted as (single-element) array [{"link": "test"}] (as you are assigning to an array after all - add more items to it if you want to add all of them at once). This looks almost like your original attempt:
jq '.components[].content[1:1] = [{"link": "test"}]' test.json
For convenience, you can also turn this into an insertAt function:
def insertAt($pos; $val): .[$pos:$pos] = [$val];
.components[].content |= insertAt(1; {"link": "test"})

Using jq to fetch and show key value with quotes

I have a file that looks as below:
{
"Job": {
"Name": "sample_job",
"Description": "",
"Role": "arn:aws:iam::00000000000:role/sample_role",
"CreatedOn": "2021-10-21T23:35:23.660000-03:00",
"LastModifiedOn": "2021-10-21T23:45:41.771000-03:00",
"ExecutionProperty": {
"MaxConcurrentRuns": 1
},
"Command": {
"Name": "glueetl",
"ScriptLocation": "s3://aws-sample-s3/scripts/sample.py",
"PythonVersion": "3"
},
"DefaultArguments": {
"--TempDir": "s3://aws-sample-s3/temporary/",
"--class": "GlueApp",
"--enable-continuous-cloudwatch-log": "true",
"--enable-glue-datacatalog": "true",
"--enable-metrics": "true",
"--enable-spark-ui": "true",
"--job-bookmark-option": "job-bookmark-enable",
"--job-insights-byo-rules": "",
"--job-language": "python",
"--spark-event-logs-path": "s3://aws-sample-s3/logs"
},
"MaxRetries": 0,
"AllocatedCapacity": 100,
"Timeout": 2880,
"MaxCapacity": 100.0,
"WorkerType": "G.1X",
"NumberOfWorkers": 100,
"GlueVersion": "2.0"
}
}
I want to get key/value from "Name", "--enable-continuous-cloudwatch-log": "" and "--enable-metrics": "". So, I need to show the info like this:
"Name" "sample_job"
"--enable-continuous-cloudwatch-log" ""
"--enable-metrics" ""
UPDATE
Follow the tips from #Inian and #0stone0 I came close to it:
jq -r '(.Job ) + (.Job.DefaultArguments | { "--enable-continuous-cloudwatch-log", "--enable-metrics"}) | to_entries[] | "\"\(.key)\" \"\(.value)\""'
This extract the values I need but show all another key/values.
Since you're JSON isn't valid, I've converted it into:
{
"Job": {
"Name": "sample_job",
"Role": "sample_role_job"
},
"DefaultArguments": {
"--enable-continuous-cloudwatch-log": "test_1",
"--enable-metrics": ""
},
"Timeout": 2880,
"NumberOfWorkers": 10
}
Using the following filter:
"Name \(.Job.Name)\n--enable-continuous-cloudwatch-log \(.DefaultArguments."--enable-continuous-cloudwatch-log")\n--enable-metrics \(.DefaultArguments."--enable-metrics")"
We use string interpolation to show the desired output:
Name sample_job
--enable-continuous-cloudwatch-log test_1
--enable-metrics
jq --raw-output '"Name \(.Job.Name)\n--enable-continuous-cloudwatch-log \(.DefaultArguments."--enable-continuous-cloudwatch-log")\n--enable-metrics \(.DefaultArguments."--enable-metrics")"'
Online Demo

Fill arrays in the first input with elements from the second based on common field

I have two files and I would need to merge the elements of the second file into an object array in the first file based on searching the reference field.
The first file:
[
{
"reference": 25422,
"order_number": "10_1",
"details" : []
},
{
"reference": 25423,
"order_number": "10_2",
"details" : []
}
]
The second file:
[
{
"record_id" : 1,
"reference": 25422,
"row_description": "descr_1_0"
},
{
"record_id" : 2,
"reference": 25422,
"row_description": "descr_1_1"
},
{
"record_id" : 3,
"reference": 25423,
"row_description": "descr_2_0"
}
]
I would like to get:
[
{
"reference": 25422,
"order_number": "10_1",
"details" : [
{
"record_id" : 1,
"reference": 25422,
"row_description": "descr_1_0"
},
{
"record_id" : 2,
"reference": 25422,
"row_description": "descr_1_1"
}
]
},
{
"reference": 25423,
"order_number": "10_2",
"details" :[
{
"record_id" : 3,
"reference": 25423,
"row_description": "descr_2_0"
}
]
}
]
Below is my code in es_func.jq file launched by this command:
jq -n --argfile f1 es_file1.json --argfile f2 es_file2.json -f es_func.jq
INDEX($f2[] ; .reference) as $details
| $f1
| map( ($details[.reference|tostring]| .row_description) as $vn
| if $vn then .details = [{"row_description" : $vn}] else . end)
I get the result only for the last record in 25422 reference with "row description": "descr_1_1" and not have "row_description": "descr_1_0"
[
{
"reference": 25422,
"order_number": "10_1",
"details": [
{
"row_description": "descr_1_1"
}
]
},
{
"reference": 25423,
"order_number": "10_2",
"details": [
{
"row_description": "descr_2_0"
}
]
}
]
I think I'm close to the solution but something is still missing. Thank you
This would be way easier if you used reduce instead.
jq 'reduce inputs[] as $rec (INDEX(.reference);
.[$rec.reference | tostring].details += [$rec]
) | map(.)' es_file1.json es_file2.json
Online demo
Here's a straightforward, reduce-free solution:
jq '
group_by(.reference)
| INDEX(.[]; .[0]|.reference|tostring) as $dict
| input
| map_values(. + {details: $dict[.reference|tostring]})
' 2.json 1.json

Leveling select fields

I am fetching a json response of following structure:
{
"data": {
"children": [
{
"data": {
"id": "abcdef",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_1.jpg"
}
}
]
},
"title": "Boring Title One"
}
},
{
"data": {
"id": "ghijkl",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_2.jpg"
}
}
]
},
"title": "Boring Title Two"
}
},
{
"data": {
"id": "mnopqr",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_3.jpg"
}
}
]
},
"title": "Boring Title Three"
}
},
{
"data": {
"id": "stuvwx",
"preview": {
"images": [
{
"source": {
"url": "https://example.com/somefiles_4.jpg"
}
}
]
},
"title": "Boring Title Four"
}
}
]
}
}
Ideally I would like to have a shortened json like this:
{
"data": [
{
"id": "abcdef",
"title": "Boring Title One",
"url": "https://example.com/somefiles_1.jpg"
},
{
"id": "ghijkl",
"title": "Boring Title Two",
"url": "https://example.com/somefiles_2.jpg"
},
{
"id": "mnopqr",
"title": "Boring Title Three",
"url": "https://example.com/somefiles_3.jpg"
},
{
"id": "stuvwx",
"title": "Boring Title Four",
"url": "https://example.com/somefiles_4.jpg"
}
]
}
If this is not possible I can work with joining those three values into a single string and latter split when necessary; like this:
abcdef#Boring Title One#https://example.com/somefiles_1.jpg
ghijkl#Boring Title Two#https://example.com/somefiles_2.jpg
mnopqr#Boring Title Three#https://example.com/somefiles_3.jpg
stuvwx#Boring Title Four#https://example.com/somefiles_4.jpg
This is where I am. I was uring the jq with select() and then pipe the results to to_entries like this:
jq -r '.data.children[] | select(.data.post_type|test("image")?) | .data | to_entries[] | [ .value.title , .value.preview.images[0].source.url ] | join("#")' ~/Documents/json/sample.json
I don't understand what goes after to_entries[]; I have tried multiple variations of .key and .values; Mostly I don't get any result but sometimes I get key pairs I do not intend to select. How to learn the proper syntax for it?
Is creating a flat json out of a nested json like this good or is it better to create the string outputs? I feel the string might be error prone especially with the presence of spaces or special characters.
Apparently what you're looking for is the {field} syntax. You don't need to resort to string outputs.
{ data: [
.data.children[].data
| select(has("post_type") and (.post_type | index("image")))
| {id, title} + (.preview.images[].source | {url})
# or, if images array always contains one element:
# | {id, title, url: .preview.images[0].source.url}
]
}
A simple solution to the main question is:
{data: [.data.children[]
| .data
| {id, title, url: .preview.images[0].source.url} ]}
(The "post_type" seems to have disappeared, but hopefully if it's relevant, you will be able to adapt the above as required. Likewise if .images[1] and beyond are relevant.)
String Output
If you want linear output, you should probably consider CSV or TSV, both of which are supported by jq via #csv and #tsv respectively.

Split a string and trim a known prefix from each part in a complex JSON structure

I'm dealing with a fairly complex JSON-structure in which a single entry needs to be edited in several places. For example:
[
{
"name": "test 1",
"stuff": {
"properties": {
"id": 0,
"stuff_list": [
{
"entryId": 1,
"description": "- item 1\n- item 2\n- item 3"
},
{
"entryId": 2,
"description": "- item 1\n- item 2\n- item 3"
}
]
}
}
},
{
"name": "test 2",
"stuff": {
"properties": {
"id": 1,
"stuff_list": [
{
"entryId": 1,
"description": null
},
{
"entryId": 2,
"description": "- item 1\n- item 2\n- item 3"
}
]
}
}
}
]
Here I would like to edit each "description"-element: The string needs to be split at each \n and the substrings "^\n?-\s" of each resulting array element need to be removed. So it should result in:
{
"entryId": 1,
"description": ["item 1", "item 2", "item 3"]
}
My first approach is:
jq '.[].stuff.properties.stuff_list[].description | split("\n")' the_file.json
but that's not working in the first place becaue of the null values that can occur at some places. So now I wonder: how can I achieve what I want?
An alternate version using split() on the \n and trimming string - on the left, would be to do
.[].stuff.properties.stuff_list[].description |=
if . != null then
split("\n") | map(ltrimstr("- "))
else
.
end
jqplay - Demo