Rename JSON key field with value in object - json

Have the following json output:
[
{
"id": "47",
"canUpdate": true,
"canDelete": true,
"canArchive": true,
"info": [
{
"key": "problem_type",
"value": "PAN",
"valueCaption": "PAN",
"keyCaption": "Category"
},
{
"key": "status",
"value": 3,
"valueCaption": "Closed",
"keyCaption": "Status"
},
{
"key": "insert_time",
"value": 1466446314000,
"valueCaption": "2016-06-20 14:11:54.0",
"keyCaption": "Request time"
}
As you can see under "info" they actually label the key:value pair as "key": "problem_type" and "value": "PAN" and then "valueCaption": "PAN" "keyCaption": "Category". What I need to do is remap the file so that, in this example, it shows as "problem_type": "PAN" and "Category": "PAN". What would be the best method to iterate through the output to remap the key:value pairs in this manner?
How it needs to be:
[
{
"id": "47",
"canUpdate": true,
"canDelete": true,
"canArchive": true,
"info": [
{
"problem_type": "PAN",
"Category": "PAN"
},
{
"status": 3,
"Status": "Closed"
},
{
"insert_time": 1466446314000,
"Request time": "2016-06-20 14:11:54.0"
}

Here is a jq solution which uses Update assignment |=
.[].info[] |= {(.key):.value, (.keyCaption):.valueCaption}
Sample Run (assumes data in data.json)
$ jq -M '.[].info[] |= {(.key):.value, (.keyCaption):.valueCaption}' data.json
[
{
"id": "47",
"canUpdate": true,
"canDelete": true,
"canArchive": true,
"info": [
{
"problem_type": "PAN",
"Category": "PAN"
},
{
"status": 3,
"Status": "Closed"
},
{
"insert_time": 1466446314000,
"Request time": "2016-06-20 14:11:54.0"
}
]
}
]
Try it online at jqplay.org

Related

Map step function result and extract certain keys

I have the following output coming from a step function task: ListObjectsV2
{
"Contents": [
{
"ETag": "\"86c12c034bc6c30cb89b500b954c188f\"",
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_1.csv",
"LastModified": "2023-02-09T13:46:20Z",
"Size": 796014,
"StorageClass": "STANDARD"
},
{
"ETag": "\"58e4a770e0f66073b00d185df500f07f\"",
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_2.csv",
"LastModified": "2023-02-09T13:47:20Z",
"Size": 934038,
"StorageClass": "STANDARD"
},
{
"ETag": "\"460abd0de64d5cb67e8f0d46878cb1ef\"",
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_3.csv",
"LastModified": "2023-02-09T13:46:57Z",
"Size": 794264,
"StorageClass": "STANDARD"
},
{
"ETag": "\"1bfedc3dc92e4ba8d04e24b9b5a0ed58\"",
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_4.csv",
"LastModified": "2023-02-09T13:46:24Z",
"Size": 788756,
"StorageClass": "STANDARD"
},
{
"ETag": "\"9d6c434ce5ebdf203a790fbcf19338dc\"",
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_5.csv",
"LastModified": "2023-02-09T13:47:07Z",
"Size": 831156,
"StorageClass": "STANDARD"
}
],
"IsTruncated": false,
"KeyCount": 5,
"MaxKeys": 1000,
"Name": "vita-internal-text-classification-dev-183576513728",
"Prefix": "55271f52fffe4461a2ee3228ebb97157"
}
I want to have an array containing only the Key key, to pass to the next state, like so:
[
{
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_1.csv",
},
{
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_2.csv",
},
{
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_3.csv",
},
{
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_4.csv",
},
{
"Key": "55271f52fffe4461a2ee3228ebb97157/input/batch_5.csv",
}
]
So far I've tried setting the ResultPath to:
$.Contents[*].Key
$.Contents[*].['Key']
What I get is:
[
"55271f52fffe4461a2ee3228ebb97157/input/batch_1.csv",
"55271f52fffe4461a2ee3228ebb97157/input/batch_2.csv",
"55271f52fffe4461a2ee3228ebb97157/input/batch_3.csv",
"55271f52fffe4461a2ee3228ebb97157/input/batch_4.csv",
"55271f52fffe4461a2ee3228ebb97157/input/batch_5.csv",
]
But I've gotten bad output from that, any help?
The way I've solved this is to use an Inline Map state with a Pass state to build the necessary format. You can see this pattern in an example here for how to use Step Functions Distributed Map to bulk delete objects from S3. You can see this in the inner Create Object Identifier Array Map state. If you were doing this in Standard Workflows, this could be a cost concern given the number of state transitions involved. But since in the Item Processor I'm using Express Workflows, which are billed by duration (and these are super fast), it works pretty well.
{
"Comment": "A state machine to bulk delete objects from S3 using Distributed Map",
"StartAt": "Confirm Bucket Provided",
"States": {
"Confirm Bucket Provided": {
"Type": "Choice",
"Choices": [
{
"Not": {
"Variable": "$.bucket",
"IsPresent": true
},
"Next": "Fail - No Bucket"
}
],
"Default": "Check for Prefix"
},
"Check for Prefix": {
"Type": "Choice",
"Choices": [
{
"Not": {
"Variable": "$.prefix",
"IsPresent": true
},
"Next": "Generate Parameters - Without Prefix"
}
],
"Default": "Generate Parameters - With Prefix"
},
"Generate Parameters - Without Prefix": {
"Type": "Pass",
"Parameters": {
"Bucket.$": "$.bucket",
"Prefix": ""
},
"ResultPath": "$.list_parameters",
"Next": "Delete Objects from S3 Bucket"
},
"Fail - No Bucket": {
"Type": "Fail",
"Error": "InsuffcientArguments",
"Cause": "No Bucket was provided"
},
"Generate Parameters - With Prefix": {
"Type": "Pass",
"Next": "Delete Objects from S3 Bucket",
"Parameters": {
"Bucket.$": "$.bucket",
"Prefix.$": "$.prefix"
},
"ResultPath": "$.list_parameters"
},
"Delete Objects from S3 Bucket": {
"Type": "Map",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "DISTRIBUTED",
"ExecutionType": "EXPRESS"
},
"StartAt": "Create Object Identifier Array",
"States": {
"Create Object Identifier Array": {
"Type": "Map",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "Create Object Identifier",
"States": {
"Create Object Identifier": {
"Type": "Pass",
"End": true,
"Parameters": {
"Key.$": "$.Key"
}
}
}
},
"ItemsPath": "$.Items",
"ResultPath": "$.object_identifiers",
"Next": "Delete Objects"
},
"Delete Objects": {
"Type": "Task",
"Next": "Clear Output",
"Parameters": {
"Bucket.$": "$.BatchInput.bucket",
"Delete": {
"Objects.$": "$.object_identifiers"
}
},
"Resource": "arn:aws:states:::aws-sdk:s3:deleteObjects",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2,
"IntervalSeconds": 1,
"MaxAttempts": 6
}
],
"ResultSelector": {
"Deleted.$": "$.Deleted",
"RetryCount.$": "$$.State.RetryCount"
}
},
"Clear Output": {
"Type": "Pass",
"End": true,
"Result": {}
}
}
},
"ItemReader": {
"Resource": "arn:aws:states:::s3:listObjectsV2",
"Parameters": {
"Bucket.$": "$.list_parameters.Bucket",
"Prefix.$": "$.list_parameters.Prefix"
}
},
"MaxConcurrency": 5,
"Label": "S3objectkeys",
"ItemBatcher": {
"MaxInputBytesPerBatch": 204800,
"MaxItemsPerBatch": 1000,
"BatchInput": {
"bucket.$": "$.list_parameters.Bucket"
}
},
"ResultSelector": {},
"End": true
}
}
}

JQ - unique count of each value in an array

I'm needing to solve this with JQ. I have a large lists of arrays in my json file and am needing to do some sort | uniq -c types of stuff on them. Specifically I have a relatively nasty looking fruit array that needs to break down what is inside. I'm aware of unique and things like that, and imagine there is likely a simple way to do this, but I've been trying run down assigning things as variables and appending and whatnot, but I can't get the most basic part of counting the unique values per that fruit array, and especially not without breaking the rest of the content (hence the variable ideas). Please tell me I'm overthinking this.
I'd like to turn this;
[
{
"uid": "123abc",
"tID": [
"T19"
],
"fruit": [
"Kiwi",
"Apple",
"",
"",
"",
"Kiwi",
"",
"Kiwi",
"",
"",
"Mango",
"Kiwi"
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"fruit": [
"",
"Orange"
]
}
]
Into this;
[
{
"uid": "123abc",
"tID": [
"T19"
],
"metadata": [
{
"name": "fruit",
"value": "Kiwi - 3"
},
{
"name": "fruit",
"value": "Mango - 1"
},
{
"name": "fruit",
"value": "Apple - 1"
}
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"metadata": [
{
"name": "fruit",
"value": "Orange - 1"
}
]
}
]
Using group_by and length would be one way:
jq '
map(with_entries(select(.key == "fruit") |= (
.value |= (group_by(.) | map(
{name: "fruit", value: "\(.[0] | select(. != "")) - \(length)"}
))
| .key = "metadata"
)))
'
[
{
"uid": "123abc",
"tID": [
"T19"
],
"metadata": [
{
"name": "fruit",
"value": "Apple - 1"
},
{
"name": "fruit",
"value": "Kiwi - 4"
},
{
"name": "fruit",
"value": "Mango - 1"
}
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"metadata": [
{
"name": "fruit",
"value": "Orange - 1"
}
]
}
]
Demo

Json file editing through jq

I have json
{
"file1": [{
"username": "myname",
"groupname": "mypassword",
"environment": [{
"name": "UMASK",
"value": "022"
},
{
"name": "DEBUG",
"value": "2"
}]
}]
}
and want to change the value of DEBUG to 5.
Tried with below command
jq .file1[0].environment sandeep.json |jq '(.[] |select(.name ==
"DEBUG") | .value) |= "5"'
this will return me specific portion of json like
[
{
"name": "UMASK",
"value": "022"
},
{
"name": "DEBUG",
"value": "5"
}
]
but I want to see full json with changed value
{
"file1": [{
"username": "myname",
"groupname": "mypassword",
"environment": [{
"name": "UMASK",
"value": "022"
},
{
"name": "DEBUG",
"value": "5"
}]
}]
}
Please suggest me
It should be:
jq '(.file1[].environment[]|select(.name=="DEBUG").value) |= 5' file.json
Output:
{
"file1": [
{
"username": "myname",
"groupname": "mypassword",
"environment": [
{
"name": "UMASK",
"value": "022"
},
{
"name": "DEBUG",
"value": 5
}
]
}
]
}

jq - extract additional JSON object into new array

I have some JSON. It looks like this:
{
"Volumes": [
{
"Attachments": [
{
"VolumeId": "vol-11111111",
"State": "attached",
"DeleteOnTermination": false,
"Device": "/dev/sdz"
}
],
"Tags": [
{
"Value": "volume1",
"Key": "Name"
},
{
"Value": "00:00",
"Key": "Start"
},
{
"Value": "00:20",
"Key": "Finish"
},
{
"Value": "2",
"Key": "Period"
}
],
"VolumeId": "vol-11111111"
},
{
"Attachments": [
{
"VolumeId": "vol-22222222",
"State": "attached",
"DeleteOnTermination": false,
"Device": "/dev/sdz"
}
],
"Tags": [
{
"Value": "volume2",
"Key": "Name"
},
{
"Value": "00:00",
"Key": "Start"
},
{
"Value": "00:20",
"Key": "Finish"
},
{
"Value": "2",
"Key": "Period"
}
],
"VolumeId": "vol-22222222"
},
{
"Attachments": [
{
"VolumeId": "vol-333333333",
"State": "attached",
"DeleteOnTermination": false,
"Device": "/dev/sdz"
}
],
"Tags": [
{
"Value": "volume3",
"Key": "Name"
},
{
"Value": "00:00",
"Key": "Start"
},
{
"Value": "00:20",
"Key": "Finish"
},
{
"Value": "2",
"Key": "Period"
}
],
"VolumeId": "vol-33333333"
}
]
}
Using jq, I am able to extract the following information:
VolumeId,Finish,Start,Period
using the jq command
cat json | jq -r '[.Volumes[]|({VolumeId}+(.Tags|from_entries))|{VolumeId,Finish,Start,Period}]'
[
{
"VolumeId": "vol-11111111",
"Finish": "00:20",
"Start": "00:00",
"Period": "2"
},
{
"VolumeId": "vol-22222222",
"Finish": "00:20",
"Start": "00:00",
"Period": "2"
},
{
"VolumeId": "vol-33333333",
"Finish": "00:20",
"Start": "00:00",
"Period": "2"
}
]
All this works fine. However I have the need to additional extract .Attachments.Device. I am looking for output for each array similar to:
[
{
"VolumeId": "vol-11111111",
"Finish": "00:20",
"Start": "00:00",
"Period": "2",
"DeviceId": "/dev/sdz"
},
{
"VolumeId": "vol-22222222",
"Finish": "00:20",
"Start": "00:00",
"Period": "2",
"DeviceId": "/dev/sdz"
},
{
"VolumeId": "vol-33333333",
"Finish": "00:20",
"Start": "00:00",
"Period": "2",
"DeviceId": "/dev/sdz"
}
]
However I can't figure out how to do this without getting an error. The most logical approach for me would be to do something like:
cat json | jq -r '[.Volumes[]|({VolumeId}+(.Attachments|from_entries)+(.Tags|from_entries))|{VolumeId,Finish,Start,Period,DeviceId}]'
However I get the error:
jq: error (at <stdin>:91): Cannot use null (null) as object key
Any help figuring out what I am not doing correct and how to fix it would be greatly appreciated.
thanks
Ultimately, the problem is that you're using from_entries on the Attachments array when it wouldn't work. from_entries takes an array of key/value pair objects to create an object with those values. However, you don't have key/value pairs, but objects. If you're just trying to combine them, you should use add.
Also, there is no property named DeviceId, it's Device. If you want to select the Device property and get it as DeviceId, you need to provide the correct name.
.Volumes | map(
({ VolumeId } + (.Attachments | add) + (.Tags | from_entries))
| { VolumeId, Finish, Start, Period, DeviceId: .Device }
)

Extract values in json object with awk/sed, but cannot get it to work

I have a file with the return of a curl statement in it, in the form of json. Each object has a set of values, but the parameters for these values are all called the same names. See code below.
These objects are part of a larger object called workflow. The Cleaning up object is the last process that runs in our workflow. For every video that passes through the workflow, a json file in this format is created. (There are more than only these three objects, this is just for illustrative purposes)
I want to take the value of completed of the object with "description": "Cleaning up" and store it as a variable $end_time. Then I want to take the value of completed of the object with "description": "Ingest" and store it as a variable $start_time. These two values are then subtracted to give me an integer time in milliseconds so I can calculate the time it took for the video to go through this part of the process. With the maths part I am fine, and know how to do it. It is the extraction of the values that I am struggling with.
I hope this makes sense? ANY help would be appreciated. Thank you in advance!
EDIT: Had to delete original code in post, due to character limitations
Here is a proper example of the file that I have to work with:
{
"workflows": {
"count": "20",
"searchTime": "1",
"startPage": "0",
"totalCount": "1",
"workflow": {
"configurations": {
"configuration": [
{
"$": "1409750880000",
"key": "schedule.start"
},
{
"$": "1409755980000",
"key": "schedule.stop"
},
{
"$": "Capture_agent",
"key": "schedule.location"
},
{
"$": "false",
"key": "trimHold"
},
{
"$": "true",
"key": "archiveOp"
},
{
"$": "false",
"key": "captionHold"
},
{
"$": "false",
"key": "videoPreview"
}
]
},
"creator": {
"organization": "mh_default_org",
"roles": [
"76b1bdde-a080-40a4-b929-bde89af6a0a8_Instructor",
"ROLE_ADMIN",
"ROLE_ANONYMOUS",
"ROLE_USER"
],
"userName": user_name
},
"description": "This workflow definition defines the steps involved in scheduling a recording, capturing it, and\n ingesting it, after which processing operations may be added.\n ",
"errors": "",
"id": "15518",
"mediapackage": {
"attachments": "",
"creators": {
"creator": "Name"
},
"id": "2d25ed19-2978-458d-a4a0-c9c56d791c68",
"license": "Creative Commons 3.0: Attribution-NonCommercial-NoDerivs",
"media": "",
"metadata": "",
"publications": {
"publication": {
"channel": "engage-player",
"id": "b7b68f91-2c33-4673-ba7c-2e9b891788f9",
"mimetype": "text/html",
"tags": "",
"url": "http://some.url.com:80/engage/ui/watch.html?id=2d25ed19-2978-458d-a4a0-c9c56d791c68"
}
},
"series": "76b1bdde-a080-40a4-b929-bde89af6a0a8",
"seriestitle": "Recording_Title_user_name",
"start": "2014-09-03T13:28:00Z",
"title": "Recording_Title"
},
"operations": {
"operation": [
{
"abortable": "false",
"completed": 1409750882092,
"configurations": {
"configuration": [
{
"$": "1409750880000",
"key": "schedule.start"
},
{
"$": "1409755980000",
"key": "schedule.stop"
},
{
"$": "Capture_agent",
"key": "schedule.location"
}
]
},
"continuable": "false",
"description": "Scheduled",
"execution-history": "",
"execution-host": "http://some.url.com:8080",
"fail-on-error": "true",
"failed-attempts": "0",
"hold-action-title": "View schedule",
"holdurl": "/workflow/hold/org.opencastproject.workflow.handler.scheduleworkflowoperationhandler",
"id": "schedule",
"job": "15519",
"max-attempts": "1",
"retry-strategy": "none",
"started": 1409750881745,
"state": "SUCCEEDED",
"time-in-queue": 0
},
{
"abortable": "false",
"configurations": "",
"continuable": "false",
"description": "Capture",
"execution-history": "",
"execution-host": "http://some.url.com:8080",
"fail-on-error": "true",
"failed-attempts": "0",
"hold-action-title": "Monitor capture",
"holdurl": "/workflow/hold/org.opencastproject.workflow.handler.captureworkflowoperationhandler",
"id": "capture",
"job": "42894",
"max-attempts": "1",
"retry-strategy": "none",
"started": 1409750884085,
"state": "SKIPPED",
"time-in-queue": 0
},
{
"completed": 1409756171224,
"configurations": "",
"description": "Ingest",
"execution-history": "",
"fail-on-error": "true",
"failed-attempts": "0",
"id": "ingest",
"max-attempts": "1",
"retry-strategy": "none",
"state": "SUCCEEDED"
},
{
"completed": 1409854379552,
"configurations": {
"configuration": {
"key": "preserve-flavors"
}
},
"description": "Cleaning up",
"execution-history": "",
"execution-host": "http://some.url.com:8080",
"fail-on-error": "false",
"failed-attempts": "0",
"id": "cleanup",
"job": "45113",
"max-attempts": "1",
"retry-strategy": "none",
"started": 1409854378128,
"state": "SUCCEEDED",
"time-in-queue": 0
}
]
},
"organization": {
"adminRole": "ROLE_ADMIN",
"anonymousRole": "ROLE_ANONYMOUS",
"id": "mh_default_org",
"name": "Opencast Project",
"properties": {
"property": [
{
"$": "true",
"key": "adminui.i18n_tab_episode.enable"
},
{
"$": "false",
"key": "adminui.i18n_tab_users.enable"
},
{
"$": "/engage/ui/img/mh_logos/OpencastLogo.png",
"key": "logo_small"
},
{
"$": "http://opencast.org/matterhorn/",
"key": "engageui.link_mobile_redirect.url"
},
{
"$": "false",
"key": "engageui.annotations.enable"
},
{
"$": "true",
"key": "engageui.links_media_module.enable"
},
{
"$": "2024",
"key": "adminui.chunksize"
},
{
"$": "false",
"key": "adminui.series_prepopulate.enable"
},
{
"$": "true",
"key": "engageui.link_download.enable"
},
{
"$": "false",
"key": "engageui.link_mobile_redirect.enable"
},
{
"$": "For more information have a look at the official site.",
"key": "engageui.link_mobile_redirect.description"
},
{
"$": "/engage/ui/img/mh_logos/MatterhornLogo_large.png",
"key": "logo_large"
}
]
},
"servers": {
"server": {
"name": "localhost",
"port": "8080"
}
}
},
"parent": {
"nil": "true"
},
"state": "SUCCEEDED",
"template": "full",
"title": "Scheduled Workflow"
}
}
}
Here is a jq example that should point you to getting what you want:
#!/bin/bash
# Assuming the json is in a file workflow.json
end_time=$( jq '.workflows.workflow.operations.operation[] | select(.description == "Cleaning up") | .completed' < workflow.json )
start_time=$( jq '.workflows.workflow.operations.operation[] | select(.description == "Ingest") | .completed' < workflow.json )
This is assuming the input you have is in an JSON array called workflow at the top level. Here's this on the command line:
$ jq '.workflows.workflow.operations.operation[] | select(.description == "Ingest") | .completed' < workflow.json
1406051539118
$ jq '.workflows.workflow.operations.operation[] | select(.description == "Cleaning up") | .completed' < workflow.json
1406051695440