Finding a substring from a JSON attribute with jq

Finding a substring from a JSON attribute with jq - json

I know how to retrieve an entire JSON attribute with jq, but I want to extract only a specific substring. Consider the following sample input:
[
{
"name": "test",
"output": "",
"error": "",
"state": "unknown",
"startTime": 1571292623936,
"endTime": 0,
"extra": {},
"warning": "************************* test Warnings *************************\n*\n* \n*****************************************************************",
"hasWarning": false
},
{
"name": "npm run test",
"output": "\n> DISPLAY was set to: \":99\"\n\nCypress will attempt to fix the problem and rerun.\n\n\n Running: consumer/oct.js... (1 of 1) \nPROCESSING JS RESOURCE FILE FROM:/PMT1469/workspace/E2EI/cypress/e2e/consumer/kindle.js\n{\"dataFile\":\"scripts/regression/transfers/card/kindle.csv\"}\nSENDING JS RESOURCE FILE FROM: /PMT-1469/workspace/E2E-UI { startedTestsAt: '2019-10-17T06:10:59.339Z',\n endedTestsAt: '2019-10-17T06:11:53.542Z',\n totalDuration: 54203,\n totalSuites: 4,\n totalTests: 2,\n totalFailed: 2,\n totalPassed: 0,\n totalPending: 0,\n totalSkipped: 0,\n\n browserPath: '',\n browserName: 'electron',\n reporter: 'mochawesome',\n taskTimeout: 60000,\n video: true,\n known: true }\n",
"error": null,
"state": "success",
"startTime": 1571292631223,
"endTime": 1571292718780,
"extra": {},
"warning": "************************* npm run test Warnings *************************\n*\n* \n*************************************************************************",
"hasWarning": false
}
]
I just want to pick the following values in the above JSON payload which is in "output" attribute.
Expected output:
totalDuration: 54203
totalSuites: 4
totalFailed: 2
totalPassed: 0
totalSkipped: 0
We can easily fetch the attribute values using jq -r '.[].output', but I'm trying to only capture substrings of the form total<something>: <number>.

The inefficient-but-easy answer is to do the bulk of the work in a separate pipeline stage. Assuming GNU tools:
jq -r '.[].output' <in.json \
| grep -Eo '^[[:space:]]+(total[[:alpha:]]+: [[:digit:]]+)' \
| sed -re 's/^[[:space:]]+//'
However, with modern jq, one can do much better:
jq -r '.[].output | scan("total[[:alpha:]]+: [[:digit:]]+")' <in.json

Related

How to populate value with input

From a composer.lock I am creating json output of some fields, like this:
<composer.lock jq '.packages[] | {site: "foo", name: .name, version: .version, type: .type}' | jq -s
It works fine:
{
"site": "foo",
"name": "webmozart/path-util",
"version": "2.3.0",
"type": "library"
},
{
"site": "foo",
"name": "webonyx/graphql-php",
"version": "v14.9.0",
"type": "library"
}
but now I want to add a generated value, using uuidgen, but I can't figure out how to do that.
End result would be something like:
{
"site": "foo",
"name": "webmozart/path-util",
"version": "2.3.0",
"type": "library",
"uuid": "c4e97c3c-147d-4360-a601-6b4f6f5e71bb"
},
{
"site": "foo",
"name": "webonyx/graphql-php",
"version": "v14.9.0",
"type": "library",
"uuid": "6fbe472b-49fe-4064-93f0-09a18a7e1c24"
}
I think I should use input, but all I tried so far have failed.

If your shell supports process substitution, you can avoid a two-pass solution as follows:
jq -nR --argfile input composer.lock '
$input
| .packages[]
| {site: "foo", name: .name, version: .version, type: .type, id: input}
' < <(while true; do uuidgen; done)
The point here is that by using process substitution as above, the termination of the "consumer" (jq) will also terminate the "producer" of UUIDs.

Here's one possibility that does not require your shell to support "process substitution" but which entails two passes, the first for determining the number of UUIDs that are required:
# Syntax: generate N
function generate {
for ((i=0; i<$1; i++)) ; do
uuidgen
done
}
generate $( <composer.lock jq '.packages|length') |
jq -nR --argfile input composer.lock '
($input
| .packages[]
| {site: "foo", name: .name, version: .version, type: .type,
id: input})'

Combine files in jq based on similar ID object and reform data

Preface: If the following is not possible with jq, then I completely accept that as an answer and will try to force this with bash.
I have two files that contain some IDs that, with some massaging, should be able to be combined into a single file. I have some content that I'll add to that as well (as seen in output). Essentially "mitre_test" should get compared to "sys_id". When compared, the "mitreid" from in2.json becomes technique_ID in the output (and is generally the unifying field of each output object).
Caveats:
There are some junk "desc" values placed in the in1.json that are there to make sure this is as programmatic as possible, and there are actually numerous junk inputs on the true input file I am using.
some of the mitre_test values have pairs and are not in a real array. I can split on those and break them out, but find myself losing the other information from in1.json.
Notice in the "metadata" for the output that is contains the "number" values from in1.json, and stored in a weird way (but the way that the receiving tool requires).
in1.json
[
{
"test": "Execution",
"mitreid": "T1204.001",
"mitre_test": "90b"
},
{
"test": "Defense Evasion",
"mitreid": "T1070.001",
"mitre_test": "afa"
},
{
"test": "Credential Access",
"mitreid": "T1556.004",
"mitre_test": "14b"
},
{
"test": "Initial Access",
"mitreid": "T1200",
"mitre_test": "f22"
},
{
"test": "Impact",
"mitreid": "T1489",
"mitre_test": "fa2"
}
]
in2.json
[
{
"number": "REL0001346",
"desc": "apple",
"mitre_test": "afa"
},
{
"number": "REL0001343",
"desc": "pear",
"mitre_test": "90b"
},
{
"number": "REL0001366",
"desc": "orange",
"mitre_test": "14b,f22"
},
{
"number": "REL0001378",
"desc": "pineapple",
"mitre_test": "90b"
}
]
The output:
[{
"techniqueID": "T1070.001",
"tactic": "defense-evasion",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001346"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1204.001",
"tactic": "execution",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001343"
},
{
"name": "DET_ID",
"value": "REL0001378"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1556.004",
"tactic": "credential-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1200",
"tactic": "initial-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
}
]
I'm assuming I have some splitting to do on mitre_test with something like .mitre_test |= split(",")), and there are some joins I'm assuming, but doing so causes data loss or mixing up of the data. You'll notice the static data in the output exists as well, but is likely easy to place in and as such isn't as much of an issue.
Edit: reduced some of the match IDs so that it is easier to look at while analyzing the in1 and in2 files. Also simplified the two inputs to have a similar structure so that the answer is easier to understand later.

The requirements are somewhat opaque but it's fairly clear that if the task can be done by computer, it can be done using jq.
From the description, it would appear that one of the unusual aspects of the problem is that the "dictionary" defined by in1.json must be derived by splitting the key names that are CSV (comma-separated values). Here therefore is a jq def that will do that:
# Input: a JSON dictionary for which some keys are CSV,
# Output: a JSON dictionary with the CSV keys split on the commas
def refine:
. as $in
| reduce keys_unsorted[] as $k ({};
if ($k|index(","))
then ($k/",") as $keys
| . + ($keys | map( {(.): $in[$k]}) | add)
else .[$k] = $in[$k]
end );
You can see how this works by running:
INDEX($mitre.records[]; .mitre_test) | refine
using an invocation of jq such as:
jq --argfile mitre in1.json -f program.jq in2.json
For the joining part of the problem, there are many relevant Q&As on SO, e.g.
How to join JSON objects on particular fields using jq?

There is probably a much more elegant way to do this, but I ended up manually walking around things and piping to new output.
Explanation:
Read in both files, pull the fields I need.
Break out the mitre_test values that were previously just a comma separated set of values with map and try.
Store the none-changing fields as a variable and then manipulate mitre_test to become an appropriately split array, removing nulls.
Group by mitre_test values, since they are the common thing that the output is based on.
Cleanup more nulls.
Sort output to look like I want it.
jq . in1.json in2.json | \
jq '.[] |{number: .number, test: .test, mitreid: .mitreid, mitre_test: .mitre_test}' |\
jq -s '[. |map(try(.mitre_test |= split(",")) // .)|\
.[] | [.number,.test,.mitreid] as $h | .mitre_test[] |$h + [.] | \
{DET_ID: .[0], tactic: .[1], techniqueID: .[2], mitre_test: .[3]}] |\
del(.[][] | nulls)' |jq '[group_by(.mitre_test)[]|{mitre_test: .[0].mitre_test, techniqueID: [.[].techniqueID],tactic: [.[].tactic], DET_ID: [.[].DET_ID]}]|\
del(.[].techniqueID[] | nulls) | del(.[].tactic[] | nulls) | del(.[].DET_ID[] | nulls)' | \
jq '.[]| [{techniqueID: .techniqueID[0],tactic: .tactic[0], metadata: [{name: "DET_ID",value: .DET_ID[]}]}] | .[] | \
select((.metadata|length)>0)'
It was a long line, so I split it among some of the basic ideas.

How to parse json values stored in shell variable

#!/bin/bash
DESCRIBE_VPC=$(aws ec2 describe-vpcs --region us-west-2)
The Json value retrieved from aws ec2 describe-vpcs --region us-west-2 stores in DESCRIBE_VPC which comes to the output format below.
> echo $DESCRIBE_VPC
{
"Vpcs": [
{
"VpcId": "vpc-12345678910",
"InstanceTenancy": "default",
"Tags": [
{
"Value": "arn:aws:cloudformation:us-west-2:12345678910:stack/vpc/0123456-vpcid",
"Key": "stack-id"
},
{
"Value": "vpc-type",
"Key": "Name"
},
],
"CidrBlockAssociationSet": [
{
"AssociationId": "vpc-cidr-123456",
"CidrBlock": "123.456.89.10",
"CidrBlockState": {
"State": "associated"
}
}
],
"State": "available",
"DhcpOptionsId": "dpt-01234567",
"OwnerId": "12345678910",
"CidrBlock": "123.456.789.10",
"IsDefault": false
}
]
}
[root#ip bin]# jq '.Vpcs' $DESCRIBE_VPC
jq: error: Could not open file {: No such file or directory
jq: error: Could not open file "Vpcs":: No such file or directory
parse error: Invalid numeric literal at line 1, column 57
Any suggestions here how to parse the json values stored in variable?

Actually you have to use the (json) content of a shell variable in place of a command's (jq) input. This input should be a file or a stream, but you have it in a shell variable. There are many ways to do this, this is a simple one:
echo "$DESCRIBE_VPC" | jq '.Vpcs'
or
printf "%s" "$DESCRIBE_VPC" | jq '.Vpcs'
or this (for bash shell):
jq '.Vpcs' <<< "$DESCRIBE_VPC"
Also jq can accept variables and json variables. For example you could do this:
jq -n --argjson x "$DESCRIBE_VPC" '$x.Vpcs'
But the first one is usually better.

Replace a keyword with the content of the file

I have a templatized json file called template.json as below:
{
"subject": "Some subject line",
"content": $CONTENT,
}
I have another file called sample.json with the json content as below:
{
"status": "ACTIVE",
"id": 217,
"type": "TEXT",
"name": "string",
"subject": "string",
"url": "contenttemplates/217",
"content": {
"text": "hello ${user_name}",
"variables": [{
"key": "${user_name}",
"value": null
}]
},
"content_footer": null,
"audit": {
"creator": "1000",
"timestamp": 1548613800000,
"product": "2",
"channel": "10",
"party": null,
"event": {
"type": null,
"type_id": "0",
"txn_id": "0"
},
"client_key": "pk6781gsfr5"
}
}
I want to replace $CONTENT from template.json with the content under the tag "content" from the content.json file . I have tried with below sed commands:
sed -i 's/$CONTENT/'$(jq -c '.content' sample.json)'/' template.json
I am getting below error:
sed: -e expression #1, char 15: unterminated `s' command
Can someone please help me to get the right sed command (or any other alternative)?

The jq Cookbook has a section on using jq with templates: https://github.com/stedolan/jq/wiki/Cookbook#using-jq-as-a-template-engine
In the present case, the first technique ("Using jq variables as template variables") matches the already-defined template file (except for the dangling comma), so you could for example write:
jq -n --arg CONTENT "$(jq -c .content sample.json)" '
{"subject": "Some subject line", "content": $CONTENT}'
or use the format:
jq -n --arg CONTENT "$(jq -c .content sample.json)" -f template.jq
(I'd only use the .json suffix for files that hold JSON or JSON streams.)

The output from jq contains spaces, you need to quote them to prevent the shell from tokenizing them.
sed -i 's/$CONTENT/'"$(jq -c '.content' sample.json)/" template.json
See further When to wrap quotes around a shell variable?

With GNU sed:
sed '/$CONTENT/{s/.*/jq -c ".content" sample.json/e}'
Replace the entire line with your command and e (GNU only) to execute the command and replace sed's pattern space with the output of the command.

How to use `jq` to obtain the keys

My json looks like this :
{
"20160522201409-jobsv1-1": {
"vmStateDisplayName": "Ready",
"servers": {
"20160522201409 jobs_v1 1": {
"serverStateDisplayName": "Ready",
"creationDate": "2016-05-22T20:14:22.000+0000",
"state": "READY",
"provisionStatus": "PENDING",
"serverRole": "ROLE",
"serverType": "SERVER",
"serverName": "20160522201409 jobs_v1 1",
"serverId": 2902
}
},
"isAdminNode": true,
"creationDate": "2016-05-22T20:14:23.000+0000",
"totalStorage": 15360,
"shapeId": "ot1",
"state": "READY",
"vmId": 4353,
"hostName": "20160522201409-jobsv1-1",
"label": "20160522201409 jobs_v1 ADMIN_SERVER 1",
"ipAddress": "10.252.159.39",
"publicIpAddress": "10.252.159.39",
"usageType": "ADMIN_SERVER",
"role": "ADMIN_SERVER",
"componentType": "jobs_v1"
}
}
My key keeps changing from time to time. So for example 20160522201409-jobsv1-1 may be something else tomorrow. Also I may more than one such entry in the json payload.
I want to echo $KEYS and I am trying to do it using jq.
Things I have tried :
| jq .KEYS is the command i use frequently.
Is there a jq command to display all the primary keys in the json?
I only care about the hostname field. And I would like to extract that out. I know how to do it using grep but it is NOT a clean approach.

You can simply use: keys:
% jq 'keys' my.json
[
"20160522201409-jobsv1-1"
]
And to get the first:
% jq -r 'keys[0]' my.json
20160522201409-jobsv1-1
-r is for raw output:
--raw-output / -r: With this option, if the filter’s result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes. This can be useful for making jq filters talk to non-JSON-based systems.
Source
If you want a known value below an unknown property, eg xxx.hostName:
% jq -r '.[].hostName' my.json
20160522201409-jobsv1-1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Finding a substring from a JSON attribute with jq - json

Related

How to populate value with input

Combine files in jq based on similar ID object and reform data

How to parse json values stored in shell variable

Replace a keyword with the content of the file

How to use `jq` to obtain the keys

Categories

Resources