jq: how do I update a value based on a substring match? - json

I've got a jq question. Given a file file.json containing:
[
{
"type": "A",
"name": "name 1",
"url": "http://domain.com/path/to/filenameA.zip"
},
{
"type": "B",
"name": "name 2",
"url": "http://domain.com/otherpath/to/filenameB.zip"
},
{
"type": "C",
"name": "name 3",
"url": "http://otherdomain.com/otherpath/to/filenameB.zip"
}
]
I'm looking to create another file using jq with url modified only if the url's value matches some pattern. For example, I'd want to update any url matching the pattern:
http://otherdomain.com.*filenameB.*
to some fixed string such as:
http://yetanotherdomain.com/new/path/to/filenameC.tar.gz
with the resulting json:
[
{
"type": "A",
"name": "name 1",
"url": "http://domain.com/path/to/filenameA.zip"
},
{
"type": "B",
"name": "name 2",
"url": "http://domain.com/otherpath/to/filenameB.zip"
},
{
"type": "C",
"name": "name 3",
"url": "http://yetanotherdomain.com/new/path/to/filenameB.tar.gz"
}
]
I haven't gotten far even on being able to find the url, let alone update it. This is as far as I've gotten (wrong results and doesn't help me with the update issue):
% cat file.json | jq -r '.[] | select(.url | index("filenameB")).url'
http://domain.com/otherpath/to/filenameB.zip
http://otherdomain.com/otherpath/to/filenameB.zip
%
Any ideas on how to get the path of the key that has a value matching a regex? And after that, how to update the key with some new string value? If there are multiple matches, all should be updated with the same new value.

The good news is that there's a simple solution to the problem:
map( if .url | test("http://otherdomain.com.*filenameB.*")
then .url |= sub( "http://otherdomain.com.*filenameB.*";
"http://yetanotherdomain.com/new/path/to/filenameC.tar.gz")
else .
end)
The not-so-good news is that it's not so easy to explain unless you understand the key cleverness here - the "|=" filter. There is plenty of jq documentation about it, so I'll just point out that it is similar to the += family of operators in the C family of programming languages.
Specifically, .url |= sub(A;B) is like .url = (.url|sub(A;B)). That is how the update is done "in-place".

Here is a solution which identifies paths to url members with tostream and select and then updates the values using reduce and setpath
"http://otherdomain.com.*filenameB.*" as $from
| "http://yetanotherdomain.com/new/path/to/filenameC.tar.gz" as $to
| reduce (tostream | select(length == 2 and .[0][-1] == "url")) as $p (
.
; setpath($p[0]; $p[1] | sub($from; $to))
)

Related

Combine files in jq based on similar ID object and reform data

Preface: If the following is not possible with jq, then I completely accept that as an answer and will try to force this with bash.
I have two files that contain some IDs that, with some massaging, should be able to be combined into a single file. I have some content that I'll add to that as well (as seen in output). Essentially "mitre_test" should get compared to "sys_id". When compared, the "mitreid" from in2.json becomes technique_ID in the output (and is generally the unifying field of each output object).
Caveats:
There are some junk "desc" values placed in the in1.json that are there to make sure this is as programmatic as possible, and there are actually numerous junk inputs on the true input file I am using.
some of the mitre_test values have pairs and are not in a real array. I can split on those and break them out, but find myself losing the other information from in1.json.
Notice in the "metadata" for the output that is contains the "number" values from in1.json, and stored in a weird way (but the way that the receiving tool requires).
in1.json
[
{
"test": "Execution",
"mitreid": "T1204.001",
"mitre_test": "90b"
},
{
"test": "Defense Evasion",
"mitreid": "T1070.001",
"mitre_test": "afa"
},
{
"test": "Credential Access",
"mitreid": "T1556.004",
"mitre_test": "14b"
},
{
"test": "Initial Access",
"mitreid": "T1200",
"mitre_test": "f22"
},
{
"test": "Impact",
"mitreid": "T1489",
"mitre_test": "fa2"
}
]
in2.json
[
{
"number": "REL0001346",
"desc": "apple",
"mitre_test": "afa"
},
{
"number": "REL0001343",
"desc": "pear",
"mitre_test": "90b"
},
{
"number": "REL0001366",
"desc": "orange",
"mitre_test": "14b,f22"
},
{
"number": "REL0001378",
"desc": "pineapple",
"mitre_test": "90b"
}
]
The output:
[{
"techniqueID": "T1070.001",
"tactic": "defense-evasion",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001346"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1204.001",
"tactic": "execution",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001343"
},
{
"name": "DET_ID",
"value": "REL0001378"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1556.004",
"tactic": "credential-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1200",
"tactic": "initial-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
}
]
I'm assuming I have some splitting to do on mitre_test with something like .mitre_test |= split(",")), and there are some joins I'm assuming, but doing so causes data loss or mixing up of the data. You'll notice the static data in the output exists as well, but is likely easy to place in and as such isn't as much of an issue.
Edit: reduced some of the match IDs so that it is easier to look at while analyzing the in1 and in2 files. Also simplified the two inputs to have a similar structure so that the answer is easier to understand later.
The requirements are somewhat opaque but it's fairly clear that if the task can be done by computer, it can be done using jq.
From the description, it would appear that one of the unusual aspects of the problem is that the "dictionary" defined by in1.json must be derived by splitting the key names that are CSV (comma-separated values). Here therefore is a jq def that will do that:
# Input: a JSON dictionary for which some keys are CSV,
# Output: a JSON dictionary with the CSV keys split on the commas
def refine:
. as $in
| reduce keys_unsorted[] as $k ({};
if ($k|index(","))
then ($k/",") as $keys
| . + ($keys | map( {(.): $in[$k]}) | add)
else .[$k] = $in[$k]
end );
You can see how this works by running:
INDEX($mitre.records[]; .mitre_test) | refine
using an invocation of jq such as:
jq --argfile mitre in1.json -f program.jq in2.json
For the joining part of the problem, there are many relevant Q&As on SO, e.g.
How to join JSON objects on particular fields using jq?
There is probably a much more elegant way to do this, but I ended up manually walking around things and piping to new output.
Explanation:
Read in both files, pull the fields I need.
Break out the mitre_test values that were previously just a comma separated set of values with map and try.
Store the none-changing fields as a variable and then manipulate mitre_test to become an appropriately split array, removing nulls.
Group by mitre_test values, since they are the common thing that the output is based on.
Cleanup more nulls.
Sort output to look like I want it.
jq . in1.json in2.json | \
jq '.[] |{number: .number, test: .test, mitreid: .mitreid, mitre_test: .mitre_test}' |\
jq -s '[. |map(try(.mitre_test |= split(",")) // .)|\
.[] | [.number,.test,.mitreid] as $h | .mitre_test[] |$h + [.] | \
{DET_ID: .[0], tactic: .[1], techniqueID: .[2], mitre_test: .[3]}] |\
del(.[][] | nulls)' |jq '[group_by(.mitre_test)[]|{mitre_test: .[0].mitre_test, techniqueID: [.[].techniqueID],tactic: [.[].tactic], DET_ID: [.[].DET_ID]}]|\
del(.[].techniqueID[] | nulls) | del(.[].tactic[] | nulls) | del(.[].DET_ID[] | nulls)' | \
jq '.[]| [{techniqueID: .techniqueID[0],tactic: .tactic[0], metadata: [{name: "DET_ID",value: .DET_ID[]}]}] | .[] | \
select((.metadata|length)>0)'
It was a long line, so I split it among some of the basic ideas.

How to retrieve recursive path to a specific key (not displaying the parents' key name, but the value from a different key of each parent)

I have the following JSON
[
{
"name": "alpha"
},
{
"fields": [
{
"name": "beta_sub_1"
},
{
"name": "beta_sub_2"
}
],
"name": "beta"
},
{
"fields": [
{
"fields": [
{
"name": "gamma_sub_sub_1"
}
],
"name": "gamma_sub_1"
}
],
"name": "gamma"
}
]
and I would like to get the paths of "name" needed to get to each "name" values. Considering the above code, I would like the following result:
"alpha"
"beta.beta_sub_1"
"beta.beta_sub_2"
"beta"
"gamma.gamma_sub_1.gamma_sub_sub_1"
"gamma.gamma_sub_1"
"gamma"
I've been searching around but I couldn't get to this result. So far, I have this:
tostream as [$p,$v] | select($p[-1] == "name" and $v != null) | "\([$p[0,1]] | join(".")).\($v)"
but this gives me the path with the key name of the parents (and doesn't keep all the intermediary parents.
"0.name.alpha"
"1.fields.beta_sub_1"
"1.fields.beta_sub_2"
"1.name.beta"
"2.fields.gamma_sub_sub_1"
"2.fields.gamma_sub_1"
"2.name.gamma"
Any ideas?
P.S.: I've been searching for very detailed doc on jq but couldn't find anything good enough. If anyone has any recommendations, I'd appreciate.
The problem description does not seem to match the sample input and output, but the following jq program produces the required output:
def descend:
select( type == "object" and has("name") )
| if has("fields") then ([.name] + (.fields[] | descend)) else empty end,
[.name] ;
.[]
| descend
| join(".")
With your input, and using the -r command-line option, this produces:
alpha
beta.beta_sub_1
beta.beta_sub_2
beta
gamma.gamma_sub_1.gamma_sub_sub_1
gamma.gamma_sub_1
gamma
Resources
Apart from the jq manual, FAQ, and Cookbook, you might find the following helpful:
"jq Language Description"
"A Stream-Oriented Introduction to jq"

jq, replace null values on any level, not touching non-null or not existing

please assist to a newbie in jq. :)
I have to update a field with specific name that might occur on any level of JSON structure - and might not. Like with all *.description fields in JSON below:
{
"a": {
"b": [{
"name": "b0",
"description": "b0 has description"
},
{
"name": "b1",
"description": null
},
{
"name": "b2"
}
],
"description": null
},
"s": "Some string value"
}
I need to update "description" value with some dummy value if only it has null value, but do not touch existing values and do not create new fields where they do not exist. So desired result in this case is:
{
"a": {
"b": [{
"name": "b0",
"description": "b0 has description"
},
{
"name": "b1",
"description": "DUMMY DESCRIPTION"
},
{
"name": "b2"
}
],
"description": "DUMMY DESCRIPTION"
},
"s": "Some string value"
}
Here, .a.b[0].description left untouched because it existed and was not null; .a.b[1].description and .a.description are forced to "DUMMY DESCRIPTION" because these field existed and were null; and .a.b[2] as well as root level left untouched because there was no description field at all.
If for example I try to use command on known paths like below
jq '.known.level.description //= "DUMMY DESCRIPTION"' ........
it fails to skip non-existing fields like .a.b[2].description; and, sure, it works on known positions in JSON only. And if I try to do recursive search like:
jq '.. | .description? //= "DUMMY DESCRIPTION"' ........
it does not seem to work correctly on arrays.
What's the correct approach to walk through entire JSON in this case? Thanks!
What's the correct approach to walk through entire JSON in this case?
The answer is walk!
If your jq does not already have walk/1, you can google for it easily enough (jq "def walk"), and then include its def before using it, e.g. as follows:
walk(if type == "object" and has("description") and .description == null
then .description = "DUMMY DESCRIPTION"
else . end)
One option you could consider is using streams. You'll get paths and values to every item in the input. With that you could look for name/value pairs with the name "description" and update the value.
$ jq --arg replacement "DUMMY DESCRIPTION" '
fromstream(tostream | if length == 2 and .[0][-1] == "description"
then .[1] |= (. // $replacement)
else .
end)
' input.json

create an object from an existing json file using 'jq'

I have a messages.json file
[
{
"id": "title",
"description": "This is the Title",
"defaultMessage": "title",
"filepath": "src/title.js"
},
{
"id": "title1",
"description": "This is the Title1",
"defaultMessage": "title1",
"filepath": "src/title1.js"
},
{
"id": "title2",
"description": "This is the Title2",
"defaultMessage": "title2",
"filepath": "src/title2.js"
},
{
"id": "title2",
"description": "This is the Title2",
"defaultMessage": "title2",
"filepath": "src/title2.js"
},
]
I want to create an object
{
"title": "Dummy1",
"title1": "Dummy2",
"title2": "Dummy3",
"title3": "Dummy4"
}
from the top one.
So far I have
jq '.[] | .id' src/messages.json;
And it does give me the IDs
How do I add some random text and make the new object as above?
Can we also create a new JSON file and write the newly created object onto it using jq?
Your output included "title3" so I'll assume that you intended that the second occurrence of "title2" in the input was supposed to refer to "title3".
With this assumption, the following jq program seems to do what you want:
map( .id )
| . as $in
| reduce range(0;length) as $i ({};
. + {($in[$i]): "dummy\(1+$i)"})
In words, extract the values of .id, and then turn each into an object of the form: {(.id) : "dummy\(1+$i)"}
This uses string interpolation, and produces:
{
"title": "dummy1",
"title1": "dummy2",
"title2": "dummy3",
"title3": "dummy4"
}
reduce-free solution
map(.id )
| [., [range(0;length)]]
| transpose
| map( {(.[0]): "dummy\(.[1]+1)"})
| add
Output
Can we also create a new json file and write the newly created object onto it using jq?
Yes, just use output redirection:
jq -f program.jq messages.json > output.json
Addendum
I want a parent object "de" to the already created json file objects
You could just pipe either of the above solutions to: {de: .}

Converting object into array using only jq and bash

I have a JSON formatted stream, full of objects. Each object looks like this:
{
"object": "alpha",
"attributes": [
{
"type": "A",
"description": "a",
"value": 1271129046.9144535
},
{
"type": "B",
"description": "b",
"value": 6738889338.63777
},
{
"type": "C",
"description": "c",
"value": 214918692.38456276
},
{
"type": "D",
"description": "d",
"value": 140222346.75136077
},
{
"type": "E",
"description": "e",
"value": 2085635554.8128803
}
]
}
I'd like to get data out as:
alpha,A,a,1271129046.9144535
alpha,B,b,6738889338.63777
alpha,C,c,214918692.38456276
alpha,D,d,140222346.75136077
alpha,E,e,2085635554.8128803
The next object may be "beta" instead of "alpha", hence I don't want to just strip the "object" key.
My restrictions are that I want to process this stream in a bash pipeline. I'm hoping I can just use "jq" for this, rather than piping through python/ruby/perl etc which I'd rather not depend on if I can help it.
Any ideas would be most grateful!
It looks like you're building up CSV data, the #csv filter was made for this. You just need to collect an array of the values you want to write out and pass it in to the filter. You could do this:
$ jq -r '.attributes[] as $attr | [.object, $attr.type, $attr.description, $attr.value] | #csv' input.json
Which produces this:
"alpha","A","a",1271129046.9144535
"alpha","B","b",6738889338.63777
"alpha","C","c",214918692.38456276
"alpha","D","d",140222346.75136077
"alpha","E","e",2085635554.8128803
(1) Slightly briefer than the accepted answer:
jq -r '[.object] + (.attributes[] | [.type, .description, .value]) | #csv'
(2) If you don't want the quotation marks, then one possibility would be:
jq -r '"\(.object)," + (.attributes[] | "\(.type),\(.description),\(.value)")'