Converting object into array using only jq and bash - json

I have a JSON formatted stream, full of objects. Each object looks like this:
{
"object": "alpha",
"attributes": [
{
"type": "A",
"description": "a",
"value": 1271129046.9144535
},
{
"type": "B",
"description": "b",
"value": 6738889338.63777
},
{
"type": "C",
"description": "c",
"value": 214918692.38456276
},
{
"type": "D",
"description": "d",
"value": 140222346.75136077
},
{
"type": "E",
"description": "e",
"value": 2085635554.8128803
}
]
}
I'd like to get data out as:
alpha,A,a,1271129046.9144535
alpha,B,b,6738889338.63777
alpha,C,c,214918692.38456276
alpha,D,d,140222346.75136077
alpha,E,e,2085635554.8128803
The next object may be "beta" instead of "alpha", hence I don't want to just strip the "object" key.
My restrictions are that I want to process this stream in a bash pipeline. I'm hoping I can just use "jq" for this, rather than piping through python/ruby/perl etc which I'd rather not depend on if I can help it.
Any ideas would be most grateful!

It looks like you're building up CSV data, the #csv filter was made for this. You just need to collect an array of the values you want to write out and pass it in to the filter. You could do this:
$ jq -r '.attributes[] as $attr | [.object, $attr.type, $attr.description, $attr.value] | #csv' input.json
Which produces this:
"alpha","A","a",1271129046.9144535
"alpha","B","b",6738889338.63777
"alpha","C","c",214918692.38456276
"alpha","D","d",140222346.75136077
"alpha","E","e",2085635554.8128803

(1) Slightly briefer than the accepted answer:
jq -r '[.object] + (.attributes[] | [.type, .description, .value]) | #csv'
(2) If you don't want the quotation marks, then one possibility would be:
jq -r '"\(.object)," + (.attributes[] | "\(.type),\(.description),\(.value)")'

Related

how to denormalise this json structure

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
],
"name": "dbname1"
}
]
Using jq I tried to generate a simpeler format out of this, until now without any luck.
What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"name": "dbname1",
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"name": "dbname1",
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
]
}
]
where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using
'.[].backup[].name + ":" + .[].backup[].type'
I get a cartesian product:
"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"
how to solve that?
So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try
jq 'map(.backup[] += {name} | del(.name))'
Demo
Then, generating a CSV output using jq is easy: There is a builtin called #csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.
jq -r '.[]
| .backup[] + {name}
| [(.archive | .start, .stop), .name, .info.size, .label, .type]
| #csv
'
Demo
First navigate to backup and only then “print” the stuff you’re interested.
.[].backup[] | .name + ":" + .type

Combine files in jq based on similar ID object and reform data

Preface: If the following is not possible with jq, then I completely accept that as an answer and will try to force this with bash.
I have two files that contain some IDs that, with some massaging, should be able to be combined into a single file. I have some content that I'll add to that as well (as seen in output). Essentially "mitre_test" should get compared to "sys_id". When compared, the "mitreid" from in2.json becomes technique_ID in the output (and is generally the unifying field of each output object).
Caveats:
There are some junk "desc" values placed in the in1.json that are there to make sure this is as programmatic as possible, and there are actually numerous junk inputs on the true input file I am using.
some of the mitre_test values have pairs and are not in a real array. I can split on those and break them out, but find myself losing the other information from in1.json.
Notice in the "metadata" for the output that is contains the "number" values from in1.json, and stored in a weird way (but the way that the receiving tool requires).
in1.json
[
{
"test": "Execution",
"mitreid": "T1204.001",
"mitre_test": "90b"
},
{
"test": "Defense Evasion",
"mitreid": "T1070.001",
"mitre_test": "afa"
},
{
"test": "Credential Access",
"mitreid": "T1556.004",
"mitre_test": "14b"
},
{
"test": "Initial Access",
"mitreid": "T1200",
"mitre_test": "f22"
},
{
"test": "Impact",
"mitreid": "T1489",
"mitre_test": "fa2"
}
]
in2.json
[
{
"number": "REL0001346",
"desc": "apple",
"mitre_test": "afa"
},
{
"number": "REL0001343",
"desc": "pear",
"mitre_test": "90b"
},
{
"number": "REL0001366",
"desc": "orange",
"mitre_test": "14b,f22"
},
{
"number": "REL0001378",
"desc": "pineapple",
"mitre_test": "90b"
}
]
The output:
[{
"techniqueID": "T1070.001",
"tactic": "defense-evasion",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001346"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1204.001",
"tactic": "execution",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001343"
},
{
"name": "DET_ID",
"value": "REL0001378"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1556.004",
"tactic": "credential-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1200",
"tactic": "initial-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
}
]
I'm assuming I have some splitting to do on mitre_test with something like .mitre_test |= split(",")), and there are some joins I'm assuming, but doing so causes data loss or mixing up of the data. You'll notice the static data in the output exists as well, but is likely easy to place in and as such isn't as much of an issue.
Edit: reduced some of the match IDs so that it is easier to look at while analyzing the in1 and in2 files. Also simplified the two inputs to have a similar structure so that the answer is easier to understand later.
The requirements are somewhat opaque but it's fairly clear that if the task can be done by computer, it can be done using jq.
From the description, it would appear that one of the unusual aspects of the problem is that the "dictionary" defined by in1.json must be derived by splitting the key names that are CSV (comma-separated values). Here therefore is a jq def that will do that:
# Input: a JSON dictionary for which some keys are CSV,
# Output: a JSON dictionary with the CSV keys split on the commas
def refine:
. as $in
| reduce keys_unsorted[] as $k ({};
if ($k|index(","))
then ($k/",") as $keys
| . + ($keys | map( {(.): $in[$k]}) | add)
else .[$k] = $in[$k]
end );
You can see how this works by running:
INDEX($mitre.records[]; .mitre_test) | refine
using an invocation of jq such as:
jq --argfile mitre in1.json -f program.jq in2.json
For the joining part of the problem, there are many relevant Q&As on SO, e.g.
How to join JSON objects on particular fields using jq?
There is probably a much more elegant way to do this, but I ended up manually walking around things and piping to new output.
Explanation:
Read in both files, pull the fields I need.
Break out the mitre_test values that were previously just a comma separated set of values with map and try.
Store the none-changing fields as a variable and then manipulate mitre_test to become an appropriately split array, removing nulls.
Group by mitre_test values, since they are the common thing that the output is based on.
Cleanup more nulls.
Sort output to look like I want it.
jq . in1.json in2.json | \
jq '.[] |{number: .number, test: .test, mitreid: .mitreid, mitre_test: .mitre_test}' |\
jq -s '[. |map(try(.mitre_test |= split(",")) // .)|\
.[] | [.number,.test,.mitreid] as $h | .mitre_test[] |$h + [.] | \
{DET_ID: .[0], tactic: .[1], techniqueID: .[2], mitre_test: .[3]}] |\
del(.[][] | nulls)' |jq '[group_by(.mitre_test)[]|{mitre_test: .[0].mitre_test, techniqueID: [.[].techniqueID],tactic: [.[].tactic], DET_ID: [.[].DET_ID]}]|\
del(.[].techniqueID[] | nulls) | del(.[].tactic[] | nulls) | del(.[].DET_ID[] | nulls)' | \
jq '.[]| [{techniqueID: .techniqueID[0],tactic: .tactic[0], metadata: [{name: "DET_ID",value: .DET_ID[]}]}] | .[] | \
select((.metadata|length)>0)'
It was a long line, so I split it among some of the basic ideas.

jq find the max in quoted values

Here is my JSON test.jsonfile :
[
{
"name": "nodejs",
"version": "0.1.21",
"apiVersion": "v1"
},
{
"name": "nodejs",
"version": "0.1.20",
"apiVersion": "v1"
},
{
"name": "nodejs",
"version": "0.1.11",
"apiVersion": "v1"
},
{
"name": "nodejs",
"version": "0.1.9",
"apiVersion": "v1"
},
{
"name": "nodejs",
"version": "0.1.8",
"apiVersion": "v1"
}
]
When I use max_by, jq return 0.1.9 instead of 0.1.21 probably due to the quoted value :
cat test.json | jq 'max_by(.version)'
{
"name": "nodejs",
"version": "0.1.9",
"apiVersion": "v1"
}
How can I get the element with version=0.1.21 ?
Semantic version compare is not supported out of the box in jq. You need to play around with the fields split by .
jq 'sort_by(.version | split(".") | map(tonumber))[-1]'
The split(".") takes the string from .version and creates an array of fields i.e. 0.1.21 becomes an array of [ "0", "1", "21"] and map(tonumber) takes an input array and transforms the string elements to an array of digits.
The sort_by() function does a index wise comparison for each of the elements in the array generated from last step and sorts in the ascending order with the object containing the version 0.1.21 at the last. The notation [-1] is to get the last object from this sorted array.
Here's an adaptation of the more general answer using jq at
How to sort Artifactory package search result by version number with JFrog CLI?
def parse:
[splits("[-.]")]
| map(tonumber? // .) ;
max_by(.version|parse)
As a less robust one-liner:
max_by(.version | [splits("[.]")] | map(tonumber))

Get field from another json using jq

I have two .json-files.
The first is 1.json
{
"id": "107709375",
"type": "page",
"title": "SomeTitle",
"space": {
"key": "BUSINT"
},
"version": {
"number": 62
}
}
And the second one logg.json:
{
"id": "228204270",
"type": "page",
"status": "current",
"title": "test-test",
"version": {
"when": "2016-11-23T16:54:18.313+07:00",
"number": 17,
"minorEdit": false
},
"extensions": {
"position": "none"
}
}
Can I paste version.number from logg.json into version.number 1.json using jq? I need something like that (it's absolutely wrong):
jq-win64 ".version.number 1.json" = ".version.number +1" logg.json
Read logg.json as an argument file. You could then access its values to make changes to the other.
$ jq --argfile logg logg.json '.version.number = $logg.version.number + 1' 1.json
Of course you'll need to use double quotes to work in the Windows Command prompt.
> jq --argfile logg logg.json ".version.number = $logg.version.number + 1" 1.json
Although the documentation says to use --slurpfile instead, we only have a single object in the file so it would be totally appropriate to use --argfile instead.

Bash JQ getting multiple values Issue in JSON file

I'm trying to parse a JSON file for getting multiple values. I know how to parse the specific values ( "A"/"B"/"C") in the array (.info.file.hashes[]).
For Example : When issuing the following command over the file b.json
jq -r '.info.file.hashes[] | select(.name == ("A","B","C")).value' b.json
Result :
f34d5f2d4577ed6d9ceec516c1f5a744
66031dad95dfe6ad10b35f06c4342faa
9df25fa4e379837e42aaf6d05d92012018d4b659
Where b.json:
{
"Finish": 1475668827,
"Start": 1475668826,
"info": {
"file": {
"Score": 4,
"file_subtype": "None",
"file_type": "Image",
"hashes": [
{
"name": "A",
"value": "f34d5f2d4577ed6d9ceec516c1f5a744"
},
{
"name": "B",
"value": "66031dad95dfe6ad10b35f06c4342faa"
},
{
"name": "C",
"value": "9df25fa4e379837e42aaf6d05d92012018d4b659"
},
{
"name": "D",
"value": "4a51cc531082d216a3cf292f4c39869b462bf6aa"
},
{
"name": "E",
"value": "e445f412f92b25f3343d5f7adc3c94bdc950601521d5b91e7ce77c21a18259c9"
}
],
"size": 500
}
}
}
Now, how can i get multiple values with "Finish", "Start" along with the hash values? I have tried issuing the command.
jq -r '.info.file.hashes[] | select(.name == ("A","B","C")).value','.Finish','.Start' b.json
and Im getting the result as:
f34d5f2d4577ed6d9ceec516c1f5a744
null
66031dad95dfe6ad10b35f06c4342faa
null
9df25fa4e379837e42aaf6d05d92012018d4b659
null
null
null
Expected Result :
f34d5f2d4577ed6d9ceec516c1f5a744
66031dad95dfe6ad10b35f06c4342faa
9df25fa4e379837e42aaf6d05d92012018d4b659
1475668827
1475668826
Literally just downloaded and read the manual
Try
jq '(.info.file.hashes[] |select(.name == ("A","B","C")).value), .Finish, .Start' b.json
"f34d5f2d4577ed6d9ceec516c1f5a744"
"66031dad95dfe6ad10b35f06c4342faa"
"9df25fa4e379837e42aaf6d05d92012018d4b659"
1475668827
1475668826
Note the brackets used for grouping the pipe separately from the Finish and Start values.