Remove matching/non-matching elements of a nested array using jq

Remove matching/non-matching elements of a nested array using jq - json

I need to split the results of a sonarqube analysis history into individual files. Assuming a starting input below,
{
"paging": {
"pageIndex": 1,
"pageSize": 100,
"total": 3
},
"measures": [
{
"metric": "coverage",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "100.0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "100.0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "100.0"
}
]
},
{
"metric": "bugs",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "0"
}
]
},
{
"metric": "vulnerabilities",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "0"
}
]
}
]
}
How do I use jq to clean the results so it only retains the history array entries for each element? The desired output is something like this (output-20181118123808.json for analysis done on "2018-11-18T12:37:08+0000"):
{
"paging": {
"pageIndex": 1,
"pageSize": 100,
"total": 3
},
"measures": [
{
"metric": "coverage",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "100.0"
}
]
},
{
"metric": "bugs",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
}
]
},
{
"metric": "vulnerabilities",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
}
]
}
]
}
I am lost on how to operate only on the sub-elements while leaving the parent structure intact. The naming of the JSON file is going to be handled externally from the jq utility. The sample data provided will be split into 3 files. Some other input can have a variable number of entries, some may be up to 10000. Thanks.

Here is a solution which uses awk to write the distinct files. The solution assumes that the dates for each measure are the same and in the same order, but imposes no limit on the number of distinct dates, or the number of distinct measures.
jq -c 'range(0; .measures[0].history|length) as $i
| (.measures[0].history[$i].date|gsub("[^0-9]";"")), # basis of filename
reduce range(0; .measures|length) as $j (.;
.measures[$j].history |= [.[$i]])' input.json |
awk -F\\t 'fn {print >> fn; fn="";next}{fn="output-" $1 ".json"}'
Comments
The choice of awk here is just for convenience.
The disadvantage of this approach is that if each file is to be neatly formatted, an additional run of a pretty-printer (such as jq) would be required for each file. Thus, if the output in each file is required to be neat, a case could be made for running jq once for each date, thus obviating the need for the post-processing (awk) step.
If the dates of the measures are not in lock-step, then the same approach as above could still be used, but of course the gathering of the dates and the corresponding measures would have to be done differently.
Output
The first two lines produced by the invocation of jq above are as follows:
"201811181237080000"
{"paging":{"pageIndex":1,"pageSize":100,"total":3},"measures":[{"metric":"coverage","history":[{"date":"2018-11-18T12:37:08+0000","value":"100.0"}]},{"metric":"bugs","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]},{"metric":"vulnerabilities","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]}]}

In the comments, the following addendum to the original question appeared:
is there a variation wherein the filtering is based on the date value and not the position? It is not guaranteed that the order will be the same or the number of elements in each metric is going to be the same (i.e. some dates may be missing "bugs", some might have additional metric such as "complexity").
The following will produce a stream of JSON objects, one per date. This stream can be annotated with the date as per my previous answer, which shows how to use these annotations to create the various files. For ease of understanding, we use two helper functions:
def dates:
INDEX(.measures[].history[].date; .)
| keys;
def gather($date): map(select(.date==$date));
dates[] as $date
| .measures |= map( .history |= gather($date) )
INDEX/2
If your jq does not have INDEX/2, now would be an excellent time to upgrade, but in case that's not feasible, here is its def:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);

Related

how to denormalise this json structure

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
],
"name": "dbname1"
}
]
Using jq I tried to generate a simpeler format out of this, until now without any luck.
What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"name": "dbname1",
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"name": "dbname1",
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
]
}
]
where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using
'.[].backup[].name + ":" + .[].backup[].type'
I get a cartesian product:
"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"
how to solve that?

So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try
jq 'map(.backup[] += {name} | del(.name))'
Demo
Then, generating a CSV output using jq is easy: There is a builtin called #csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.
jq -r '.[]
| .backup[] + {name}
| [(.archive | .start, .stop), .name, .info.size, .label, .type]
| #csv
'
Demo

First navigate to backup and only then “print” the stuff you’re interested.
.[].backup[] | .name + ":" + .type

Combine files in jq based on similar ID object and reform data

Preface: If the following is not possible with jq, then I completely accept that as an answer and will try to force this with bash.
I have two files that contain some IDs that, with some massaging, should be able to be combined into a single file. I have some content that I'll add to that as well (as seen in output). Essentially "mitre_test" should get compared to "sys_id". When compared, the "mitreid" from in2.json becomes technique_ID in the output (and is generally the unifying field of each output object).
Caveats:
There are some junk "desc" values placed in the in1.json that are there to make sure this is as programmatic as possible, and there are actually numerous junk inputs on the true input file I am using.
some of the mitre_test values have pairs and are not in a real array. I can split on those and break them out, but find myself losing the other information from in1.json.
Notice in the "metadata" for the output that is contains the "number" values from in1.json, and stored in a weird way (but the way that the receiving tool requires).
in1.json
[
{
"test": "Execution",
"mitreid": "T1204.001",
"mitre_test": "90b"
},
{
"test": "Defense Evasion",
"mitreid": "T1070.001",
"mitre_test": "afa"
},
{
"test": "Credential Access",
"mitreid": "T1556.004",
"mitre_test": "14b"
},
{
"test": "Initial Access",
"mitreid": "T1200",
"mitre_test": "f22"
},
{
"test": "Impact",
"mitreid": "T1489",
"mitre_test": "fa2"
}
]
in2.json
[
{
"number": "REL0001346",
"desc": "apple",
"mitre_test": "afa"
},
{
"number": "REL0001343",
"desc": "pear",
"mitre_test": "90b"
},
{
"number": "REL0001366",
"desc": "orange",
"mitre_test": "14b,f22"
},
{
"number": "REL0001378",
"desc": "pineapple",
"mitre_test": "90b"
}
]
The output:
[{
"techniqueID": "T1070.001",
"tactic": "defense-evasion",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001346"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1204.001",
"tactic": "execution",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001343"
},
{
"name": "DET_ID",
"value": "REL0001378"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1556.004",
"tactic": "credential-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1200",
"tactic": "initial-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
}
]
I'm assuming I have some splitting to do on mitre_test with something like .mitre_test |= split(",")), and there are some joins I'm assuming, but doing so causes data loss or mixing up of the data. You'll notice the static data in the output exists as well, but is likely easy to place in and as such isn't as much of an issue.
Edit: reduced some of the match IDs so that it is easier to look at while analyzing the in1 and in2 files. Also simplified the two inputs to have a similar structure so that the answer is easier to understand later.

The requirements are somewhat opaque but it's fairly clear that if the task can be done by computer, it can be done using jq.
From the description, it would appear that one of the unusual aspects of the problem is that the "dictionary" defined by in1.json must be derived by splitting the key names that are CSV (comma-separated values). Here therefore is a jq def that will do that:
# Input: a JSON dictionary for which some keys are CSV,
# Output: a JSON dictionary with the CSV keys split on the commas
def refine:
. as $in
| reduce keys_unsorted[] as $k ({};
if ($k|index(","))
then ($k/",") as $keys
| . + ($keys | map( {(.): $in[$k]}) | add)
else .[$k] = $in[$k]
end );
You can see how this works by running:
INDEX($mitre.records[]; .mitre_test) | refine
using an invocation of jq such as:
jq --argfile mitre in1.json -f program.jq in2.json
For the joining part of the problem, there are many relevant Q&As on SO, e.g.
How to join JSON objects on particular fields using jq?

There is probably a much more elegant way to do this, but I ended up manually walking around things and piping to new output.
Explanation:
Read in both files, pull the fields I need.
Break out the mitre_test values that were previously just a comma separated set of values with map and try.
Store the none-changing fields as a variable and then manipulate mitre_test to become an appropriately split array, removing nulls.
Group by mitre_test values, since they are the common thing that the output is based on.
Cleanup more nulls.
Sort output to look like I want it.
jq . in1.json in2.json | \
jq '.[] |{number: .number, test: .test, mitreid: .mitreid, mitre_test: .mitre_test}' |\
jq -s '[. |map(try(.mitre_test |= split(",")) // .)|\
.[] | [.number,.test,.mitreid] as $h | .mitre_test[] |$h + [.] | \
{DET_ID: .[0], tactic: .[1], techniqueID: .[2], mitre_test: .[3]}] |\
del(.[][] | nulls)' |jq '[group_by(.mitre_test)[]|{mitre_test: .[0].mitre_test, techniqueID: [.[].techniqueID],tactic: [.[].tactic], DET_ID: [.[].DET_ID]}]|\
del(.[].techniqueID[] | nulls) | del(.[].tactic[] | nulls) | del(.[].DET_ID[] | nulls)' | \
jq '.[]| [{techniqueID: .techniqueID[0],tactic: .tactic[0], metadata: [{name: "DET_ID",value: .DET_ID[]}]}] | .[] | \
select((.metadata|length)>0)'
It was a long line, so I split it among some of the basic ideas.

Create merged JSON array from multiple files using jq

I have multiple JSON files one.json, two.json, three.json with the below format and I want to create a consolidated array from them using jq. So, from all the files I want to extract Name and Value field inside the Parameters and use them to create an array where the id value will be constructed from the Name value and value field will be constructed using Value field value.
input:
one.json:
{
"Parameters": [
{
"Name": "id1",
"Value": "one",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
two.json
{
"Parameters": [
{
"Name": "id2",
"Value": "xyz",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
three.json
{
"Parameters": [
{
"Name": "id3",
"Value": "xyz",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
output:
[
{
"id": "id1",
"value": "one"
},
{
"id": "id2",
"value": "xyz"
},
{
"id": "id3",
"value": "xyz"
}
]
How to achieve this using jq

You can use a reduce expression instead of slurping the whole file into memory (-s); by iterative manipulation of the input file contents and then appending the required fields one at a time.
jq -n 'reduce inputs.Parameters[] as $d (.; . + [ { id: $d.Name, value: $d.Value } ])' one.json two.json three.json
The -n flag is to ensure that we construct the output JSON data from scratch over the input file contents made available over the inputs function. Since reduce works in an iterative manner, for each of the object in the input, we create a final array, creating the KV pair as desired.

pipe in del(... | select(... | ...)) works in v1.6, how to get same result in v1.5?

I'm trying to remove some objects based on tags within an array. I can get it working fine on jqplay.org (v1.6) but is there any way to get the same result in v1.5? I just get an error Invalid path expression with result
The goal is to return the JSON stripped of the top two (content and data) levels, and with the properties of notes stripped out if there isn't a types tag starting with 'x' or 'y' for that note.
Here's the v1.6 working example: https://jqplay.org/s/AVpz_IkfJa
There's also this: https://github.com/stedolan/jq/issues/1146 but I don't know how (or if it's possible) to apply the workaround for del() rather than path(), assuming it's the same basic problem.
JQ instructions:
.content.data
| del(
.hits[].doc.notes[]
| select
( .types
| any(startswith("x") or startswith("y"))
| not
)
)
input JSON:
{
"content": { "data": {
"meta": "stuff",
"hits": [
{ "doc":
{
"id": "10",
"notes": {
"f1": {"name": "F1", "types": ["wwwa", "zzzb"] },
"f2": {"name": "F2", "types": ["xxxa", "yyya"] }
}
},
"score": "1"
},
{ "doc":
{
"id": "11",
"notes": {
"f1": {"name": "F1", "types": ["wwwa", "zzzb"] },
"f3": {"name": "F3", "types": ["qzxb", "xxxb"] }
}
},
"score": "2"
} ] } } }
Desired result:
{
"meta": "stuff",
"hits": [
{
"doc": {
"id": "10",
"notes": {
"f2": {"name": "F2", "types": ["xxxa", "yyya"] }
}
},
"score": "1"
},
{
"doc": {
"id": "11",
"notes": {
"f3": {"name": "F3", "types": ["qzxb", "xxxb"] }
}
},
"score": "2"
} ] }
Any suggestions greatly appreciated. I'm pretty much a jq novice. Even if it's not practically do-able in v1.5 at least I won't lose more hours trying to make it work.

OP back after a few hours - I found something that seems to work, still very interested in any comments / other ways to crack the problem / improvements.
.content.data
| .hits[].doc.notes |= map (
if ( .types | any(startswith("x") or startswith("y")))
then .
else empty
end
)

This is just a variation of the solution proposed by the OP. It illustrates how a complex use of del can be expressed in a more straightforward and robust way by crafting a suitable helper function.
The relevant helper function in the present case implements the stripping-out requirement:
# Input: an object some keys of which are to be removed
def prune:
to_entries
| map( select( any(.value.types[]; test("^(x|y)")) ) )
| from_entries ;
The task can now be accomplished using a one-liner:
.content.data | .hits |= map( .doc.notes |= prune )
Invocation
With the above jq program in program.jq, a suitable invocation of jq
would look like this:
jq -f program.jq input.json

parsing JSON with jq to return value of element where another element has a certain value

I have some JSON output I am trying to parse with jq. I read some examples on filtering but I don't really understand it and my output it more complicated than the examples. I have no idea where to even begin beyond jq '.[]' as I don't understand the syntax of jq beyond that and the hierarchy and terminology are challenging as well. My JSON output is below. I want to return the value for Valid where the ItemName equals Item_2. How can I do this?
"1"
[
{
"GroupId": "1569",
"Title": "My_title",
"Logo": "logo.jpg",
"Tags": [
"tag1",
"tag2",
"tag3"
],
"Owner": [
{
"Name": "John Doe",
"Id": "53335"
}
],
"ItemId": "209766",
"Item": [
{
"Id": 47744,
"ItemName": "Item_1",
"Valid": false
},
{
"Id": 47872,
"ItemName": "Item_2",
"Valid": true
},
{
"Id": 47872,
"ItemName": "Item_3",
"Valid": false
}
]
}
]
"Browse"
"8fj9438jgge9hdfv0jj0en34ijnd9nnf"
"v9er84n9ogjuwheofn9gerinneorheoj"

Except for the initial and trailing JSON scalars, you'd simply write:
.[] | .Item[] | select( .ItemName == "Item_2" ) | .Valid
In your particular case, to ensure the top-level JSON scalars are ignored, you could prefix the above with:
arrays |

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Remove matching/non-matching elements of a nested array using jq - json

Related

how to denormalise this json structure

Combine files in jq based on similar ID object and reform data

Create merged JSON array from multiple files using jq

pipe in del(... | select(... | ...)) works in v1.6, how to get same result in v1.5?

parsing JSON with jq to return value of element where another element has a certain value

Categories

Resources