iterating through JSON files adding properties to each with jq - json

I am attempting to iterate through all my JSON files and add properties but I am relatively new jq.
here is what I am attempting:
find hashlips_art_engine/build -type f -name '*.json' | jq '. + {
"creators": [
{
"address": "4iUFmB3H3RZGRrtuWhCMtkXBT51iCUnX8UV7R8rChJsU",
"share": 10
},
{
"address": "2JApg1AXvo1Xvrk3vs4vp3AwamxQ1DHmqwKwWZTikS9w",
"share": 45
},
{
"address": "Zdda4JtApaPs47Lxs1TBKTjh1ZH2cptjxXMwrbx1CWW",
"share": 45
}
]
}'
However this is returning an error:
parse error: Invalid numeric literal at line 2, column 0
I have around 10,000 JSON files that I need to iterate over and add
{
"creators": [
{
"address": "4iUFmB3H3RZGRrtuWhCMtkXBT51iCUnX8UV7R8rChJsU",
"share": 10
},
{
"address": "2JApg1AXvo1Xvrk3vs4vp3AwamxQ1DHmqwKwWZTikS9w",
"share": 45
},
{
"address": "Zdda4JtApaPs47Lxs1TBKTjh1ZH2cptjxXMwrbx1CWW",
"share": 45
}
]
}
to, is this possible or am I barking up the wrong tree on this?
thanks for your assistance with this, I have been searching the web for several hours now but either my terminology is incorrect or there isn't much out there regarding this issue.

The problem is that you are piping the filenames to jq rather than making the contents available to jq.
Most likely you could use the following approach, e.g. if you want the augmented contents of each file to be handled separately:
find ... | while read f ; do jq ... "$f" ; done
An alternative that might be relevant would be:
jq ... $(find ...)

If you have 2 files:
file01.json :
{"a":"1","b":"2"}
file02.json :
{"x":"10","y":"12","z":"15"}
you can:
for f in file*.json ;do cat $f | jq '. + { creators:[{address: "xxx",share:1}] } ' ; done
result:
{
"a": "1",
"b": "2",
"creators": [
{
"address": "xxx",
"share": 1
}
]
}
{
"x": "10",
"y": "12",
"z": "15",
"creators": [
{
"address": "xxx",
"share": 1
}
]
}

Related

how to denormalise this json structure

I have a json formatted overview of backups, generated using pgbackrest. For simplicity I removed a lot of clutter so the main structures remain. The list can contain multiple backup structures, I reduced here to just 1 for simplicity.
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
],
"name": "dbname1"
}
]
Using jq I tried to generate a simpeler format out of this, until now without any luck.
What I would like to see is the backup.archive, backup.info, backup.label, backup.type, name combined in one simple structure, without getting into a cartesian product. I would be very happy to get the following output:
[
{
"backup": [
{
"archive": {
"start": "000000090000000200000075",
"stop": "000000090000000200000075"
},
"name": "dbname1",
"info": {
"size": 1200934840
},
"label": "20220103-122051F",
"type": "full"
},
{
"archive": {
"start": "00000009000000020000007D",
"stop": "00000009000000020000007D"
},
"name": "dbname1",
"info": {
"size": 1168586300
},
"label": "20220103-153304F_20220104-081304I",
"type": "incr"
}
]
}
]
where name is redundantly added to the list. How can I use jq to convert the shown input to the requested output? In the end I just want to generate a simple csv from the data. Even with the simplified structure using
'.[].backup[].name + ":" + .[].backup[].type'
I get a cartesian product:
"dbname1:full"
"dbname1:full"
"dbname1:incr"
"dbname1:incr"
how to solve that?
So, for each object in the top-level array you want to pull in .name into each of its .backup array's elements, right? Then try
jq 'map(.backup[] += {name} | del(.name))'
Demo
Then, generating a CSV output using jq is easy: There is a builtin called #csv which transforms an array into a string of its values with quotes (if they are stringy) and separated by commas. So, all you need to do is to iteratively compose your desired values into arrays. At this point, removing .name is not necessary anymore as we are piecing together the array for CSV output anyway. And we're giving the -r flag to jq in order to make the output raw text rather than JSON.
jq -r '.[]
| .backup[] + {name}
| [(.archive | .start, .stop), .name, .info.size, .label, .type]
| #csv
'
Demo
First navigate to backup and only then “print” the stuff you’re interested.
.[].backup[] | .name + ":" + .type

pipe in del(... | select(... | ...)) works in v1.6, how to get same result in v1.5?

I'm trying to remove some objects based on tags within an array. I can get it working fine on jqplay.org (v1.6) but is there any way to get the same result in v1.5? I just get an error Invalid path expression with result
The goal is to return the JSON stripped of the top two (content and data) levels, and with the properties of notes stripped out if there isn't a types tag starting with 'x' or 'y' for that note.
Here's the v1.6 working example: https://jqplay.org/s/AVpz_IkfJa
There's also this: https://github.com/stedolan/jq/issues/1146 but I don't know how (or if it's possible) to apply the workaround for del() rather than path(), assuming it's the same basic problem.
JQ instructions:
.content.data
| del(
.hits[].doc.notes[]
| select
( .types
| any(startswith("x") or startswith("y"))
| not
)
)
input JSON:
{
"content": { "data": {
"meta": "stuff",
"hits": [
{ "doc":
{
"id": "10",
"notes": {
"f1": {"name": "F1", "types": ["wwwa", "zzzb"] },
"f2": {"name": "F2", "types": ["xxxa", "yyya"] }
}
},
"score": "1"
},
{ "doc":
{
"id": "11",
"notes": {
"f1": {"name": "F1", "types": ["wwwa", "zzzb"] },
"f3": {"name": "F3", "types": ["qzxb", "xxxb"] }
}
},
"score": "2"
} ] } } }
Desired result:
{
"meta": "stuff",
"hits": [
{
"doc": {
"id": "10",
"notes": {
"f2": {"name": "F2", "types": ["xxxa", "yyya"] }
}
},
"score": "1"
},
{
"doc": {
"id": "11",
"notes": {
"f3": {"name": "F3", "types": ["qzxb", "xxxb"] }
}
},
"score": "2"
} ] }
Any suggestions greatly appreciated. I'm pretty much a jq novice. Even if it's not practically do-able in v1.5 at least I won't lose more hours trying to make it work.
OP back after a few hours - I found something that seems to work, still very interested in any comments / other ways to crack the problem / improvements.
.content.data
| .hits[].doc.notes |= map (
if ( .types | any(startswith("x") or startswith("y")))
then .
else empty
end
)
This is just a variation of the solution proposed by the OP. It illustrates how a complex use of del can be expressed in a more straightforward and robust way by crafting a suitable helper function.
The relevant helper function in the present case implements the stripping-out requirement:
# Input: an object some keys of which are to be removed
def prune:
to_entries
| map( select( any(.value.types[]; test("^(x|y)")) ) )
| from_entries ;
The task can now be accomplished using a one-liner:
.content.data | .hits |= map( .doc.notes |= prune )
Invocation
With the above jq program in program.jq, a suitable invocation of jq
would look like this:
jq -f program.jq input.json

jq- merge two json files on a value

i have two json files structured like that:
file 1
[
{
"id": 25422,
"location": "Hotel X",
"suppliers": [
12
]
},
{
"id": 25423,
"location": "Hotel Y",
"suppliers": [
13
]
}]
file 2
[
{
"id": 12,
"vatNumber": "0000000000"
},
{
"id": 14,
"vatNumber": "0000000001"
}]
and i'd like a result like this
[
{
"id": 25422,
"location": "Hotel X",
"suppliers": [
12
],
"vatNumber": "0000000000"
},
{
"id": 25423,
"location": "Hotel Y",
"suppliers": [
13
],
}]
The important thing to me is that the matching vatNumbers, are set in the first file. Supplier arrays are not required anymore after the melding, if it simplifies the job.
Also jq is not essential, but i need something i can use via terminal to set up a script.
Thank you in advance.
Here's one of many possible solutions. If your jq does not have INDEX/2, then either upgrade your jq or include its def (available e.g. from https://github.com/stedolan/jq/blob/master/src/builtin.jq):
Invocation:
jq -n --argfile f1 file1.json --argfile f2 file2.json -f merge.jq
merge.jq:
INDEX($f2[] ; .id) as $dict
| $f1
| map( ($dict[.suppliers[0]|tostring]|.vatNumber) as $vn
| if $vn then .vatNumber = $vn else . end)

Remove matching/non-matching elements of a nested array using jq

I need to split the results of a sonarqube analysis history into individual files. Assuming a starting input below,
{
"paging": {
"pageIndex": 1,
"pageSize": 100,
"total": 3
},
"measures": [
{
"metric": "coverage",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "100.0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "100.0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "100.0"
}
]
},
{
"metric": "bugs",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "0"
}
]
},
{
"metric": "vulnerabilities",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
},
{
"date": "2018-11-21T12:22:39+0000",
"value": "0"
},
{
"date": "2018-11-21T13:09:02+0000",
"value": "0"
}
]
}
]
}
How do I use jq to clean the results so it only retains the history array entries for each element? The desired output is something like this (output-20181118123808.json for analysis done on "2018-11-18T12:37:08+0000"):
{
"paging": {
"pageIndex": 1,
"pageSize": 100,
"total": 3
},
"measures": [
{
"metric": "coverage",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "100.0"
}
]
},
{
"metric": "bugs",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
}
]
},
{
"metric": "vulnerabilities",
"history": [
{
"date": "2018-11-18T12:37:08+0000",
"value": "0"
}
]
}
]
}
I am lost on how to operate only on the sub-elements while leaving the parent structure intact. The naming of the JSON file is going to be handled externally from the jq utility. The sample data provided will be split into 3 files. Some other input can have a variable number of entries, some may be up to 10000. Thanks.
Here is a solution which uses awk to write the distinct files. The solution assumes that the dates for each measure are the same and in the same order, but imposes no limit on the number of distinct dates, or the number of distinct measures.
jq -c 'range(0; .measures[0].history|length) as $i
| (.measures[0].history[$i].date|gsub("[^0-9]";"")), # basis of filename
reduce range(0; .measures|length) as $j (.;
.measures[$j].history |= [.[$i]])' input.json |
awk -F\\t 'fn {print >> fn; fn="";next}{fn="output-" $1 ".json"}'
Comments
The choice of awk here is just for convenience.
The disadvantage of this approach is that if each file is to be neatly formatted, an additional run of a pretty-printer (such as jq) would be required for each file. Thus, if the output in each file is required to be neat, a case could be made for running jq once for each date, thus obviating the need for the post-processing (awk) step.
If the dates of the measures are not in lock-step, then the same approach as above could still be used, but of course the gathering of the dates and the corresponding measures would have to be done differently.
Output
The first two lines produced by the invocation of jq above are as follows:
"201811181237080000"
{"paging":{"pageIndex":1,"pageSize":100,"total":3},"measures":[{"metric":"coverage","history":[{"date":"2018-11-18T12:37:08+0000","value":"100.0"}]},{"metric":"bugs","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]},{"metric":"vulnerabilities","history":[{"date":"2018-11-18T12:37:08+0000","value":"0"}]}]}
In the comments, the following addendum to the original question appeared:
is there a variation wherein the filtering is based on the date value and not the position? It is not guaranteed that the order will be the same or the number of elements in each metric is going to be the same (i.e. some dates may be missing "bugs", some might have additional metric such as "complexity").
The following will produce a stream of JSON objects, one per date. This stream can be annotated with the date as per my previous answer, which shows how to use these annotations to create the various files. For ease of understanding, we use two helper functions:
def dates:
INDEX(.measures[].history[].date; .)
| keys;
def gather($date): map(select(.date==$date));
dates[] as $date
| .measures |= map( .history |= gather($date) )
INDEX/2
If your jq does not have INDEX/2, now would be an excellent time to upgrade, but in case that's not feasible, here is its def:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);

How to convert complex JSON to CSV using JQ 1.4

I am using JQ 1.4 on Windows 64 bit machine.
Below are the contents of input file IP.txt
{
"results": [
{
"name": "Google",
"employees": [
{
"name": "Michael",
"division": "Engineering"
},
{
"name": "Laura",
"division": "HR"
},
{
"name": "Elise",
"division": "Marketing"
}
]
},
{
"name": "Microsoft",
"employees": [
{
"name": "Brett",
"division": "Engineering"
},
{
"name": "David",
"division": "HR"
}
]
}
]
}
{
"results": [
{
"name": "Amazon",
"employees": [
{
"name": "Watson",
"division": "Marketing"
}
]
}
]
}
File contains two "results". 1st result containts information for 2 companies: Google and Microsoft. 2nd result contains information for Amazon.
I want to convert this JSON into csv file with company name and employee name.
"Google","Michael"
"Google","Laura"
"Google","Elise"
"Microsoft","Brett"
"Microsoft","David"
"Amazon","Watson"
I am able to write below script:
jq -r "[.results[0].name,.results[0].employees[0].name]|#csv" IP.txt
"Google","Michael"
"Amazon","Watson"
Can someone guide me to write the script without hardcoding the index values?
Script should be able generate output for any number results and each cotaining information of any number of companies.
I tried using below script which didn't generate expected output:
jq -r "[.results[].name,.results[].employees[].name]|#csv" IP.txt
"Google","Microsoft","Michael","Laura","Elise","Brett","David"
"Amazon","Watson"
You need to flatten down the results first to rows of company and employee names. Then with that, you can convert to csv rows.
map(.results | map({ cn: .name, en: .employees[].name } | [ .cn, .en ])) | add[] | #csv
Since you have a stream of inputs, you'll have to slurp (-s) it in. Since you want to output csv, you'll want to use raw output (-r).