Find and edit a Json file using bash - json

I have multiple files in the following format with different categories like:
{
"id": 1,
"flags": ["a", "b", "c"],
"name": "test",
"category": "video",
"notes": ""
}
Now I want to append all the files flags whose category is video with string d. So my final file should look like the file below:
{
"id": 1,
"flags": ["a", "b", "c", "d"],
"name": "test",
"category": "video",
"notes": ""
}
Now using the following command I am able to find files of my interest, but now I want to work with editing part which I an unable to find as there are 100's of file to edit manually, e.g.
find . - name * | xargs grep "\"category\": \"video\"" | awk '{print $1}' | sed 's/://g'

You can do this
find . -type f | xargs grep -l '"category": "video"' | xargs sed -i -e '/flags/ s/]/, "d"]/'
This will find all the filnames which contain line with "category": "video", and then add the "d" flag.
Details:
find . -type f
=> Will get all the filenames in your directory
xargs grep -l '"category": "video"'
=> Will get those filenames which contain the line "category": "video"
xargs sed -i -e '/flags/ s/]/, "d"]/'
=> Will add the "d" letter to the flags:line.

"TWEET!!" ... (yellow flag thown to the ground) ... Time Out!
What you have, here, is "a JSON file." You also have, at your #!shebang command, your choice of(!) full-featured programming languages ... with intimate and thoroughly-knowledgeale support for JSON ... with which you can very-speedily write your command-file.
Even if it is "theoretically possible" to do this using "bash scripts," this is roughly equivalent to "putting a beautiful stone archway over the front-entrance to a supermarket." Therefore, "waste ye no time" in such an utterly-profitless pursuit. Write a script, using a language that "honest-to-goodness knows about(!) JSON," to decode the contents of the file, then manipulate it (as a data-structure), then re-encode it again.

Here is a more appropriate approach using PHP in shell:
FILE=foo2.json php -r '$file = $_SERVER["FILE"]; $arr = json_decode(file_get_contents($file)); if ($arr->category == "video") { $arr->flags[] = "d"; file_put_contents($file,json_encode($arr)); }'
Which will load the file, decode into array, add "d" into flags property only when category is video, then write back to the file in JSON format.
To run this for every json file, you can use find command, e.g.
find . -name "*.json" -print0 | while IFS= read -r -d '' file; do
FILE=$file
# run above PHP command in here
done

If the files are in the same format, this command may help (version for a single file):
ex +':/category.*video/norm kkf]i, "d"' -scwq file1.json
or:
ex +':/flags/,/category/s/"c"/"c", "d"/' -scwq file1.json
which is basically using Ex editor (now part of Vim).
Explanation:
+ - executes Vim command (man ex)
:/pattern_or_range/cmd - find pattern, if successful execute another Vim commands (:h :/)
norm kkf]i - executes keystrokes in normal mode
kk - move cursor up twice
f] - find ]
i, "d" - insert , "d"
-s - silent mode
-cwq - executes wq (write & quit)
For multiple files, use find and -execdir or extend above ex command to:
ex +'bufdo!:/category.*video/norm kkf]i, "d"' -scxa *.json
Where bufdo! executes command for every file, and -cxa saves every file. Add -V1 for extra verbose messages.
If flags line is not 2 lines above, then you may perform backward search instead. Or using similar approach to #sps by replacing ] with d.
See also: How to change previous line when the pattern is found? at Vim.SE.

Using jq:
find . -type f | xargs cat | jq 'select(.category=="video") | .flags |= . + ["d"]'
Explanation:
jq 'select(.category=="video") | .flags |= . + ["d"]'
# select(.category=="video") => filters by category field
# .flags |= . + ["d"] => Updates the flags array

Related

Bulk update values in json files (writing files)

I have a set of JSON files in a local folder. What I want to do is change a particular string value in it, permanently. That means, deleting or modifying the old entry, writing a new one, and saving it.
Below is the format of the file:
{
"name": "ABC #1",
"description": "This is the description",
"image": "ipfs://NewUriToReplace/1.png",
"dna": "a56c520f57ba2a861de8c78099b4691f9dad6e87",
"edition": 1,
"date": 1641634646966,
"creator": "Team Dreamlabs",
"attributes": [
{
I want to change ABA #1 to ABC #9501 in this file, ABC #2 to ABC #9502 in the text file, and so on. How do I do that on MAC in one go?
As I understand from the example, you are adding a value of 9500 to your integers after the symbol #.
Because this kind of a replacement is a kind of string operation, a cycle with command sed might be used:
for f in *.json; do sed -i.bak 's/\("name": "ABC #\)\([0-9]\)",/\1950\2",/' $f; done
it just replaces a single digit to the new composition... Despite it responses to the example, obviously, it would not work for more than number #9.
Then we need to use a bash function:
function add_number() { old_number=$(cat $1 | sed -n 's/[ ]*"name": "ABC #\([0-9]*\)",/\1/p'); new_number=$(($old_number+9500)); sed -i.bak "s/\(\"name\": \"ABC #\)\([0-9]*\)\",/\1${new_number}\",/" $1; }; for f in *.json; do add_number $f ; done
The function add_number extracts the integer value, then adds a desired number to it and then replaces content of the file.
For both extraction and replacing the sed is used again.
At extraction flag -n allows to limit the amount of lines at sed output and mode p prints the result of replacement. Also, we do not want spaces symbols to pass into this assignment.
At replacement double quotes used in order to enable the bash to use the variable value inside of sed. Also, the real quotes are masked.
Regarding addition from the comment below, in order to make replacement in another line with tag edition (and using the same number), just a new replacement sed operation should be added with amended regular expression to fit this line.
Finally, the overall code in a better look:
function add_number() {
old_number=$(cat $1 | sed -n 's/[ ]*"name": "ABC #\([0-9]*\)",/\1/p')
new_number=$(($old_number+9500))
sed -i.bak "s/\(\"name\": \"ABC #\)[0-9]*\",/\1${new_number}\",/" $1
sed -i.bak "s/\(\"edition\": \)[0-9]*,/\1${new_number},/" $1
}
for f in *.json
do add_number $f
done
Those previous answers helped me to write this code:
using variables inside of sed
assigning the variable
If you are going to manipulate your JSON files on more than just this one occasion, then you might want to consider using tools that are designed to accomplish such tasks with ease.
One popular choice could be jq which is a "lightweight and flexible command-line JSON processor" that "has zero runtime dependencies" and is also available for OS X. By using jq within your shell, the following would be one way to accomplish what you have asked for.
Adding the numeric value 9500 to the number sitting in the field called edition:
jq '.edition += 9500' file.json
Interpreting a part of a string as number, adding again 9500 to it, and recomposing the string:
jq '.name |= ((./"#" | .[1] |= "\(tonumber + 9500)") | join("#"))' file.json
On the whole, iterating over your files, making both changes at once, writing to a temporary file and replacing the original on success, while having the value to be added as external variable:
v=9500
for f in *.json; do jq --argjson v $v '
.edition += $v | .name |= ((./"#" | .[1] |= "\(tonumber + $v)") | join("#"))
' "$f" > "$f.new" && mv "$f.new" "$f"
done
Here is an online "playground for jq", set up to simulate the application of my code from above to three imaginary files of yours. Feel free to edit the jq filter and/or the input JSON in order to see what could be possible using jq.

jq: filter result by value (contains) is very slow

I am trying to use jq to filter a large number of JSON files and extract the ids of each object who belong to a specific domain, as well as the full URL within that domain. Here's a sample of the data:
{
"items": [
{
"completeness": 5,
"dcLanguageLangAware": {
"def": [
"de"
]
},
"edmIsShownBy": [
"https://gallica.example/image/2IC6BQAEGWUEG4OP7AYBDGIGYAX62KZ6H366KXP2IKVAF4LKY37Q/presentation_images/5591be60-01fc-11e6-8e10-fa163e091926/node-3/image/SBB/Berliner_Börsenzeitung/1920/02/27/F_065_098_0/F_SBB_00007_19200227_065_098_0_001/full/full/0/default.jpg"
],
"id": "/9200355/BibliographicResource_3000117730632",
"type": "TEXT",
"ugc": [
false
]
}
]
}
Bigger sample here: https://www.dropbox.com/s/0s0zjtxe01mecjc/AoQhRn%2B56KDm5AJJPwEvOTIwMDUyMC9hcmtfXzEyMTQ4X2JwdDZrMTAyNzY2Nw%3D%3D.json?dl=0
I can extract both ids and URL which contains the string "gallica" using the following command:
jq '[ .items[] | select(.edmIsShownBy[] | contains ("gallica")) | {id: .id, link: .edmIsShownBy[] }]'
However, i have more than 28000 JSON files to process and it is taking a large amount of time (around 1 file per minute). I am processing the files using bash with the command:
find . -name "*.json" -exec cat '{}' ';' | jq '[ .items[] | select(.edmIsShownBy[] | contains ("gallica")) | {id: .id, link: .edmIsShownBy[] }]'
I was wondering if the slowness is due by the instruction given to jq, and if it is the case, is there a faster way to filter a string contained in a chosen value? Any ideas?
It would probably be wise not to attempt to cat all the files at once; indeed, it would probably be best to avoid cat altogether.
For example, assuming program.jq contains whichever jq program you decide on (and there is nothing wrong with using contains here), you could try:
find . -name "*.json" -exec jq -f program.jq '{}' +
Using the non-standard + instead of ';' minimizes the number of times jq must be called, though the overhead of invoking jq is actually quite small. If your find does not support + and you wish to avoid calling jq once per file, then consider using xargs, or GNU parallel with the —-xargs option.
If you know the JSON files of interest are in the pwd, you could also speed up find by specifying -maxdepth 1.

JQ Group Multiple Files

I have a set of JSON that all contain JSON in the following format:
File 1:
{ "host" : "127.0.0.1", "port" : "80", "data": {}}
File 2:
{ "host" : "127.0.0.2", "port" : "502", "data": {}}
File 3:
{ "host" : "127.0.0.1", "port" : "443", "data": {}}
These files can be rather large, up to several gigabytes.
I want to use JQ or some other bash json processing tool that can merge these json files into one file with a grouped format like so:
[{ "host" : "127.0.0.1", "data": {"80": {}, "443" : {}}},
{ "host" : "127.0.0.2", "data": {"502": {}}}]
Is this possible with jq and if yes, how could I possibly do this? I have looked at the group_by function in jq, but it seems like I need to combine all files first and then group on this big file. However, since the files can be very large, it might make sense to stream the data and group them on the fly.
With really big files, I'd look into a primarily disk based approach instead of trying to load everything into memory. The following script leverages sqlite's JSON1 extension to load the JSON files into a database and generate the grouped results:
#!/usr/bin/env bash
DB=json.db
# Delete existing database if any.
rm -f "$DB"
# Create table. Assuming each host,port pair is unique.
sqlite3 -batch "$DB" <<'EOF'
CREATE TABLE data(host TEXT, port INTEGER, data TEXT,
PRIMARY KEY (host, port)) WITHOUT ROWID;
EOF
# Insert the objects from the files into the database.
for file in file*.json; do
sqlite3 -batch "$DB" <<EOF
INSERT INTO data(host, port, data)
SELECT json_extract(j, '\$.host'), json_extract(j, '\$.port'), json_extract(j, '\$.data')
FROM (SELECT json(readfile('$file')) AS j) as json;
EOF
done
# And display the results of joining the objects Could use
# json_group_array() instead of this sed hackery, but we're trying to
# avoid building a giant string with the entire results. It might still
# run into sqlite maximum string length limits...
sqlite3 -batch -noheader -list "$DB" <<'EOF' | sed '1s/^/[/; $s/,$/]/'
SELECT json_object('host', host,
'data', json_group_object(port, json(data))) || ','
FROM data
GROUP BY host
ORDER BY host;
EOF
Running this on your sample data prints out:
[{"host":"127.0.0.1","data":{"80":{},"443":{}}},
{"host":"127.0.0.2","data":{"502":{}}}]
If the goal is really to produce a single ginormous JSON entity, then presumably that entity is still small enough to have a chance of fitting into the memory of some computer, say C. So there is a good chance of jq being up to the job on C. At any rate, to utilize memory efficiently, you would:
use inputs while performing the grouping operation;
avoid the built-in group_by (since it requires an in-memory sort).
Here then is a two-step candidate using jq, which assumes grouping.jq contains the following program:
# emit a stream of arrays assuming that f is always string-valued
def GROUPS_BY(stream; f):
reduce stream as $x ({}; ($x|f) as $s | .[$s] += [$x]) | .[];
GROUPS_BY(inputs | .data=.port | del(.port); .host)
| {host: .[0].host, data: map({(.data): {}}) | add}
If the JSON files can be captured by *.json, you could then consider:
jq -n -f grouping.jq *.json | jq -s .
One advantage of this approach is that if it fails, you could try using a temporary file to hold the output of the first step, and then processing it later, either by "slurping" it, or perhaps more sensibly distributing it amongst several files, one per .host.
Removing extraneous data
Obviously, if the input files contain extraneous data, you might want to remove it first, e.g. by running
for f in *.json ; do
jq '{host,port}' "$f" | sponge $f
done
or by performing the projection in program.jq, e.g. using:
GROUPS_BY(inputs | {host, data: .port}; .host)
| {host: .[0].host, data: map( {(.data):{}} )}
Here's a script which uses jq to solve the problem without requiring more memory than is needed for the largest group. For simplicity:
it reads *.json and directs output to $OUT as defined at the top of the script.
it uses sponge
#!/usr/bin/env bash
# Requires: sponge
OUT=big.json
/bin/rm -i "$OUT"
if [ -s "$OUT" ] ; then
echo $OUT already exists
exit 1
fi
### Step 0: setup
TDIR=$(mktemp -d /tmp/grouping.XXXX)
function cleanup {
if [ -d "$TDIR" ] ; then
/bin/rm -r "$TDIR"
fi
}
trap cleanup EXIT
### Step 1: find the groups
for f in *.json ; do
host=$(jq -r '.host' "$f")
echo "$f" >> "$TDIR/$host"
done
for f in $TDIR/* ; do
echo $f ...
jq -n 'reduce (inputs | {host, data: {(.port): {} }}) as $in (null;
.host=$in.host | .data += [$in.data])' $(cat $f) | sponge "$f"
done
### Step 2: assembly
i=0
echo "[" > $OUT
find $TDIR -type f | while read f ; do
i=$((i + 1))
if [ $i -gt 1 ] ; then echo , >> $OUT ; fi
cat "$f" >> $OUT
done
echo "]" >> $OUT
Discussion
Besides requiring enough memory to handle the largest group, the main deficiencies of the above implementation are:
it assumes that the .host string is suitable as a file name.
the resultant file is not strictly speaking pretty-printed.
These two issues could however be addressed quite easily with minor modifications to the script without requiring additional memory.

Loop through JSON array shell script

I am trying to write a shell script that loops through a JSON file and does some logic based on every object's properties. The script was initially written for Windows but it does not work properly on a MacOS.
The initial code is as follows
documentsJson=""
jsonStrings=$(cat "$file" | jq -c '.[]')
while IFS= read -r document; do
# Get the properties from the docment (json string)
currentKey=$(echo "$document" | jq -r '.Key')
encrypted=$(echo "$document" | jq -r '.IsEncrypted')
# If not encrypted then don't do anything with it
if [[ $encrypted != true ]]; then
echoComment " Skipping '$currentKey' as it's not marked for encryption"
documentsJson+="$document,"
continue
fi
//some more code
done <<< $jsonStrings
When ran on a MacOs, the whole file is processed at once, so it does not loop through objects.
The closest I got to making it work - after trying a lot of suggestions - is as follows:
jq -r '.[]' "$file" | while read i; do
for config in $i ; do
currentKey=$(echo "$config" | jq -r '.Key')
echo "$currentKey"
done
done
The console result is parse error: Invalid numeric literal at line 1, column 6
I just cannot find a proper way of grabbing the JSON object and reading its properties.
JSON file example
[
{
"Key": "PdfMargins",
"Value": {
"Left":0,
"Right":0,
"Top":20,
"Bottom":15
}
},
{
"Key": "configUrl",
"Value": "someUrl",
"IsEncrypted": true
}
]
Thank you in advance!
Try putting the $jsonStrings in doublequotes: done <<< "$jsonStrings"
Otherwise the standard shell splitting applies on the variable expansion and you probably want to retain the line structure of the output of jq.
You could also use this in bash:
while IFS= read -r document; do
...
done < <(jq -c '.[]' < "$file")
That would save some resources. I am not sure about making this work on MacOS, though, so test this first.

shell script parsing issue: unexpected INVALID_CHARACTER (Windows cmd shell quoting issues?)

What I am doing?
I have one JSON file as sonar-report.json. I want to iterate sonar-report.json in shell script, to read values of json.
To parse JSON file I am using jq https://stedolan.github.io/jq/
So Following code I was trying to execute in shell script
alias jq=./jq-win64.exe
for key in $(jq '.issues | keys | .[]' sonar-report.json); do
echo "$key"
line=$(jq -r ".issues[$key].line" sonar-report.json)
done
Problem
When i execute this, console give me error:
jq: error: syntax error, unexpected INVALID_CHARACTER (Windows cmd shell quoting issues?) at <top-level>, line 1:
If I update my above script, and add static index of array then script works fine
alias jq=./jq-win64.exe
for key in $(jq '.issues | keys | .[]' sonar-report.json); do
echo "$key"
line0=$(jq -r ".issues[0].line" sonar-report.json)
line1=$(jq -r ".issues[1].line" sonar-report.json)
done
so at the end what i want :
I want to iterate values and print in console like
alias jq=./jq-win64.exe
for key in $(jq '.issues | keys | .[]' sonar-report.json); do
line=$(jq -r ".issues[$key].line" sonar-report.json)
echo $line
done
so the output should be
15
This is my JSON file as sonar-report.json
{
"issues": [
{
"key": "016B7970D27939AEBD",
"component": "bits-and-bytes:src/main/java/com/catalystone/statusreview/handler/StatusReviewDecisionLedHandler.java",
"line": 15,
"startLine": 15,
"startOffset": 12,
"endLine": 15,
"endOffset": 14,
"message": "Use the \"equals\" method if value comparison was intended.",
"severity": "MAJOR",
"rule": "squid:S4973",
"status": "OPEN",
"isNew": true,
"creationDate": "2019-06-21T15:19:18+0530"
},
{
"key": "AWtqCc-jtovxS8PJjBiP",
"component": "bits-and-bytes:src/test/java/com/catalystone/statusreview/service/StatusReviewInitiationSerivceTest.java",
"message": "Fix failing unit tests on file \"src/test/java/com/catalystone/statusreview/service/StatusReviewInitiationSerivceTest.java\".",
"severity": "MAJOR",
"rule": "common-java:FailedUnitTests",
"status": "OPEN",
"isNew": false,
"creationDate": "2019-06-18T15:32:08+0530"
}
]
}
please help me, Thanks in advance
This looks to me like an instance of Windows/Unix line-ending incompatibility, indicated in jq bugs 92 (for Cygwin) and 1870 (for MSYS2).
Any of the workarounds indicated in those bug reports should work, but once the fix gets into the release binary (presumably v1.7), the simplest solution is to use the new -b command-line option. (The option is available in recent jq preview builds; see the second bug report listed above):
for key in $(jq -b '.issues | keys | .[]' sonar-report.json); do
line=$(jq -rb ".issues[$key].line" sonar-report.json)
# I added quotes in the next line, because it's better style.
echo "$line"
done
Until the next version of jq is available, or if you don't want to upgrade for some reason, a good workaround is to just remove the CRs by piping the output of jq through tr -d '\r':
for key in $(jq -'.issues | keys | .[]' sonar-report.json | tr -d '\r'); do
line=$(jq -r ".issues[$key].line" sonar-report.json | tr -d '\r')
echo "$line"
done
However, as pointed out in a comment by Cyrus, you probably don't need to iterate line-by-line in a shell loop, which is incredibly inefficient since it leads to reparsing the entire JSON input many times. You can use jq itself to iterate, with the much simpler:
jq '.issues[].line' solar-response.json
which will parse the JSON file just once, and then produce each .line value in the file. (You probably still want to use the -b command-line option or other workaround, depending on what you intend to do with the output.)