Best way to parse JSON-like file using awk/sed - json

I have a file which contents are like the following:
{application_name, [
{settings, [
{generic_1, [
{key_1, "value"},
{key_2, 1},
{key_3, [something, other]}
]},
{generic_2, [
{key_1, "value"},
{key_3, [something, other]}
]},
{{generic_2, specific_1}, [
{key_3, [entirely, different]}
]},
]}
]}
Now I'm looking for a way to parse this using awk or sed (or something else). What I need is to be able to specify a key, and then get the "blockname" returned.
f.e. if I want all settings for key_3 returned as follows:
generic_1 [something, other]
generic_2 [something, other]
specific_1 [entirely, different]
What would be the best way to approach this?

The best solution for how to parse JSON data with sed or awk is... not to do that with sed or awk. They aren't designed for it.
Use a tool that understands JSON like
perl
python
ruby
javascript
jq
Just about anything else
Using anything like sed or awk on this is going to be fragile (at best).

I do agree with Etan that this is a job for another tools.
This is an gnu awk approach (due to multiple characters in RS), not a complete solution.
awk -v RS="generic_[0-9]" 'NR==1 {f=RT;next} {$1=$1;n=split($0,a,"[][]");if (a[1]~/}/) {split(a[1],b,"[ }]");f=b[2]};printf "%s [",f;for (i=1;i<=n;i++) if (a[i]~/key_3/) print a[i+1]"]";f=RT}' file
generic_1 [something, other]
generic_2 [something, other]
specific_1 [entirely, different]
Or some more readable:
awk -v RS="generic_[0-9]" '
NR==1 {
f=RT
next}
{
$1=$1
n=split($0,a,"[][]")
if (a[1]~/}/) {
split(a[1],b,"[ }]")
f=b[2]}
printf "%s [",f
for (i=1;i<=n;i++)
if (a[i]~/key_3/)
print a[i+1]"]"
f=RT
}' file

Related

JQ write each object to subdirectory file

I'm new to jq (around 24 hours). I'm getting the filtering/selection already, but I'm wondering about advanced I/O features. Let's say I have an existing jq query that works fine, producing a stream (not a list) of objects. That is, if I pipe them to a file, it produces:
{
"id": "foo"
"value": "123"
}
{
"id": "bar"
"value": "456"
}
Is there some fancy expression I can add to my jq query to output each object individually in a subdirectory, keyed by the id, in the form id/id.json? For example current-directory/foo/foo.json and current-directory/bar/bar.json?
As #pmf has pointed out, an "only-jq" solution is not possible. A solution using jq and awk is as follows, though it is far from robust:
<input.json jq -rc '.id, .' | awk '
id=="" {id=$0; next;}
{ path=id; gsub(/[/]/, "_", path);
system("mkdir -p " path);
print >> path "/" id ".json";
id="";
}
'
As you will need help from outside jq anyway (see #peak's answer using awk), you also might want to consider using another JSON processor instead which offers more I/O features. One that comes to my mind is mikefarah/yq, a jq-inspired processor for YAML, JSON, and other formats. It can split documents into multiple files, and since its v4.27.2 release it also supports reading multiple JSON documents from a single input source.
$ yq -p=json -o=json input.json -s '.id'
$ cat foo.json
{
"id": "foo",
"value": "123"
}
$ cat bar.json
{
"id": "bar",
"value": "456"
}
The argument following -s defines the evaluation filter for each output file's name, .id in this case (the .json suffix is added automatically), and can be manipulated to further needs, e.g. -s '"file_with_id_" + .id'. However, adding slashes will not result in subdirectories being created, so this (from here on comparatively easy) part will be left over for post-processing in the shell.

How to use data from a JQ key to name a new JSON file

I have been trying to modify the accepted code provided by #peak in this thread: Split a JSON file into separate files. I'm very grateful for this answer, as it saved me many hours.
Both of the solutions provided in that thread produce exactly the results I expect and want within the resulting split files. However, the output files are named "$key.json". I would like the file name to be the data contained in the first field of the output file.
Each output file looks something like this:
{
"name": "Bob Smith",
"description": "(some descriptive text)",
"image": "(link to an image file)",
...
}
I have spent several hours trying to figure out how to get the output file names to be "Bob Smith.json", "Jane Doe.json" etc., instead of "0.json", "1.json", etc. I have tried many different ways of modifying the output parameters printf "%s\n" "$item" > "/tmp/$key.json" and '{ print $2 > "/tmp/" $1 ".json" }' without any success. I am completely new to JQ, so I suspect that the solution may be very simple. But, without spending many more hours learning JQ, I don't think I will be able to find it on my own.
For your convenience, here are the solutions from the previous thread:
jq -cr 'keys[] as $k | "\($k)\n\(.[$k])"' input.json |
while read -r key ; do
read -r item
printf "%s\n" "$item" > "/tmp/$key.json"
done
and
jq -cr 'keys[] as $k | "\($k)\t\(.[$k])"' input.json |
awk -F\\t '{ print $2 > "/tmp/" $1 ".json" }'
Can someone who is proficient in JQ please give me a hint? Thank you.
Blindly using .name as the basis of the filename might not be a great idea,
so please adapt the following to your needs.
Assuming the input has the form as in the previous question, i.e.
{ "item1": { "name": "Bob Smith", ...}, ...}
you could use the following pipeline:
jq -cr '.[] | "\(.name)\t\(.)"' input.json |
awk -F\\t '{ print $2 >> "/tmp/" $1 ".json" }'

Using JQ to create a JSON file from a list of files

I want to create a JSON file from a list of files like:
$ ls -r *
folder1/
file1.pdf
file2.pdf
file3.pdf
folder2/
file4.pdf
file5.pdf
file6.pdf
I want a json file that looks like this:
{
'folder1' : [ 'file1.pdf', 'file2.pdf', 'file3.pdf' ],
'folder2' : [ 'file4.pdf', 'file5.pdf', 'file6.pdf' ]
}
For now I am able to create the list of files with jq but not sure how to name them. This is what I am doing now:
$ ls folder | jq -R -s 'split("\n") -[""]'
[
"file1.pdf",
"file2.pdf",
"file3.pdf"
]
Thanks a lot for the help!
PS. Additionally, I need to include a prefix in the name of files. I can try to do it with sed but if maybe there is an easier way to do it here, that would be even better.
You shouldn't parse the output of ls. Pass folder1/file1.pdf, folder1/file2.pdf, etc. to JQ as positional arguments instead, and parse them in there.
jq -n 'reduce ($ARGS.positional[] / "/") as [$k, $v] (.; .[$k] += ["prefix-\($v)"])' --args */*

How to replace parameter of a json file by a shell script?

Let's say 123.json with below content:
{
"LINE" : {
"A_serial" : "1234",
"B_serial" : "2345",
"C_serial" : "3456",
"X_serial" : "76"
}
}
If I want to use a shell script to change the parameter of X_serial by the original number +1 which is 77 in this example.
I have tried the below script to take out the parameter of X_serial:
grep "X_serial" 123.json | awk {print"$3"}
which outputs 76. But then I don't know how to make it into 77 and then put it back to the parameter of X_serial.
It's not a good idea to use line-oriented tools for parsing/manipulating JSON data. Use jq instead, for example:
$ jq '.LINE.X_serial |= "\(tonumber + 1)"' 123.json
{
"LINE": {
"A_serial": "1234",
"B_serial": "2345",
"C_serial": "3456",
"X_serial": "77"
}
}
This simply updates .LINE.X_serial by converting its value to a number, increasing the result by one, and converting it back to a string.
You need to install powerful JSON querying processor like jq processor. you can can easily install from here
once you install jq processor, try following command to extract the variable from JSON key value
value=($(jq -r '.X_serial' yourJsonFile.json))
you can modify the $value as you preferred operations
With pure Javascript: nodejs and bash :
node <<EOF
var o=$(</tmp/file);
o["LINE"]["X_serial"] = parseInt(o["LINE"]["X_serial"]) + 1;
console.log(o);
EOF
 Output
{ LINE:
{ A_serial: '1234',
B_serial: '2345',
C_serial: '3456',
X_serial: 78 }
}
sed or perl, depending on whether you just need string substitution or something more sophisticated, like arithmetic.
Since you tried grep and awk, let's start with sed:
In all lines that contain TEXT, replace foo with bar
sed -n '/TEXT/ s/foo/bar/ p'
So in your case, something like:
sed -n '/X_serial/ s/\"76\"/\"77\"/ p'
or
$ cat 123.json | sed '/X_serial/ s/\"76\"/\"77\"/' > new.json
This performs a literal substiution: "76" -> "77"
If you would like to perform arithmetic, like "+1" or "+10" then use perl not sed:
$ cat 123.json | perl -pe 's/\d+/$&+10/e if /X_serial/'
{
"LINE" : {
"A_serial" : "1234",
"B_serial" : "2345",
"C_serial" : "3456",
"X_serial" : "86"
}
}
This operates on all lines containing X_serial (whether under "LINE" or under something else), as it is not a json parser.

Use grep to parse a key from a json file and get the value

Can someone suggest how I can get the value 45 after parsing an example json text as shown below :
....
"test": 12
"job": 45
"task": 11
.....
Please note that I am aware of tools like jq and others but this requires it to be installed.
I am hoping to get this executed using grep, awk or sed command.
awk -F'[[:space:]]*:[[:space:]]*' '/^[[:space:]]*"job"/{ print $2 }'
sed -n 's/^[[:space:]]*"job"[[:space:]]*:[[:space:]]*//p'
You can use grep -oP (PCRE):
grep -oP '"job"\s*:\s*\K\d+' file
45
\K is used for reseting the previously matched data.
Using awk, if you just want to print it:
awk -F ':[ \t]*' '/^.*"job"/ {print $2}' filename
Above command matches any line that has "job" at the beginning of a line, and then prints the second column of that line. awk option -F is used to set the column separator as : followed by any number of spaces or tabs.
If you want to store this value in bash variable job_val:
job_val=$(awk -F ':[ \t]*' '/^.*"job"/ {print $2}' filename)
Use specialized tools like jq for the task :
Had your file looked like
[
{
"test": 12,
"job": 45,
"task": 11
}
]
below stuff would get you home
jq ".[].job" file
Had your file looked like
{
"stuff" :{
.
.
"test": 12,
"job": 45,
"task": 11
.
.
}
}
below
jq ".stuff.job" file
would get you home.