Grep from txt file (JSON format) - json

I have a txt in a JSON format:
{
"items": [ {
"downloadUrl" : "some url",
"path": "yxxsf",
"id" : "abc",
"repository" : "example",
"format" : "zip",
"checksum" : {
"sha1" : "kdhjfksjdfasdfa",
"md5" : "skjfhkjshdfkjshfkjsdhf"
}
}],
"continuationToken" : null
}
I want to extract download url context (in this example i want "some url") using grep and store it in another txt file. TBH i have never used grep

Using grep
grep -oP 'downloadUrl"\s:\s"(.*)",' myfile > urlFile.txt
See this Regex in action: https://regex101.com/r/DvnXCO/1
A better way to do this is to use jq
Download jq for Windows: https://stedolan.github.io/jq/download/
jq ".items[0].downloadUrl" myfile > urlFile.txt

Although json string may contain double-quote character escaped by
a backslash, both a double quote and a backslash in URLs should be
percent-encoded according to RFC 3986. Then you can extract the URL with:
tr "[:space:]" " " < file.json | grep -Po '"downloadUrl"\s*:\s*\K"[^"]+"'
Let me use tr to pre-process the json file to convert all blank
characters to whitespaces. Then the following grep will work
if the name and the value pair are in the separate (but consecutive) lines.
The \K operator in the regex is a variable-length look-behind without
including the preceding pattern in the matched result.
Note that the command above works with the provided example but may not
be robust enough for arbitrary inputs. I'd still recommend to use jq
for the strict purpose.

If you want to use only grep:
grep downloadURL myfile > new_file.txt
If you prefer a cleaner option, adding cut command:
grep downloadURL myfile | cut -d\" -f4 > new_file.txt
BTW the image of the json file shows that you are using notepad (windows?)

Related

JQ write each object to subdirectory file

I'm new to jq (around 24 hours). I'm getting the filtering/selection already, but I'm wondering about advanced I/O features. Let's say I have an existing jq query that works fine, producing a stream (not a list) of objects. That is, if I pipe them to a file, it produces:
{
"id": "foo"
"value": "123"
}
{
"id": "bar"
"value": "456"
}
Is there some fancy expression I can add to my jq query to output each object individually in a subdirectory, keyed by the id, in the form id/id.json? For example current-directory/foo/foo.json and current-directory/bar/bar.json?
As #pmf has pointed out, an "only-jq" solution is not possible. A solution using jq and awk is as follows, though it is far from robust:
<input.json jq -rc '.id, .' | awk '
id=="" {id=$0; next;}
{ path=id; gsub(/[/]/, "_", path);
system("mkdir -p " path);
print >> path "/" id ".json";
id="";
}
'
As you will need help from outside jq anyway (see #peak's answer using awk), you also might want to consider using another JSON processor instead which offers more I/O features. One that comes to my mind is mikefarah/yq, a jq-inspired processor for YAML, JSON, and other formats. It can split documents into multiple files, and since its v4.27.2 release it also supports reading multiple JSON documents from a single input source.
$ yq -p=json -o=json input.json -s '.id'
$ cat foo.json
{
"id": "foo",
"value": "123"
}
$ cat bar.json
{
"id": "bar",
"value": "456"
}
The argument following -s defines the evaluation filter for each output file's name, .id in this case (the .json suffix is added automatically), and can be manipulated to further needs, e.g. -s '"file_with_id_" + .id'. However, adding slashes will not result in subdirectories being created, so this (from here on comparatively easy) part will be left over for post-processing in the shell.

How to replace parameter of a json file by a shell script?

Let's say 123.json with below content:
{
"LINE" : {
"A_serial" : "1234",
"B_serial" : "2345",
"C_serial" : "3456",
"X_serial" : "76"
}
}
If I want to use a shell script to change the parameter of X_serial by the original number +1 which is 77 in this example.
I have tried the below script to take out the parameter of X_serial:
grep "X_serial" 123.json | awk {print"$3"}
which outputs 76. But then I don't know how to make it into 77 and then put it back to the parameter of X_serial.
It's not a good idea to use line-oriented tools for parsing/manipulating JSON data. Use jq instead, for example:
$ jq '.LINE.X_serial |= "\(tonumber + 1)"' 123.json
{
"LINE": {
"A_serial": "1234",
"B_serial": "2345",
"C_serial": "3456",
"X_serial": "77"
}
}
This simply updates .LINE.X_serial by converting its value to a number, increasing the result by one, and converting it back to a string.
You need to install powerful JSON querying processor like jq processor. you can can easily install from here
once you install jq processor, try following command to extract the variable from JSON key value
value=($(jq -r '.X_serial' yourJsonFile.json))
you can modify the $value as you preferred operations
With pure Javascript: nodejs and bash :
node <<EOF
var o=$(</tmp/file);
o["LINE"]["X_serial"] = parseInt(o["LINE"]["X_serial"]) + 1;
console.log(o);
EOF
 Output
{ LINE:
{ A_serial: '1234',
B_serial: '2345',
C_serial: '3456',
X_serial: 78 }
}
sed or perl, depending on whether you just need string substitution or something more sophisticated, like arithmetic.
Since you tried grep and awk, let's start with sed:
In all lines that contain TEXT, replace foo with bar
sed -n '/TEXT/ s/foo/bar/ p'
So in your case, something like:
sed -n '/X_serial/ s/\"76\"/\"77\"/ p'
or
$ cat 123.json | sed '/X_serial/ s/\"76\"/\"77\"/' > new.json
This performs a literal substiution: "76" -> "77"
If you would like to perform arithmetic, like "+1" or "+10" then use perl not sed:
$ cat 123.json | perl -pe 's/\d+/$&+10/e if /X_serial/'
{
"LINE" : {
"A_serial" : "1234",
"B_serial" : "2345",
"C_serial" : "3456",
"X_serial" : "86"
}
}
This operates on all lines containing X_serial (whether under "LINE" or under something else), as it is not a json parser.

jq injects literal newline and escape characters instead of an actual newline

I have the following JSON:
{
"overview_ui": {
"display_name": "my display name",
"long_description": "my long description",
"description": "my description"
}
}
I grab it like so:
overview_ui=$(jq -r ".overview_ui" service.json)
I then want to use it to replace content in another JSON file:
jq -r --arg updated_overview_ui_strings "${overview_ui}" '.overview_ui.${language} |= $updated_overview_ui_strings' someOtherFile.json
This works, however it also introduces visible newline \n and escape \ characters instead of actually preserving the newlines as newlines. Why does it do that?
"en": "{\n \"display_name\": \"my display name\",\n \"long_description\": \"my long description\",\n \"description\": \"my description\"\n}",
You have read the overview_ui variable in as a string (using --arg) so when you assigned it, you assigned that string (along with the formatting). You would either have to parse it as an object (using fromjson) or just use --argjson instead.
jq -r --argjson updated_overview_ui_strings "${overview_ui}" ...
Though, you don't really need to have to do this in multiple separate invocations, you can read the file in as an argument so you can do it in one call.
$ jq --argfile service service.json --arg language en '
.overview_ui[$language] = $service.overview_ui
' someOtherFile.json

echo json over command line into file

I have a build tool which is creating a versions.json file injected with a json format string.
Initially I was thinking of just injecting the json via an echo, something like below.
json = {"commit_id": "b8f2b8b", "environment": "test", "tags_at_commit": "sometags", "project": "someproject", "current_date": "09/10/2014", "version": "someversion"}
echo -e json > versions.jso
However the echo seems to escape out all of the quote marks so my file will end up something like this:
{commit_id: b8f2b8b, environment: test, tags_at_commit: somereleasetags, project: someproject, current_date: 09/10/2014, version: someproject}
This unfortunately is not valid JSON.
To preserve double quotes you need to surround your variable in single quotes, like so:
json='{"commit_id": "b8f2b8b", "environment": "test", "tags_at_commit": "sometags", "project": "someproject", "current_date": "09/10/2014", "version": "someversion"}'
echo "$json" > versions.json
Take into account that this method will not display variables correctly, but instead print the literal $variable.
If you need to print variables, use the cat << EOF construct, which utilizes the Here Document redirection built into Bash. See man bash and search for "here document" for more information.
Example:
commit="b8f2b8b"
environment="test"
...etc
cat << EOF > /versions.json
{"commit_id": "$commit", "environment": "$environment", "tags_at_commit": "$tags", "project": "$project", "current_date": "$date", "version": "$version"}
EOF
If you're looking for a more advanced json processing tool that works very well with bash, I'd recommend jq
If you want variables in between you can quote does this '"'$variable'"'. Below is the date example.
echo {'"date"' : '"'$(date +"%d_%m_%Y")'"'} > cron_checkpoint.json

Regex to extract all Starred Items URLs from Google Reader JSON file

Sadly it was announced that Google Reader will be shutdown mid of the year.
Since I have a large amount of starred items in Google Reader I'd like to back them up.
This is possible via Google Reader takeout. It produces a file in JSON format.
Now I would like to extract all of the article urls out of this several MB large file.
At first I thought it would be best to use a regex for url but it seems to be better to extract the needed article urls by a regex to find just the article urls. This will prevent to also extract other urls that are not needed.
Here is a short example how parts of the json file looks:
"published" : 1359723602,
"updated" : 1359723602,
"canonical" : [ {
"href" : "http://arstechnica.com/apple/2013/02/omni-group-unveils-omnifocus-2-omniplan-omnioutliner-4-for-mac/"
} ],
"alternate" : [ {
"href" : "http://feeds.arstechnica.com/~r/arstechnica/everything/~3/EphJmT-xTN4/",
"type" : "text/html"
} ],
I just need the urls you can find here:
"canonical" : [ {
"href" : "http://arstechnica.com/apple/2013/02/omni-group-unveils-omnifocus-2-omniplan-omnioutliner-4-for-mac/"
} ],
Perhaps anyone is in the mood to say how a regex have to look like to extract all these urls?
The benefit would be to have a quick and dirty way to extract starred items urls from Google Reader to import them in services like pocket or evernote, once processed.
I know you asked about regex, but I think there's a better way to handle this problem. Multi-line regular expressions are a PITA, and in this case there's no need for that kind of brain damage.
I would start with grep, rather than a regex. The -A1 parameter says "return the line that matches, and one after":
grep -A1 "canonical" <file>
This will return lines like this:
"canonical" : [ {
"href" : "http://arstechnica.com/apple/2013/02/omni-group-unveils-omnifocus-2-omniplan-omnioutliner-4-for-mac/"
Then, I'd grep again for the href:
grep -A1 "canonical" <file> | grep "href"
giving
"href" : "http://arstechnica.com/apple/2013/02/omni-group-unveils-omnifocus-2-omniplan-omnioutliner-4-for-mac/"
now I can use awk to get just the url:
grep -A1 "canonical" <file> | grep "href" | awk -F'" : "' '{ print $2 }'
which strips out the first quote on the url:
http://arstechnica.com/apple/2013/02/omni-group-unveils-omnifocus-2-omniplan-omnioutliner-4-for-mac/"
Now I just need to get rid of the extra quote:
grep -A1 "canonical" <file> | grep "href" | awk -F'" : "' '{ print $2 }' | tr -d '"'
That's it!
http://arstechnica.com/apple/2013/02/omni-group-unveils-omnifocus-2-omniplan-omnioutliner-4-for-mac/