On Windows, I am trying to do an in-place edit of simple JSON files that are malformed. My files look like this, with first object duplicated.
{
"hello":"there"
}
{
"hello":"there"
}
My goal is to have only one object, and discard the second one. I should note that the files have linefeeds, as shown here.
End file should look like
{
"hello":"there"
}
I can match the first group using a regexp like ^({.*?}).*. This is fairly simple.
Seems like a perfect job for sed in-place editing. But apparently no matter what combination of escaping I do in sed, I can't match the brackets. I am using (GNU sed) 4.9 patch by Michael M. Builov.
Some results I get:
# as per original regexp
sed.exe -E "s,^({.*?}).*,d,g" double.json
-> -e expression #1, char 16: Invalid preceding regular expression
# trying to escape paranthesis
sed.exe -E "s,^\({.*?}\).*,d,g" double.json
-> -e expression #1, char 18: Invalid content of \{\}
# escaping curly brackets
sed.exe -E "s,^\(\{.*?\}\).*,d,g" double.json
-> Works, but original file is returned (no match)
Is it possible at all on Windows ? According to this and this comment it seems that Windows for some reason does not like curly brackets with sed.
Note: tested in WSL/Ubuntu, and got same result.
You can try this GNU sed
$ sed -Ez 's/((\{[^}]*}).*)\2/\1/' input_file
{
"hello":"there"
}
jq command line tool is handy to use for JSON values, even on Windows. You can download by this link as an example.
Then save the command
inputs
into a file called double.jq, and then call from command line by
[Windows key]+ cmd
cd to_the_path_where_the_files_reside
jq -f double.jq <double.json
if there are more than two independent objects within the file double.json such as
{
"hello":"there"
}
{
"hello":"there"
}
{
"hello":"there"
}
{
"hello":"there"
}
and want to pick the first one only, then convert code of the double.jq to
[inputs] | unique | .[]
Related
I'm trying to take the contents of a config file (JSON format), strip out extraneous new lines and spaces to be concise and then assign it to an environment variable before starting my application.
This is where I've got so far:
pwr_config=`echo "console.log(JSON.stringify(JSON.parse(require('fs').readFileSync(process.argv[2], 'utf-8'))));" | node - config.json | xargs -0 printf '%q\n'` npm run start
This pipes a short node.js app into the node runtime taking an argument of the file name and it parses and stringifies the JSON file to validate it and remove any unnecessary whitespace. So far so good.
The result of this is then piped to printf, or at least it would be but printf doesn't support input in this way, apparently, so I'm using xargs to pass it in in a way it supports.
I'm using the %q formatter to format the string escaping any characters that would be a problem as part of a command, but when calling printf through xargs, printf claims it doesn't support %q. I think this is perhaps because there is more than one version of printf but I'm not exactly sure how to resolve that.
Any help would be appreciated, even if the solution is completely different from what I've started :) Thanks!
Update
Here's the output I get on MacOS:
$ cat config.json | xargs -0 printf %q
printf: illegal format character q
My JSON file looks like this:
{
"hue_host": "192.168.1.2",
"hue_username": "myUsername",
"port": 12000,
"player_group_config": [
{
"name": "Family Room",
"player_uuid": "ATVUID",
"hue_group": "3",
"on_events": ["media.play", "media.resume"],
"off_events": ["media.stop", "media.pause"]
},
{
"name": "Lounge",
"player_uuid": "STVUID",
"hue_group": "1",
"on_events": ["media.play", "media.resume"],
"off_events": ["media.stop", "media.pause"]
}
]
}
Two ways:
Use xargs to pick up bash's printf builtin instead of the printf(1) executable, probably in /usr/bin/printf(thanks to #GordonDavisson):
pwr_config=`echo "console.log(JSON.stringify(JSON.parse(require('fs').readFileSync(process.argv[2], 'utf-8'))));" | node - config.json | xargs -0 bash -c 'printf "%q\n"'` npm run start
Simpler: you don't have to escape the output of a command if you quote it. In the same way that echo "<|>" is OK in bash, this should also work:
pwr_config="$(echo "console.log(JSON.stringify(JSON.parse(require('fs').readFileSync(process.argv[2], 'utf-8'))));" | node - config.json )" npm run start
This uses the newer $(...) form instead of `...`, and so the result of the command is a single word stored as-is into the pwr_config variable.*
Even simpler: if your npm run start script cares about the whitespace in your JSON, it's fundamentally broken :) . Just do:
pwr_config="$(< config.json)" npm run start
The $(<...) returns the contents of config.json. They are all stored as a single word ("") into pwr_config, newlines and all.* If something breaks, either config.json has an error and should be fixed, or the code you're running has an error and needs to be fixed.
* You actually don't need the "" around $(). E.g., foo=$(echo a b c) and foo="$(echo a b c)" have the same effect. However, I like to include the "" to remind myself that I am specifically asking for all the text to be kept together.
I am trying to change the values in a text file using sed in a Bash script with the line,
sed 's/draw($prev_number;n_)/draw($number;n_)/g' file.txt > tmp
This will be in a for loop. Why is it not working?
Variables inside ' don't get substituted in Bash. To get string substitution (or interpolation, if you're familiar with Perl) you would need to change it to use double quotes " instead of the single quotes:
# Enclose the entire expression in double quotes
$ sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
# Or, concatenate strings with only variables inside double quotes
# This would restrict expansion to the relevant portion
# and prevent accidental expansion for !, backticks, etc.
$ sed 's/draw('"$prev_number"';n_)/draw('"$number"';n_)/g' file.txt > tmp
# A variable cannot contain arbitrary characters
# See link in the further reading section for details
$ a='foo
bar'
$ echo 'baz' | sed 's/baz/'"$a"'/g'
sed: -e expression #1, char 9: unterminated `s' command
Further Reading:
Difference between single and double quotes in Bash
Is it possible to escape regex metacharacters reliably with sed
Using different delimiters for sed substitute command
Unless you need it in a different file you can use the -i flag to change the file in place
Variables within single quotes are not expanded, but within double quotes they are. Use double quotes in this case.
sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
You could also make it work with eval, but don’t do that!!
This may help:
sed "s/draw($prev_number;n_)/draw($number;n_)/g"
You can use variables like below. Like here, I wanted to replace hostname i.e., a system variable in the file. I am looking for string look.me and replacing that whole line with look.me=<system_name>
sed -i "s/.*look.me.*/look.me=`hostname`/"
You can also store your system value in another variable and can use that variable for substitution.
host_var=`hostname`
sed -i "s/.*look.me.*/look.me=$host_var/"
Input file:
look.me=demonic
Output of file (assuming my system name is prod-cfm-frontend-1-usa-central-1):
look.me=prod-cfm-frontend-1-usa-central-1
I needed to input github tags from my release within github actions. So that on release it will automatically package up and push code to artifactory.
Here is how I did it. :)
- name: Invoke build
run: |
# Gets the Tag number from the release
TAGNUMBER=$(echo $GITHUB_REF | cut -d / -f 3)
# Setups a string to be used by sed
FINDANDREPLACE='s/${GITHUBACTIONSTAG}/'$(echo $TAGNUMBER)/
# Updates the setup.cfg file within version number
sed -i $FINDANDREPLACE setup.cfg
# Installs prerequisites and pushes
pip install -r requirements-dev.txt
invoke build
Retrospectively I wish I did this in python with tests. However it was fun todo some bash.
Another variant, using printf:
SED_EXPR="$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)"
sed "${SED_EXPR}" file.txt
or in one line:
sed "$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)" file.txt
Using printf to build the replacement expression should be safe against all kinds of weird things, which is why I like this variant.
I have found a ton of solutions do do what I want with only one exception.
I need to search a .html document and pull a string.
The line containing the string will look like this (1 line, no newlines)
<script type="text/javascript">g_initHeader(0);LiveSearch.attach(ge('oh2345v5ks'));var _ = g_items;_[60]={icon:'INV_Chest_Leather_09',name_enus:'Layered Tunic'};_[6076]={icon:'INV_Pants_11',name_enus:'Tapered Pants'};_[3070]={icon:'INV_Misc_Cape_01',name_enus:'Ensign Cloak'};</script>
The text I need to get is
INV_CHEST_LEATHER_09
When I use awk, grep, and sed, I extract the data between icon:' and ',name_
The problem is, all three of these scripts scan the entire line and use the last occurring ',name_ thus I end up with
INV_Chest_Leather_09',name_enus:'Layered
Tunic'};_[6076]={icon:'INV_Pants_11',name_enus:'Tapered
Pants'};_[3070]={icon:'INV_Misc_Cape_01
Here's the last one I tried
grep -Po -m 1 "(?<=]={icon:').*(?=',name_)"
I've tried awk and sed too, and I don't really have a preference of which one to use.
So basically, I need to search the entire html file, find the first occurrence of icon:', extract the text right after it until the first occurrence after icon:' of ',name_.
With GNU awk for the 3rd arg to match():
$ awk 'match($0,/icon:\047([^\047]+)/,a){print a[1]}' file
INV_Chest_Leather_09
Simple perl approach:
perl -ne 'print "$1\n" if /\bicon:\047([^\047]+)/' file
The output:
INV_Chest_Leather_09
The .* in your regular expression is a greedy matcher, so the pattern will match till the end of the string and then backtrack to match the ,name_ portion. You could try replacing the .* with something like [^,]* (i.e. match anything except comma):
grep -Po -m 1 "(?<=]={icon:')[^,]*(?=',name_)"
I'm creating a Bash script to parse the air pollution levels from the webpage:
http://aqicn.org/city/beijing/m/
There is a lot of stuff in the file, but this is the relevant bit:
"iaqi":[{"p":"pm25","v":[59,21,112],"i":"Beijing pm25 (fine
particulate matter) measured by U.S Embassy Beijing Air Quality
Monitor
(\u7f8e\u56fd\u9a7b\u5317\u4eac\u5927\u4f7f\u9986\u7a7a\u6c14\u8d28\u91cf\u76d1\u6d4b).
Values are converted from \u00b5g/m3 to AQI levels using the EPA
standard."},{"p":"pm10","v":[15,5,69],"i":"Beijing pm10
(respirable particulate matter) measured by Beijing Environmental
Protection Monitoring Center
I want the script to parse and display 2 numbers: current PM2.5 and PM10 levels (the numbers in bold in the above paragraph).
CITY="beijing"
AQIDATA=$(wget -q 0 http://aqicn.org/city/$CITY/m/ -O -)
PM25=$(awk -v FS="(\"p\":\"pm25\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
PM100=$(awk -v FS="(\"p\":\"pm10\",\"v\":\\\[|,[0-9]+)" '{print $2}' <<< $AQIDATA)
echo $PM25 $PM100
Even though I can get PM2.5 levels to display correctly, I cannot get PM10 levels to display. I cannot understand why, because the strings are similar.
Anyone here able to explain?
The following approach is based on two steps:
(1) Extracting the relevant JSON;
(2) Extracting the relevant information from the JSON using a JSON-aware tool -- here jq.
(1) Ideally, the web service would provide a JSON API that would allow one to obtain the JSON directly, but as the URL you have is intended for viewing with a browser, some form of screen-scraping is needed. There is a certain amount of brittleness to such an approach, so here I'll just provide something that currently works:
wget -O - http://aqicn.org/city/beijing/m |
gawk 'BEGIN{RS="function"}
$1 ~/getAqiModel/ {
sub(/.*var model=/,"");
sub(/;return model;}/,"");
print}'
(gawk or an awk that supports multi-character RS can be used; if you have another awk, then first split on "function", using e.g.:
sed $'s/function/\\\n/g' # three backslashes )
The output of the above can be piped to the following jq command, which performs the filtering envisioned in (2) above.
(2)
jq -c '.iaqi | .[]
| select(.p? =="pm25" or .p? =="pm10") | [.p, .v[0]]'
The result:
["pm25",59]
["pm10",15]
I think your problem is that you have a single line HTML file that contains a script that contains a variable that contains the data you are looking for.
Your field delimiters are either "p":"pm100", "v":[ or a comma and some digits.
For pm25 this works, because it is the first, and there are no occurrences of ,21 or something similar before it.
However, for pm10, there are some that are associated with pm25 ahead of it. So the second field contains the empty string between ,21 and ,112
#karakfa has a hack that seems to work -- but he doesn't explain very well why it works.
What he does is use awk's record separator (which is usually a newline) and sets it to either of :, ,, or [. So in your case, one of the records would be "pm25", because it is preceded by a colon, which is a separator, and succeeded by a comma, also a separator.
Once it hits the matching content ("pm25") it sets a counter to 4. Then, for this and the next records, it counts this counter down. "pm25" itself, "v", the empty string between : and [, and finally reaches one when hitting the record with the number you want to output: 4 && ! 3 is false, 3 && ! 2 is false, 2 && ! 1 is false, but 1 && ! 0 is true. Since there is no execution block, awk simply prints this record, which is the value you want.
A more robust work would probably be using xpath to find the script, then use some json parser or similar to get the value.
chw21's helpful answer explains why your approach didn't work.
peak's helpful answer is the most robust, because it employs proper JSON parsing.
If you don't want to or can't use third-party utility jq for JSON parsing, I suggest using sed rather than awk, because awk is not a good fit for field-based parsing of this data.
$ sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA"
59 15
The above should work with both GNU and BSD/OSX sed.
To read the result into variables:
read pm25 pm10 < \
<(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).+"pm10"[^[]+\[([0-9]+).*$/\1 \2/' <<< "$AQIDATA")
Note how I've chosen lowercase variable names, because it's best to avoid all upper-case variables in shell programming, so as to avoid conflicts with special shell and environment variables.
If you can't rely on the order of the values in the source string, use two separate sed commands:
pm25=$(sed -E 's/^.*"pm25"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
pm10=$(sed -E 's/^.*"pm10"[^[]+\[([0-9]+).*$/\1/' <<< "$AQIDATA")
awk to the rescue!
If you have to, you can use this hacky way using smart counters with hand-crafted delimiters. Setting RS instead of FS transfers looping through fields to awk itself. Multi-char RS is not available for all awks (gawk supports it).
$ awk -v RS='[:,[]' '$0=="\"pm25\""{c=4} c&&!--c' file
59
$ awk -v RS='[:,[]' '$0=="\"pm10\""{c=4} c&&!--c' file
15
I'm inserting a git diff of changed files into a JSON object to send using a curl request.
The problem is it doesn't like the new-line characters being inserted into the JSON but I'm not sure how to get around that. Translate tool didn't work, this perl solution I'm using is close but just replaces with spaces:
changedfiles=$(git diff --name-only $3..$4 | perl -p -e 's/\n/ /')
and changing it to this didn't help:
changedfiles=$(git diff --name-only $3..$4 | perl -p -e 's/\n/\\n/')
Can anyone point me in the right direction? It doesn't need to use perl, it just needs to work
(...being simple would be nice too)
Instead of trying to do ad-hoc escaping for characters that your immediate testing finds problematic, how about using an actual JSON library that handles all of them in a solid way?
Here's an example in bash using inlined python:
python -c '
import json
import sys
print(json.dumps({"data": sys.argv[1]}))
' "$(git diff --name-only $3..$4)"
It prints the json object { "data": "your command output here" } with standards compliant escaping.
This is what I think you want to do to get a quoted list of files separated by commas (i.e. for inserting into a JSON string):
git diff --name-only $3..$4 | perl -p -e 's/(.*)/"$1",/;s/\n//;s/""/","/'
This works if your files don't contain double quotes or special characters that need to be JSON escaped.
First, we put the files in quotes followed by a comma, then remove newlines, then change the "" between files to ",". Although, this is kind of a hack. Somewhat better might be:
git diff --name-only $3..$4 | perl -p -e '$/="";s/(.*)\n/"$1",/g;s/,$//'
Here we read in the whole input, newlines and all, do our substitution and remove the final comma.