Grep for a pattern - html

I have an HTML file with the following code
<html>
<body>
Test #1 '<%aaa(x,y)%>'
Test #2 '<%bbb(p)%>'
Test #3 '<%pqr(z)%>'
</body>
</html>
Please help me with the regex for a command (grep or awk) which displays the output as follows:
'<%aaa(x,y)%>'
'<%bbb(p)%>'
'<%pqr(z)%>'

I think that sed is a better choice than awk, but it is not completely clear cut.
sed -n '/ *Test #[0-9]* */s///p' <<!
<html>
<body>
Test #1 '<%aaa(x,y)%>'
Test #2 '<%bbb(p)%>'
Test #3 '<%pqr(z)%>'
</body>
</html>
!
You can't use grep; it returns lines that match a pattern, but doesn't normally edit those lines.
You could use awk:
awk '/Test #[0-9]+/ { print $3 }'
The pattern matches the test lines and prints the third field. It works because there are no spaces after the test number third field. If there could be spaces there, then the sed script is easier; it already handles them, whereas the awk script would have to be modified to handle them properly.
Judging from the comments, the desired output is the material between '<%' and '%>'. So, we use sed, as before:
sed -n '/.*\(<%.*%>\).*/s//\1/p'
On lines which match 'anything-<%-anything-%>-anything', replace the whole line with the part between '<%' and '%>' (including the markers) and print the result. Note that if there are multiple patterns on the line which match, only the last will be printed. (The question and comments do not cover what to do in that case, so this is acceptable. The alternatives are tough and best handled in Perl or perhaps Python.)
If the single quotes on the lines must be preserved, then you can use either of these - I'd use the first with the double quotes surrounding the regex, but they both work and are equivalent. OTOH, if there were expressions involving $ signs or back-ticks in the regex, the single-quotes are better; there are no metacharacters within a single-quoted string at the shell level.
sed -n "/.*\('<%.*%>'\).*/s//\1/p"
sed -n '/.*\('\''<%.*%>'\''\).*/s//\1/p'
The sequence '\'' is how you embed a single quote into a single-quoted string in a shell script. The first quote terminates the current string; the backslash-quote generates a single quote, and the last quote starts a new single-quoted string.

the -o option for grep is what you want:
grep -o "'.*'" filename

grep -P "^Test" 1.htm |awk '{print $3}'

Related

Remove specific tag with its contents using sed

I would like to remove following tag from HTML including its constantly varying contents:
<span class="the_class_name">li4tuq734g23r74r7Whatever</span>
A following BASH script
.... | sed -e :a -re 's/<span class="the_class_name"/>.*</span>//g' > "$NewFile"
ends with error
sed: -e expression #2, char XX: unknown option to `s'
I tried to escape quotes, slashes and "less than" symbols in various combinations and still get this error.
I suggest using a different sed separator than / when / is contained within the thing you want to match on. Also, prefer -E instead of -r for extended regex to be Posix compatible. Also note that you have a / in your first span in your regex that doesn't belong there.
Also, .* will make it overly greedy and eat up any </span> that follows the first </span> on the line. It's better to match on [^<]*. That is, any character that is not <.
sed -E 's,<span class="the_class_name">[^<]*</span>,,g'
A better option is of course to use a HTML parser for this.

Find and replace text in JSON with sed [duplicate]

I am trying to change the values in a text file using sed in a Bash script with the line,
sed 's/draw($prev_number;n_)/draw($number;n_)/g' file.txt > tmp
This will be in a for loop. Why is it not working?
Variables inside ' don't get substituted in Bash. To get string substitution (or interpolation, if you're familiar with Perl) you would need to change it to use double quotes " instead of the single quotes:
# Enclose the entire expression in double quotes
$ sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
# Or, concatenate strings with only variables inside double quotes
# This would restrict expansion to the relevant portion
# and prevent accidental expansion for !, backticks, etc.
$ sed 's/draw('"$prev_number"';n_)/draw('"$number"';n_)/g' file.txt > tmp
# A variable cannot contain arbitrary characters
# See link in the further reading section for details
$ a='foo
bar'
$ echo 'baz' | sed 's/baz/'"$a"'/g'
sed: -e expression #1, char 9: unterminated `s' command
Further Reading:
Difference between single and double quotes in Bash
Is it possible to escape regex metacharacters reliably with sed
Using different delimiters for sed substitute command
Unless you need it in a different file you can use the -i flag to change the file in place
Variables within single quotes are not expanded, but within double quotes they are. Use double quotes in this case.
sed "s/draw($prev_number;n_)/draw($number;n_)/g" file.txt > tmp
You could also make it work with eval, but don’t do that!!
This may help:
sed "s/draw($prev_number;n_)/draw($number;n_)/g"
You can use variables like below. Like here, I wanted to replace hostname i.e., a system variable in the file. I am looking for string look.me and replacing that whole line with look.me=<system_name>
sed -i "s/.*look.me.*/look.me=`hostname`/"
You can also store your system value in another variable and can use that variable for substitution.
host_var=`hostname`
sed -i "s/.*look.me.*/look.me=$host_var/"
Input file:
look.me=demonic
Output of file (assuming my system name is prod-cfm-frontend-1-usa-central-1):
look.me=prod-cfm-frontend-1-usa-central-1
I needed to input github tags from my release within github actions. So that on release it will automatically package up and push code to artifactory.
Here is how I did it. :)
- name: Invoke build
run: |
# Gets the Tag number from the release
TAGNUMBER=$(echo $GITHUB_REF | cut -d / -f 3)
# Setups a string to be used by sed
FINDANDREPLACE='s/${GITHUBACTIONSTAG}/'$(echo $TAGNUMBER)/
# Updates the setup.cfg file within version number
sed -i $FINDANDREPLACE setup.cfg
# Installs prerequisites and pushes
pip install -r requirements-dev.txt
invoke build
Retrospectively I wish I did this in python with tests. However it was fun todo some bash.
Another variant, using printf:
SED_EXPR="$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)"
sed "${SED_EXPR}" file.txt
or in one line:
sed "$(printf -- 's/draw(%s;n_)/draw(%s;n_)/g' $prev_number $number)" file.txt
Using printf to build the replacement expression should be safe against all kinds of weird things, which is why I like this variant.

how to use sed command to enclose a variable value in single quotes

I set the env variable like this
setenv ENV {"a":{"b":"http://c","d":"http://e"}}
I use sed like this
sed 's|(ENV)|('"$ENV"')|' aFile
It replaces (ENV) with the following
(a:b:http://c a:d:http://e)
However if I set the variable like this within single quotes
setenv ENV '{"a":{"b":"http://c","d":"http://e"}}'
It replaces like this
({"a":{"b":"http://c","d":"http://e"}})
But I would like the output of sed for (ENV) to be
('{"a":{"b":"http://c","d":"http://e"}}')
First of all, it's important to quote (or escape) your variable on definition (with setenv), as explained in this answer, since braces and double quotes (among others) are metacharacters in C shell.
You can quote the complete value with single quotes:
$ setenv ABC '{"a":{"b":"http://c","d":"http://e"}}'
$ echo "$ABC"
{"a":{"b":"http://c","d":"http://e"}}
Or, escape metacharacters with backslashes, like this:
$ setenv ABC \{\"a\":\{\"b\":\"http://c\",\"d\":\"http://e\"\}\}
$ echo "$ABC"
{"a":{"b":"http://c","d":"http://e"}}
Now, for the sed part of your question -- and how to quote the variable on replacement.
Just add single quotes in your sed replacement expression and you're done:
$ setenv ABC '{"a":{"b":"http://c","d":"http://e"}}'
$ echo '(ABC)' | sed 's|(ABC)|('"'$ABC'"')|'
('{"a":{"b":"http://c","d":"http://e"}}')
Note, in addition, you should also take care of escaping sed replacement metacharacters (these three only: \, /, and &) in your variable. You seem aware of this, since you already use | as a delimiter in you regex, but if you are unsure of what your variable contains, it's always good to escape it. For details on reliable escaping metacharacters in sed, see this answer. In short, to escape a single-line replacement pattern, run it through sed 's/[&/\]/\\&/g'.

Find a string between 2 other strings in document

I have found a ton of solutions do do what I want with only one exception.
I need to search a .html document and pull a string.
The line containing the string will look like this (1 line, no newlines)
<script type="text/javascript">g_initHeader(0);LiveSearch.attach(ge('oh2345v5ks'));var _ = g_items;_[60]={icon:'INV_Chest_Leather_09',name_enus:'Layered Tunic'};_[6076]={icon:'INV_Pants_11',name_enus:'Tapered Pants'};_[3070]={icon:'INV_Misc_Cape_01',name_enus:'Ensign Cloak'};</script>
The text I need to get is
INV_CHEST_LEATHER_09
When I use awk, grep, and sed, I extract the data between icon:' and ',name_
The problem is, all three of these scripts scan the entire line and use the last occurring ',name_ thus I end up with
INV_Chest_Leather_09',name_enus:'Layered
Tunic'};_[6076]={icon:'INV_Pants_11',name_enus:'Tapered
Pants'};_[3070]={icon:'INV_Misc_Cape_01
Here's the last one I tried
grep -Po -m 1 "(?<=]={icon:').*(?=',name_)"
I've tried awk and sed too, and I don't really have a preference of which one to use.
So basically, I need to search the entire html file, find the first occurrence of icon:', extract the text right after it until the first occurrence after icon:' of ',name_.
With GNU awk for the 3rd arg to match():
$ awk 'match($0,/icon:\047([^\047]+)/,a){print a[1]}' file
INV_Chest_Leather_09
Simple perl approach:
perl -ne 'print "$1\n" if /\bicon:\047([^\047]+)/' file
The output:
INV_Chest_Leather_09
The .* in your regular expression is a greedy matcher, so the pattern will match till the end of the string and then backtrack to match the ,name_ portion. You could try replacing the .* with something like [^,]* (i.e. match anything except comma):
grep -Po -m 1 "(?<=]={icon:')[^,]*(?=',name_)"

Replacing HTML tag content using sed

I'm trying to replace the content of some HTML tags in an HTML page using sed in a bash script. For some reason I'm not getting the proper result as it's not replacing anything. It has to be something very simple/stupid im overlooking, anyone care to help me out?
HTML to search/replace in:
Unlocked <span id="unlockedCount"></span>/<span id="totalCount"></span> achievements for <span id="totalPoints"></span> points.
sed command used:
cat index.html | sed -i -e "s/\<span id\=\"unlockedCount\"\>([0-9]\{0,\})\<\/span\>/${unlockedCount}/g" index.html
The point of this is to parse the HTML page and update the figures according to some external data. For a first run, the contents of the tags will be empty, after that they will be filled.
EDIT:
I ended up using a combination of the answers which resulted in the following code:
sed -i -e 's|<span id="unlockedCount">\([0-9]\{0,\}\)</span>|<span id="unlockedCount">'"${unlockedCount}"'</span>|g' index.html
Many thanks to #Sorpigal, #tripleee, #classic for the help!
Try this:
sed -i -e "s/\(<span id=\"unlockedCount\">\)\(<\/span>\)/\1${unlockedCount}\2/g" index.html
What you say you want to do is not what you're telling sed to do.
You want to insert a number into a tag or replace it if present. What you're trying to tell sed to do is to replace a span tag and its contents, if any or a number, with the value of in a shell variable.
You're also employing a lot of complex, annoying and erorr-prone escape sequences which are just not necessary.
Here's what you want:
sed -r -i -e 's|<span id="unlockedCount">([0-9]{0,})</span>|<span id="unlockedCount">'"${unlockedCount}"'</span>|g' index.html
Note the differences:
Added -r to turn on extended expressions without which your capture pattern would not work.
Used | instead of / as the delimiter for the substitution so that escaping / would not be necessary.
Single-quoted the sed expression so that escaping things inside it from the shell would not be necessary.
Included the matched span tag in the replacement section so that it would not get deleted.
In order to expand the unlockedCount variable, closed the single-quoted expression, then later re-opened it.
Omitted cat | which was useless here.
I also used double quotes around the shell variable expansion, because this is good practice but if it contains no spaces this is not really necessary.
It was not, strictly speaking, necessary for me to add -r. Plain old sed will work if you say \([0-9]\{0,\}\), but the idea here was to simplify.
sed -i -e 's%<span id="unlockedCount">([0-9]*)</span\>/'"${unlockedCount}/g" index.html
I removed the Useless Use of Cat, took out a bunch of unnecessary backslashes, added single quotes around the regex to protect it from shell expansion, and fixed the repetition operator. You might still need to backslash the grouping parentheses; my sed, at least, wants \(...\).
Note the use of single and double quotes next to each other. Single quotes protect against shell expansion, so you can't use them around "${unlockedCount}" where you do want the shell to interpolate the variable.