I have index.html file
I want line# 88 which looks like this: <h1>Test Page 1</h1>
To be like this: <h1>Test Page 10</h1>
Tried basic procedures such as:
sed -i '88s/1/10' index.html
sed -i ‘88|\(.*\)| <h1>Test Page 10</h1>\1|' index.html
but seems like html tags needs different treatment?
I filled t.html with 100 lines of the same content (each line is just <h1>Test Page 1</h1>. For this demonstration I used nl t.html | grep 88; to show the 88th line. ( nl just numbers each line, and grep searches for a regular expression to match, but 88 just matches a literal 88). I run that at the beginning and the end of my command, to show line 88 before and after the change.
$ nl t.html | grep 88; sed -i -e '88 s/Page 1/Page 10/' t.html; nl t.html | grep 88
88 <h1>Test Page 1</h1>
88 <h1>Test Page 10</h1>
You have to be careful with regular expressions - if you just use s/1/10/ it will replace the 1 in the first h1 instead of the 1 in Page 1.
cat >> replace.ed << EOF
88s/e 1/e 10/
wq
EOF
ed -s index.html < replace.ed
rm -v ./replace.ed
Use
sed -E 's,(>[^<]*)1([^<]*<),\110\2,'
s,(>[^<]*)1([^<]*<),\110\2, will find > and any text other than <, then 1, then any text other than < up to and including <, and replaces the match with the text before 1, then 10, then the text after 1.
Nice little one liner:
cat data.txt | awk 'NR==2'| sed 's/Test Page 1/Test Page 10/'
So the AWK command uses NR, which is the total number of input lines seen so far. Change this to the specific line number in question. The sed command literally just swaps everything inbetween the first and last slashes.
Hopefully that helps :)
Related
I am trying to delete the text between <pre><\pre> html tags using:
sed -i '/<pre>/,/<\/pre>/d' file.html
But this deletes the <pre></pre> tags too. The are only one pre tag pair in the file.
How can I avoid to delete de pre tags?
Thanks.
This might work for you (GNU sed):
sed -n '/<pre>/{p;:a;n;/<\/pre>/!ba};p' file
Turn off implicit printing by using the -n option.
If a line contains <pre>, print it and fetch the next line.
If that line does not contain </pre> loop back and repeat.
Otherwise print all other lines.
I have tried multiple times to get digits between two html patterns.
Neither sed nor awk worked for me, since the examples in the internet were too easy to fit my task.
Here is the code I want to filter:
....class="a-size-base review-text">I WANT THIS TEXT</span></div> ....
So I would need a command that output: I WANT THIS TEXT between ...review-text"> and </span>
Do you have a clue? Thanks for the effort and greetings from Germany.
Here is the plain code
Try:
tr '\n' ' ' file.html | grep -o 'review-text">[^<>]*</span> *</div>' | cut -d'>' -f2 | cut -d'<' -f 1
It should work if there are no any tags inside "I WANT THIS TEXT"
I can't see the problem here supposing the text you want to extract doesn't contains < nor >.
For instance with POSIX REGEXP:
$ HTML_FILE=/tmp/myfile.html
$ sed -n "s/.*review-text.>\([^<]*\)<.*/\1/gp" $HTML_FILE
prints the text between HTML TAGS
I have a html text file and i want to format it so that paragraphs are always on the same line e.g.
<p>paragraph info here</p>
instead of
<p>paragraph
info here </p>
Is there a tool that enables me to do this
You can use sed
cat test.html |sed ':a;N;$!ba;s/\n/ /g' |sed 's/<\/p> /<\/p>\n/g'
In first run it remove all line break and then add it after paragraph tag
It is not clear but it work
While the requirement paragraphs are always on the same line would be met by simply joining the whole file to a single line, this solution is less radical:
perl -pe 'if (/<p>/../<\/p>/) { s/\n/ / unless /<\/p>/ }' test.html
My regex-fu is sadly lacking and though I am reading "Mastering Regex" and reading some online tutorials I am getting nowhere so hope perhaps if someone can give me a practical example for my situation it will help me to get started.
Input files look roughly like this:
<html>
<head>
<title>My Title</title>
</head>
<body>
<p>Various random text...</p>
<ul>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ul>
<p>Various random text...</p>
</body>
</html>
My eventual goal is to output:
My Title,One,Two,Three
e.g. A comma separated values with title, and the content of the li tags
First step though is to try to remove everything before and including title, so as I decided to use sed (I have GNU sed version 4.2 running on windows) I try as follows:
Figuring I need to match "everything" including newlines up to the title tag and replace with nothing that means:
match every character with a dot, and also newlines /n so make that a class and make it repeat with * which means [.\n]* followed by the title tag
replace with nothing
so
type file.html | sed "s/[.\n]*<title>//"
But this doesn't work, it just removes the string title but not the things before it.
Where am I going wrong? I want to understand.
Any advice appreciated. Thanks in advance.
Using sed (and tr, and sed...):
sed -n -e '/<title>\|<li>/{s/^[ ]*<[^>]*>//;s/<[^>]*>[ ]*$//p}' input | \
tr '\n' , | sed 's/,$/\n/'
Using a single sed expression:
sed ':a;N;$!ba;s/\n//g; # loop, read-in all file, remove newlines
s/.*<title>//; # remove everything up to, including <title>
s/title>.*<ul>/title>/; # remove everything between </title> and <ul>
s!</ul>.*!!; # remove everything after </ul>, inclusive
s!</li>\|</title>!,!g; # substitute closing tags with commas
s/<li>//g; # remove <li> tags
s/,[ ]*$// # delete the trailing comma
' input
A Ruby Solution
You can do what you want in a variety of ways, some more elegant than others. Here's a quick-and-dirty way to get your expected results with a single Ruby one-liner.
ruby -ne 'BEGIN { output = "" }
output << $1 + ?, if %r{<(?:title|li)>(.*)</\1?}
END { puts output.sub(/,$/, "") }' /tmp/foo.html
This script will print the result in the format described in the original question. For example, with the sample text provided it prints:
My Title,One,Two,Three
Hi I have following file
<strong>Ramandand Sagar Krishna part 34</strong> Vasudev comes back
and girl disappears from Kansa's hand and the first temple she instructs Devs to make at Vindhyachal <a href="http://www.dailymotion.com/embed/video/x3p3gu?
width=320&theme=none&wmode=transparent">http://www.dailymotion.com/embed/video/x3p3gu?width=320&theme=none&wmode=transparent</a> <a
href="http://www.dailymotion.com/video/x3p3gu_krishna-part-34_shortfilms"
target="_blank">Krishna Part 34</a> <strong>Ramandand Sagar Krishna part 35</strong> Celebrations at Yashoda's house and Vasudev Devki freed from jail <a href="http://www.dailymotion.com/embed/video/x3p3sg?width=320&theme=none&wmode=transparent">
http://www.dailymotion.com/embed/video/x3p3sg?width=320&theme=none&wmode=transparent</a> Krishna Part 35 Krishna 143</em></div>
In above file I want to replace
any HTML which is of following kind
http://www.dailymotion.com/embed/video/x5ftx3?width=320
the keyword is any HTML tag having wmode=transparent or width=320 should be replaced with a blank space.Is there an easy way to do so?There are many HTML tags like
which do not have wmode=transparent in their lines.
The file above posted is very very big approximately 30K lines are there in HTML so I have posted only relevant lines.
I am on a Ubuntu system.
As Sorpigal has pointed out, there is no simple answer to solve this. If your willing to destroy your line endings you could try my ugly concoction. It might help you:
cat file.txt | tr -d "\n" | awk '{ for (i=1; i<=NF; i++) if ($i !~ /wmode=transparent|width=320/) printf "%s ", $i} END {print ""}' file.txt | sed -e "s%<a <a%<a%g"
Output:
<strong>Ramandand Sagar Krishna part 34</strong> Vasudev comes back and girl disappears from Kansa's hand and the first temple she instructs Devs to make at Vindhyachal Krishna Part 34 <strong>Ramandand Sagar Krishna part 35</strong> Celebrations at Yashoda's house and Vasudev Devki freed from jail Krishna Part 35 Krishna 143</em></div>
I'm sure this one-liner could be improved in some way. If you do find this useful, you may then want to split the output on a boundary to tidy things up. Sed can be good for this.
here is a link where you can found answer for your question.
in your case you have to create a script file for sed like
s/wmode=transparent//g
s/width=320//g
and running something like that:
sed -f replace_file in.txt > out.txt
i hope it's helpful for you.
have a nice day