BASH-SHELL SCRIPT: Print part of the text from a file

BASH-SHELL SCRIPT: Print part of the text from a file - html

So i have a text file with part of html code:
>>nano wynik.txt
with text:
1743: < a href="/currencies/lisk/#markets" class="price" data-usd="24.6933" data-btc= "0.00146882"
and i want to print only: 24.6933
I tried the way with the cut command but it does not work. Can anyone give me a solution?

With GNU grep and Perl Compatible Regular Expressions:
grep -Po '(?<=data-usd=").*?(?=")' file
Output:
24.6933

Related

Bash: Content between two complex Patterns - html

I have tried multiple times to get digits between two html patterns.
Neither sed nor awk worked for me, since the examples in the internet were too easy to fit my task.
Here is the code I want to filter:
....class="a-size-base review-text">I WANT THIS TEXT</span></div> ....
So I would need a command that output: I WANT THIS TEXT between ...review-text"> and </span>
Do you have a clue? Thanks for the effort and greetings from Germany.
Here is the plain code

Try:
tr '\n' ' ' file.html | grep -o 'review-text">[^<>]*</span> *</div>' | cut -d'>' -f2 | cut -d'<' -f 1
It should work if there are no any tags inside "I WANT THIS TEXT"

I can't see the problem here supposing the text you want to extract doesn't contains < nor >.
For instance with POSIX REGEXP:
$ HTML_FILE=/tmp/myfile.html
$ sed -n "s/.*review-text.>\([^<]*\)<.*/\1/gp" $HTML_FILE
prints the text between HTML TAGS

Xidel extract data inside the tag -- raw output

Pleased to be member of StackOverflow, a long time lurker in here.
I need to parse text between two tags, so far I've found a wonderful tool called Xidel
I need to parse text in between
<div class="description">
Text. <tag>Also tags.</tag> More text.
</div>
However, said text can include HTML tags in it, and I want them to be printed out in raw format. So using a command like:
xidel --xquery '//div[#class="description"]' file.html
Gets me:
Text. Also tags. More text.
And I need it to be exactly as it is, so:
Text. <tag>Also tags.</tag> More text.
How can I achieve this?
Regards, R

Can be done in a couple of ways with Xidel, which is why I love it so much.
HTML-templating:
xidel -s file.html -e "<div class='description'>{inner-html()}</div>"
XPath:
xidel -s file.html -e "//div[#class='description']/inner-html()"
CSS:
xidel -s file.html -e "inner-html(css('div.description'))"
BTW, on Linux: swap the double quotes for single and vice versa.

You can show the tags by adding the --output-format=xml option.
xidel --xquery '//div[#class="description"]' --output-format=xml file.html

How to insert HTML code with awk?

I am trying to insert some HTML code for my Telldus script page. I want create a link-image. I have tried a lot of different alternatives for this code but nothing works.
AWK CODE to generate off.php
# awk '{print a$1b$1d}' a='' d='<img src='OFF.png'><br> ' off1.php > off.php
Result for off.php
# more off.php
Lampor<img src=OFF.png><br>
Result I want for off.php
<img src=OFF.png>Lampor<br>
So, I want the image to be a link instead of the word "Lampor".

this should do...
awk '{print "<img src=OFF.png>"$1"<br>"}' file

How to copy text between 2 html tags?

I want to copy all the text in a website between tags:
<p> and </p>
using bash.
Do you have an idea how to do it?

As the comment above states: don't even try. There is no reliable way to parse HTML with Bash internals.
But when you're using a shell you may as well use third-party command line tools such as pup which are built for HTML parsing on the command line.

Yes, an HTML parser is a better choice. But if you are just trying to grab the text in between the first set of P tags quickly, you can use Perl:
perl -n0e 'if (/<p>(.*?)<\/p>/s) { print $1; }'
For example:
echo "
<p>A test
here
today</p>
<p>whatever</p>
" | perl -n0e 'if (/<p>(.*?)<\/p>/s) { print $1; }'
This will output:
A test
here
today

How to indent html with xmllint?

I'm outputting html that's all crushed together, and would like to convert it to have proper indentation. I've been trying to use xmllint for this, but with no joy. E.g. when this is in file.html:
<table><tr><td><b>Foo</b></td></tr></table>
<table><tr><td>Bar</td></tr></table>
I get:
$ xmllint --format file.html
file.html:2: parser error : Extra content at the end of the document
<table><tr><td>Bar</td></tr></table>
^
<<< exit status [1] >>>
But when file.html contains either of those lines alone, it works fine (removing the second line):
$ xmllint --format file.html
<?xml version="1.0"?>
<table>
<tr>
<td>
<b>Foo</b>
</td>
</tr>
</table>
When i inlcude the --html option, it's more likely to run without errors, but then it doesn't indent.
Any suggestions? Are there any other (*nix) tools I can use for this? Thanks ...

As user 4M01 suggested: On the command line, append the pipe with a call to HTML tidy.
HTML output from xmllint will be repaired; tidy will wrap some reasonable ... around your html fragment.
xmllint --xpath "//tr[6]/td[7]" --html - | tidy -q

tidy -i sets the indent: auto config value. If instead of auto I set it to yes, I consistently got better indentation style:
tidy --indent yes

I think this is because the HTML you have supplied doesn't have a root tag, thus making it an invalid XML.
Try adding the body tag and run xmllint again on it.
<body><table><tr><td><b>Foo</b></td></tr></table>
<table><tr><td>Bar</td></tr></table></body>

Have you tried HTML Tidy ? More Information about this is available at W3 & sourceforge.Even there GUI tool available which known as GuiTidy . This tools are great , they not only help in proper indentation but also validate html code.
Hope this help

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

BASH-SHELL SCRIPT: Print part of the text from a file - html

So i have a text file with part of html code: >>nano wynik.txt with text: 1743: < a href="/currencies/lisk/#markets" class="price" data-usd="24.6933" data-btc= "0.00146882" and i want to print only: 24.6933 I tried the way with the cut command but it does not work. Can anyone give me a solution?

With GNU grep and Perl Compatible Regular Expressions: grep -Po '(?<=data-usd=").*?(?=")' file Output: 24.6933

Related

Bash: Content between two complex Patterns - html

Xidel extract data inside the tag -- raw output

How to insert HTML code with awk?

How to copy text between 2 html tags?

How to indent html with xmllint?

Categories

Resources