Perl: Changing html element Label by ID - html

How does one with perl change an html element's value using the html element's ID. How do you do this in the code.
For example:
<div id="container">
<form id="form" action="../code.cgi" method="post">
<label id="lblMessage" class="text">Message</label>
</form>
</div>
How would I change the label's text in my perl script?

Get an HTML or XML parser find the element (e.g., by XPath expression), and remove all its child text nodes, and add a new text node with the text you want.
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_html(location => 'myfile.html');
my $xpath = XML::LibXML::XPathContext->new($dom);
foreach my $label ($xpath->findnodes('//label[#id="lblMessage"]')) {
$label->removeChildNodes();
$label->addChild($dom->createTextNode("new text"));
}
Caveat: if there are other nodes (elements, like <b> or <span>) in your label, these get removed as well.
You probably need to add some code to write the modified html back to a file.

I, personally like HTML::TreeBuilder for this sort of task.
use HTML::TreeBuilder;
my $html = <<END;
<div id="container">
<form id="form" action="../code.cgi" method="post">
<label id="lblMessage" class="text">Message</label>
</form>
</div>
END
my $root = HTML::TreeBuilder->new_from_content( $html );
$root->elementify(); # Become a tree of HTML::Element objects.
my $message = $root->find_by_attribute( 'id', 'lblMessage' )
or die "No such element";
$message->delete_content();
$message->push_content('I am new stuff');
print $root->as_HTML();

Related

Delete an HTML element containing a pattern

How can I delete elements (from <span> to </span>) whose text contain PATTERN in it? The contents of the element should be deleted along with the element.
For example, I want to delete the first <span>...</span> element in the following:
<span><SPAN>some text with
with </SPAN> a PATTERNin it etc</span><span><SPAN>some text
without </SPAN> a thingIn it etc</span>
to produce, using SED only :
<span><SPAN>some text
without </SPAN> a thingIn it etc</span>
PS: No help with end of lines or solo words, it must just detect any <span>...</span> and PATTERN.
Production server only allow basic commands such as SED.
I'm currently using the following but it's ugly and doesn't seem to work.
sed '/<span.*\n.*PATTERN.*<\/span>/d'
If HTML:
perl -MXML::LibXML -e'
my $parser = XML::LibXML->new();
my $doc = $parser->parse_html_file($ARGV[0]);
$_->unbindNode()
for $doc->findnodes(q{//span[contains(text(), "PATTERN")]});
binmode(STDOUT);
print($doc->toString());
' in.html >out.html
If XHTML:
perl -MXML::LibXML -e'
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($ARGV[0]);
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs( h => "http://www.w3.org/1999/xhtml" );
$_->unbindNode()
for $xpc->findnodes(q{//h:span[contains(text(), "PATTERN")]}, $doc);
binmode(STDOUT);
print($doc->toString());
' in.xhtml >out.xhtml
The above both produce the following (with some implied elements vivified):
<span><SPAN>some text
without </SPAN> a thingIn it etc</span>

Finding a <div> block with an 'id' and 'class' using Nokogiri

How can I search for the following block using Nokogiri:
<div id="live_list_cat_16" class="football-block sport-block" style="display:block;">
</div>
Try this
doc.search('div#foo.bar')
How does this work?
search and at method both accept CSS queries
div#foo finds a div with id foo
div.bar finds a div with class bar
You can use #some_id as the CSS selector.
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<div id="foo" class="bar">text</div>
<div id="foo2" class="bar">more_text</div>
</body>
</html>
EOT
doc.search('#foo').to_html # => "<div id=\"foo\" class=\"bar\">text</div>"
doc.search('div.bar').to_html # => "<div id=\"foo\" class=\"bar\">text</div><div id=\"foo2\" class=\"bar\">more_text</div>"
Remember, a particular ID is only allowed to exist once in the document.

Make html text not transform into html

So I have made a guestbook (http://guestbook.ssdfkx.ch) which has a bug I can't get rid of myself. When you submit an entry and write in HTML text, it is converted into HTML and not plain text. This leads to the problem that one person can mess up the whole website in seconds.
I have tried it with the <plaintext> tag. But if I do so, even when I close the tag again, everything from the tag down turns into plain text.
Help is appreciated. The following is my code:
while ($row = mysqli_fetch_object($ergebnis)) {
$message = $row->message;
$name = $row->name;
$date = $row->date;
$id = $row->id;
$datetime = new DateTime($date);
$formatteddate = $datetime->format('d.m.Y');
$formattedmessage = nl2br($message);
if ($_SESSION['logged_in'] == true) {
$entryfeld = '<article>
<div>
<main>
<div class="innerdiv">
<p>'.$formattedmessage.'</p>
</div>
</main>
<div class="innerleft">
<form method="POST">
<input name="id" type="hidden" value="'. $id . '"/>
<input name="löschen" class="deletebutton" id="deletebutton" value="Löschen" type="submit"> </form>
<br/>
<p id="innerleftp">'.$name.'</p>
</div>
<div class="innerrightdel">
<p>'.$formatteddate.'</p>
</div>
</div>
</article>';
EDIT: Well, the variable $formattedmessage is what the user enters. If the user enters HTML it actually converts it which should not be happening. I tried using the <plaintext> tag before and after the variable. It somehow changed everything after the variable into plain text and not only the user input.

Knockout Html binding

I'd bound my model &lt ; input type=&quot ; radio &quot ; name=&quot ;&quot ; &gt ; with the view model of razor engine file.
** I've separated the tags just to show how the html codes were saved in the original format.
When the above data get's displayed over the view, the data is shown as "< input type="radio" name="" />".
The knockout tag that I use at the cshtml page is "<label data-bind = "html: XYZ"></label>".
I want to know why the above data reflects as a string message rather than an html control?
If the string you have provided is correct, I would say it is because of the whitespace between < and input. Make it <input. Secondly, depending on how you are quoting within the html to be rendered, you'll need to escape some of the " or change them to '.
Viewmodel
vm={
html1: "C<input type=\"radio\" name=\"\" />",
html2: "D<input type='radio' name='' />",
html3: "Your example has whitespace causing it to be invalid HTML: < input type='radio' name=''/>"}
ko.applyBindings(vm)
Html
<body>
With escaped double quotes <label data-bind="html: html1"></label>
<br/>
With single qoutes <label data-bind="html: html2"></label>
<br/>
<label data-bind="html: html3"></label>
</body>
See this fiddle for working example of the above.

WWW::Mechanize::Firefox How do you extract the text within HTML element tags?

Good Day,
How do you print the text of an HTML tag with WWW::Mechanize::Firefox?
I have tried:
print $_->text, '/n' for $mech->selector('td.dataCell');
print $_->text(), '/n' for $mech->selector('td.dataCell');
print $_->{text}, '/n' for $mech->selector('td.dataCell');
print $_->content, '/n' for $mech->selector('td.dataCell');
Remember I do not want {innerhtml}, but that does work btw.
print $_->{text}, '/n' for $mech->selector('td.dataCell');
The above line does work, but output is just multiple /n
my $node = $mech->xpath('//td[#class="dataCell"]/text()');
print $node->{nodeValue};
Note that if you're retrieving text interspersed with other tags, like "Test_1" and "Test_3" in this example...
<html>
<body>
<form name="input" action="demo_form_action.asp" method="get">
<input name="testRadioButton" value="test 1" type="radio">Test_1<br>
<input name="testRadioButton" value="test 3" type="radio">Test_3<br>
<input value="Submit" type="submit">
</form>
</body>
</html>
You need to refer to them by their position within the tag (taking any newlines into account):
$node = $self->{mech}->xpath("//form/text()[2]", single=>1);
print $node->{nodeValue};
Which prints "Test_1".
I would do :
print $mech->xpath('//td[#class="dataCell"]/text()');
using a xpath expression
The only solution I have is to use:
my $element = $mech->selector('td.dataCell');
my $string = $element->{innerHTML};
And then formatting the html within each dataCell
Either:
$element->{textContent};
or
$element->{innerText};
will work.