Convert DOMXpath to an HTML string - html

Is there way to convert the DOMXpath object back to HTML? I would like to replace one section of the HTML
<div class='original'>Stuff</div>
replaced with:
<div class='replacement'>New Stuff</div>
and then return it back to a valid Xpath. I know that the function DOMDocument::saveHTML exists, but if I do
XPATH->saveHTML();
I get an error. Any advice would be appreciated.

Looks like an XY problem. DOMXPath always works on a DOMDocument instance, so you should always save the DOMDocument instead. See a working demo example below :
<?php
$xml = "<parent><div class='original'>Stuff</div></parent>";
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
//get element to be replaced
$old = $xpath->query("/parent/div")->item(0);
//create new element for replacement
$new = $doc->createElement("div");
$new->setAttribute("class", "replacement");
$new->nodeValue = "New Stuff";
//replace old element with the new one
$old->parentNode->replaceChild($new, $old);
//TODO: save the modified HTML instead of echo
echo $doc->saveHTML();
?>
eval.in demo
output :
<parent><div class="replacement">New Stuff</div></parent>

Related

Json parsing fails out of the blue

So I've been struggling with JSON for awhile now, however last night something weird happened, even tho I have " escaped it brings up an error, here's my JSON string
var data = $.parseJSON('{"rows":[{"type":"row","width_class":"row new_row","column_class":"col3 column_model","columns":{"0":{"class":"column one","children":[]},"1":{"class":"column one","children":[{"type":"bullet-block","html":"<div class=\\"bullet-block-element\\"><ul><li style='padding-left:36px;background-image:url(\\"http://example.com/includes/images/bulletins/large-0.png\\");'>123</li><li style='padding-left:36px;background-image:url(\\"http://example.com/includes/images/bulletins/large-0.png\\");'>456</li><li style='padding-left:36px;background-image:url(\\"http://example.com/includes/images/bulletins/large-0.png\\");'>789</li></ul></div>","image":"http://example.com/includes/images/bulletins/large-0.png","size":"large","items":["123","456","789"]}]},"2":{"class":"column one","children":[]}}}]}');
This is generated via
var data = $.parseJSON('<?= str_replace('\\','\\\\',base64_decode($data['d'])) ?>');
Am I just being blind or have I had too much redbull? Help would be appreciated!
json_encode does the escaping and it will automatically be exposed as JSON, you don't need $.parseJSON, it's double decoding there.
Simply use this:
<?php
$php = array('test' => 'hi');
$data['d'] = base64_encode(json_encode($php)); // 'eyJ0ZXN0IjoiaGkifQ=='
?>
<script>
var data = <?php echo base64_decode($data['d']); ?>;
console.debug(data.test); // Prints 'hi' in the console ;-)
</script>
See the codepad: http://codepad.org/VmKGt0JD
you need to escape the ''s as well (styles in the html tags)
so this will work
var data = $.parseJSON('{"rows":[{"type":"row","width_class":"row new_row","column_class":"col3 column_model","columns":{"0":{"class":"column one","children":[]},"1":{"class":"column one","children":[{"type":"bullet-block","html":"<div class=\\"bullet-block-element\\"><ul><li style=\'padding-left:36px;background-image:url(\\"http://example.com/includes/images/bulletins/large-0.png\\");\'>123</li><li style=\'padding-left:36px;background-image:url(\\"http://example.com/includes/images/bulletins/large-0.png\\");\'>456</li><li style=\'padding-left:36px;background-image:url(\\"http://example.com/includes/images/bulletins/large-0.png\\");\'>789</li></ul></div>","image":"http://example.com/includes/images/bulletins/large-0.png","size":"large","items":["123","456","789"]}]},"2":{"class":"column one","children":[]}}}]}');
copied and pasted to a fiddle

DOMDocument issues: Escaping attributes and removing tags from javascript

I am not fan of DOMDocument because I believe it is not very good for real world usages. Yet in current project I need to replace all texts in a page (which I don't have access to source code) with other strings (some sort of translation); so I need to use it.
I tried doing this with DOMDocument and I didn't received the expected result. Here is the code I use:
function Translate_DoHTML($body, $replaceArray){
if ($replaceArray && is_array($replaceArray) && count($replaceArray) > 0){
$body2 = mb_convert_encoding($body, 'HTML-ENTITIES', "UTF-8");
$doc = new DOMDocument();
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->strictErrorChecking = false;
if (#$doc->loadHTML($body2)){
Translate_DoHTML_Process($doc, $replaceArray);
$body = $doc->saveHTML();
}
}
return $body;
}
function Translate_DoHTML_Process($node, $replaceRules){
if($node->hasChildNodes()) {
$nodes = array();
foreach ($node->childNodes as $childNode)
$nodes[] = $childNode;
foreach ($nodes as $childNode)
if ($childNode instanceof DOMText) {
if (trim($childNode->wholeText)){
$text = str_ireplace(array_keys($replaceRules), array_values($replaceRules), $childNode->wholeText);
$node->replaceChild(new DOMText($text),$childNode);
}
}else
Translate_DoHTML_Process($childNode, $replaceRules);
}
}
And here are the problems:
Escaping attributes: There are data-X attributes in file that become escaped. This is not a major problem but it would be great if I could disable this behavior.
Before DOM:
data-link-content=" <a class="submenuitem" href=&quot
After DOM:
data-link-content=' <a class="submenuitem" href="
Removing of closing tags in javascript:
This is actually the main problem for me here. I don't know for what reason in the world DOMDocument may see any need to remove these tags. But it do. As you can clearly see in below example it remove closing tags in java-script string. It also removed last part of script. It seems like DOMDocument parse the java-script inside. Maybe because there is no CDATA tag? But any way it is HTML and we don't need CDDATA in HTML. I thought CDATA is for xHTML. Also I have no way to add CDDATA here. So can I ask it to not parse script tags?
Before DOM:
<script type="text/javascript"> document.write('<video src="http://x.webm"><p>You will need to Install the latest Flash plugin to view this page properly.</p></video>'); </script>
After DOM:
<script type="text/javascript"> document.write('<video src="http://x.webm"><p>You will need to <a href="http://www.adobe.com/go/getflashplayer" target="_blank">Install the latest Flash plugin to view this page properly.</script>
If there is no way for me to prevent these things, is there any way that I can port this code to SimpleHTMLDOM?
Thanks you very much.
Try this , and replace line content ;
$body2 = mb_convert_encoding($body, 'HTML-ENTITIES', "UTF-8");
to ;
$body2 = convertor($body);
and insert in your code ;
function convertor($ToConvert)
{
$FromConvert = html_entity_decode($ToConvert,ENT_QUOTES,'ISO-8859-1');
$Convert = mb_convert_encoding($FromConvert, "ISO-8859-1", "UTF-8");
return ltrim($Convert);
}
But use the right encoding in the context.
Have a nice day.
Based on my search, reason of the second problem is actually what "Alex" told us in this question: DOM parser that allows HTML5-style </ in <script> tag
But based on their research there is no good parser out there capable of understanding today's HTML. Also, html5lib's last update was 2 years ago and it failed to work in real world situations based on my tests.
So I had only one way to solve the second problem. RegEx. Here is the code I use:
function Translate_DoHTML_GetScripts($body){
$res = array();
if (preg_match_all('/<script\b[^>]*>([\s\S]*?)<\/script>/m', $body, $matches) && is_array($matches) && isset($matches[0])){
foreach ($matches[0] as $key => $match)
$res["<!-- __SCRIPTBUGFIXER_PLACEHOLDER".$key."__ -->"] = $match;
$body = str_ireplace(array_values($res), array_keys($res), $body);
}
return array('Body' => $body, 'Scripts' => $res);
}
function Translate_DoHTML_SetScripts($body, $scripts){
return str_ireplace(array_keys($scripts), array_values($scripts), $body);
}
Using above two functions I will remove any script from HTML so I can use DomDocument to do my works. Then again at the end, I will add them back exactly where they were.
Yet I am not sure if regex is fast enough for this.
And don't tell me to not use RegEx for HTML. I know that HTML is not a regular language and so on; but if you read the problem your self, you will suggest the same approach.

How to remove an HTML tag with PHPQuery?

Update1: With the full source code:
$html1 = '<div class="pubanunciomrec" style="background:#FFFFFF;"><script type="text/javascript"><!--
google_ad_slot = "9853257829";
google_ad_width = 300;
google_ad_height = 250;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></div>';
$doc = phpQuery::newDocument($html1);
$html1 = $doc->remove('script');
echo $html1;
The source code is this the above. I have also read that exists a bug, http://code.google.com/p/phpquery/issues/detail?id=150 I don't know if it is solved.
Any clues on how to remove the <script> from this HTML?
Best Regards,
Hi,
I need to remove all <script> tags from a HTML document using PhpQuery.
I have done the following:
$doc = phpQuery::newDocument($html);
$html = $doc['script']->remove();
echo $html;
It is not removing the <script> tags and contents. It is possible to do this with PhpQuery?
Best Regards,
This works:
$html->find('script')->remove();
echo $html;
This doesn't work:
$html = $html->find('script')->remove();
echo $html;
From the documentation it looks like you would do this:
$doc->remove('script');
http://code.google.com/p/phpquery/wiki/Manipulation#Removing
EDIT:
Looks like there's a bug in PHPQuery, this works instead:
$doc->find('script')->remove();
I was hoping something simple like this would work
pq('td[colspan="2"]')->remove('b');
Unfortunately it did not work as I hoped.
I ran across this stackoverflow and tried what was mentioned without success.
This is what worked for me.
$doc = phpQuery::newDocumentHTML($html);
// used newDocumentHTML and stored it's return into $doc
$doc['td[colspan="2"] b']->remove();
// Used the $doc var to call remove() on the elements I did not want from the DOM
// In this instance I wanted to remove all bold text from the td with a colspan of 2
$d = pq('td[colspan="2"]');
// Created my selection from the current DOM which has the elements removed earlier
echo pq($d)->text();
// Rewrap $d into PHPquery and call what ever function you want

embedding html created by a perl method, in the place it is being called upon

I am very new to perl coding,
I am calling a method, which again calls some other method and then generate an html code. I need to embed the htmlcode in my current code, so as to add that to the current html code.
I am calling the method like this
my $test = $frek->xyz();
where xyz generated an html.
now i need to embed the $test in my html, but not finding the way out.
PLease help
Its not entirely clear what you want, but maybe it's a
heredoc content, or
something based on HTML::Mason
something based on HTML::Template
what would answer your question. What exactly do you try do do?
Can you give a specific example?
Addendum
After reading another of your comments, I think I got what you are trying to accomplish.
Lets imagine we have a Perl class 'MyClass' that contains a method xyz():
package MyClass;
sub new {
my $class = shift;
my $self = { x => shift, y => shift, z => shift };
bless $self, $class;
return $self
}
sub xyz { # <== here we go
my ($self) = #_;
return $self->{x} * $self->{y} * $self->{z}
}
1;
If your Perl program (e.g. cgitest.pl) works as a simple CGI-script
from a cgi-bin directory, it would look like this:
#!/usr/bin/perl
use strict;
# here we have html included in source
my $html = q{
<html>
<head></head>
<body>
<h1>Test</h1>
<div id='test_results'> #{$test}# </div>
</body>
</html>
};
use MyClass; # lets hope it'll be found
my $frek = new MyClass(10,10,10); # create instance
my $test = $frek->xyz(); # get value
$html =~ s/#{(\$\w+)}#/$1/eeg; # now replace #{$test}# in html by $test
print "Content-type: text/html\n\n"; # output modified html to browser
print $html;
This would replace the marker #{$var}# by the value
of the actual $var and print the resulting html.
Note the (double) /ee after the substitution pattern.
But then, if your web site is a Mason site, your test.html simply looks like:
<h1>Test</h1>
<div id='test_results'> <% $test %> </div>
<!-- Perl initialization code goes below -->
<%init>
use MyClass;
my $frek = new MyClass(10,10,10);
my $test = $frek->xyz();
</%init>
which can be written similar with a %perl code block :
<h1>Test</h1>
<%perl>
use MyClass;
my $frek = new MyClass(10,10,10);
my $test = $frek->xyz();
</%perl>
<div id='test_results'> <% $test %> </div>
but now, you have intermingled html parts and Perl parts, whereas in
the example above, all Perl code goes below the html. If your Web-
Server is properly configured for HTML::Mason, it will handle
either of them fine. Mason is available for Windows, Unix and whatever
systems there are.
Regards
rbo

Sending values through links

Here is the situation: I have 2 pages.
What I want is to have a number of text links(<a href="">) on page 1 all directing to page 2, but I want each link to send a different value.
On page 2 I want to show that value like this:
Hello you clicked {value}
Another point to take into account is that I can't use any php in this situation, just html.
Can you use any scripting? Something like Javascript. If you can, then pass the values along in the query string (just add a "?ValueName=Value") to the end of your links. Then on the target page retrieve the query string value. The following site shows how to parse it out: Parsing the Query String.
Here's the Javascript code you would need:
var qs = new Querystring();
var v1 = qs.get("ValueName")
From there you should be able to work with the passed value.
Javascript can get it. Say, you're trying to get the querystring value from this url: http://foo.com/default.html?foo=bar
var tabvalue = getQueryVariable("foo");
function getQueryVariable(variable)
{
var query = window.location.search.substring(1);
var vars = query.split("&");
for (var i=0;i<vars.length;i++)
{
var pair = vars[i].split("=");
if (pair[0] == variable)
{
return pair[1];
}
}
}
** Not 100% certain if my JS code here is correct, as I didn't test it.
You might be able to accomplish this using HTML Anchors.
http://www.w3schools.com/HTML/html_links.asp
Append your data to the HREF tag of your links ad use javascript on second page to parse the URL and display wathever you want
http://java-programming.suite101.com/article.cfm/how_to_get_url_parts_in_javascript
It's not clean, but it should work.
Use document.location.search and split()
http://www.example.com/example.html?argument=value
var queryString = document.location.search();
var parts = queryString.split('=');
document.write(parts[0]); // The argument name
document.write(parts[1]); // The value
Hope it helps
Well this is pretty basic with javascript, but if you want more of this and more advanced stuff you should really look into php for instance. Using php it's easy to get variables from one page to another, here's an example:
the url:
localhost/index.php?myvar=Hello World
You can then access myvar in index.php using this bit of code:
$myvar =$_GET['myvar'];
Ok thanks for all your replies, i'll take a look if i can find a way to use the scripts.
It's really annoying since i have to work around a CMS, because in the CMS, all pages are created with a Wysiwyg editor which tend to filter out unrecognized tags/scripts.
Edit: Ok it seems that the damn wysiwyg editor only recognizes html tags... (as expected)
Using php
<?
$passthis = "See you on the other side";
echo '<form action="whereyouwantittogo.php" target="_blank" method="post">'.
'<input type="text" name="passthis1" value="'.
$passthis .' " /> '.
'<button type="Submit" value="Submit" >Submit</button>'.
'</form>';
?>
The script for the page you would like to pass the info to:
<?
$thispassed = $_POST['passthis1'];
echo '<textarea>'. $thispassed .'</textarea>';
echo $thispassed;
?>
Use this two codes on seperate pages with the latter at whereyouwantittogo.php and you should be in business.