PHP: Inject iframe right after body tag - html

I would like to place an iframe right below the start of the body tag. This has some issues since the body tag can have various attributes and odd whitespace. My guess is this will will require regular expressions to do correctly.
EDIT: This solution has to work with php 4 & performance is a concern of mine. It's for this http://drupal.org/node/586210#comment-2567398

You can use DOMDocument and friends. Assuming you have a variable html containing the existing HTML document as a string, the basic code is:
$doc = new DOMDocument();
$doc->loadHTML(html);
$body = $doc->getElementsByTagName('body')->item(0);
$iframe = $doc->createElement('iframe');
$body->insertBefore($iframe, $body->firstChild);
To retrieve the modified HTML text, use
$html = $doc->saveHTML();
EDIT: For PHP4, you can try DOM XML.

Both PHP 4 and PHP 5 should be happy with preg_split():
/* split the string contained in $html in three parts:
* everything before the <body> tag
* the body tag with any attributes in it
* everything following the body tag
*/
$matches = preg_split('/(<body.*?>)/i', $html, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
/* assemble the HTML output back with the iframe code in it */
$injectedHTML = $matches[0] . $matches[1] . $iframeCode . $matches[2];

Using regular expressions brings up performance concerns... This is what I'm going for
<?php
$html = file_get_contents('http://www.yahoo.com/');
$start = stripos($html, '<body');
$end = stripos($html, '>', $start);
$body = substr_replace($html, '<IFRAME INSERT>', $end+1, 0);
echo htmlentities($body);
?>
Thoughts?

Related

How to keep metadata fields on one line

I am trying to customize the metadata for my posts. Specifically, I want to change the separator from the default forward slash (/) to a vertical line (|). I also want to add the word "Updated" before the date displayed. And, I want to keep the option to display reading time. (*FYI: I know basically nothing about coding, just trying to override the metadata output from Astra)
I used this code to change the separator (found in a post about how to customize post meta in Astra theme):
add_filter('astra_single_post_meta', 'custom_post_meta');
function custom_post_meta($old_meta)
{
$post_meta = astra_get_option('blog-single-meta');
if (!$post_meta) return $old_meta;
$new_output = astra_get_post_meta($post_meta, "|");
if (!$new_output) return $old_meta;
return "<div class='entry-meta'>$new_output</div>";
}
See the output in image 1 - looks great!
output of new code for separator
This is close, but still needed to add "Updated" before the date. So, I used this code (from same post mentioned above):
function astra_post_date()
{
$format = apply_filters('astra_post_date_format', '');
$published = esc_html(get_the_date($format));
$modified = esc_html(get_the_modified_date($format));
$output = '<p class="posted-on">';
$output .= 'Updated: <span class="published" ';
$output .= 'itemprop="datePublished">' . $published;
$output .= '</span>';
$output .= '</p>';
return apply_filters('astra_post_date', $output);
}
See output in image 2 - content is perfect, but formatting is wrong. Can't figure out how to edit the code to get all 3 fields on the same line.
output of new code for adding "Update"
What do I need to change in the 2nd block of code to keep everything on one line like the 1st block of code?
Thanks for any help!

as_html in HTML::TagParser

I'm working in perl
I would like to ask if there is something like
$value->as_html()
from HTML::TreeBuilder in HTML::TagParser;
I extracted tag which I needed in HTML::TagParser, but now the only option is:
$value->innerText();
which give me only text without HTML tags
Or maybe can I somehow connect result from HTML::TagParser with HTML::TreeBuilder, and take my HTML tags like this?
The HTML::TagParser does not only read the element content. It also keeps the element name and the attribute key/value pairs for each selected element. Therefore you can easily reproduce the complete HTML code of the element.
Actually, the HTML::TagParser CPAN page contains an example for this: The following code extracts all <a>nchor tags from a web page and reproduces them into an HTML fragment listing precisely these tags.
my $url = 'http://www.kawa.net/xp/index-e.html';
my $html = HTML::TagParser->new( $url );
my #list = $html->getElementsByTagName( "a" );
foreach my $elem ( #list ) {
my $tagname = $elem->tagName;
my $attr = $elem->attributes;
my $text = $elem->innerText;
print "<$tagname";
foreach my $key ( sort keys %$attr ) {
print " $key=\"$attr->{$key}\"";
}
if ( $text eq "" ) {
print " />\n";
} else {
print ">$text</$tagname>\n";
}
}
This works pretty well for simple element scanning. For more complex tasks (e.g. mixed inner HTML content) I would prefer to work with HTML::Parser.

How to stop tags within a div from affecting elements outside a div?

So, on my website, I have user generated content. I have a wysiwyg editor, but there is a view source part. I have a few approved tags.
But it occurred to me, what if a user just puts in without closing it? Then the rest of the rest of the page.
How can I get around this.
I actaully wanted to tell you look how wordpress does this untill i tested it and found out that wordpress does not care ^^ i can break my page easy by open divs and not colosing them
anyway i found this.
/** * close all open xhtml tags at the end of the string
* * #param string $html
* #return string
* #author Milian <mail#mili.de>
*/function closetags($html) {
#put all opened tags into an array
preg_match_all('#<([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
$openedtags = $result[1]; #put all closed tags into an array
preg_match_all('#</([a-z]+)>#iU', $html, $result);
$closedtags = $result[1];
$len_opened = count($openedtags);
# all tags are closed
if (count($closedtags) == $len_opened) {
return $html;
}
$openedtags = array_reverse($openedtags);
# close tags
for ($i=0; $i < $len_opened; $i++) {
if (!in_array($openedtags[$i], $closedtags)){
$html .= '</'.$openedtags[$i].'>';
} else {
unset($closedtags[array_search($openedtags[$i], $closedtags)]); }
} return $html;}
this should be what you are looking for.
This is a very complex topic in general and there's no easy shortcut. Before you start reinventing the wheel, use a library that has been designed to do exactly that. Supposedly HTML Purifier is one of the few, if not the only, libraries that gets it right.

how to find all <p> tags under heading

I have to extract data from this link: http://bit.ly/l1rF5x
What I want to do is that I want to extract all p tags which comes under the <a> tag having attribute rel="bookmark". My only requirement is that only <p> tags which comes under this heading should be parsed, and remaining should be left as it is. Like for example in this page which I have given you, all <p> tags which comes under heading "IIFT question paper 2006", should be parsed.
help please.
You can try using the following :
$(function(){
var results= '';
$('a[rel="bookmark"] p').each(function(i,e){
results += $(e).html() + "\n";
});
alert(results);
});
Variable results will be alerted with the required content.
Example : http://jsfiddle.net/eGmWw/1/
Since you haven't provided any information about the language / environment you want to use to extract this information, I've gone ahead and hacked something together with jQuery.
(Updated) You can see it in action here: JS Fiddle.
If you wanted to use PHP, I recommend simplehtmldom
Here is an example using simplehtmldom:
$url = 'http://school-listing.mba4india.com/page/7/';
$html = file_get_html($url);
$data = array();
// Find all anchors with the desired rel attribute
foreach ($html->find('a[rel="bookmark"]') as $a) {
$h4 = $a->parent(); // Get the anchors parent (in this case an h4)
// We're assuming the next sibling is a p tag here - should test for this here
$p = $h4->next_sibling();
$content = '';
// Iterate over all following p tags, until we run out of siblings or find one
// that isn't a p tag
while ($p) {
$content .= (string) $p;
if ($p->next_sibling() && $p->next_sibling()->tag == 'p') {
$p = $p->next_sibling();
} else {
break;
}
}
$data[] = array('h4' => $h4, 'content' => $content);
}
$br = '<br/>';
foreach ($data as $datum) {
echo $datum['h4'] . $br . $datum['content'];
echo $br.$br;
}
Refer to Simplehtmldom Documentation for more!

Ignoring unclosed tags from another <div>?

I have a website where members can input text using a limited subset of HTML. When a page is displayed that contains a user's text, if they have any unclosed tags, the formatting "bleeds" across into the next area. For example, if the user entered:
Hi, my name is <b>John
Then, the rest of the page will be bold.
Ideally, there'd be someting I could do that would be this simple:
<div contained>Hi, my name is <b>John</div>
And no tags could bleed out of that div. Assuming there isn't anything this simple, how would I accomplish a similar effect? Or, is there something this easy?
Importantly, I do not want to validate the user's input and return an error if they have unclosed tags, since I want to provide the "easiest" user interface possible for my users.
Thanks!
i have solution for php
<?php
// close opened html tags
function closetags ( $html )
{
#put all opened tags into an array
preg_match_all ( "#<([a-z]+)( .*)?(?!/)>#iU", $html, $result );
$openedtags = $result[1];
#put all closed tags into an array
preg_match_all ( "#</([a-z]+)>#iU", $html, $result );
$closedtags = $result[1];
$len_opened = count ( $openedtags );
# all tags are closed
if( count ( $closedtags ) == $len_opened )
{
return $html;
}
$openedtags = array_reverse ( $openedtags );
# close tags
for( $i = 0; $i < $len_opened; $i++ )
{
if ( !in_array ( $openedtags[$i], $closedtags ) )
{
$html .= "</" . $openedtags[$i] . ">";
}
else
{
unset ( $closedtags[array_search ( $openedtags[$i], $closedtags)] );
}
}
return $html;
}
// close opened html tags
?>
you can use this function like
<?php echo closetags("your content <p>test test"); ?>
You can put the HTML snippet through Tidy, which will do its best to fix it. Many languages include it in some fashion or another, here for example PHP.
This can't be done.
Don't let users invalidate your HTML.
If you don't want to let users fix their errors, then try to clean it up automatically for them.
You can parse the data entered by the user. Thats what an XML does. You may need to parse or replace the standard html or xml symbols like '<', '>', '/', '&', etc... with '&lt', '&gt', etc...
In this way you can achieve whatever you want.
There is a way to do this using HTML and javascript. I wouldn't recommend this method for public-facing websites; you should clean your data before it reaches the browser. But it might be useful in other situations.
The idea is to put the potentially invalid content into a noscript tag, like this:
<noscript class="contained">
<div>Hi, my name is <b>John</div>
</noscript>
... and then add javascript that will load it into the DOM. Using jQuery (but probably not necessary):
$("noscript.contained").each(function () {
$(this).replaceWith(this.innerText);
});
Note that users without javascript will still experience the "bleeding" that you are trying to avoid.