MSHTML appear on editing - html

I want to add another MSHTML question. Thanks to all responses.
We use in Delphi the standard TWebbrowser component, that uses mshtml.dll internally. Additionaly we use the registry to ensure that the pages renders with the new rendering engine (Web-Browser-Control-Specifying-the-IE-Version, MSDN: FEATURE_BROWSER_EMULATION). So we use the rendering of IE 10 but we have the same results with ie 8 to ie 11.
Using the standard rendering machine of MSHTML (IE7) works right, but due to new rendering options we need the new rendering of MSHTML.
We use the design mode of the control to enabled the user to make changes in the documents:
var
mDocument: IHTMLDocument2;
begin
mDocument := ((ASender as TWebBrowser).Document as IHTMLDocument2);
mDocument.designMode := 'on';
Now we have the following problem:
We load th following (simplified) HTML via the IPersistStreamInit.Load(...) into the WebBrowser:
<html>
<body>
What should I do
with some of the
spaces.
</body>
</html>
In the WebBrowser user can see the following:
Now, when selecting the word "with" in the WebBrowser in editing mode, and typing a character, some spaces appear. The HTML now has in it - exactly as many as there has been spaces in the HTML before editing:
The code is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv="Content-Type" content="text/html; charset=unicode">
<META name="GENERATOR" content="MSHTML 10.00.9200.16540"></HEAD>
<BODY> What should I do n some of the spaces.
</BODY></HTML>
The same effect can be achieved by replacing the word "spaces" in the WebBrowser.
This is a bad behaviour for users using our application.
Every HTML code with white spaces infront of text, has the same behaviour. The mess is, that MSHTML itself generates such HTML.
By now we think of a solution to remove all the spaces on the left of each line, but we think that such workarounds could end in a bigger mess, because they change the HTML. This could cause some different behaviour of the rendering.

Thinking about removing the spaces before each line, puts you somewhere in the right direction, but nowhere near what you should be doing: convert the data info HTML before IPersistStreamInit.Load.
Since the HTML specification prescribes any whitespace in the HTML code should be treated as a single instance of whitespace (except inside <pre> tags), it's understandable that IE's design-mode is confused what to with these extra spaces when you edit around them. You've stumbled upon a border case.
I suggest you either don't use IPersistStreamInit.Load
but Navigate('about:<html><body></body></html>'); and document.body.innerText:=... instead,
or take care to properly format the initial HTML:
parse the text to collapse any/all consecutive whitespace,
replace all & with &, < with < etc...
(perhaps also #13#10 with '<br />' and #13#10#13#10 with '</p><p>'?)

Related

Handlebars.net not preserving white space

I have the following template:
<!DOCTYPE html>
<html>
<body>
{{>Header}}
This is a test template.
{{>Footer}}
</body>
</html>
When I compile this template, I'd expect to get this:
<!DOCTYPE html>
<html>
<body>
This is a header.
This is a test template.
This is a footer.
</body>
</html>
Instead, what I get is this:
<!DOCTYPE html>
<html>
<body>
This is a header. This is a test template.
This is a footer.</body>
</html>
The indentations before the first and third lines are gone; and the newlines before the second line and closing body tag are gone. Is this expected, and is there a way to preserve the whitespace just as it is laid out in the base template? I should note that I'm using Handlebars.Net here, although my understanding is that it's meant to emulate the original Javascript spec as closely as possible.
(Answer from the Github issue where this was also posted):
So there's two different things going on here that I'll summarize first and then explain: 1) what you expect is incorrect; 2) what you are getting is also incorrect, in a different way:
Whitespace (and line breaks) are not significant in HTML, and though Handlebars is technically a general string templating language, the design decisions & opinions it contains are heavily slanted towards using it for an HTML templating language. You should not expect it to preserve implicit line breaks, only explicit line breaks (e.g. if you put \n it will preserve that)
Handlebars.Net actually DOES preserve some line breaks when it's not supposed to! That's a bug but one many users are currently relying on, so we'll keep it in 1.x but fix it in v2.
To get your desired output, put explicit line breaks in your template. Cheers!

How do I show actual HTML Code in textarea rather than rendered HTML?

I have a code that saves (html code) plus (some text) in mysql from textarea.
I then take the text from the mysql and display it under the textarea. The thing is if I save the code
<div style="color:red">Hello</div>
in mysql and then display it, I see Hello in red, but I want to see the actual
<div style="color:red">Hello</div>
to appear under the textarea. I hope you understand my problem.
so when you've grabbed the data from the database you want to actually display the html, rather than the page rendering the html?
if so just use the php function htmlentities();
You can use the xmp element, see What was the tag used for. It has been in HTML since the beginning and is supported by all browsers. Specifications frown upon it, but HTML5 CR still describes it and requires browsers to support it (though it also tells authors not to use it, but it cannot really prevent you).
Everything inside xmp is taken as such, no markup (tags or character references) is recognized there, except, for apparent reason, the end tag of the element itself, .
Otherwise xmp is rendered like pre.
When using “real XHTML”, i.e. XHTML served with an XML media type (which is rare), the special parsing rules do not apply, so xmp is treated like pre. But in “real XHTML”, you can use a CDATA section, which implies similar parsing rules. It has no special formatting, so you would probably want to wrap it inside a pre element:
<![CDATA[
This is a demo, tags will
appear literally.
<div style="color:red">Hello</div>
]]>
you can refer this ans : https://stackoverflow.com/a/16785992/3000179
If you want to do on browser level, you can follow the steps :
Replace the & character with &
Replace the < character with <
Replace the > character with >
Optionally surround your HTML sample with <pre> and/or <code>
tags.
Hope this helps.

Would a browser ever try to parse img>

Is it likely or possible for img tag, or any other to be parsed, when the < tag is several characters prior, or perhaps omitted? Would this happen in any notable HTML parsers?
For example
<div>$test</div>.
Where $test could be any string containing a >, but not a <. Such as img>, but not <img
Full disclosure: This question is specifically to see whether or not the comment I posted was correct.
You don't technically need either < or >. Load this up in IE, and it'll run a javascript alert. Not sure if it's possible without messing with the charset though.
<HTML>
<HEAD>
<META charset="UTF-7">
</HEAD>
<BODY>
<DIV>+ADw-script+AD4-alert(+ACI-XSS+ACI-)+ADw-/script+AD4-</DIV>
</BODY>
</HTML>
Source: http://securityoverride.org/articles.php?article_id=13
Well, out of curiosity, I changed one of my test pages so its script section began with this:
< script>
The result was completely broken and just printed all of my javascript. This happened in IE9, GC28, and Firefox. I didn't really have an image on-hand to test with, but I think we can derive from this that HTML tags are always required to have no white-space between the angle bracket and tag declaration.
If you'd like even further confirmation, I suggest you browse the W3C standardization documents to see if you can find where they declare the generic pattern for HTML element tags. Many HTML parsers probably base themselves off those documents to ease their coding.
White space is allowed after the tagname
< script> is invalid
while
<script> is valid

CDATA not rendering tags

I'm trying to display XML tags mixed in with plain text on a web page. I do this from a python script that obtains it's data from a database. I've simplified my problem to the program below.
#!/usr/local/bin/python
print """Content-type: text/html;charset=utf-8\n\n"""
print """<html><body>
start:<![CDATA[This is the <xml> tag </xml>.]]>:end
</body></html>"""
I'm expecting it to display the following:
start:This is the <xml> tag </xml>.:end
In both IE8 and Chrome15 it however displays the following:
start: tag .]]>:end
When I look at the HTML source of the page in IE, I can see the following:
<html><body>
start:<![CDATA[This is the <xml> tagxml.]]>:end
</body></html>
In Chrome I see the the same when looking at the source, but it seems that the <![CDATA[This is the <xml> part is in green because it is considered a comment.
I particularly want to keep the text (instead of converting the < to <) because via javascript I access the elements, allowing people to edit them in a separate textarea. Converting them would then save them converted, resulting in problems further down in processing. I could convert them back before saving, but this seems like the wrong approach.
Any idea what I'm doing wrong?
Thanks in advance,
Grant
CDATA is part of XML, not HTML, so the browser ignores it, and then treats any tags in it as it would any other tags - ignoring ones it doesn't recognise, and paying attention to those it does.
I think there's no alternative but to use < etc and convert to tags when editing and convert back when saving.
<!DOCTYPE html>
<html>
<body>
<div id="div1"><b>hi</b></div>
<textarea id="area"></textarea>
<script type="text/javascript">
var div1 = document.getElementById('div1')
var area = document.getElementById('area')
var text = div1.firstChild.nodeValue
area.value = text
</script>
</body>
</html>
Where the problem is?

double hyphen in script makes firefox render strangely

<!-- <script type="text/javascript">/*<![CDATA[*/ c-- ;//]]></script> -->
When I have the above line in the <head> section of a plain html page, Firefox 3.5.5 renders the trailing --> as text. If I change c-- to c- it doesn't. Any ideas what's going on here? I getting an artifact on my pages with this due to a very large script that's been crunched. I can change the statement to c-=1 and avoid the problem for now but.... I'd like to know what bit/byte is biting my a$$.
This is due to Firefox implementing SGML (on which HTML was based) comments strictly. This will only occur when the document is loaded in standards mode (i.e. there is a DOCTYPE).
The first <! starts a comment. The first -- enters a section in which > characters are allowed. The second -- (in your script) leaves the section in which > characters are allowed. The > at the end of </script> then ends the comment. The following --> is therefore no longer part of the the comment and gets rendered as text.
See http://www.howtocreate.co.uk/SGMLComments.html for a comprehensive guide to the issue.
Its also worth noting that the HTML 4 Specification says that 'authors should avoid putting two or more adjacent hyphens inside comments' and the HTML 5 Specification says comments must not 'contain two consecutive U+002D HYPHEN-MINUS characters (--)'.
The solution, as you've found, is to not include -- in the middle of a comment.
Technically you are not allowed to have double hyphen in a comment in HTML (or XML). So even if browsers "allow" if it is not valid and should fail an HTML validator.
See Comment section of HTML 4 Specification
I can't replicate this. Doesn't show up on 3.0.1.