Handlebars.net not preserving white space - html

I have the following template:
<!DOCTYPE html>
<html>
<body>
{{>Header}}
This is a test template.
{{>Footer}}
</body>
</html>
When I compile this template, I'd expect to get this:
<!DOCTYPE html>
<html>
<body>
This is a header.
This is a test template.
This is a footer.
</body>
</html>
Instead, what I get is this:
<!DOCTYPE html>
<html>
<body>
This is a header. This is a test template.
This is a footer.</body>
</html>
The indentations before the first and third lines are gone; and the newlines before the second line and closing body tag are gone. Is this expected, and is there a way to preserve the whitespace just as it is laid out in the base template? I should note that I'm using Handlebars.Net here, although my understanding is that it's meant to emulate the original Javascript spec as closely as possible.

(Answer from the Github issue where this was also posted):
So there's two different things going on here that I'll summarize first and then explain: 1) what you expect is incorrect; 2) what you are getting is also incorrect, in a different way:
Whitespace (and line breaks) are not significant in HTML, and though Handlebars is technically a general string templating language, the design decisions & opinions it contains are heavily slanted towards using it for an HTML templating language. You should not expect it to preserve implicit line breaks, only explicit line breaks (e.g. if you put \n it will preserve that)
Handlebars.Net actually DOES preserve some line breaks when it's not supposed to! That's a bug but one many users are currently relying on, so we'll keep it in 1.x but fix it in v2.
To get your desired output, put explicit line breaks in your template. Cheers!

Related

Why does `<span/ >` produce an empty span?

In order to break long lines of text in the HTML source (because I prefer source code that approaches human readability) without introducing whitespace when
rendered, I have used source similar to
<! DOCTYPE HTML>
<html>
<body>
<p>
Span<span/
>in<span/
>the<span/
>place<span/
>where<span/
>you<span/
>live.
</p>
</body>
</html>
Which renders something like
Spanintheplacewhereyoulive.
However, I am not sure why this seems to work (using recent Chrome and Firefox, and a version of Konqueror). The standard seems not to cover this situation, unless I have missed it. A related post suggests to me that the above example is not valid, insofar as the <span/ > tags are concerned.
Not sure it matters, but I want to emphasize that there is whitespace between the / and > in <span/ >. This is lexically distinct from <span /> and <span/>, although I don't know if it's semantically different.
Why does <span/ > render, producing an empty span? Am I accessing some browser-specific behavior?

Can I safely replace "<ul>" tags within HTML using regexes?

I am trying to solve this issue, where users paste invalid HTML that we have to deal with, of the form <ol><ul><li>item</li></ul></ol>. We are currently parsing using lxml. In legal HTML, <ol> cannot have a (direct) child of a <ul> (it must be in an <li>) so lxml closes the ol tag too soon to try to "repair" the HTML, producing <div><ol/><ul><li>item</li></ul>.
The user-pasted text also might be invalid XML (e.g., bare <br> tag), so we can't just parse it as XML.
Thus, we can neither parse it as HTML nor XML, because it might be invalid.
To make this certain (common) case of invalid HTML into valid HTML, can we just replace all <ul> tags with <ol> tags using regexes?
If I use lxml to parse <ol><ol><li>item</li></ol></ol>, the output looks fine (does not close a tag too soon).
However, I don't want to break actual user-typed text, and I'm wondering if there are edge cases I haven't thought of (like "<ul>" within a <pre> tag or some other crazy thing that isn't actually a tag, though I've tested that particular case).
Yes, it would change unnumbered lists to numbered lists. I'm okay with that.
Yes, I have read this fun regex answer.
In general, there is no guarantee of a 'non-edge case' transform with HTML and regular expressions. HTML, more so than XML, has rules that make a direct text replacement of things that look like tags problematic.
The following text validates as HTML using w3c.org validation checker without any warnings.
<!DOCTYPE html>
<html lang="en">
<head>
<title><!--<ul>--></title>
<style lang="css">s {content: "<ul>";}</style>
<script>"<ul>"</script>
</head>
<body data-ul="<ul>"></body>
</html>
That aside, using some regular expression heuristics might solve the issue at hand - at least insofar as a reasonable scope. A streaming HTML token parser that does not attempt to apply any validation or DOM/tree building might also be useful for the initial replacement stage.

Is it bad practice to put elements below the </body></html> tags?

Sure the following looks stupid:
<!--partialPage.html-->
<html>
<head>A css file linked here...</head>
<body>
First piece of content
</body>
</html>
blah after html
...and could easily be fixed by moving the blah after html into the body. Turns out the browser does that for you, and appends this to the end of the body. (Not a good reason to do this stupid thing)
I did however find a good reason why I would want to do this that would simplify partial layouts in template engines.
<!--page1.html-->
{{> partialPage}} <!--This is a partial/include-->
Content that would later be teleported into the body.
<!--page2.html-->
{{> partialPage}} <!--This is a partial/include-->
Another page using the layout
This is of course simpler than doing all of the following:
//sendPartials.js
{
partialOne: 'Content that would later be teleported into the body',
partialTwo: 'Another page using the layout'
}
<!--page1.html-->
<html>
<head>A css file linked here...</head>
<body>
First piece of content
{{> partialOne}}
</body>
</html>
<!--page2.html-->
<html>
<head>A css file linked here...</head>
<body>
First piece of content
{{> partialTwo}}
</body>
</html>
So the questions comes down to... Which method should I be using?
Is it bad to create a partial that IS the layout file?
Just because a browser will let you get away with something doesn't mean you should take advantage of it. Some potential caveats:
This might not work in every browser.
This might work in every browser now, but not in six months.
This might work in every browser, but with subtle differences you might not catch.
This might make the browser turn on quirks mode or some similar mode. You really don't want your pages rendering in quirks mode, because it can cause lots of random and minor things to work subtly differently. You could be working on some completely unrelated feature six months down the line, and have an issue where the browser isn't doing what you'd expect, solely because it's running in quirks mode.
I'd say this is like putting a credit card through a paper shredder -- sure, it might get the job done, but it's really not a good practice to rely upon.

MSHTML appear on editing

I want to add another MSHTML question. Thanks to all responses.
We use in Delphi the standard TWebbrowser component, that uses mshtml.dll internally. Additionaly we use the registry to ensure that the pages renders with the new rendering engine (Web-Browser-Control-Specifying-the-IE-Version, MSDN: FEATURE_BROWSER_EMULATION). So we use the rendering of IE 10 but we have the same results with ie 8 to ie 11.
Using the standard rendering machine of MSHTML (IE7) works right, but due to new rendering options we need the new rendering of MSHTML.
We use the design mode of the control to enabled the user to make changes in the documents:
var
mDocument: IHTMLDocument2;
begin
mDocument := ((ASender as TWebBrowser).Document as IHTMLDocument2);
mDocument.designMode := 'on';
Now we have the following problem:
We load th following (simplified) HTML via the IPersistStreamInit.Load(...) into the WebBrowser:
<html>
<body>
What should I do
with some of the
spaces.
</body>
</html>
In the WebBrowser user can see the following:
Now, when selecting the word "with" in the WebBrowser in editing mode, and typing a character, some spaces appear. The HTML now has in it - exactly as many as there has been spaces in the HTML before editing:
The code is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv="Content-Type" content="text/html; charset=unicode">
<META name="GENERATOR" content="MSHTML 10.00.9200.16540"></HEAD>
<BODY> What should I do n some of the spaces.
</BODY></HTML>
The same effect can be achieved by replacing the word "spaces" in the WebBrowser.
This is a bad behaviour for users using our application.
Every HTML code with white spaces infront of text, has the same behaviour. The mess is, that MSHTML itself generates such HTML.
By now we think of a solution to remove all the spaces on the left of each line, but we think that such workarounds could end in a bigger mess, because they change the HTML. This could cause some different behaviour of the rendering.
Thinking about removing the spaces before each line, puts you somewhere in the right direction, but nowhere near what you should be doing: convert the data info HTML before IPersistStreamInit.Load.
Since the HTML specification prescribes any whitespace in the HTML code should be treated as a single instance of whitespace (except inside <pre> tags), it's understandable that IE's design-mode is confused what to with these extra spaces when you edit around them. You've stumbled upon a border case.
I suggest you either don't use IPersistStreamInit.Load
but Navigate('about:<html><body></body></html>'); and document.body.innerText:=... instead,
or take care to properly format the initial HTML:
parse the text to collapse any/all consecutive whitespace,
replace all & with &, < with < etc...
(perhaps also #13#10 with '<br />' and #13#10#13#10 with '</p><p>'?)

Would a browser ever try to parse img>

Is it likely or possible for img tag, or any other to be parsed, when the < tag is several characters prior, or perhaps omitted? Would this happen in any notable HTML parsers?
For example
<div>$test</div>.
Where $test could be any string containing a >, but not a <. Such as img>, but not <img
Full disclosure: This question is specifically to see whether or not the comment I posted was correct.
You don't technically need either < or >. Load this up in IE, and it'll run a javascript alert. Not sure if it's possible without messing with the charset though.
<HTML>
<HEAD>
<META charset="UTF-7">
</HEAD>
<BODY>
<DIV>+ADw-script+AD4-alert(+ACI-XSS+ACI-)+ADw-/script+AD4-</DIV>
</BODY>
</HTML>
Source: http://securityoverride.org/articles.php?article_id=13
Well, out of curiosity, I changed one of my test pages so its script section began with this:
< script>
The result was completely broken and just printed all of my javascript. This happened in IE9, GC28, and Firefox. I didn't really have an image on-hand to test with, but I think we can derive from this that HTML tags are always required to have no white-space between the angle bracket and tag declaration.
If you'd like even further confirmation, I suggest you browse the W3C standardization documents to see if you can find where they declare the generic pattern for HTML element tags. Many HTML parsers probably base themselves off those documents to ease their coding.
White space is allowed after the tagname
< script> is invalid
while
<script> is valid