I'm looking to prevent a line break after a hyphen - on a case-by-case basis that is compatible with all browsers.
Example:
I have this text: 3-3/8" which in HTML is this: 3-3/8”
The problem is that near the end of a line, because of the hyphen, it breaks and wraps to the next line instead of treating it like a full word...
3-
3/8"
I've tried inserting the "zero width no break character", with no luck...
3-3/8”
I'm seeing this in Safari and thinking it will be the same in all browsers.
The following is my doctype and character encoding...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
Is there any way I can prevent these from line-breaking after the hyphen? I do not need any solution that applies to the whole page... just something I can insert as needed, like a "zero width no break character", except one that works.
Here is a Demo. Simply make the frame narrower until the line breaks at the hyphen.
http://jsfiddle.net/RagKH/
Try using the non-breaking hyphen ‑. I've replaced the dash with that character in your jsfiddle, shrunk the frame down as small as it can go, and the line doesn't split there any more.
You could also wrap the relevant text with
<span style="white-space: nowrap;"></span>
IE8/9 render the non-breaking hyphen mentioned in CanSpice's answer longer than a typical hyphen. It is the length of an en-dash instead of a typical hyphen. This display difference was a deal breaker for me.
As I could not use the CSS answer specified by Deb I instead opted to use no break tags.
<nobr>e-mail</nobr>
In addition I found a specific scenario that caused IE8/9 to break on a hyphen.
A string contains words separated by non-breaking spaces -
Width is limited
Contains a dash
IE renders it like this.
The following code reproduces the problem pictured above. I had to use a meta tag to force rendering to IE9 as IE10 has fixed the issue. No fiddle because it does not support meta tags.
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=9" />
<meta charset="utf-8"/>
<style>
body { padding: 20px; }
div { width: 300px; border: 1px solid gray; }
</style>
</head>
<body>
<div>
<p>If there is a - and words are separated by the whitespace code then IE will wrap on the dash.</p>
</div>
</body>
</html>
You can also do it "the joiner way" by inserting "U+2060 Word Joiner".
If Accept-Charset permits, the unicode character itself can be inserted directly into the HTML output.
Otherwise, it can be done using entity encoding. E.g. to join the text red-brown, use:
red-brown
or (decimal equivalent):
red-brown
. Another usable character is "U+FEFF Zero Width No-break Space"[ 1 ]:
red-brown
and (decimal equivalent):
red-brown
[1]: Note that while this method still works in major browsers like Chrome, it has been deprecated since Unicode 3.2.
Comparison of "the joiner way" with "U+2011 Non-breaking Hyphen":
The word joiner can be used for all other characters, not just hyphens.
When using the word joiner, most renderers will rasterize the text identically. On Chrome, FireFox, IE, and Opera, the rendering of normal hyphens, eg:
a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z
is identical to the rendering of normal hyphens (with U+2060 Word Joiner), eg:
a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z
while the above two renders differ from the rendering of "Non-breaking Hyphen", eg:
a‑b‑c‑d‑e‑f‑g‑h‑i‑j‑k‑l‑m‑n‑o‑p‑q‑r‑s‑t‑u‑v‑w‑x‑y‑z
. (The extent of the difference is browser-dependent and font-dependent. E.g. when using a font declaration of "arial", Firefox and IE11 show relatively huge variations, while Chrome and Opera show smaller variations.)
Comparison of "the joiner way" with <span class=c1></span> (CSS .c1 {white-space:nowrap;}) and <nobr></nobr>:
The word joiner can be used for situations where usage of HTML tags is restricted, e.g. forms of websites and forums.
On the spectrum of presentation and content, majority will consider the word joiner to be closer to content, when compared to tags.
• As tested on Windows 8.1 Core 64-bit using:
• IE 11.0.9600.18205
• Firefox 43.0.4
• Chrome 48.0.2564.109 (Official Build) m (32-bit)
• Opera 35.0.2066.92
Late to the party, but I think this is actually the most elegant. Use the WORD JOINER Unicode character on either side of your hyphen, or em dash, or any character.
So, like so:
—
This will join the symbol on both ends to its neighbors (without adding a space) and prevent line breaking.
Following on from #den’s useful JSX solution, this worked for me:
<h2>{props.name.replace('-', '-\u2060')}</h2>
JSX solution example using word joiner unicode character:
<div>{`This is JSX-${'\u2060'}related example`}</div>
Related
I'm looking to prevent a line break after a hyphen - on a case-by-case basis that is compatible with all browsers.
Example:
I have this text: 3-3/8" which in HTML is this: 3-3/8”
The problem is that near the end of a line, because of the hyphen, it breaks and wraps to the next line instead of treating it like a full word...
3-
3/8"
I've tried inserting the "zero width no break character", with no luck...
3-3/8”
I'm seeing this in Safari and thinking it will be the same in all browsers.
The following is my doctype and character encoding...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
Is there any way I can prevent these from line-breaking after the hyphen? I do not need any solution that applies to the whole page... just something I can insert as needed, like a "zero width no break character", except one that works.
Here is a Demo. Simply make the frame narrower until the line breaks at the hyphen.
http://jsfiddle.net/RagKH/
Try using the non-breaking hyphen ‑. I've replaced the dash with that character in your jsfiddle, shrunk the frame down as small as it can go, and the line doesn't split there any more.
You could also wrap the relevant text with
<span style="white-space: nowrap;"></span>
IE8/9 render the non-breaking hyphen mentioned in CanSpice's answer longer than a typical hyphen. It is the length of an en-dash instead of a typical hyphen. This display difference was a deal breaker for me.
As I could not use the CSS answer specified by Deb I instead opted to use no break tags.
<nobr>e-mail</nobr>
In addition I found a specific scenario that caused IE8/9 to break on a hyphen.
A string contains words separated by non-breaking spaces -
Width is limited
Contains a dash
IE renders it like this.
The following code reproduces the problem pictured above. I had to use a meta tag to force rendering to IE9 as IE10 has fixed the issue. No fiddle because it does not support meta tags.
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=9" />
<meta charset="utf-8"/>
<style>
body { padding: 20px; }
div { width: 300px; border: 1px solid gray; }
</style>
</head>
<body>
<div>
<p>If there is a - and words are separated by the whitespace code then IE will wrap on the dash.</p>
</div>
</body>
</html>
You can also do it "the joiner way" by inserting "U+2060 Word Joiner".
If Accept-Charset permits, the unicode character itself can be inserted directly into the HTML output.
Otherwise, it can be done using entity encoding. E.g. to join the text red-brown, use:
red-brown
or (decimal equivalent):
red-brown
. Another usable character is "U+FEFF Zero Width No-break Space"[ 1 ]:
red-brown
and (decimal equivalent):
red-brown
[1]: Note that while this method still works in major browsers like Chrome, it has been deprecated since Unicode 3.2.
Comparison of "the joiner way" with "U+2011 Non-breaking Hyphen":
The word joiner can be used for all other characters, not just hyphens.
When using the word joiner, most renderers will rasterize the text identically. On Chrome, FireFox, IE, and Opera, the rendering of normal hyphens, eg:
a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z
is identical to the rendering of normal hyphens (with U+2060 Word Joiner), eg:
a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z
while the above two renders differ from the rendering of "Non-breaking Hyphen", eg:
a‑b‑c‑d‑e‑f‑g‑h‑i‑j‑k‑l‑m‑n‑o‑p‑q‑r‑s‑t‑u‑v‑w‑x‑y‑z
. (The extent of the difference is browser-dependent and font-dependent. E.g. when using a font declaration of "arial", Firefox and IE11 show relatively huge variations, while Chrome and Opera show smaller variations.)
Comparison of "the joiner way" with <span class=c1></span> (CSS .c1 {white-space:nowrap;}) and <nobr></nobr>:
The word joiner can be used for situations where usage of HTML tags is restricted, e.g. forms of websites and forums.
On the spectrum of presentation and content, majority will consider the word joiner to be closer to content, when compared to tags.
• As tested on Windows 8.1 Core 64-bit using:
• IE 11.0.9600.18205
• Firefox 43.0.4
• Chrome 48.0.2564.109 (Official Build) m (32-bit)
• Opera 35.0.2066.92
Late to the party, but I think this is actually the most elegant. Use the WORD JOINER Unicode character on either side of your hyphen, or em dash, or any character.
So, like so:
—
This will join the symbol on both ends to its neighbors (without adding a space) and prevent line breaking.
Following on from #den’s useful JSX solution, this worked for me:
<h2>{props.name.replace('-', '-\u2060')}</h2>
JSX solution example using word joiner unicode character:
<div>{`This is JSX-${'\u2060'}related example`}</div>
I'd like to display raw HTML. We all know one has to escape each "<" and ">" like this:
<PRE> this is a test <DIV> </PRE>
However, I do not want to do this. I'd like a way to keep the HTML code as is (since it is easier to read, (inside the editor) and I might want to copy it and use it again myself as actual HTML code, and do not want to have to change it again or have two versions of the same code, one escaped and one not escaped.
Is there any other environment that is more "raw" than PRE that might allow this? So one does not have to keep editing HTML and changing everything each time they want to show some raw HTML code, maybe in HTML5?
Something like <REALLY_REALLY_VERBATIM> ...... </<REALLY_REALLY_VERBATIM>
The JavaScript solution does not work on Firefox 21, here is a screenshot:
The first solution still does not work on Firefox, here is a screenshot:
You can use the xmp element, see What was the <XMP> tag used for?. It has been in HTML since the beginning and is supported by all browsers. Specifications frown upon it, but HTML5 CR still describes it and requires browsers to support it (though it also tells authors not to use it, but it cannot really prevent you).
Everything inside xmp is taken as such, no markup (tags or character references) is recognized there, except, for apparent reason, the end tag of the element itself, </xmp>.
Otherwise xmp is rendered like pre.
When using “real XHTML”, i.e. XHTML served with an XML media type (which is rare), the special parsing rules do not apply, so xmp is treated like pre. But in “real XHTML”, you can use a CDATA section, which implies similar parsing rules. It has no special formatting, so you would probably want to wrap it inside a pre element:
<pre><![CDATA[
This is a demo, tags like <p> will
appear literally.
]]></pre>
I don’t see how you could combine xmp and CDATA section to achieve so-called polyglot markup
Essentially the original question can be broken down in 2 parts:
Main objective/challenge: embedding(/transporting) a raw formatted code-snippet
(any kind of code) in a web-page's markup (for simple copy/paste/edit due to no
encoding/escaping)
correctly displaying/rendering that code-snippet (possibly edit it) in the
browser
The short (but) ambiguous answer is: you can't, ...but you can (get very close).
(I know, that are 3 contradicting answers, so read on...)
(polyglot)(x)(ht)ml Markup-languages rely on wrapping (almost) everything between begin/opening and end/closing tags/character(sequences).
So, to embed any kind of raw code/snippet inside your markup-language, one will always have to escape/encode every instance (inside that snippet) that resembles the character(-sequence) that would close the wrapping 'container' element in the markup. (During this post I'll refer to this as rule no 1.)
Think of "some "data" here" or <i>..close italics with '</i>'-tag</i>, where it is obvious one should escape/encode (something in) </i and " (or change container's quote-character from " to ').
So, because of rule no 1, you can't 'just' embed 'any' unknown raw code-snippet inside markup.
Because, if one has to escape/encode even one character inside the raw snippet, then that snippet would no longer be the same original 'pure raw code' that anyone can copy/paste/edit in the document's markup without further thought. It would lead to malformed/illegal markup and Mojibake (mainly) because of entities.
Also, should that snippet contain such characters, you'd still need some javascript to 'translate' that character(sequence) from (and to) it's escaped/encoded representation to display the snippet correctly in the 'webpage' (for copy/paste/edit).
That brings us to (some of) the datatypes that markup-languages specify. These datatypes essentially define what are considered 'valid characters' and their meaning (per tag, property, etc.):
PCDATA (Parsed Character DATA): will expand entities and one must
escape <, & (and > depending on markup language/version).
Most tags like body, div, pre, etc, but also textarea (until
HTML5) fall under this type.
So not only do you need to encode all the container's closing character-sequences
inside the snippet, you also have to encode all <, & (,>) characters
(at minimum).
Needless to say, encoding/escaping this many characters falls outside this
objective's scope of embedding a raw snippet in the markup.
'..But a textarea seems to work...', yes, either because of the browsers
error-engine trying to make something out of it, or because HTML5:
RCDATA (Replaceable Character DATA): will not not treat tags inside the
text as markup (but are still governed by rule 1), so one doesn't need to
encode < (>). BUT entities are still expanded, so they and 'ambiguous
ampersands' (&) need special care.
The current HTML5 spec says the textarea is now a RCDATA field and (quote):
The text in raw text and RCDATA elements must not contain any
occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS)
followed by characters that case-insensitively match the tag name of
the element followed by one of U+0009 CHARACTER TABULATION (tab),
U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
(CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/).
Thus no matter what, textarea needs a hefty entity translation handler or
it will eventually Mojibake on entities!
CDATA (Character Data) will not treat tags inside the text as
markup and will not expand entities.
So as long as the raw snippet code does not violate rule 1 (that one can't
have the containers closing character(sequence) inside the snippet), this
requires no other escaping/encoding.
Clearly this boils down to: how can we minimize the number of characters/character-sequences that still need to be encoded in the snippet's raw source and the number of times that character(sequence) might appear in an average snippet; something that is also of importance for the javascript that handles the translation of these characters (if they occur).
So what 'containers' have this CDATA context?
Most value properties of tags are CDATA, so one could (ab)use a hidden input's value property (proof of concept jsfiddle here).
However (conform rule 1) this creates an encoding/escape problem with nested quotes (" and ') in the raw snippet and one needs some javascript to get/translate and set the snippet in another (visible) element (or simply setting it as a text-area's value). Somehow this gave me problems with entities in FF (just like in a textarea). But it doesn't really matter, since the 'price' of having to escape/encode nested quotes is higher then a (HTML5) textarea (quotes are quite common in source code..).
What about trying to (ab)use <![CDATA[<tag>bla & bla</tag>]]>?
As Jukka points out in his extended answer, this would only work in (rare) 'real xhtml'.
I thought of using a script-tag (with or without such a CDATA wrapper inside the script-tag) together with a multi-line comment /* */ that wraps the raw snippet (script-tags can have an id and you can access them by count). But since this obviously introduces a escaping problem with */, ]]> and </script in the raw snippet, this doesn't seem like a solution either.
Please post other viable 'containers' in the comments to this answer.
By the way, encoding or counting the number of - characters and balancing them out inside a comment tag <!-- --> is just insane for this purpose (apart from rule 1).
That leaves us with Jukka K. Korpela's excellent answer: the <xmp> tag seems the best option!
The 'forgotten' <xmp> holds CDATA, is intended for this purpose AND is indeed still in the current HTML 5 spec (and has been at least since HTML3.2); exactly what we need! It's also widely supported, even in IE6 (that is.. until it suffers from the same regression as the scrolling table-body).
Note: as Jukka pointed out, this will not work in true xhtml or polyglot (that will treat it as a pre) and the xmp tag must still adhere to rule no 1. But that's the 'only' rule.
Consider the following markup:
<!-- ATTENTION: replace any occurrence of </xmp with </xmp -->
<xmp id="snippet-container">
<div>
<div>this is an example div & holds an xmp tag:<br />
<xmp>
<html><head> <!-- indentation col 0!! -->
<title>My Title</title>
</head><body>
<p>hello world !!</p>
</body></html>
</xmp> <!-- note this encoded/escaped tag -->
</div>
This line is also part of the snippet
</div>
</xmp>
The above codeblok illustrates a raw piece of markup where <xmp id="snippet-container"> contains an (almost raw) code-snippet (containing div>div>xmp>html-document).
Notice the encoded closing tag in this markup? To comply with rule no 1, this was encoded/escaped).
So embedding/transporting the (sometimes almost) raw code is/seems solved.
What about displaying/rendering the snippet (and that encoded </xmp>)?
The browser will (or it should) render the snippet (the contents inside snippet-container) exactly the way you see it in the codeblock above (with some discrepancy amongst browsers whether or not the snippet starts with a blank line).
That includes the formatting/indentation, entities (like the string &), full tags, comments AND the encoded closing tag </xmp> (just like it was encoded in the markup). And depending on browser(version) one could even try use the property contenteditable="true" to edit this snippet (all that without javascript enabled). Doing something like textarea.value=xmp.innerHTML is also a breeze.
So you can... if the snippet doesn't contain the containers closing character-sequence.
However, should a raw snippet contain the closing character-sequence </xmp (because it is an example of xmp itself or it contains some regex, etc), you must accept that you have to encode/escape that sequence in the raw snippet AND need a javascript handler to translate that encoding to display/render the encoded </xmp> like </xmp> inside a textarea (for editing/posting) or (for example) a pre just to correctly render the snippet's code (or so it seems).
A very rudimentary jsfiddle example of this here. Note that getting/embedding/displaying/retrieving-to-textarea worked perfect even in IE6. But setting the xmp's innerHTML revealed some interesting 'would-be-intelligent' behavior on IE's part. There is a more extensive note and workaround on that in the fiddle.
But now comes the important kicker (another reason why you only get very close):
Just as an over-simplified example, imagine this rabbit-hole:
Intended raw code-snippet:
<!-- remember to translate between </xmp> and </xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>
Well, to comply with rule 1, we 'only' need to encode those </xmp[> \n\r\t\f\/] sequences, right?
So that gives us the following markup (using just a possible encoding):
<xmp id="container">
<!-- remember to translate between </xmp> and </xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>
</xmp>
Hmm.. shalt I get my crystal ball or flip a coin? No, let the computer look at its system-clock and state that a derived number is 'random'. Yes, that should do it..
Using a regex like: xmp.innerHTML.replace(/<(?=\/xmp[> \n\r\t\f\/])/gi, '<');, would translate 'back' to this:
<!-- remember to translate between </xmp> and </xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>
Hmm.. seems this random generator is broken... Houston..?
Should you have missed the joke/problem, read again starting at the 'intended raw code-snippet'.
Wait, I know, we (also) need to encode .... to ....
Ok, rewind to 'intended raw code-snippet' and read again.
Somehow this all begins to smell like the famous hilarious-but-true rexgex-answer on SO, a good read for people fluent in mojibake.
Maybe someone knows a clever algorithm or solution to fix this problem, but I assume that the embedded raw code will get more and more obscure to the point where you'd be better of properly escaping/encoding just your <, & (and >), just like the rest of the world.
Conclusion: (using the xmp tag)
it can be done with known snippets that do not contain the container's closing character-sequence,
we can get very close to the original objective with known snippets that only use 'basic first-level' escaping/encoding so we don't fall in the rabbithole,
but ultimately it seems that one can't do this reliably in a 'production-environment' where people can/should copy/paste/edit 'any unknown' raw snippets while not knowing/understanding the implications/rules/rabbithole (depending on your implementation of handling/translating for rule 1 and the rabbit-hole).
Hope this helps!
PS:
Whilst I would appreciate an upvote if you find this explanation useful, I kind of think Jukka's answer should be the accepted answer (should no better option/answer come along), since he was the one who remembered the xmp tag (that I forgot about over the years and got 'distracted' by the commonly advocated PCDATA elements like pre, textarea, etc.).
This answer originated in explaining why you can't do it (with any unknown raw snippet) and explain some obvious pitfalls that some other (now deleted) answers overlooked when advising a textarea for embedding/transport. I've expanded my existing explanation to also support and further explain Jukka's answer (since all that entity and *CDATA stuff is almost harder than code-pages).
Cheap and cheerful answer:
<textarea>Some raw content</textarea>
The textarea will handle tabs, multiple spaces, newlines, line wrapping all verbatim.
It copies and pastes nicely and its valid HTML all the way. It also allows the user to resize the code box.
You don't need any CSS, JS, escaping, encoding.
You can alter the appearance and behaviour as well.
Here's a monospace font, editing disabled, smaller font, no border:
<textarea
style="width:100%; font-family: Monospace; font-size:10px; border:0;"
rows="30" disabled
>Some raw content</textarea>
This solution is probably not semantically correct. So if you need that, it might be best to choose a more sophisticated answer.
xmp is the way to go, i.e.:
<xmp>
# your code...
</xmp>
echo '<pre>' . htmlspecialchars("<div><b>raw HTML</b></div>") . '</pre>';
I think that's what you're looking for?
In other words, use htmlspecialchars() in PHP
#GitaarLAB and #Jukka elaborate that <xmp> tag is obsolete, but still the best. When I use it like this
<xmp>
<div>Lorem ipsum</div>
<p>Hello</p>
</xmp>
then the first EOL is inserted in the code, and it looks awful.
It can be solved by removing that EOL
<xmp><div>Lorem ipsum</div>
<p>Hello</p>
</xmp>
but then it looks bad in the source. I used to solve it with wrapping <div>, but recently I figured out a nice CSS3 rule, I hope it also helps somebody:
xmp { margin: 5px 0; padding: 0 5px 5px 5px; background: #CCC; }
xmp:before { content: ""; display: block; height: 1em; margin: 0 -5px -2em -5px; }
This looks better.
If you have jQuery enabled you can use an escapeXml function and not have to worry about escaping arrows or special characters.
<pre>
${fn:escapeXml('
<!-- all your code -->
')};
</pre>
<code> tag is the good way because <xmp> and <pre> tags not support line wrapping
echo '<code>' . htmlspecialchars("<div><b>hello world</b></div>") . '</code>';
In my html page I have displayed fractions using html special character. My idea is to display 1/2, 2/2 and 3/3.
I have used ⅓ for 1/3 and ⅔ for 2/3 and the special charactera are displayed correctly. I took reference from this link HTML Special Characters
But when I tried using &frac33; for 3/3 it is not working. It is just displaying as it is, not converting to special character.
Could you someone please tell me what is the html special character for 3/3.
Thank You
<sup>3</sup>⁄<sub>3</sub>
Result: 3⁄3
Not all fractions have their own special character. For those fractions (like 3/3) which don't have slanted fraction characters, use the HTML entity ⁄:
<sup>3</sup>⁄<sub>3</sub> = 3⁄3
There is no named (or numeric) character reference for a character representing 3/3, since there simply is no such character.
In theory, the FRACTION SLASH U+2044 “⁄” character (representable as ⁄ in HTML, among other thing) can be used between digits to suggest that rendering routines present the combination as a typographic fraction. In practice, only some typesetting programs can do this, and web browsers come nowhere near.
Trying to play with HTML markup and/or CSS to construct something that looks like a typographic fraction (comparable to ½ in appearance) tend to produce messy results, including uneven line spacing.
The practical option is to use just common notations like 2/2. But if you want something like a typographic fraction, you could use MathML with MathJax. More exactly, you would use the mfrac element in MathML with the attribute bevelled="true". Sample code:
<!doctype html>
<title>Fractions with MathJax and MathML</title>
<script src=
"http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
Here we have the common fraction ½, then
a simulation with HTML and CSS:
<sup>1</sup>⁄<sub>2</sub>.
Note that this tends to create uneven line spacing.
There are some cures to that, but let us see how MathML works:
<math>
<mfrac bevelled="true">
<mn>1</mn>
<mn>2</mn>
</mfrac>
</math>.
Some text here to demonstrate that line spacing has not
been disturbed here.
Sample rendering:
I'd like to display raw HTML. We all know one has to escape each "<" and ">" like this:
<PRE> this is a test <DIV> </PRE>
However, I do not want to do this. I'd like a way to keep the HTML code as is (since it is easier to read, (inside the editor) and I might want to copy it and use it again myself as actual HTML code, and do not want to have to change it again or have two versions of the same code, one escaped and one not escaped.
Is there any other environment that is more "raw" than PRE that might allow this? So one does not have to keep editing HTML and changing everything each time they want to show some raw HTML code, maybe in HTML5?
Something like <REALLY_REALLY_VERBATIM> ...... </<REALLY_REALLY_VERBATIM>
The JavaScript solution does not work on Firefox 21, here is a screenshot:
The first solution still does not work on Firefox, here is a screenshot:
You can use the xmp element, see What was the <XMP> tag used for?. It has been in HTML since the beginning and is supported by all browsers. Specifications frown upon it, but HTML5 CR still describes it and requires browsers to support it (though it also tells authors not to use it, but it cannot really prevent you).
Everything inside xmp is taken as such, no markup (tags or character references) is recognized there, except, for apparent reason, the end tag of the element itself, </xmp>.
Otherwise xmp is rendered like pre.
When using “real XHTML”, i.e. XHTML served with an XML media type (which is rare), the special parsing rules do not apply, so xmp is treated like pre. But in “real XHTML”, you can use a CDATA section, which implies similar parsing rules. It has no special formatting, so you would probably want to wrap it inside a pre element:
<pre><![CDATA[
This is a demo, tags like <p> will
appear literally.
]]></pre>
I don’t see how you could combine xmp and CDATA section to achieve so-called polyglot markup
Essentially the original question can be broken down in 2 parts:
Main objective/challenge: embedding(/transporting) a raw formatted code-snippet
(any kind of code) in a web-page's markup (for simple copy/paste/edit due to no
encoding/escaping)
correctly displaying/rendering that code-snippet (possibly edit it) in the
browser
The short (but) ambiguous answer is: you can't, ...but you can (get very close).
(I know, that are 3 contradicting answers, so read on...)
(polyglot)(x)(ht)ml Markup-languages rely on wrapping (almost) everything between begin/opening and end/closing tags/character(sequences).
So, to embed any kind of raw code/snippet inside your markup-language, one will always have to escape/encode every instance (inside that snippet) that resembles the character(-sequence) that would close the wrapping 'container' element in the markup. (During this post I'll refer to this as rule no 1.)
Think of "some "data" here" or <i>..close italics with '</i>'-tag</i>, where it is obvious one should escape/encode (something in) </i and " (or change container's quote-character from " to ').
So, because of rule no 1, you can't 'just' embed 'any' unknown raw code-snippet inside markup.
Because, if one has to escape/encode even one character inside the raw snippet, then that snippet would no longer be the same original 'pure raw code' that anyone can copy/paste/edit in the document's markup without further thought. It would lead to malformed/illegal markup and Mojibake (mainly) because of entities.
Also, should that snippet contain such characters, you'd still need some javascript to 'translate' that character(sequence) from (and to) it's escaped/encoded representation to display the snippet correctly in the 'webpage' (for copy/paste/edit).
That brings us to (some of) the datatypes that markup-languages specify. These datatypes essentially define what are considered 'valid characters' and their meaning (per tag, property, etc.):
PCDATA (Parsed Character DATA): will expand entities and one must
escape <, & (and > depending on markup language/version).
Most tags like body, div, pre, etc, but also textarea (until
HTML5) fall under this type.
So not only do you need to encode all the container's closing character-sequences
inside the snippet, you also have to encode all <, & (,>) characters
(at minimum).
Needless to say, encoding/escaping this many characters falls outside this
objective's scope of embedding a raw snippet in the markup.
'..But a textarea seems to work...', yes, either because of the browsers
error-engine trying to make something out of it, or because HTML5:
RCDATA (Replaceable Character DATA): will not not treat tags inside the
text as markup (but are still governed by rule 1), so one doesn't need to
encode < (>). BUT entities are still expanded, so they and 'ambiguous
ampersands' (&) need special care.
The current HTML5 spec says the textarea is now a RCDATA field and (quote):
The text in raw text and RCDATA elements must not contain any
occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS)
followed by characters that case-insensitively match the tag name of
the element followed by one of U+0009 CHARACTER TABULATION (tab),
U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN
(CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/).
Thus no matter what, textarea needs a hefty entity translation handler or
it will eventually Mojibake on entities!
CDATA (Character Data) will not treat tags inside the text as
markup and will not expand entities.
So as long as the raw snippet code does not violate rule 1 (that one can't
have the containers closing character(sequence) inside the snippet), this
requires no other escaping/encoding.
Clearly this boils down to: how can we minimize the number of characters/character-sequences that still need to be encoded in the snippet's raw source and the number of times that character(sequence) might appear in an average snippet; something that is also of importance for the javascript that handles the translation of these characters (if they occur).
So what 'containers' have this CDATA context?
Most value properties of tags are CDATA, so one could (ab)use a hidden input's value property (proof of concept jsfiddle here).
However (conform rule 1) this creates an encoding/escape problem with nested quotes (" and ') in the raw snippet and one needs some javascript to get/translate and set the snippet in another (visible) element (or simply setting it as a text-area's value). Somehow this gave me problems with entities in FF (just like in a textarea). But it doesn't really matter, since the 'price' of having to escape/encode nested quotes is higher then a (HTML5) textarea (quotes are quite common in source code..).
What about trying to (ab)use <![CDATA[<tag>bla & bla</tag>]]>?
As Jukka points out in his extended answer, this would only work in (rare) 'real xhtml'.
I thought of using a script-tag (with or without such a CDATA wrapper inside the script-tag) together with a multi-line comment /* */ that wraps the raw snippet (script-tags can have an id and you can access them by count). But since this obviously introduces a escaping problem with */, ]]> and </script in the raw snippet, this doesn't seem like a solution either.
Please post other viable 'containers' in the comments to this answer.
By the way, encoding or counting the number of - characters and balancing them out inside a comment tag <!-- --> is just insane for this purpose (apart from rule 1).
That leaves us with Jukka K. Korpela's excellent answer: the <xmp> tag seems the best option!
The 'forgotten' <xmp> holds CDATA, is intended for this purpose AND is indeed still in the current HTML 5 spec (and has been at least since HTML3.2); exactly what we need! It's also widely supported, even in IE6 (that is.. until it suffers from the same regression as the scrolling table-body).
Note: as Jukka pointed out, this will not work in true xhtml or polyglot (that will treat it as a pre) and the xmp tag must still adhere to rule no 1. But that's the 'only' rule.
Consider the following markup:
<!-- ATTENTION: replace any occurrence of </xmp with </xmp -->
<xmp id="snippet-container">
<div>
<div>this is an example div & holds an xmp tag:<br />
<xmp>
<html><head> <!-- indentation col 0!! -->
<title>My Title</title>
</head><body>
<p>hello world !!</p>
</body></html>
</xmp> <!-- note this encoded/escaped tag -->
</div>
This line is also part of the snippet
</div>
</xmp>
The above codeblok illustrates a raw piece of markup where <xmp id="snippet-container"> contains an (almost raw) code-snippet (containing div>div>xmp>html-document).
Notice the encoded closing tag in this markup? To comply with rule no 1, this was encoded/escaped).
So embedding/transporting the (sometimes almost) raw code is/seems solved.
What about displaying/rendering the snippet (and that encoded </xmp>)?
The browser will (or it should) render the snippet (the contents inside snippet-container) exactly the way you see it in the codeblock above (with some discrepancy amongst browsers whether or not the snippet starts with a blank line).
That includes the formatting/indentation, entities (like the string &), full tags, comments AND the encoded closing tag </xmp> (just like it was encoded in the markup). And depending on browser(version) one could even try use the property contenteditable="true" to edit this snippet (all that without javascript enabled). Doing something like textarea.value=xmp.innerHTML is also a breeze.
So you can... if the snippet doesn't contain the containers closing character-sequence.
However, should a raw snippet contain the closing character-sequence </xmp (because it is an example of xmp itself or it contains some regex, etc), you must accept that you have to encode/escape that sequence in the raw snippet AND need a javascript handler to translate that encoding to display/render the encoded </xmp> like </xmp> inside a textarea (for editing/posting) or (for example) a pre just to correctly render the snippet's code (or so it seems).
A very rudimentary jsfiddle example of this here. Note that getting/embedding/displaying/retrieving-to-textarea worked perfect even in IE6. But setting the xmp's innerHTML revealed some interesting 'would-be-intelligent' behavior on IE's part. There is a more extensive note and workaround on that in the fiddle.
But now comes the important kicker (another reason why you only get very close):
Just as an over-simplified example, imagine this rabbit-hole:
Intended raw code-snippet:
<!-- remember to translate between </xmp> and </xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>
Well, to comply with rule 1, we 'only' need to encode those </xmp[> \n\r\t\f\/] sequences, right?
So that gives us the following markup (using just a possible encoding):
<xmp id="container">
<!-- remember to translate between </xmp> and </xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>
</xmp>
Hmm.. shalt I get my crystal ball or flip a coin? No, let the computer look at its system-clock and state that a derived number is 'random'. Yes, that should do it..
Using a regex like: xmp.innerHTML.replace(/<(?=\/xmp[> \n\r\t\f\/])/gi, '<');, would translate 'back' to this:
<!-- remember to translate between </xmp> and </xmp> -->
<xmp>
<p>a paragraph</p>
</xmp>
Hmm.. seems this random generator is broken... Houston..?
Should you have missed the joke/problem, read again starting at the 'intended raw code-snippet'.
Wait, I know, we (also) need to encode .... to ....
Ok, rewind to 'intended raw code-snippet' and read again.
Somehow this all begins to smell like the famous hilarious-but-true rexgex-answer on SO, a good read for people fluent in mojibake.
Maybe someone knows a clever algorithm or solution to fix this problem, but I assume that the embedded raw code will get more and more obscure to the point where you'd be better of properly escaping/encoding just your <, & (and >), just like the rest of the world.
Conclusion: (using the xmp tag)
it can be done with known snippets that do not contain the container's closing character-sequence,
we can get very close to the original objective with known snippets that only use 'basic first-level' escaping/encoding so we don't fall in the rabbithole,
but ultimately it seems that one can't do this reliably in a 'production-environment' where people can/should copy/paste/edit 'any unknown' raw snippets while not knowing/understanding the implications/rules/rabbithole (depending on your implementation of handling/translating for rule 1 and the rabbit-hole).
Hope this helps!
PS:
Whilst I would appreciate an upvote if you find this explanation useful, I kind of think Jukka's answer should be the accepted answer (should no better option/answer come along), since he was the one who remembered the xmp tag (that I forgot about over the years and got 'distracted' by the commonly advocated PCDATA elements like pre, textarea, etc.).
This answer originated in explaining why you can't do it (with any unknown raw snippet) and explain some obvious pitfalls that some other (now deleted) answers overlooked when advising a textarea for embedding/transport. I've expanded my existing explanation to also support and further explain Jukka's answer (since all that entity and *CDATA stuff is almost harder than code-pages).
Cheap and cheerful answer:
<textarea>Some raw content</textarea>
The textarea will handle tabs, multiple spaces, newlines, line wrapping all verbatim.
It copies and pastes nicely and its valid HTML all the way. It also allows the user to resize the code box.
You don't need any CSS, JS, escaping, encoding.
You can alter the appearance and behaviour as well.
Here's a monospace font, editing disabled, smaller font, no border:
<textarea
style="width:100%; font-family: Monospace; font-size:10px; border:0;"
rows="30" disabled
>Some raw content</textarea>
This solution is probably not semantically correct. So if you need that, it might be best to choose a more sophisticated answer.
xmp is the way to go, i.e.:
<xmp>
# your code...
</xmp>
echo '<pre>' . htmlspecialchars("<div><b>raw HTML</b></div>") . '</pre>';
I think that's what you're looking for?
In other words, use htmlspecialchars() in PHP
#GitaarLAB and #Jukka elaborate that <xmp> tag is obsolete, but still the best. When I use it like this
<xmp>
<div>Lorem ipsum</div>
<p>Hello</p>
</xmp>
then the first EOL is inserted in the code, and it looks awful.
It can be solved by removing that EOL
<xmp><div>Lorem ipsum</div>
<p>Hello</p>
</xmp>
but then it looks bad in the source. I used to solve it with wrapping <div>, but recently I figured out a nice CSS3 rule, I hope it also helps somebody:
xmp { margin: 5px 0; padding: 0 5px 5px 5px; background: #CCC; }
xmp:before { content: ""; display: block; height: 1em; margin: 0 -5px -2em -5px; }
This looks better.
If you have jQuery enabled you can use an escapeXml function and not have to worry about escaping arrows or special characters.
<pre>
${fn:escapeXml('
<!-- all your code -->
')};
</pre>
<code> tag is the good way because <xmp> and <pre> tags not support line wrapping
echo '<code>' . htmlspecialchars("<div><b>hello world</b></div>") . '</code>';
I have a aspx page whose title has both Hebrew and English characters in it and the order of the words gets messed up.
Is there a way to style the title so the words don't get messed up, or is it an OS problem?
This is what the title should say:
This is what the title actually looks like:
#jleedev already has what seems to be the right answer. Here is a bit of background information on it:
Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts
There are some situations where you may not be able to use the markup described in the previous section. In HTML these include the title element and any attribute value.
In these situations you can use invisible Unicode characters that produce the same results.
To replicate the effect of the markup described in the example above related to nested base directions, we can use pairs of characters to surround the embedded text. The first character is one of U+202B RIGHT-TO-LEFT EMBEDDING (RLE) or U+202A LEFT-TO-RIGHT EMBEDDING (LRE). This corresponds to the markup <span dir="rtl"> or <span dir="ltr">, respectively. The second character is U+202C POP DIRECTIONAL FORMATTING (PDF). This corresponds to the in the markup. Below you can see how to apply this to the previous example.
<p>The title says "..." in Hebrew<p>
Try using the Unicode direction embedding codes:
The result looks like הוספת קובץ JS עורך ישן - Mozilla Firefox, and it's marked up as follows:
(right-to-left embedding)
הוספת קובץ
JS
עורך ישן
(pop directional formatting)
- Mozilla Firefox
(You ought to be able to write the title with <title dir="rtl">, but I couldn't get that to work.)
if you use utf-8 you shouldn't have a problem.
add the following to your <head> and make sure you are saving as utf-8
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />