Non-breaking en dash, not a hyphen - html

I know in html there is a non-breaking hyphen (#8209), but I am needing a non-breaking en dash (#8211). They are used differently. Is there proper way of doing this besides wrapping the text and en-dash with no-break code throughout my document, like:
<nobr> TEXT #8211 TEXT </nobr>

Try surrounding the en dash with the Word-Joiner character (#8288).
TEXT⁠–⁠TEXT

This question was driving me crazy in a recent project. I found it hard to believe that en dashes are by default breaking characters. If you're okay inserting extra characters with html, then ⁠ works well. Note, only a word-joiner is necessary after the en dash, since I believe they don't break before.
For a number of reasons, manually putting extra characters everywhere in my source HTML was not going to work for me, and I came up with a javascript solution which automatically applies a .nobreak CSS class to any en dash and the character immediately following it.
<style>
.nobreak {
white-space: nowrap;
}
</style>
<body>
<p>This is a paragraph with a number range, 1–34, and it will never break after the en dash.</p>
<script>
$("body").html(function(_, html) {
return html.replace(/(–.)/g, '<span class="nobreak">$1</span>')
});
</script>
</body>
An even simpler solution might be to use the same script to automatically insert ⁠ after en dashes. Just change the js to:
$("body").html(function(_, html) {
return html.replace(/–/g, '–⁠')
});

As per of this documentation, and other answer about the word joiner –, the following HTML should produce a non breakable en dash:
If the character does not have an HTML entity, you can use the decimal
(dec) or hexadecimal (hex) reference.
<p>
Will break :
<br />
TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXT–TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXT
<p>
<p>
Won't break (word joiner):
<br />
TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXT⁠–⁠TEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXTTEXT
<p>

Related

difference between " " and nbsp; or " "

Hello I am trying to compile an EPUB v2.0 with html code extracted from Indesign. I have noticed there are a lot of "special characters" either at the beginning of a paragraph or at the end. For example
<p class="text_indent0px font_size0_8em line_height1_325 margin_bottom1px margin_left0px margin_right0px sans_serif floatleft">E<span class="small_caps">VELYNE</span> </p>
What is this
and can I either get rid of it or replace it with a "nbsp;"?
&#9
Is the ascii code for tabs. So I guess the paragraphs were indented with tabs.
If you want to replace them with then use 4 of them
That would be a horizontal tab (i.e. the same as using the tab key).
If you want to replace it, I would suggest doing a find/replace using an ePub editor like Sigil (http://sigil-ebook.com/).
represents the horizontal tab
Similarly represent space.
To replace you have to use
In the HTML encoding &#{number}, {number} is the ascii code. Therefore, is a tab which typically condenses down to one space in HTML, unless you use CSS (or the <pre> tag) to treat it as pre formatted text.
Therefore, it's not safe to replace it with a non-breaking or a regular space unless you can guarantee that it's not being displayed as a tab anywhere.
div:first-child {
white-space: pre;
}
<div> Test</div>
<div> Test</div>
<pre> Test</pre>
See https://developer.mozilla.org/en-US/docs/Web/CSS/white-space and http://ascii.cl/
is the entity used to represent a non-breaking space
decimal char code of space what we enter using keyboard spacebar
decimal char code of horizontal tab
and both represent space but is non-breaking means multiple sequential occurrence will not be collapsed into one where as for the same case, ` will collapse to one space
= approx. 4 spaces and approx. 8 spaces
There are four types of character reference scheme used.
Using decimal character codes (regex-pattern: &#[0-9]+;),
Using hexadecimal character codes (regex-pattern: &#x[a-f0-9]+;),
Using named character codes (regex-pattern: &[a-z]+;),
Using the actual characters (regex-pattern: .).
Al these conversions are rendered same way. But, the coding style is different. For example, if you need to display a latin small letter E with diaeresis then you could use any of the below convention:
ë (decimal notation),
ë (hexadecimal notation),
ë (html notation),
ë (actual character),
Likewise, as you said, what should be used (a) (decimal notation) or (b) (html notation) or (c) (decimal notation).
So, from the above analogy, it can be said that the (a), (b) and (c) are three different kind of notation of three different characters.
And, this is for your information that, (a) is a Horizontal Tab, the (b) one is the non-breaking space which is actually   in decimal notation and the (c) is the decimal notation for normal space character.
Now, technically space at the end of the paragraph, is nothing but meaningless. Better, you could discard those all. And if you still need to use space inside <pre> elements, not in <p> or <div>.
Hope this helps...

In HTML text don't break at the existing hyphen in the text [duplicate]

We have the CKEditor in our CMS. Our end users will input some long articles via that CKEditor. We need a way to prevent line break at hyphens on those articles.
Is there a way to prevent line break at hyphens in all browsers?
Or does CKEditor have an option to prevent that?
You can use ‑ which is a Unicode NON-BREAKING HYPHEN (U+2011).
HTML: ‑ or ‑
Also see: http://en.wikipedia.org/wiki/Hyphen#In_computing
One solution could be to use an extra span tag and the white-space CSS property. Just define a class like this:
.nowrap {
white-space: nowrap;
}
And then add a span with that class around your hyphenated text.
<p>This is the <span class="nowrap">anti-inflammable</span> model</p>
This approach should work just fine in all browsers - the buggy implementations listed here are for other values of the white-space property: http://reference.sitepoint.com/css/white-space#compatibilitysection
I’m afraid there’s no simpler way to do it reliably than splitting the text to “words” (sequences of non-whitespace characters separated by whitespace) and wrapping each “word” that contains a hyphen inside nobr markup. So input data like bla bla foo-bar bla bla would be turned to bla bla <nobr>foo-bar</nobr> bla bla.
You might even consider inserting nobr markup whenever the “word” contains anything but letters and digits. The reason is that some browsers may even break strings like “2/3” or “f(0)” (see my page on oddities of line breaking in browsers).
You are unable to do it without editing every HTML instance. Consequently, I wrote some JavaScript code to replace them:
jQuery:
// Replace hyphens with non-breaking ones
$txt = $("#block-views-video-block h2");
$txt.text( $txt.text().replace(/-/g, '‑') );
Vanilla JavaScript:
function nonBrHypens(id) {
var str = document.getElementById(id).innerHTML;
var txt = str.replace(/-/g, '‑');
document.getElementById(id).innerHTML = txt;
}
Use the word joiner character (⁠) around the hyphen. It works in Internet Explorer as well.
Fix specific hyphens...
function fixicecream(text) {
return text.replace(/ice-cream/g, 'ice⁠-⁠cream'));
}
Or everything...
function fixhyphens(text) {
return text.replace(/(\S+)-(\S+)/g, '$1⁠-⁠$2'));
}
Try this CSS:
word-break: break-all;
-webkit-hyphens:none;
-moz-hyphens: none;
hyphens: none;

How to prevent line break at hyphens in all browsers

We have the CKEditor in our CMS. Our end users will input some long articles via that CKEditor. We need a way to prevent line break at hyphens on those articles.
Is there a way to prevent line break at hyphens in all browsers?
Or does CKEditor have an option to prevent that?
You can use ‑ which is a Unicode NON-BREAKING HYPHEN (U+2011).
HTML: ‑ or ‑
Also see: http://en.wikipedia.org/wiki/Hyphen#In_computing
One solution could be to use an extra span tag and the white-space CSS property. Just define a class like this:
.nowrap {
white-space: nowrap;
}
And then add a span with that class around your hyphenated text.
<p>This is the <span class="nowrap">anti-inflammable</span> model</p>
This approach should work just fine in all browsers - the buggy implementations listed here are for other values of the white-space property: http://reference.sitepoint.com/css/white-space#compatibilitysection
I’m afraid there’s no simpler way to do it reliably than splitting the text to “words” (sequences of non-whitespace characters separated by whitespace) and wrapping each “word” that contains a hyphen inside nobr markup. So input data like bla bla foo-bar bla bla would be turned to bla bla <nobr>foo-bar</nobr> bla bla.
You might even consider inserting nobr markup whenever the “word” contains anything but letters and digits. The reason is that some browsers may even break strings like “2/3” or “f(0)” (see my page on oddities of line breaking in browsers).
You are unable to do it without editing every HTML instance. Consequently, I wrote some JavaScript code to replace them:
jQuery:
// Replace hyphens with non-breaking ones
$txt = $("#block-views-video-block h2");
$txt.text( $txt.text().replace(/-/g, '‑') );
Vanilla JavaScript:
function nonBrHypens(id) {
var str = document.getElementById(id).innerHTML;
var txt = str.replace(/-/g, '‑');
document.getElementById(id).innerHTML = txt;
}
Use the word joiner character (⁠) around the hyphen. It works in Internet Explorer as well.
Fix specific hyphens...
function fixicecream(text) {
return text.replace(/ice-cream/g, 'ice⁠-⁠cream'));
}
Or everything...
function fixhyphens(text) {
return text.replace(/(\S+)-(\S+)/g, '$1⁠-⁠$2'));
}
Try this CSS:
word-break: break-all;
-webkit-hyphens:none;
-moz-hyphens: none;
hyphens: none;

Line Break in XML? [duplicate]

This question already has an answer here:
How to add a newline (line break) in XML file?
(1 answer)
Closed 4 years ago.
I'm a beginner in web development, and I'm trying to insert line breaks in my XML file.
This is what my XML looks like:
<musicpage>
<song>
<title>Song Title</title>
<lyric>Lyrics</lyric>
</song>
<song>
<title>Song Title</title>
<lyric>Lyrics</lyric>
</song>
<song>
<title>Song Title</title>
<lyric>Lyrics</lyric>
</song>
<song>
<title>Song Title</title>
<lyric>Lyrics</lyric>
</song>
</musicpage>
I want to have line breaks in between the sentences for the lyrics. I tried everything from /n, 
 and other codes similar to it, PHP parsing, etc., and nothing works! Have been googling online for hours and can't seem to find the answer. I'm using the XML to insert data to an HTML page using Javascript.
Does anyone know how to solve this problem?
And this is the JS code I used to insert the XML data to the HTML page:
<script type="text/javascript">
if (window.XMLHttpRequest) {
xhttp=new XMLHttpRequest();
} else {
xhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xhttp.open("GET","xml/musicpage_lyrics.xml",false);
xhttp.send("");
xmlDoc=xhttp.responseXML;
var x=xmlDoc.getElementsByTagName("songs");
for (i=0;i<x.length;i++) {
document.write("<p class='msg_head'>");
document.write(x[i].getElementsByTagName("title")[0].childNodes[0].nodeValue);
document.write("</p><p class='msg_body'>");
document.write(x[i].getElementsByTagName("lyric")[0].childNodes[0].nodeValue);
document.write("</p>");
}
</script>
#icktoofay was close with the CData
<myxml>
<record>
<![CDATA[
Line 1 <br />
Line 2 <br />
Line 3 <br />
]]>
</record>
</myxml>
In XML a line break is a normal character. You can do this:
<xml>
<text>some text
with
three lines</text>
</xml>
and the contents of <text> will be
some text
with
three lines
If this does not work for you, you are doing something wrong. Special "workarounds" like encoding the line break are unnecessary. Stuff like \n won't work, on the other hand, because XML has no escape sequences*.
* Note that
is the character entity that represents a line break in serialized XML. "XML has no escape sequences" means the situation when you interact with a DOM document, setting node values through the DOM API.
This is where neither
nor things like \n will work, but an actual newline character will. How this character ends up in the serialized document (i.e. "file") is up to the API and should not concern you.
Since you seem to wonder where your line breaks go in HTML: Take a look into your source code, there they are. HTML ignores line breaks in source code. Use <br> tags to force line breaks on screen.
Here is a JavaScript function that inserts <br> into a multi-line string:
function nl2br(s) { return s.split(/\r?\n/).join("<br>"); }
Alternatively you can force line breaks at new line characters with CSS:
div.lines {
white-space: pre-line;
}
just use <br> at the end of your lines.
At the end of your lines, simply add the following special character:
That special character defines the carriage-return character.
In the XML: use literal line-breaks, nothing else needed there.
The newlines will be preserved for Javascript to read them [1]. Note that any indentation-spaces and preceding or trailing line-breaks are preserved too (the reason you weren't seeing them is that HTML/CSS collapses whitespace into single space-characters by default).
Then the easiest way is: In the HTML: do nothing, just use CSS to preserve the line-breaks
.msg_body {
white-space: pre-line;
}
But this also preserves your extra lines from the XML document, and doesn't work in IE 6 or 7 [2].
So clean up the whitespace yourself; this is one way to do it (linebreaks for clarity - Javascript is happy with or without them [3]) [4]
[get lyric...].nodeValue
.replace(/^[\r\n\t ]+|[\r\n\t ]+$/g, '')
.replace(/[ \t]+/g, ' ')
.replace(/ ?([\r\n]) ?/g, '$1')
and then preserve those line-breaks with
.msg_body {
white-space: pre; // for IE 6 and 7
white-space: pre-wrap; // or pre-line
}
or, instead of that CSS, add a .replace(/\r?\n/g, '<br />') after the other JavaScript .replaces.
(Side note: Using document.write() like that is also not ideal and sometimes vulnerable to cross-site scripting attacks, but that's another subject. In relation to this answer, if you wanted to use the variation that replaces with <br>, you'd have to escape <,&(,>,",') before generating the <br>s.)
--
[1] reference: sections "Element White Space Handling" and "XML Schema White Space Control" http://www.usingxml.com/Basics/XmlSpace#ElementWhiteSpaceHandling
[2] http://www.quirksmode.org/css/whitespace.html
[3] except for a few places in Javascript's syntax where its semicolon insertion is particularly annoying.
[4] I wrote it and tested these regexps in Linux Node.js (which uses the same Javascript engine as Chrome, "V8"). There's a small risk some browser executes regexps differently. (My test string (in javascript syntax) "\n\nfoo bar baz\n\n\tmore lyrics \nare good\n\n")
<song>
<title>Song Tigle</title>
<lyrics>
<line>The is the very first line</line>
<line>Number two and I'm still feeling fine</line>
<line>Number three and a pattern begins</line>
<line>Add lines like this and everyone wins!</line>
</lyrics>
</song>
(Sung to the tune of Home on the Range)
If it was mine I'd wrap the choruses and verses in XML elements as well.
If you use CDATA, you could embed the line breaks directly into the XML I think. Example:
<song>
<title>Song Title</title>
<lyric><![CDATA[Line 1
Line 2
Line 3]]></lyric>
</song>
<description><![CDATA[first line<br/>second line<br/>]]></description>
If you are using CSS to style (Not recommended.) you can use display:block;, however, this will only give you line breaks before and after the styled element.

Render tab characters in HTML [duplicate]

This question already has answers here:
Encoding a tab in html [duplicate]
(4 answers)
Closed 8 years ago.
I have to render some text to a web page. The text is coming from sources outside my control and it is formatted using newlines and tab characters.
New lines (\n) can be replaced by br tags, but what about preserving tabs? A brief search reveals there is no way to directly render tab characters in HTML.
Why not just wrap the content in a <pre> tag? This will handle the \n as well as the \t characters.
An alternative to the non-breaking space would be the em space (  or  ). It is usually rendered as a longer space, if that is an advantage.
A Quick & Dirty Way
For a quick fix, you can use the xmp tag to stop the browser from collapsing whitespace. The xmp tag contains text that should be rendered uninterpreted (and in a monospaced font).
The problem is that xmp tags have been deprecated since HTML3.2, and have been dropped from the HTML5 spec altogether. In practice, browsers still support xmp tags, so they can still be useful, but not in production.
The Proper Way
Tabs are for tabulating data. The proper way to tabulate data in HTML is to use the table tag. Every line in your original string translates to a row in the table, while each tab in the original string starts a new (left-aligned) cell in the table.
Imagine you had this (tab-aligned) string to begin with:
Spam 1.99
Cheese 2.99
Translated to HTML, that string would look like this:
<table>
<tr> <td> Spam </td> <td> 1.99 </td> </tr>
<tr> <td> Cheese </td> <td> 2.99 </td> </tr>
</table>
Note: If you wrapped the tab-aligned string in xmp tags and styled the HTML table to look like plain text, the rendered results would be the same.
replace \t with .
Each space you want will be a
As pointed out this isn't completely correct as it only pretends to be a tab as HTML doesn't actually output format a tab as you would expect.
If you're already replacing line breaks, why not do the same for tabs...?
str_replace("\t", ' ', $text);
 ,  ,   or   can be used.
W3 says little about this...
The character entities   and   denote an en space and an em space respectively, where an en space is half the point size and an em space is equal to the point size of the current font.
Read More at W3.org fro HTML3
Read More at W3.org for HTML4
Even more at Wikipedia (about spaces)