Is there an "invisible" hyphen character in Unicode / HTML? - html

I've found the soft hyphen character (U+00AD SHY) very useful but I am wondering if there is the same thing that will tell the browser where to break long words for wrapping without adding any character at all?
For example, let's say you have a narrow column in HTML with newspaper justification and there is a long URL explicitly in the text itself. You could add the soft/shy hyphen I mentioned but then when a user copy and pastes the URL it will contain those dash characters. An ideal situation would be the same visual results without a hyphen character so that the user may copy and paste the long word(s).
Thoughts or suggestions?
I tried searching for this but most of what I come up with is non-breaking space characters and essentially I am looking for the opposite.
UPDATE: I found the ZERO-WIDTH SPACE (U+200B) but it still has the problem that the character is preserved during copy&paste into the address bar so the results are even more confusing to the end user.

You want the HTML5 tag <wbr>, which is specified to do exactly what you are asking for.
If you can't rely on HTML5, U+200B ZERO WIDTH SPACE (​) should also work.
(The effects of copying text out of an HTML document, unfortunately, are underspecified. If <wbr> doesn't do what you want upon copy-and-paste, you might want to bring it up to the WhatWG — the easiest way to do that is probably to file a Github issue on the spec.)

Related

Is there a way to set word-breaks at suffixes and prefixes?

Is there a way to tell the html or css to word-break a word not by individual letters but rather by suffixes and prefixes?
By setting a negative word-spacing: -#px i can acomplish the look of a single word that breaks where I want. The downside is that unless it is a monospace font I have to manually calculate how much word-spacing: I should remove. Also I don't know if that method negatively affects screen readers since it would not read it as a single word but rather as two or more.
Assuming you control the HTML output, you can insert soft hyphens (­) in the correct places (after prefixes, before suffixes). Soft hyphens allow a word to break up in the specified place, and will show an otherwise invisible hyphen or dash when it does. There is no way to have a browser do this automatically; you need to specify where to place soft hyphens, either manually or by running the text through some scripting or programming language.
If you have a fixed width and font you could add a dash to the word to break it. There is no other built in solution.

How to insert enter within the text of html code?

I am writing up a resume, and the enter, space, and other characters do not show up. It's a poorly coded website. Is there a way to include enter key without accessing the source of the html code, or including html elements? If I insert < br > element, it shows up as & lt; br &gt ;
&#10
Chava G's suggestion might work, but it really shouldn't (it would indicate a serious XSS vulnerability).
You can try something similar though, with an no-break-space unicode character.
This is exactly the same thing that he is suggesting with except I am suggesting you use the actual character itself.
I think you can just copy it from the wiki page, right between the quotes where it says non-breaking space (" ").
https://en.wikipedia.org/wiki/Non-breaking_space
This route is a bit iffy as well though, sometimes unicode explodes.
You may be better of just accepting the limitations.
HTML ignores extra white-space. Use <br /> in place of Enter, and for the space character.
You can also use the <pre> tag. Any text inside this tag will show up as formatted, including spaces and the enter key.
However, since you are using a third-party website here, Wedstrom is right that the above shouldn't work. There is no way for you to change or add HTML code on another party's site, and there shouldn't be. Try to find a way to write up your resume using the functionality that is readily provided for you (or use your own text editor...).

Is there a HTML character that is blank (including no whitespace) on all browsers?

Is there a HTML character that, on all (major) browsers (plus IE8 sadly) displays nothing and doesn't add any extra space?
So, an alternative to but which doesn't add whitespace to the page, and which won't ever show up as an ugly "unrecognised character" marker or ?.
Why: in my case, I'm trying to work around a problem on an old, proprietary CMS that is removing empty but necessary HTML elements that are required because other parts of the system will fill them dynamically.
Imagine something like (simplified trivial example) <span class="placeholder" data-type="username"></span> which is populated with a user's username if a user is logged in - but this old-school CMS sees it as being empty and removes it.
There seem to be two options that mostly fit the bill. They seem to reliably not show anything when in a <span>, but they (particularly the second option) might have a minor effect on copy/paste and word breaking in some cases.
Zero-width space
​ aka ​ which behaves the same as the (now in HTML5) <wbr> - used to make words break at certain points without changing the display of the words.
<h1>This text is full<span>​</span> of spans with char<span>​</span>acte<span></span>rs that affe<span>​</span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span>​</span>douper​crazy<span>​</span>long<span></span>worden.</h1>
Seems to work fine on modern browsers and IE7+ (not tested on IE6).
Soft hyphen
­ - like a zero-width space but (in theory) adds a hyphen when it breaks a word across a line.
<h1>This text is full<span>­</span> of spans with char<span>­</span>acte<span></span>rs that affe<span>­</span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span>­</span>douper­crazy<span>­</span>long<span></span>worden.</h1>
<h1>Example where das super­douper­crazy­longword contains no spans.</h1>
Fine on modern browsers and IE7+ (not tested on IE6), though as some comments note there are issues with these turning into regular hyphens when copied and pasted, for example, here's how it pastes from Chrome to Notepad, on Windows 8.1:
Within a span, it seems to never add a hyphen (but still better to use zero-width spaces if possible).
Edit: I found an older SO answer discussing these as a solution to a different problem which suggests these are robust except for possible copy/paste quirks.
The only other issue with these I could find in research is that apparently some search engines may treat words containing these as being split (e.g. awe­some might match searches for awe and some instead of awesome).
There are two characters that are graphic characters but defined to be zero width: U+200B ZERO WIDTH SPACE and U+FEFF ZERO WIDTH NO-BREAK SPACE. The former acts like a space character, so that it is a separator between words and allows line breaking in formatting, whereas the latter explicitly forbids line breaks. It depends on the purpose and context which one you should use. The can be represented in HTML as ​ and .
There characters work well in most browsing situations. However, in IE 6, they tend to be rendered as small rectangles, since IE 6 does not know these characters and tries to render them as if they were graphic characters (which lack glyphs).
There are also control characters that are allowed in HTML, such as U+200E LEFT-TO-RIGHT MARK and U+200D ZERO WIDTH JOINER. They have no rendering as such, though they may affect rendering of graphic characters, e.g. by setting writing direction, affecting ligature behavior, etc. Due to the possibility of such effects, it might be risky to use them as “dummy” characters.

Optional line-breaking HTML entity that is always invisible

I want an optional line-breaking character that is always invisible that works with the word-wrap: break-word; CSS style.
Here are some specifics. My goal is to break apart long links in reasonable places. These characters are a good place to start: -, ., _, /, \. This is not a Rails-specific question, but I wanted to share some code I'm using now:
module ApplicationHelper
def with_optional_line_breaks(text)
text.gsub(%r{([-._/\\])}, '\1­')
end
end
Here's the problem with the code above: when ­ takes effect (in a table with: word-wrap: break-word;), ­ gets displayed as -. I don't want to see the -; I want a line break without any character shown.
​ is the HTML entity for a unicode character called the zero-width space (ZWSP).
"In HTML pages, this space can be used as a potential line-break in long words as an alternative to the <wbr> tag."- Zero-width space - Wikipedia
The <wbr> tag also works, as mentioned by Aaron's answer. I think I prefer the HTML entity over the tag because the entity seems simpler: unicode handles it, not the web browser.
<wbr> looks like it does what you want, but it looks like the support for it, and ­ for that matter, is very inconsistent. So unfortunately, there may not be a particularly good way to do what you want.
I'll post this as an answer, in 2019, although it draws its substance entirely from other contributions on this page: use <wbr>. It works well in allowing the wrap of long URLs and so not having them break out of content boxes. Users being able to paste the link you show into a web browser does matter and support for <wbr> is good in modern browsers, according to caniuse.com and my own quick tests in Chrome and Firefox for Android. I replaced forward slashes with forward slashes and a WBR, URLs now wrapping nicely.

How does copying and pasting non-breaking spaces compare in different systems/browsers?

Suppose you have an HTML document with non-breaking spaces ( ). In IE 6 - 8 running on Windows XP, when you select the non-breaking spaces and copy/paste them, they will be copied/pasted as "normal" spaces (U+0020).
Does anyone know of any systems, browsers, etc., or combinations of, that will not exhibit this behavior. That is, the non-breaking spaces will copy and/or paste as a non-breaking space (U+00A0)?
EDIT: To provide a little more context: the application I'm working on has been localized. I suspect that most North/South American and European systems will behave similarly. I'm somewhat concerned about Asian languages and systems.
While I'm not aware of differences between browsers in terms of how they handle copied / pasted text, I would suggest that it is actually the operating system's clipboard that would be responsible for interpreting the character encoding of an HTML page's text (only guessing here, though).
Either way - I would suggest that your best option to ensure that your copied text is interpreted correctly would be to include the lang attribute in your page elements (ref: W3C Recommendations). This would explicitly set a locale for a given element if that wasn't immediately clear by your page's content type declaration in the <head> meta data.
Outside of making sure that your HTML is semantically correct, I can't see how else you would be able to accommodate or predict regional differences.