Moving a question from the old MSDN forum here as it appears to be still open.
https://social.msdn.microsoft.com/Forums/en-US/ce8c4eae-ad14-4835-8537-fc3870538bbe/translator-api-notranslate-trimming-leading-space?forum=microsofttranslator
Is this a known bug, or intentional for any reason?
Are there any workarounds for this issue so that the API does not strip white spaces?
More examples:
Translator trims leading and trailing space, and compresses any other white space to a single space. This is by design. Translator needs to move the words around freely to form the newly composed sentence, and wouldn't know what to do with the extra white space.
A workaround would be to trim in your code before translation, and then restore the trimmed off pieces afterwards, depending on the context.
Related
I've spent all my ideas why it happens and how to avoid it.
I compose HTML email and sometimes gmail breaks a layout.
It may insert random symbols (basically space character) and that leads to unpredictable visual artefacts like on a screenshot:
An area outlined with red contains html entity ▼ with artificially space character inserted by Gmail. Other arrow-down symbols stay unchanged.
I've found the soft hyphen character (U+00AD SHY) very useful but I am wondering if there is the same thing that will tell the browser where to break long words for wrapping without adding any character at all?
For example, let's say you have a narrow column in HTML with newspaper justification and there is a long URL explicitly in the text itself. You could add the soft/shy hyphen I mentioned but then when a user copy and pastes the URL it will contain those dash characters. An ideal situation would be the same visual results without a hyphen character so that the user may copy and paste the long word(s).
Thoughts or suggestions?
I tried searching for this but most of what I come up with is non-breaking space characters and essentially I am looking for the opposite.
UPDATE: I found the ZERO-WIDTH SPACE (U+200B) but it still has the problem that the character is preserved during copy&paste into the address bar so the results are even more confusing to the end user.
You want the HTML5 tag <wbr>, which is specified to do exactly what you are asking for.
If you can't rely on HTML5, U+200B ZERO WIDTH SPACE () should also work.
(The effects of copying text out of an HTML document, unfortunately, are underspecified. If <wbr> doesn't do what you want upon copy-and-paste, you might want to bring it up to the WhatWG — the easiest way to do that is probably to file a Github issue on the spec.)
Is there a HTML character that, on all (major) browsers (plus IE8 sadly) displays nothing and doesn't add any extra space?
So, an alternative to but which doesn't add whitespace to the page, and which won't ever show up as an ugly "unrecognised character" marker or ?.
Why: in my case, I'm trying to work around a problem on an old, proprietary CMS that is removing empty but necessary HTML elements that are required because other parts of the system will fill them dynamically.
Imagine something like (simplified trivial example) <span class="placeholder" data-type="username"></span> which is populated with a user's username if a user is logged in - but this old-school CMS sees it as being empty and removes it.
There seem to be two options that mostly fit the bill. They seem to reliably not show anything when in a <span>, but they (particularly the second option) might have a minor effect on copy/paste and word breaking in some cases.
Zero-width space
aka which behaves the same as the (now in HTML5) <wbr> - used to make words break at certain points without changing the display of the words.
<h1>This text is full<span></span> of spans with char<span></span>acte<span></span>rs that affe<span></span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span></span>doupercrazy<span></span>long<span></span>worden.</h1>
Seems to work fine on modern browsers and IE7+ (not tested on IE6).
Soft hyphen
- like a zero-width space but (in theory) adds a hyphen when it breaks a word across a line.
<h1>This text is full<span></span> of spans with char<span></span>acte<span></span>rs that affe<span></span>ct word brea<span></span>king but don't show up</h1>
<h1>Especially in das super<span></span>doupercrazy<span></span>long<span></span>worden.</h1>
<h1>Example where das superdoupercrazylongword contains no spans.</h1>
Fine on modern browsers and IE7+ (not tested on IE6), though as some comments note there are issues with these turning into regular hyphens when copied and pasted, for example, here's how it pastes from Chrome to Notepad, on Windows 8.1:
Within a span, it seems to never add a hyphen (but still better to use zero-width spaces if possible).
Edit: I found an older SO answer discussing these as a solution to a different problem which suggests these are robust except for possible copy/paste quirks.
The only other issue with these I could find in research is that apparently some search engines may treat words containing these as being split (e.g. awesome might match searches for awe and some instead of awesome).
There are two characters that are graphic characters but defined to be zero width: U+200B ZERO WIDTH SPACE and U+FEFF ZERO WIDTH NO-BREAK SPACE. The former acts like a space character, so that it is a separator between words and allows line breaking in formatting, whereas the latter explicitly forbids line breaks. It depends on the purpose and context which one you should use. The can be represented in HTML as and .
There characters work well in most browsing situations. However, in IE 6, they tend to be rendered as small rectangles, since IE 6 does not know these characters and tries to render them as if they were graphic characters (which lack glyphs).
There are also control characters that are allowed in HTML, such as U+200E LEFT-TO-RIGHT MARK and U+200D ZERO WIDTH JOINER. They have no rendering as such, though they may affect rendering of graphic characters, e.g. by setting writing direction, affecting ligature behavior, etc. Due to the possibility of such effects, it might be risky to use them as “dummy” characters.
I have seen   in html and can't quite tell what it does other than create some whitespace. I am wondering what exactly it does and when it should be used?
(it should have a semi-colon on the end) is an entity for a non-breaking space.
Use it between two words that should not have a line break inserted between them by word wrapping.
There is a good explanation about when this is appropriate grammar on the English StackExchange.
It is sometimes abused to create horizontal space between content in web pages (since it will not collapse like multiple regular spaces). Padding and margins should usually be used instead of this hack.
One reason for is to insert multiple spaces in a document.
In HTML, multiple whitespace characters are collapsed into one space. This includes tabs and newlines.
IF you wanted to display the following:
three spaces.
You could insert 3 entities instead of using spaces like so:
three spaces.
Edit: It's worth mentioning that is more of a historical artifact than anything else. Just about every use for it that is mentioned in the answers to this question has a better alternative means to accomplish that goal. However, is still with us, and these are some of the things people have used it for.
See also: http://www.sightspecific.com/~mosh/www_faq/nbsp.html
I don't know if this answers your question or not and certainly this answer is not of the caliber already provided by others, but the beauty of a discussion thread or Q&A site is the diversity of experience that might be found in it. So, on that note, I'll share with you what I've used nbsp; for. (To be perfectly honest, 24 hours ago, nbsp; was something I had never even heard of.)
Here's how I used nbsp;. I was posting something using markdown language and I had a very simple two-item bulleted list. For the life of me I could not get the spacing before this list and after to look symmetrical. So, I did a web search and somehow ended up taking a look at this thread.
Before using nbsp; the paragraph that followed bullet point #2 collapsed the spacing between the bulleted point and the text, making it look as if the paragraph had something to do with bullet #2, specifically (which was not the case). I tried a lot of different things that I can't even remember now, but the one thing that ultimately worked was insertion of nbsp;.
Since then, I've been seeing all sorts of posts that indicate some controversy over its use, but for non-coders who need to wrangle out of an unsightly/misleading formatting issue, nbsp; is a very quick and useful fix.
I'm writing note software in PHP (to store notes) and most often I include code within, when I fetch the note from the database it collapses all whitespace I assume, so any code blocks look ugly. (I nl2br() it, I mean horizontal space)
What would be the most efficient way to deal with this? I think the database entry keeps the spaces, so would replacing all spaces with be the only solution PHP-display-side? (ugly for long long entries), what are your thoughts on how I can accomplish this taking in mind the code may be 1-16M characters long?
It shouldn't be collapsing all whitespace. Try outputting it inside <pre> tags to see that white space.
What code are you storing the Database? HTML? PHP?! This will determine the best solution to your problem.
Different column types will or won't preserve characters like new lines, carriage returns or tabs. I use Text, using a UTF-8 collation.
At a very basic level look at nl2br() - http://php.net/manual/en/function.nl2br.php