Sentence Spacing [closed] - html

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What is the best way to present the additional spacing that should come between sentences (using [X]HTML+CSS)?
<p>Lorem ipsum. Dolor sit amet.</p>
^^ wider than word spacing
Since HTML and XML both require whitespace folding, the above two spaces must behave as a single space.
What options are there?  There are a few obvious ones below, what others exist?  (Anything in CSS3?)  What drawbacks, if any, exist for the these, including across different browsers?  (How do the non-breaking spaces below interact with line wrapping?)
..ipsum. Dolor..
..ipsum. Dolor..
..ipsum. Dolor..
There's a lot of FUD on the net which claims this was invented for typewriters, but you can see it in documents such as the U.S. Declaration of Independence.  (And yes, I realize you shouldn't follow all the conventions from over two hundred years ago, the DoI is merely a handy example showing this predates typewriters and monospaced fonts.)  Or a typographer claiming that the additional space is distracting—after changing the background color so the example cannot be anything else!
To put it bluntly, while I appreciate opinions and discussion on whether additional spacing should be used or not (which isn't programming related), that is not what I'm asking. Assume this a requirement, what is the best way to implement it?

You can use white-space: pre-wrap to preserve sequences of spaces, while still wrapping text:
<p style="white-space: pre-wrap;">Lorem ipsum. Dolor sit amet.</p>
This is not supported in IE until IE 8 in IE 8 mode, nor in Firefox until 3.0.
You could also use   or   for spaces one em or one en wide. I do not know how widespread support of these is, but they seem to work on the latest WebKit and Firefox on Mac OS X.
A sequence of two characters will prevent line breaks in that space; that's what means, non-breaking space. The sequence A sentence. Another. causes the to appear on the second line, indenting text slightly, which is probably undesireable. The sequence A sentence. Another. works fine, with line breaking and not adding any extra indentation, though if you use it in justified text, with the at the end of the line, it will prevent that line from being properly justified. is intended for the case of writing someone's name, like Mr. Torvalds, or an abbreviation ending with a ., in which typographical convention says that you shouldn't split it across lines in order to avoid people being confused and thinking the sentence has ended.
So, using sequences of is undesirable. Since this is a stylistic effect, I'd recommend using white-space: pre-wrap, and accepting that the style will be a bit less than ideal on platforms that don't support it.
edit: As pointed out in the comments, white-space: pre-wrap does not work with text-align: justify. However, I've tested out a sampler of different entities using BrowserShots (obnoxious ads, and somewhat flaky and slow, but it's a pretty useful service for the price, which is free). It looks like a pretty wide variety of browsers, on a pretty wide variety of platforms, support   and  , a few that don't still use spaces so the rendering isn't too bad, and only IE 6 on Windows 2000 actually renders them broken, as boxes. BrowserShots doesn't let me choose the exact browser/OS combos I want, so I can't choose IE 6 on XP to see if that's any different. So, that's a plausible answer as long as you can live with IE 6 on Win2K (and maybe XP) broken.
Another possible solution would be to find (or create) a font that has a kerning pair for the ". " combination of characters, to kern them more widely apart. With #font-face support in all of the major browsers at this point, including IE back to IE 5.5 (though IE uses a different format than the other browsers), using your own font is actually becoming reasonable, and falling back to the users default font if not supported would not break anything.
A final possibility might be to talk the CSS committee into adding a style feature that would allow you to specify that you want wider spacing at the end of sentences (which would be determined by a period followed by a space; acronyms and abbreviations would need an in order to avoid getting the wider space). The CSS committee is currently discussing adding more advanced typography support, so now might be a good time to start discussing such a feature.

For all you 'antiquated' and 'mono-space-only' naysayers - Read a book. Professional publishers have used a single   between sentences for time immemorial, and THAT is where the monospace two-space standard came from. Learn from history instead of spouting rhetoric with no basis in fact. I have to admit, though, that an   looks better in most browsers:   is just too wide. What do you think of the readability of this paragraph? Stackoverflow's editor allows some HTML, and I'm using   between all sentences.

Wrap each sentence in a span, and style the span perhaps. (Not a great solution).

isn't the correct character to use, semantically speaking. It's a non-breaking space: a space which won't be used as a line break. Perhaps use a space an a   or a single  , or (my personal recommendation) don't bother with the antiquated double-space style on your page.

Just wanted to throw out there that if your goal is to override the default browser whitespace implementation to provide "proper" sentence spacing, there is actually some debate as to what constitutes proper spacing. It seems that the double-space "standard" is most likely just a carryover from when typewriters used monospace fonts. Money quote:
The Bottomline: Professional
typesetters, designers, and desktop
publishers should use one space only.
Save the double spaces for
typewriting, email, term papers (if
prescribed by the style guide you are
using), or personal correspondence.
For everyone else, do whatever makes
you feel good.
Unless you have this as a strict requirement, it does not seem worth the effort to try and "fix." (I realize this is not an answer to your stated question per se, but wanted to make sure that you are aware of this info as it might influence your decision to spend a lot of time on it.)

is the worst possible method, as it disrupts justification. Pre-wrap as suggested gives coarse control but can't be justified. There are other space entities like &thinspace; and &nspace;, as well as a bunch of Unicode space characters that should give somewhat better control and should not break justification. These entities are the best non-CSS solution in my opinion.
For better control you need a CSS solution. You can either span the sentences, the obvious choice, or you can span the space between sentences. The latter to me seems more incorrect, but it is easier to achieve, especially if you have the common two-space typing habit - you can simply search and replace all period-space-space with a span around a space. I have some javascript that does this on the fly for blogger.
Don't use the box model (padding-right) as it will break the right margin of fully justified text (and even if not fully justified, causes lines to wrap "early"). If you are spanning the space between sentences you can just alter the word-spacing on these elements. If you are wrapping sentences, you can set your paragraph or other container to have bigger word-spacing, and the set the sentences back to normal, or you can do it in one step with the after selector:
.your_sentence_class:after { content:" "; word-spacing:0.5em; }

Related

What weird non-font is this text?

This sounds like the dumbest question, but what font is used on this webpage?
http://aquey.info/loaded-broccoli-potato-soup/
If it copy-pastes the same, then it's like this text here:
𝘮𝘦𝘥𝘪𝘶𝘮 𝘴𝘪𝘻𝘦𝘥 𝘤𝘢𝘳𝘳𝘰𝘵𝘴
I checked using DevTools, of course, but I don't think it's really a ... font? If I copy-paste the text into Gmail and choose "remove formatting", the text still looks like same, like Gmail doesn't see it as text. Gmail also doesn't spellcheck within the text. Notepad++ also doesn't un-format the text and View>Summary counts each letter as a word.
I'm seeing if it's possible to read this text in javascript (that's the programming bit), but right now I just want to understand what it is.
They are Unicode glyphs, specifically from the Unicode block Mathematical Alphanumeric Symbols.
As the name implies, they are intended for use within mathematics contexts but are commonly abused in places like social media where other formatting controls are not available to end users.
This may go without saying (and, I fully admit, outside the scope of the question), but it's worth mentioning to future readers that this is extremely counterintuitive to use such glyphs in any context other than their intended use as they pose a huge accessibility problem. It’s especially arbitrary in this particular context when styling the text in question with CSS would net an extremely similar visual effect while preserving usability for screen readers.

Using `<i>` vs parentheses [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
When writing for the Web, should authors use the <i> element with a class, or should they use regular parentheses to indicate parenthetical phrases?
Example:
<p>I was walking to my car <i class="paren">on the other side of town</i>, when...</p>
<p>I was walking to my car (on the other side of town), when...</p>
Accompanied by:
.paren {
font-style: normal;
&::before {content: '(';}
&::after {content: ')';}
}
Just wondering which is the more semantically appropriate method. Of course the easier choice would be to simply type shift + 9 instead of having to mark it up with HTML, but according to The Draft, the <i> element should be used to indicate a change in voice, which is exactly what parenthetical phrases are: typically preceded by a short pause and a lowering of the volume and pitch of one's voice.
The same question applies to phrases surrounded by em dashes—which serve a similar purpose as phrases in parentheses—although em dashes can be much more flexible.
Parentheses already have meaning in English and you could argue that they are "markup" for a phrase. Parentheses indicate the text's relationship to the enclosing sentence/paragraph and provide a presentation cue (how to read/speak the text). In other words, adding HTML markup doesn't give you much semantic value that isn't already present by virtue of the plain text.
The extra markup serves a purpose if you want to differentiate the phrase from the rest of the paragraph or you need to add attributes that apply only to specific text. Putting tags around text allows you to work with that text in a clear fashion; you can style it, access it with script, parse it with a screen reader, etc. I would probably suggest a span instead of i; even though i has a broad potential usage, parenthesized text seems contrary to the common expectation.
Even though you can set it apart, parenthesized text probably does not always need to be set apart.1
Lastly, using CSS to provide the parentheses seems like excess work for the developer (and anyone reading it afterwards). It adds a level of indirection to a meaning that was already present.
1: There are cases where punctuation does not provide enough structure and should be augmented with markup. A simple example is a telephone number, as described here in 3.3.2.
http://www.w3schools.com/tags/tag_i.asp
I think this is pretty much correct. For accessibility reasons (such as software that reads out text), along with others, if parenthesis indicate a change in tone in text then i tag is the element that is appropriate. As a side note, as the original poster would seem to know : it no longer means "italics"in html5.
Just to beat this to death: most text readers that can handle the i tag probably don't make a sound that sounds like de-emphasis or whatever you might be trying to indicate with parenthesis, so if it's really important to get it just right, i'd suggest creating a class called de-emphsis that does not create parethesis and using parenthesis as well. Sounds like overkill but here's the reasoning.
De-emphasis can be used without parenthesis in other places.
The i tag probably does not sound like de-empahsis to text readers.
Visually impaired users might want to know actual parenthesis are there.
So, the crazy suggestion I'll make is this:
<p>I was walking to my car (<i class="de-emphasis">on the other side of town</i>), when...</p>
I do not have enough reputation to comment or I would have commented, so my final recommendation is an opinion, not a fact or anything like that. Just try to keep accessibility in mind when you make decisions about this kind of stuff.

When to use &nbsp

I have seen &nbsp in html and can't quite tell what it does other than create some whitespace. I am wondering what exactly it does and when it should be used?
(it should have a semi-colon on the end) is an entity for a non-breaking space.
Use it between two words that should not have a line break inserted between them by word wrapping.
There is a good explanation about when this is appropriate grammar on the English StackExchange.
It is sometimes abused to create horizontal space between content in web pages (since it will not collapse like multiple regular spaces). Padding and margins should usually be used instead of this hack.
One reason for is to insert multiple spaces in a document.
In HTML, multiple whitespace characters are collapsed into one space. This includes tabs and newlines.
IF you wanted to display the following:
three spaces.
You could insert 3 entities instead of using spaces like so:
three spaces.
Edit: It's worth mentioning that is more of a historical artifact than anything else. Just about every use for it that is mentioned in the answers to this question has a better alternative means to accomplish that goal. However, is still with us, and these are some of the things people have used it for.
See also: http://www.sightspecific.com/~mosh/www_faq/nbsp.html
I don't know if this answers your question or not and certainly this answer is not of the caliber already provided by others, but the beauty of a discussion thread or Q&A site is the diversity of experience that might be found in it. So, on that note, I'll share with you what I've used nbsp; for. (To be perfectly honest, 24 hours ago, nbsp; was something I had never even heard of.)
Here's how I used nbsp;. I was posting something using markdown language and I had a very simple two-item bulleted list. For the life of me I could not get the spacing before this list and after to look symmetrical. So, I did a web search and somehow ended up taking a look at this thread.
Before using nbsp; the paragraph that followed bullet point #2 collapsed the spacing between the bulleted point and the text, making it look as if the paragraph had something to do with bullet #2, specifically (which was not the case). I tried a lot of different things that I can't even remember now, but the one thing that ultimately worked was insertion of nbsp;.
Since then, I've been seeing all sorts of posts that indicate some controversy over its use, but for non-coders who need to wrangle out of an unsightly/misleading formatting issue, nbsp; is a very quick and useful fix.

Why shouldn't I use weird Characters in code/HTML documents?

I'm wondering if it's a bad idea to use weird characters in my code. I recently tried using them to create little dots to indicate which slide you're on and to change slides easily:
There are tons of these types of characters, and it seems like they could be used in place of icons/images in many cases, they are style-able and scale-able, and screen readers would be able to make sense of them.
But, I don't see anyone doing this, and I've got a feeling this is a bad idea, I just can't decide why. I guess it seems too easy to be true. Could someone tell me why this is or isn't okay? Here are some more examples of the characters i'm talking about:
↖ ↗ ↙ ↘ ㊣ ◎ ○ ● ⊕ ⊙ ○  △ ▲ ☆ ★ ◇ ◆ ■ □ ▽ ▼ § ¥ 〒 ¢ £ ※ ♀ ♂ &⁂ ℡ ↂ░ ▣ ▤ ▥ ▦ ▧ ✐✌✍✡✓✔✕✖ ♂ ♀ ♥ ♡ ☜ ☞ ☎ ☏ ⊙ ◎ ☺ ☻ ► ◄ ▧ ▨ ♨ ◐ ◑ ↔ ↕ ♥ ♡ ▪ ▫ ☼ ♦ ▀ ▄ █ ▌ ▐ ░ ▒ ▬ ♦ ◊
PS: I would also welcome general information about these characters, what they're called and stuff (ASCII, Unicode)?
There are three things to deal with:
1. As characters in a sentence/text:
The problem is that some fonts simply do not have them. However since CSS can control font use you probably will not run into this problem. As long as you use a web safe font, and know that that character is available in that font, you should probably be okay.
You can also use an embedded font, though be sure to fall back on a web safe font that contains the character you need as many browser will not support embedded fonts.
However sometimes certain devices will not have multiple fonts to choose from. If that font does not support your character you will run into problems. However depending on what your site does and the audience you are targeting this may not be a problem for you. Not to mention that devices like that are very old, and uncommon.
All in all it was probably not a good idea a handful of years ago, but now you are not likely to have problems as long as you cover all your bases.
It is important however to point out that you should never hard code those characters, instead use HTML entities. Just inserting those characters into your code can lead to unpredictable results. I recently copied some text from Word directly into my code, Word used smart quotes (quote marks that curve inwards properly). They showed up fine in Notepad++, but when I viewed the page I did not get quotes, I got some weird symbol.
I could have either replaced them with normal quotes " or with HTML entities to keep the style “ and ” (“ and ”).
Any Unicode character can be inserted this way (even those without special names).
Wikipedia has a good reference:
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
2. As UI elements:
While it may be safe to use them in many cases, it is still better to use HTML elements where possible. You could simply style some div elements to be round and filled/not filled for your example.
As far as design goes they are really limiting, finding one that fits with the style of your page can be a hassle, and may mean that you will definitely need to embed a font, which is still only supported by the latest browsers.
Plus many devices do not support heavy font manipulation, and will often display them poorly. It works in the flow of your text, but as a vital part of the UI there can be major problems. Any possible issue one of those characters can bring will be multiplied by the fact that it is part of your UI.
From an artistic stand point they simply limit your abilities too much.
3. What are you doing?
Finaly you need to consider this:
Text is for telling
Image is for showing
HTML is for organizing
CSS is for making things look good while you show them
JavaScript is for functionality
Those characters are text, they are for telling someone something. So ask the question: "What am I doing?" and then use what was designed for that task. If you are telling use them, if you are showing use Image, or CSS.
I've seen this done before (the stars) and I think it's an awesome idea! It's also becoming quite popular to use a font (with #font-face) full of icons, like this one: http://fortawesome.github.com/Font-Awesome/
I can't see any downside to using a font like "font awesome" (only the upsides you mention like scalabilty and the ability to change color with CSS). Perhaps there's a downside to using the special characters you mention but none that I know of.
The problem with using those characters is that not all of them are available in all fonts used by all users, which means your application may look strange, or in the worst case be unusable. That said, it is becoming more common to assume the characters available in certain common fonts (Apple/Microsoft's Arial, Bitstream Vera). You can't even assume that you can download a font, as some users may capture content for offline reading with a service like Instapaper or Read It Later.
There are a number of problems:
Portability: using anything other than the 7-bit ASCII characters in code can make your code less portable, as recipients may use the wrong encoding. You can do a lot to mitigate this (eg. use UTF16 or at least UTF-8 encoded files). Most languages allow you to specify strings in characters using some form of escape notation (eg. "\u1234" in C#), which will avoid the problem, but loses some of the advantages.
Font-dependency: user interface elements that depend on special characters being available in a font may be harder to internationalize, since those glyphs might not be in the font that you want/need to use for a particular audience.
No color, limited choice of art: while font glyphs might seem useful to a coder, they probably look pretty poor to a UI designer.
The question is very broad; it could be split to literally thousands of questions of the type “why shouldn’t I use character ... in HTML documents?” This seems to be what the question is about—not really about code. And it’s about characters, seen as “weird” or “uncommon” or “special” from some perspective, not about character encodings. (None of the characters mentioned are encoded in ASCII. Some are encoded in ISO-8895-1. All are encoded in Unicode.)
The characters are used in HTML documents. There is no general reason against not using them, but loads of specific reasons why some specific characters might not be the best approach in a specific situation.
For example, the “little dots” you mention in your example (probably not dots at all but circles or bullets), when used as control elements as you describe, would mean poor usability and poor accessibility. Making them significantly larger would improve the situation, but this more or less proves that such text characters are not suitable for controls.
Screen readers could make sense of special characters if they used a database of various properties of characters. Well, they don’t, and they often fail to read properly even the most common special characters. Just reading the Unicode name of a character can be cryptic or outright misleading. The proper reading would generally depend on meaning and context.
The main issue, however, is that people do not generally recognize characters in the meanings that you would assign to them. How many people know what the circled plus symbol “⊕” stands for? Maybe 1 out of 1,000, optimistically thinking. It might be all right to use in on a page about advanced mathematics or physics, especially if the notation is defined there. But used in general text, it would be just… a weird character, and people would read different meanings into it, or just get puzzled.
So using special characters just because they look cool isn’t a good idea. Even when there is time and place for a special character, there are technical issues with them. How many fonts do you expect to contain “⊕”? How many of those fonts do you expect Joe Q. Public to have in his computer? In this specific case, you would find the font coverage reasonably good, but you would still have to analyze it and write a longish list of font names in your CSS code to cover most platforms. In the pile of poo case (♨), it would be unrealistic to expect most people to see anything but a symbol for unrepresentable character. Regarding the methods of finding out such things, check out my Guide to using special characters in HTML.
I've run into problems using unusual characters: the tools editor, compiler, interpreter etc.) often complain and report errors. In the end, it wasn't worth the hassle. Darn western hegemony, or homogeneity, or, well, something!

In HTML and CSS, how do I make japanese text break lines correctly?

I'm writting a simple paragraph in both English and Japanese, using only HTML and CSS. The English text breaks lines normally (when a word doesn't fit on a line anymore, it's pushed to the next one).
With Japanese though, not a whole word is pushed to the next line, but part of it only. I've tried setting word-wrap to break-word and normal, but nothing changes (with the Japanese text).
How to I make whole words in Japanese jump to the next line like it happens in English?
English separates words with spaces, Japanese doesn't.
Whether characters in Japanese form a word or not depends on context. In many cases, looking for certain grammatical (Kana) particles could be used to separate words - but this wouldn't even be close to being reliable.
Essentially, you'd need a Japanese dictionary / understanding of the language to identify where the words start and end - a browser won't know how to do this.
Alternatively, if you know the start and end of the words, you could perhaps wrap each one in a span - then use CSS to ensure each span wraps to a new line as a whole when it doesn't fit.
Japanese has specific rules that are followed when breaking text. They are called 禁則処理 (kinsoku shori). Here is a link explaining the rules. The rules are mostly concerned with special characters. Have a look at any popular Japanese webpage and you will see that multi-character (kana and kanji) words are often split. I often see です split between lines.
Update:
I stumbled across this tool recently. I haven't tried it out yet, but the theory is solid. If someone is looking to improve the line breaks with Japanese text this could be a good solution.
I'm not an expert with Japanese specifically so it's hard for me to tell if things are wrapping correctly, but I just had to solve this problem myself and both word-break: keep-all and white-space: nowrap seemed to solve the issue for me, so those might be worth trying out.
Until the browsers are smart enough to do on-the-fly semantic analysis of the language, there are only a couple of options :
1/ Understand enough of the language to be able to group semantic elements in their own, unbreakable DOM elements. Something like (without the line breaks) :
<span class="el">私は</span>
<span class="el">キッチンで</span>
<span class="el">パンを</span>
<span class="el">食べました。</span>
Then in CSS, use something like .el { display: inline-block; }. You probably want to do this only on headings and important text pieces only, since it could impact accessibility (ie. how screen readers interpret the text). The other inconvenients are that 1/ you need to understand the text to know where to add the blocks, and 2/ this obviously only works for static text (and even in that case, it's still a manual, painstaking process).
2/ Use a tool that does the grouping for you. It could be something on the client side, like TinySegmenter (whitch does segment a bit too much for my taste IMHO), or on the server-side, with things like Budou that use Google Cloud Natural Language API and ML to analyze your sentences. The downsides (at least for Budou) is that 1/ you need Python (I think that I saw a Node.js port somewhere), and 2/ It's not free.
Hope this helps!
try setting the css property
line-break:strict;
Check it out here.