Emoticons - CSS backgrounds or inline images? - html

I was thinking to replace user text like :), :P in comments with smilies (emoticons). Using regex. Do you think it's a good idea for the replacement to be a span element with a class? Then I apply the smiley image to that class?
Or should I just replace that text with <img> tags?
CSS is usually seen as not part of the content, but these image smileys are...
(if you disable the css, the text could change its meaning because emoticons are missing)

An emoticon is visual information presented using characters, so if you replace, say, “:-)” by something, the natural candidates are special characters such as “☺” (U+263A WHITE SMILING FACE) and an img tag like <img alt=":-)" src="smiley.png">.
Using an element with a background image has several drawbacks, including lack of any counterpart to the alt attribute and the common browser behavior of suppressing background images on printing.
It is somewhat risky to programmatically change anything emoticon-looking to e.g. an image. You cannot be sure that every “:-)” is an emoticon. All kinds of odd character combinations may arise in special fields. Besides, if the user was writing e.g. about emoticons, part of the content might get lost or distorted in the replacement.

if you have a lot of smileys you are better of using the css sprite trick because it means the browser only has to download one image file instead of downloading a dozen smaller ones
this will result in less overhead and better caching

Use CSS to display the pictures. The best practice is to strip unwanted characters, invisible characters and HTML tags from user input, to avoid HTML code injection and cross-site scripting.

Have the smileys in plain-text, and display a picture instead with CSS.
You can achieve that by using a <span> or another element.
For instance, :) should be <span class="normalsmiley">:)</span> in the code.
Then the text will make sense for people not seeing images or with CSS disabled (they will see a text smiley).

Related

How to separate design and content in a dynamic website?

In normal case, I can separate the text and the style, but how should I do it, when the text is dynamic (it is editable by the admin user)? The user of course wants to use bold, italic, etc, but if I put a common html-editor (I think) I broke the rule of the separation, because there will be html elements in the text. (I can use BB codes, but it is the same.)
In a long term I think it can cause problems when I want to use the text in any non-html environment. Of course I can strip the html tags, but it is not the way I would like to use (not because it won't work, but the original theoretical issues).
In some cases I can break apart the sentences to solve this problem, but I think it's a bad way, because the parts are pointless alone, and it won't be so easily editable too.
Is there any good solution for this?
That's perfectly ok.
You give the user the oppertuniny to set some attributes for the text (BBCodes recomended).
That is content. Then it's part of the design to interpret the attributes and style it.
For example you may provide the feature to let the user define something like [headline]MyHeadline[/headline]. This is pure content.
How to replace [headline] with HTML and how to style the resulting text is up to the design.
Edit: I recommend BBCodes to provide a closed set of features. That may be easier to deal with. You could just use them in another context and interpret them, instead of stripping out HTML.
If the tags entered are semantic, ie they are using an <i> tag for italic, rather than style="font-style:italic", then your design and content are still separate.
Separating design and content is about separating a site's presentation from the readable code, rather than removing the markup altogether.
I'd advise you focus on Semantic HTML.

What concerns should I have when using special character symbols on the web?

In our web app, we're using colored stars (★ aka ★) to represent a rating. So the first four stars would be a solid color while the last star would be white to represent a rating of 4 out of 5. Like so:
What concerns does this raise in terms of accessibility and support?
I can't be certain that the version of the font in the user's browser supports this character, what are some methods to provide "graceful degradation" to this? Or is the coverage good enough that this is not an issue?
How is this "rendered" by a screenreader? Would wrapping the rating with a <span title="4 out of 5"> provide more accessibility?
The general question in the heading is very broad. To address it briefly, the main concern is font problems, which can be rather serious, and there is really no graceful degradation; for a longer answer, see my Guide to using special characters in HTML.
The specific question about colored stars is much simpler, and the short answer is that there are strong reasons for using five images, each with a different number of colored stars. Then you can use meaningful alt attributes, like alt="four stars", and things work rather reliably. They should be content images (via img), as there is no way to specify textual alternative to a background image.
Considering the possibility of using the BLACK STAR “★” character, its font support is not particularly bad but not very widespread either. There is no simple way to find out the percentage of computers that have some font containing it. Moreover, if the character exists in some font(s) in the system, its appearance may vary a lot. For this specific character, glyphs can be expected to look rather similar – but in different sizes.
If your context really required the use of a symbol as a text character, then you may need to take risks, but here the symbols accompany text instead of really being contained in text, so it’s OK to use images.
Using an embedded font via #font-face is possible but sounds overkill here.
Screen readers vary a lot in their treatment of special characters. In general, they have been designed to read normal text in some human language(s), and they often fail to speak special characters meaningfully – or at all, even by simply saying the name of the character. The title attribute may be spoken, but usually as an option only, and the user may be unaware of the existence of such options.
Images cause some HTTP requests, but this is of marginal or ignorable impact. (You could use CSS sprites, though that’s hardly useful in a simple case like this.) The images typically get cached well. They can be scaled to match text size if desired, e.g. by setting the height of img elements in em units (and not setting width, so that they get scaled so that width:height ratio is preserved).
For best overall compatibility use an image. You know what it will look like, the accessibility concerns are straightforward, and you don't need to worry about browser support for various features.
I think if you don't want to experiment, you should follow Jukkas advice and use an img element.
If, for whatever reason, you want to use the Unicode star symbols, you'd have to try to make it accessible. According to your description, I guess you currently do something like this:
<div class="rating">
<span>★★★★</span>★
</div>
<!-- CSS: .rating span {color:yellow;} -->
This example doesn't mark-up the semantics of the rating, and therefor screenreaders and other user-agents can't "understand" (announce) the meaning of it. The rating score is only described by color → this is not accessible.
A possible way might be the use of the abbr element, however, this use case could stretch the definition too far ("Are the five stars really an abbreviation of a rating score?"):
<div class="rating">
<abbr title="rating of 4 out of 5"><span>★★★★</span>★</abbr>
</div>
<!-- CSS: .rating span {color:yellow;} -->
You could use a #font-face from font squirrel. If the rating will never change, then how about creating an image using Photoshop or similar program to use as a background image? The title for the span is not a good one. What happens if you later change the rating, then you have to go change all the title attributes. I would make a more general title like "Rating". If the rating isn't absolutely necessary for your content on the page, consider using JavaScript to display the rating. That way, you don't actually change the HTML, therefore your security concerns should be handled.

Convert HTML/CSS into plain HTML

Is it possible to convert HTML + CSS into HTML for a system that doesn't handle CSS, not even inline CSS?
What options do I have?
No. Much of what CSS does is not possible with HTML alone. Your best option is to design your site in such a way that when it loses CSS, it still renders in a nice and orderly fashion. Pay very close attention to things like Heading Tags, paragraph tags, lists, etc. Be sure to build semantically-correct sites, and they (in most cases) will degrade quite nicely.
The only thing you can do is add styles that were possible with old html3+ attributes and font tags. Quite a bit of stuff is possible, but none of it is going to be automatic. You can go through pretty much everything in css and try to find it's html3+ attribute equivalent.
Things like background font b i center width height are examples of old attributes (or tags in the case of font) that define style (and should generally be ignored these days). I don't envy the work ahead of you, but just make a happy medium between reasonable things and unreasonable styles. Tables also might come in handy for floats as well.
Sounds like an old mobile device?
If you can't use any CSS, I would imagine you would have to resort to possibly deprecated HTML tags/attributes, like font tags and attributes like bgcolor.
This would probably be rather difficult, because to my knowledge you can't achieve everything you can do with CSS, like positioning for example. You would have to switch your layout to use tables and set align, valign, etc.
use this first
http://www.mailchimp.com/labs/inlinecss.php
then replace css with deprecated html
http://www.highdots.com/css-editor/articles/css_equiv.html
Two words: Image Maps :) (I've actually seen sites that, in order to "render correctly on every browser" literally just make a big fancy image the background, and add links accordingly via an image map)

Text that only exists if CSS is enabled

I have a website in which I provide tool-tips for certain things using a hidden <span> tag and JavaScript to track various mouse events. It works excellently. This site somewhat caters towards people with vision issues, so I try to make things degrade as well as possible if there is no JavaScript or CSS and generally I would say that it is successful in this regard.
So my question is, is it possible for these <span> to only exist if CSS is being used? I have thought about writing out the tool-tips in JavaScript on document load. But I was wondering if there is a better solution.
Perhaps you need to re-think the way you are providing tooltips. Could the content be contained in the title attribute of a semantically appropriate element?
EDIT: If you provide more info, someone might be able to suggest more of a solution. What sorts of elements are the tooltips popping up on? Images? Would the abbreviation tags be appropriate?
Quick Solution I just came up with: <span> has access to the core attributes, which include title, so you could include the tooltip text in the title, and use a javascript library like jQuery to display tooltips for all spans with a title.
A quick hack would be to color the text the same as the background (say, white on white) in html, and then use CSS to change the color back to something visible (black on white). Of course, this is only relevant for people able to see the text. Screen readers and such wouldn't see the text as hidden.
CSS is also used by screenreaders to help define which page elements are read or not.
Screen readers will almost always ignore elements with display:none applied to them, so not using CSS is not a valid indicator of a screenreader's presence.
I would go with Chris' idea of using javascript to generate the tooltips based on a title (or alt) attribute.
You could use JS to ensure that tooltips are only displayed when valid styles are set, so if JS is enabled and CSS disabled you can treat the extra information differently (eg footnotes).
http://juicystudio.com/article/screen-readers-display-none.php
http://www.456bereastreet.com/archive/200711/screen_readers_sometimes_ignore_displaynone/

Apart from <script> tags, what should I strip to make sure user-entered HTML is safe?

I have an app that reprocesses HTML in order to do nice typography. Now, I want to put it up on the web to let users type in their text. So here's the question: I'm pretty sure that I want to remove the SCRIPT tag, plus closing tags like </form>. But what else should I remove to make it totally safe?
Oh good lord you're screwed.
Take a look at this
Basically, there are so many things you want to strip out. Plus, there's stuff that's valid, but could be used in malicious ways. What if the user wants to set their font size smaller on a footnote? Do you care if that get applied to your entire page? How about setting colors? Now all the words on your page are white on a white background.
I would look into the requirements phase again.
Is a markdown-like alternative possible?
Can you restrict access to the final content, reducing risk of exposure? (meaning, can you set it up so the user only screws themselves, and can't harm other people?)
You should take the white-list rather than the black-list approach: Decide which features are desired, rather than try to block any unwanted feature.
Make a list of desired typographic features that match your application. Note that there is probably no one-size-fits-all list: It depends both on the nature of the site (programming questions? teenagers' blog?) and the nature of the text box (are you leaving a comment or writing an article?). You can take a look at some good and useful text boxes in open source CMSs.
Now you have to chose between your own markup language and HTML. I would chose a markup language. The pros are better security, the cons are incapability to add unexpected internet contents, like youtube videos. A good idea to prevent users' rage is adding an "HTML to my-site" feature that translates the corresponding HTML tags to your markup language, and delete all other tags.
The pros for HTML are consistency with standards, extendability to new contents types and simplicity. The big con is code injection security issues. Should you pick HTML tags, try to adopt some working system for filtering HTML (I think Drupal is doing quite a good job in this case).
Instead of blacklisting some tags, it's always safer to whitelist. See what stackoverflow does: What HTML tags are allowed on Stack Overflow?
There are just too many ways to embed scripts in the markup. javascript: URLs (encoded of course)? CSS behaviors? I don't think you want to go there.
There are plenty of ways that code could be sneaked in - especially watch for situations like <img src="http://nasty/exploit/here.php"> that can feed a <script> tag to your clients, I've seen <script> blocked on sites before, but the tag got right through, which resulted in 30-40 passwords stolen.
<iframe>
<style>
<form>
<object>
<embed>
<bgsound>
Is what I can think of. But to be sure, use a whitelist instead - things like <a>, <img>† that are (mostly) harmless.
† Just make sure that any javascript:... / on*=... are filtered out too... as you can see, it can get quite complicated.
I disagree with person-b. You're forgetting about javascript attributes, like this:
<img src="xyz.jpg" onload="javascript:alert('evil');"/>
Attackers will always be more creative than you when it comes to this. Definitely go with the whitelist approach.
MediaWiki is more permissive than this site; yes, it accepts setting colors (even white on white), margins, indents and absolute positioning (including those that would put the text completely out of screen), null, clippings and "display;none", font sizes (even if they are ridiculously small or excessively large) and font-names (even if this is a legacy non-Unicode Symbol font name that will not render text successfully), as opposed to this site which strips out almost everything.
But MediaWiki successifully strips out the dangerous active scripts from CSS (i.e. the behaviors, the onEvent handlers, the active filters or javascript link targets) without filtering completely the style attribute, and bans a few other active elements like object, embed, bgsound.
Both sits are banning marquees as well (not standard HTML, and needlessly distracting).
But MediaWiki sites are patrolled by lots of users and there are policy rules to ban those users that are abusing repeatedly.
It offers support for animated iamges, and provides support for active extensions, such as to render TeX maths expressions, or other active extensions that have been approved (like timeline), or to create or customize a few forms.