How does the Traditional "HTML is only for content" line of thought handle dynamic formatting? - html

For so long, I've read and understood the following truths concerning web development:
HTML is for content
CSS is for presentation
JavaScript is for behavior.
This is normally all fine and good, and I find that when I strictly follow these guidelines and use external .css and .js files, it makes my entire site much much more manageable. However, I think I found a situation that breaks this line of thought.
I have a custom forums system that I've built for one of my sites. In addition to the usual formatting for such a system (links, images, bold italics and underline, etc) I've allowed my users to set the formatting of their text, including color, font family, and size. All of this is saved in by database of forum messages as formatting code, and then translated to the corresponding HTML when the page is viewed. (A bit inefficient, technically I should translate before saving, but this way I can work on the system live.)
Due to the nature of this and other similar systems, I end up with a lot of tags floating around the resulting HTML code, which I believe are unofficially deprecated since I'm supposed to be using CSS for formatting. This breaks rules one and two, which state that HTML should not contain formatting information, preferring that information to be located in the CSS document instead.
Is there a way to achieve dynamic formatting in CSS without including that information in the markup? Is it worth the trouble? Or, considering the implied limitations of proper code, an I to limit what my users can do in order to follow the "correct" way to format my code?

It's okay to use the style attribute for elements:
This is <span style="color: red;">red text</span>.
If users are limited to only some options, you can use classes:
This is <span class="red">red text</span>.
Be sure to use semantic HTML elements:
This is <strong>strong and <em class="blue">emphasized</em></strong>
text with a link.
Common semantic elements and their user-space terms:
<p> (paragraphs)
<strong> (bold)
<em> (italic)
<blockquote> (quotes)
<ul> and <ol> with <li> (lists)
More...?
Likely less common in forum posts, but still usable semantic elements:
<h1>, <h2>, etc. (headings; be sure to start at a value so your page makes sense)
<del>, and, to a lesser extent, <ins> (strikeout)
<sup> and <sub> (superscript and subscript, respectively)
<dl> with <dt> and <dd> (list of pairs)
<address> (contact information)
More...

This is a bit tricky. I would think about what you really want to allow visitors to do. Arbitrary colours and fonts? That seems rather useless. Emphasis, headings, links, and images? Well that you can handle easily enough by restricting to those tags / using a wikitext/forumtext markup that only provides these features.

You could dynamically build an inline style sheet in the head of the html page fed to the users. Put in the head of the page and allow it to target those elements configurable by the user.
Alternatively, there's the notion of using external stylesheets that feature the most common adjustments, but there'd be hundreds of them to account for every possible alternative. If you use this you'd need an external style sheet for a specific font size, colour and so on, and dynamically link to those in the header. As with any external stylesheet. Though this is almost unbearably complex to enable.
Option one would work okay though.
As an example:
<STYLE>
h1,h2,h3,h4 {font-family: Helvetica, Calibri;}
p {font-size: 1.2em; // Populate all this with values from the Db.
font-weight: bold;
}
a {text-decoration: underline;
color: #f00;
}
</STYLE>
Also, it just occurred to me that you could probably create a per-user stylesheet to apply the configurable aspects. Use
<link href="/css/defaultstylesheet.css" type="text/css" rel="stylesheet" media="all" />
<link href="/css/user1245configured.css" type="text/css" rel="stylesheet" media="all" />
<!-- clearly the second is a stylesheet created for 'user 1245'. -->
The bonus of this approach is that it allows caching of the stylesheet by the browser. Though it might likely clutter up the css folder, unless you have specific user-paths to the user sheet? Wow, this could get complex... :)

This is an interesting situation because you can have an infinite number of different styles, depending on your users' tastes and their own personal styles.
There are a couple of things you can be doing to manage this situation efficiently. Probably the easiest would be to just use style overrides:
<p style="color: blue; font-size: 10pt;">Lorem Ipsum</p>
This is quick and easy. And remember, this is what style overrides are there for. But as you've said, this does not fit well with this content-presentation separation paradigm. To separate them a little more, you could build some CSS information on page load and then insert it into the <head> tag of your HTML. This still keeps the HTML and the CSS somewhat distinct, even though you're not technically sepating them.
Your other option would be to build the CSS and then output that to a file. This, however, would not be efficient (in my opinion). If you were to, on every page load, build a new CSS file that accounts for your users' unique preferences, this would sort of defeat the purpose. It's the same thing as the second option, using the <head> tag, you're just making it look separated. Even if you used techniques such as caching to try to limit how often you have to build a CSS file, will the ends really justify the means?
This is a completely subjective topic and you should, in the end, choose what you're most comfortable with.

I don't know which framework or even language you are using but e.g. Django uses a certain template language to sort of represent the HTML being output. I think a nice solution would be to simply use a different "template" depending on what the user has chosen. This way you wouldn't have to care about breaking the "rules" or having a bunch of basically unused tags floating around in the DOM.
Unless I completely misunderstood...!

The easiest way to manage this is probably to emit dynamic CSS when the pages are generated, based on the user's settings. Then everything is doing the job it is supposed to be doing and the server is doing the work of converting the user's settings into the appropriate CSS.
With the CSS doing this work, you can use appropriate attributes in the HTML (id and name and class and so on) and emit CSS that will cleanly format everything the way you want.

Consider the benefits versus the costs before you do anything. What is actually wrong with your code right now? Tag soup and combined content/presentation is to be avoided not because it makes a bad website, but because it is hard to maintain. If your HTML/CSS is being generated, who cares what the output is? If what you've got now works, then stick to it.

I assume you are allowing only a limited white list of safe options, and therefore parsing the the user's HTML already.
When rendering the HTML you could convert each style declaration to a class:
<span style="font-family: SansSerif; font-size: 18px;">Hello</span>
To:
<span class="SansSerif"><span class="size_18px">Hello</span></span>
Laborious to generate (and maintain) the list. However you needn't worry about a class for each combination, which is of course your main problem.
It also has the benefit of extra security as user's CSS is less likely to slip through your filter as it's all replaced, and this should also ensure all the CSS is valid.

I've allowed my users to set the
formatting of their text, including
color, font family, and size. All of
this is saved in by database of forum
messages as formatting code, and then
translated to the corresponding HTML
when the page is viewed.
So, you've done formatting through HTML, and you know that formatting is supposed to be done through CSS, and you realise this is a problem, and you got as far as asking a 300-word SO question about it ... ?
You don't see the solution, even though you can formulate the question ... ?
Here, I'll give you a hint:
All of this is saved in by database of
forum messages as formatting code, and
then translated to the corresponding
HTML CSS when the page is viewed.
Does that help?
Is this question a joke?

Related

Disadvantages of using consistent-behaving yet deprecated HTML tags?

When users visit my website, they don't care about how perfect or how much standard the page is coded. They only care about whether it works or not.
There are tags that are deprecated but have consistent behavior throughout all major, minor, and very minor browsers. They work now and will work in the future. (I'm not talking about optional tags like <marquee> and <blink> which will probably be removed in the future since their non-existence doesn't break pages.) The tags I'm talking about are for example:
<center> (used by google.com homepage, yes and it's May 2014)
<body bgcolor=, alink=, vlink=, link= (all used by google.com)
<font size= (also used by google.com)
If my HTML generator produces tags like <body bgcolor=black>, it is guaranteed to work for near 100% of users.
If it instead produce CSS like background:black;, it will be supported by lesser users compared to <body bgcolor=black>. (Start with https://superuser.com/q/732669/78897 and https://superuser.com/q/447269/78897, though I'm sure they are not the only ones in the whole world.)
Bear with me, this is a real question based on a true problem. Exactly what are the real disadvantages of having these tags as output?
Potential disadvantages include the following:
1) Your customer might actually care about how standard the code is. Maybe not now, but in the future. Maybe for questionable reasons, but still.
2) Deprecated constructs do not always work consistently. For example, align=center attribute set on a table may have different effects depending on browser mode. This is a relatively weak argument, though, since the browser practices have been described rather well in HTML5 CR and you can manage the potential problems. (Besides, even CSS settings may work inconsistently.)
3) There is no guarantee that deprecated features will be supported by all future browsers. On the other hand, the same applies to standard features. In practice, very few features that have been defined in HTML specifications have actually been removed from browsers. (Regarding tags, I think basefont is the only case.) All the examples mentioned, and also marquee, have been described in HTML5 CR as “obsolete” but still well-defined, and according to HTML5 CR, browsers are expected, and partly required, to support them all.
4) Your colleagues (designers/developers/...) may regard your code (and you) as old-fashioned, non-semantic, and whatever.
5) Code maintenance and development may be more difficult. If you have 1,000 pages with <body bgcolor=black> and the customer says they want a somewhat different background color, you would need to edit each page. This argument is, however, weaker than it seems to be. First, how often do such things actually happen? Second, if the pages have actually been generated using suitable tools, perhaps you just need to change the value of one parameter and regenerate them (or just let servers do that, if the pages are dynamically generated). Third, if you have a link element on all pages, referring to basic style sheet for the pages, as you normally should, you just need to add one rule to that style sheet. It is easy to override presentational HTML attributes with CSS.
To summarize, the practical arguments against your approach are rather weak. The most important arguments relate to coding style and principles.
I've added some more disadvantages:
Another disadvantage of using those tags is site bandwidth. When you put in html center, bgcolor and similar tags every time browser needs to load the whole content even if on every page those tags are the same or even if user visited this site many times. But when you place design in css file browsers may cache those files (especially when you set headers properly) so they only load html and images (if no cache is set).
One another thing is that if you decide to redesign the site/style new elements, it's much easier to put changes only in CSS files. It's possible in future you won't be doing those changes on your own or other companies/freelancers will be doing them and it will be much easier for them to make changes in the site. So the site will be cheaper to maintain.
In addition if html / php code is poor (or site is very complex) and many "visual conditions" appear in many files (for example on one page you decide to use one colour and you put it in HTML, on the other another colour) and something goes wrong it will be much easier to find the problem because you may simple cut some css and check where's the problem.
The disadvantage is when one of the major browsers chooses to get rid of the deprecated tag in a future release.
The advantage of using CSS over tags is that you can change the whole web site look and feel in a simple move.
Consider people that require larger font sizes. Colour blindness and also enable the most use of screen readers.
Even those consistent behaviour tags may be removed from browser. What if you would like to create HTML5 website? Then you will need to learn everything from scratch and change literally everything for your website to make it work because you never know if those tags will be supported in HTML 5 in future or only in older HTML documents
CSS provides easier maintenance, for one; client decides they want some elements aligned left instead of center? Change your css rule and poof, you're done. But if you're using old-school valign and such? Get ready to go change every single instance of that in the file(s).

Big development teams can't handle a single CSS style sheet?

I am currently in a 5-7 large development team creating a really large website with lots of pages and features.
I feel like we are in such a situation where a developer can change the style sheet to suit his own needs, but is unaware of the 1000 places where it probably change it for something else. I cannot blame him either, since I know it's hard to check everything.
It's a total mess.
I know that using one single style sheet file saves bandwidth and prevents duplicated code and maintenance, but I cant help wondering - is using style sheets a good idea for big sites, or should it be more object/element oriented.
Let's say you forget about the crazy large CSS and you define the CSS on each element instead. So each time you render a GreenBuyButton, it has the "style='bla bla bla'" on it. And this is pretty much done for all elements.
This will increase the bandwidth, but it will not create duplicated code.
Could this be a good idea or how does really large teams work on a single website do with CSS to avoid it being a mess?
Why don't you create multiple CSS sheets depending on the area of the site?
blog.css
accounts.css
shopping.css
Then you could have a serverside script (say PHP) combine all CSS into 1 sheet which will get you the same result of 1 small file (could use a minimizer as well).
Check your overall site with a CSS checker to find duplicates (css defined) and manage it that way.
Otherwise communication is key between your team, who develops what, and so people don't duplicate CSS definitions. A master CSS keeper would be best suited to manage the CSS styles, besides your team should have an agreed upon style and not go rouge creating their own unique styles.
My recommendation would be to use the CSS rules on specifity to help you. For each CSS that is not global, put an activate selector on, for example
.user-list .p {
font-size: 11pt
}
.login-screen .p {
font-size: 12pt
}
This will make it easy to identify what rules are for which pages, and which rules are global. That way developers can stick to their own set of styles, and no mess up anyone else's.
Change how you write CSS.
Instead fo treating every area of the website like a specific piece of markup that needs styling, start defining broad classes.
Enforce some rules. Like, "All <ul> have a specific look for this project." If there are multiple ways you want to style an element, start using classes. This will keep your website looking uniform throughout. Uniformity reduces broken layout.
Create building block classes like a "framework" of sorts. This has helped me so often that I never start a project without doing this first. Take a look at the jquery-ui themeroller framework to give you the idea. Here's an example:
.icon { display:block;width:16px;height:16px;}
.icon-green { background:url(/green.png);}
.icon-blue { background:url(/blue.png);}
Then on the elements:
<span class="icon icon-green"></span>
<span class="icon icon-blue"></span>
Breaking your styles up into their building blocks like this and using multiple classes on the element will keep your team members from having to change styles to suit their needs. If a particular styling quirk is not available they can define a new set of classes.
UPDATE:
Here is an example of how I used this method: Movingcost.com. Huge website, multiple different sections and pages, and only 252 lines of uncompressed css. Actually, these days I break things down further than I did on the movingcost project. I probably would have gone through those elements at the bottom of the stylesheet and figured out how to combine some of those into classes.
Multiple CSS files and combine in code
While doing development I found out that doing it the following way seems to be reasonable and well suited to development teams:
Don't put any styling into HTML. Maintainability as well as lots of head scratching why certain things don't display as expected will be really bad.
Have one (or few of them) global CSS that defines styles for global parts. Usually defines everything in template/master. Can be bound to master page or to generic controls used on majority of pages.
Have per-page/per-control CSS files when they are actually needed. Most of the pages won't need them, but developers can write them
Have these files well structured in folders
use naming and formatting guidelines so everyone will be able to write/read code
Write server side code taht will combine multiple CSS files into a single one to save bandwith.
You can as well automate some other tasks like auto adding per-page CSS files if they're named the same as pages themselves.
Doing it this way will make it easier to develop, since single CSS files will be easier to handle due to less content and you will have less code merging conflicts, because users will be working on separate functionality most of the time.
But there's not feasible way of automating CSS unit tests that would make sure that changing an existing CSS setting won't break other parts of your site.
My favorite override trick is to assign the id attribute on the <body> of each page. It's an easy way to make page specific changes without breaking out a separate stylesheet file.
You could have the following html
<body id="home">
<h1>Home</h1>
</body>
<body id="about">
<h1>About</h1>
</body>
And use the following css overrides
h1 {color: black}
#about h1 {color: green}
The home page gets the default css while the about gets overridden.
Using style sheets on large sites is an excellent idea. However, it only really works when you apply your team standards to the style. It makes sense to have a singular template controller that links your style sheet(s). It also makes sense to appoint someone on the team as "keeper of the style" who all changes to the style sheet should go through before making substantive changes.
Once the style standards are agreed upon and defined, then all of the controls in the site should implement the styles defined. This allows developers to get out of the business of coding to style and simply coding to the standard. Inputs are inputs, paragraphs are paragraphs, and floating divs are a headache.
The key is standardization within the team and compliance by all of the developers. I currently lead a team site that has upwards of 30 style sheets to control everything for layout, fonts, data display, popups, menu and custom controls. We do not have any of these issues because the developers very rarely need to edit the style sheet directly because the standards are clearly designed and published.
The answer is in the name. The reason it's called cascading style sheets is because multiple can be combined and there are decent rules defined on which one takes preference.
First of all, doing all your styling inline is a ridiculous idea. Not only will it waste bandwidth like nothing else, it will also result in inconsistency. Think about it for a while: why would changing a line of css 'break' another page? That indicates your css selectors are poorly chosen.
Here are my suggestions:
use one css file for the basic site look. This css file is written by people doing mainly design, and as a result the site has a consistent look. It defines the basic colors, layout and such.
use another css file per 'section'. For instance, a 'shopping' section will use components that are nowhere else on the site. Use that to define section-specific stuff
put page-specific styling directly in the page (in the header). If this section becomes too big, you're doing something wrong
put exceptional styling directly on the components. If you're doing the same thing three times, abstract it out and use a class instead.
choose your classes wisely and use the semantics for naming. 'selectedSalesItem' is good 'greenBold' is bad
if a developer changes a stylerule and it breaks the rest of the site, why did he need to change it? Either it's an exceptional thing for what he's working on (and should be inlined) or it was basically broken on the rest of the site as well, and should be fixed anyway.
If your css files become too big to handle, you can split them up and merge them server-side, if you want.
You don't want to define CSS for each element because if you ever need to make a change that affects many elements one day, say the looks of all the buttons or headers, you will be doing a lot of Search/Replace. And how to check if you forgot to update one rule to keep your site consistent?
Stephen touched on a very strong point in CSS. You can assign multiple classes to an element.
You should define some basic rules that "ordinary" developers can't touch. They will provide the consistency through the site.
Then developers can assign an extra class to personalize any property. I wouldn't assign more than two classes though: a global and a personalized.
Considering you already have this huge stylesheet in your hands, I'm not sure how you will pick which one of the 7 developers will have to sit down through a month and organize it. That is probably going to be hard part.
First off, you need to extract your website's default element styling and page structure into a separate stylesheet. That way people understand changing those rules affects the entire site's appearance/structure, not just the page they're working on.
Once you do that, all you really need to do is document / comment all of your code. A person is a lot less likely to write duplicate code in a well-documented stylesheet, and that is a fact.

Html/ css coding standards

I'm building my first website for an internship. My instructors always told me to never embed any styles on my html page.Now that I'm actually creating a site I find it annoying that, if I want to change the color of my font for a span tag - I have to I.D. it and reference it in a css file. Is there some other reason then organizational purposes for using CSS? Would embedding a single style be such a convention breaker? Thanks for reading this and I'd appreciate any feedback.
There are a couple of reasons.
Times when you want to change the style of a single element on a single page should be exceedingly rare, so it shouldn't be such a hardship. Any other time, it is going to be more efficient (from an HTTP caching perspective) and easier to maintain (from a separation of style and structure perspective) to externalize the style information.
Since there is a good chance that you'll want to style it differently for different media (e.g. screen and print), you'll need a proper stylesheet for that too.
If you embedd a style to several HTML pages, and want to change it later, you have to go file by file changing it. That is one good enough reason for me.
The key word here is maintainability. Organized code is maintainable code! It is far better to add an id to a tag and reference it in the global css file than to do it inline, because if you want to change that style later, you know where to find it, and you only have to change it in one place.
The reason you want to offload the CSS into a different file is so the browser can cache it. Otherwise, the browser has to load all the CSS as well as all the markup on every page. If you keep it in a separate file, the browser only has to load the CSS once.
The basic argument for this is that HTML's purpose is to provide structure while CSS's job is to provide styling, by embedding CSS in HTML you're breaking this basic rule. Plus, you'll have a tough time in maintaining pages.
Ideally, a design should be consistent enough that you can use generic rules for such situations. If you want to emphasize something, then <em> or <strong> is likely the way to go. After styling your <em> or <strong>, you can easily add the same emphasis to other areas of the site.
It's not simply about performance or style, it's also about consistency and ease of maintenance.
Find the similar elements in your design and mark them up similarly. It's as easy as that.
Even if it's "just 1" you should still do it because it helps you get in the habit of it.
embedded css has the following problems:-
1. It has browser compatibility problem. Example Ie has problem understanding inbuilt styling.
2. If you want to use the same css style again , it is better to have a class for it.

Page-specific css rules - where to put them?

Often when I'm designing a site, I have a need for a specific style to apply to a specific element on a page and I'm absolutely certain it will only ever apply to that element on that page (such as an absolutely positioned button or something). I don't want to resort to inline styles, as I tend to agree with the philosophy that styles be kept separate from markup, so I find myself debating internally where to put the style definition.
I hate to define a specific class or ID in my base css file for a one-time use scenario, and I dread the idea of making page-specific .css files. For the current site I'm working on, I'm considering just putting the style definition at the top of the page in the head element. What would you do?
Look to see if there's a combination of classes which would give you the result that you want. You might also want to consider breaking up the CSS for that one element into a few classes that could be re-used on other elements. This would help minimize the CSS required for your site as a whole.
I would try to avoid page-specific CSS at the top the HTML files since that leaves your CSS fragmented in the event that you want to change the appearance of the site.
For CSS which is really, truely, never to be used on anything else, I would still resort to putting a #id rule in the site-wide CSS.
Since the CSS is linked in from a different file it allows the browsers to cache that file, which reduces your server bandwidth (very) slightly for future loads.
There are four basic cases:
style= attribute. This is the least maintainable but easiest to code. I personally consider use of style= to be a bug.
<style> element at the top of the page. This is slightly better than style= because it keeps the markup clean, however it wastes bandwidth and makes it harder to make sweeping CSS changes, because you can't look at the stylesheet(s) and know what rules exist.
page-specifc css: This lets you have the clean HTML and clean main CSS file. However, it means your client must download lots of little CSS files, which increases bandwidth and page loading latency. It is, however, very easy to maintain.
one big site-wide CSS: The main advantage of one big file is that it's only one thing to download. This is much more efficient in terms of bandwidth and latency.
If you have any server-side programming going on, you might be able to just dynamically combine multiple sheets from #3 to get the effect of #4.
I would recommend one big file, whether you actually maintain it as one file or generate the file through a build process or dynamically on the server. You can specify your selectors using page-specific IDs (always include one, just in case).
As for the answer that was accepted when I wrote this, I disagree with finding a "combination of classes that gives you the result you want". This sounds to me like the classes are identifying a visual style instead of a logical concept. Your classes should be something like "titlebox" and not "red". Then if you need to change the text colour on the user info page, you can say
#userInfoPage .titlebox h1 { color : red; }
Don't start applying classes all over the place because a class currently has a certain appearance that you want. You should put high-level concepts into your page, represented by HTML with classes, and then style those concepts, not the other way around.
I would set an id for a page like
<body id="specific-page"> or <html id="specific-page">
and make use of css override mechanism in the sitewide css file.
I think you should definitely expand the thought process to include some doubt for "page specific css". This should be a very very rare thing to have. I'd say go for the global style sheets anyway, but refactor your css / html in a way that pages don't have to have super-specific styling. And if in the end there's a few lines of page-specific markup in the global css, who cares. It's better to have it in a consistent place anyway.
Defining the style in the consuming page or inlineing your style are two sides of the same coin - in both cases you are using page bandwidth to get the style in there. I don't think one is necessarily better than the other.
I would advocate making an #Selector for it in your site-wide main stylesheet. The pollution is minimal and if you really have that many truly unique cases, you may want to rethink they way you mark-up your sites.
I would put them in a <style /> tag at the top of the page.
It's not worth it to load a page-specific CSS file for one or two specific rules. I would place it in tags in the head of the document. What I usually do is have my site-wide CSS file and then using comments, section it up based on the pages and apply specific rules there.
As you know style-sheet files are static files and cached at client. Also they can be compressed by web server. So putting them in an external file is my choice.
For that situation, I think putting the page-specific style information in the header is probably the best solution. Polluting your site-wide style sheet seems wrong, and I agree with your take on inline styles.
In that case I typically place it at the top of the page. I have a page definition framework in PHP that I use which carries local variables for each page, one of which is page-specific CSS styles.
Put it in the place you would look if you wanted to know where the style was defined.
For me, that's exactly the same place as I would place styles that were used 2 times, 5 times, or 170 times - I see no reason to exclude styles from the main stylesheet(s) based on number of uses.

Apart from <script> tags, what should I strip to make sure user-entered HTML is safe?

I have an app that reprocesses HTML in order to do nice typography. Now, I want to put it up on the web to let users type in their text. So here's the question: I'm pretty sure that I want to remove the SCRIPT tag, plus closing tags like </form>. But what else should I remove to make it totally safe?
Oh good lord you're screwed.
Take a look at this
Basically, there are so many things you want to strip out. Plus, there's stuff that's valid, but could be used in malicious ways. What if the user wants to set their font size smaller on a footnote? Do you care if that get applied to your entire page? How about setting colors? Now all the words on your page are white on a white background.
I would look into the requirements phase again.
Is a markdown-like alternative possible?
Can you restrict access to the final content, reducing risk of exposure? (meaning, can you set it up so the user only screws themselves, and can't harm other people?)
You should take the white-list rather than the black-list approach: Decide which features are desired, rather than try to block any unwanted feature.
Make a list of desired typographic features that match your application. Note that there is probably no one-size-fits-all list: It depends both on the nature of the site (programming questions? teenagers' blog?) and the nature of the text box (are you leaving a comment or writing an article?). You can take a look at some good and useful text boxes in open source CMSs.
Now you have to chose between your own markup language and HTML. I would chose a markup language. The pros are better security, the cons are incapability to add unexpected internet contents, like youtube videos. A good idea to prevent users' rage is adding an "HTML to my-site" feature that translates the corresponding HTML tags to your markup language, and delete all other tags.
The pros for HTML are consistency with standards, extendability to new contents types and simplicity. The big con is code injection security issues. Should you pick HTML tags, try to adopt some working system for filtering HTML (I think Drupal is doing quite a good job in this case).
Instead of blacklisting some tags, it's always safer to whitelist. See what stackoverflow does: What HTML tags are allowed on Stack Overflow?
There are just too many ways to embed scripts in the markup. javascript: URLs (encoded of course)? CSS behaviors? I don't think you want to go there.
There are plenty of ways that code could be sneaked in - especially watch for situations like <img src="http://nasty/exploit/here.php"> that can feed a <script> tag to your clients, I've seen <script> blocked on sites before, but the tag got right through, which resulted in 30-40 passwords stolen.
<iframe>
<style>
<form>
<object>
<embed>
<bgsound>
Is what I can think of. But to be sure, use a whitelist instead - things like <a>, <img>† that are (mostly) harmless.
† Just make sure that any javascript:... / on*=... are filtered out too... as you can see, it can get quite complicated.
I disagree with person-b. You're forgetting about javascript attributes, like this:
<img src="xyz.jpg" onload="javascript:alert('evil');"/>
Attackers will always be more creative than you when it comes to this. Definitely go with the whitelist approach.
MediaWiki is more permissive than this site; yes, it accepts setting colors (even white on white), margins, indents and absolute positioning (including those that would put the text completely out of screen), null, clippings and "display;none", font sizes (even if they are ridiculously small or excessively large) and font-names (even if this is a legacy non-Unicode Symbol font name that will not render text successfully), as opposed to this site which strips out almost everything.
But MediaWiki successifully strips out the dangerous active scripts from CSS (i.e. the behaviors, the onEvent handlers, the active filters or javascript link targets) without filtering completely the style attribute, and bans a few other active elements like object, embed, bgsound.
Both sits are banning marquees as well (not standard HTML, and needlessly distracting).
But MediaWiki sites are patrolled by lots of users and there are policy rules to ban those users that are abusing repeatedly.
It offers support for animated iamges, and provides support for active extensions, such as to render TeX maths expressions, or other active extensions that have been approved (like timeline), or to create or customize a few forms.