Is it safe to allow to embed an arbitrary external stylesheet into my web-page? - html

I have a dynamic web-page which I want other people to embed into their web-pages, with an iframe (not necessarily with any kind of more advanced techniques like JavaScript).
Instead of providing all sorts of designs and styles myself, I'm thinking about allowing them to provide their own stylesheet for my page through an HTTP GET parameter, and embed such external stylesheet through a URL w/ <link type="text/css" rel="stylesheet" href… on my page.
Is this safe? Will it violate the security paradigm of my web-site? I'm aware that extra text could be inserted with CSS alone, and indeed elements could be removed (which is the whole point of me providing such functionality for my users), but anything else I should be aware of?
Could malicious people insert links onto my site through such a CSS, to benefit from my http referer and potentially violate some checks, or is CSS insertion limited to text?

In the general case, no, allowing third-party CSS is not safe. Some implementations allow JavaScript in CSS, which means that allowing users to modify your CSS allows them to execute arbitrary JavaScript in the context of your page.
However, if this is meant to be sort of a "white-label" page, where it appears to be part of the site it's embedded in and the fact that it's really your page is just an implementation detail, this doesn't seem like a major concern. The person specifying the "third-party" CSS is the site owner, so it's not really third-party at that point — they're not going to XSS themselves!
But nobody else should ever be putting CSS on a page that's meant to be under your control, because it's really under the control of whoever is controlling the CSS.

CSS cannot insert linkable content. It can only style, position and hide what's already there. Sure, people can mess up your page with :before and :after text an perhaps make things look a little confusing or change labels on existing links, but not the URLs themselves.

Related

Optimize CSS Delivery - a suggestion by Google

Google suggests to use very important CSS inline in head and other CSS inside <noscript><link rel="stylesheet" href="small.css"></noscript>.
This raises few questions in my mind:
How to prioritize CSS in two files. Everything for that page looks important. Display, font etc. If I move it to bottom then how it helps page render. Wont it cause repaint, etc?
Is that CSS is required after Document ready event? Got it from here.
How 'CSS can' go inside <noscript></noscript>, which is for script? Will it work when JavaScript is enabled? Is it browsers compatible?
Reference
Based on my reading of the link given in the question:
Choose which CSS declarations are inlined based on eliminating the Flash-of-Unstyled-Content effect. So, ensure that all page elements are the correct size and colour. (Of course, this will be impossible if you use web-fonts.)
Since the CSS which is not inlined is deferrable, you can load it whenever makes sense. Loading it on DOMContentReady, in my opinion, goes against the point of this optimisation: launching new HTTP requests before the document is completely loaded will potentially slow the rest of the page load. Also, see my next point:
The example shows the CSS in a noscript tag as a fallback. Below the example, the page states
The original small.css is loaded after onload of the page.
i.e. using javascript.
If I could add my own personal opinion to this piece:
this optimisation seems especially harmful to code readability: style sheets don't belong in noscript tags and, as pointed out in the comments, it doesn't pass validation.
It will break any potential future enhancements to HTTP (or other protocol) requests, since the network transaction is hard-coded through javascript.
Finally, under what circumstances would you get a performance gain? Perhaps if your page loads a lot of initially-hidden content; however I would hope that the browser itself is able to optimise the page load better than this hack can.
Take this with a grain of salt, however. I would hesitate to say that Google doesn't know what they're doing.
Edit: note on flash-of-unstyled-content (abbreviated FOUC)
Say you a block of text spanning multiple lines, and includes some text with custom styling, say <span class="my-class">. Now, say that your CSS will set .my-class { font-weight:bold }. If that CSS is not part of the inline style sheet, .my-class will suddenly become bold after the deferred loading has finished. The text block may reflow, and might also change size if it requires an extra line.
So, rather than a flash of totally-unstyled content, you have a flash of partly-styled content.
For this reason you should be careful when considering what CSS is deferred. A safe approach would be to only defer CSS which is used to display content which is itself deferred, for example hidden elements which are displayed after user interaction.

Making a div non-indexable?

I have a div with some sentences that I don't want to be indexed by search engines.
Is it possible to somehow hide this from Google in a way?
I thought about using frames, and having the site within the frame being blocked by robots.txt, but I've never liked the idea of using frames.
Are there other options?
Technically, you could use iframe and put <meta name=robots content=noindex> into the iframed document. Using suitable attributes and CSS, you can make the iframed document appear as part of the page, mostly, though you would still need to reserve some fixed area for it.
Or you could generate the div with JavaScript, thought then it would not be seen when JavaScript is disabled. Note that search engine bots may execute JavaScript code and might thus “see” the generated content, though I would not expect that to happen now or in the near future.
If the content is text, without internal markup or images etc., you could have an empty div with a CSS rule that adds content using the :before pseudoelement and content property. This would fail for users with CSS disabled or with an aggressive user style sheet, and search engine bots might some day start interpretign CSS.
There might be trickier methods, too, but as a whole, there is no good way I think. It’s probably more useful to consider why you would want to prevent from finding the page on the basis of its content. As a tool for hiding information, it would be inefficient.
You could create images from the sentences, then the text wouldn't be indexed.

Embed sandboxed HTML on a webpage + SEO

I want to embed some HTML on my website... I would like that:
SEO: that content can be crawled and indexed
Integration: it renders nicely (does not break my DOM trees for instance, or does not inherit my styles)
Security: it remains safe for our user (javascript disabled)
Flexibility: the HTML can be completely free (don't want any BBCode or MarkDown or even TinyMCE, it's our users that are writing the HTML code...)
I saw that I might be able to use the IFrame for that, but I am not sure it is a very good solution concerning my SEO constraint.
Any answer would be greatly appreciated!!! Thanks.
For your requirements (rendering and security, primarily), IFRAME seems to be your only option, especially when we consider no rules are specified for the HTML content except the JS removal. Even some CSS + 'a' tag can bring a serious security risk, like overlaying outgoing links on your standard interface.
For the SEO part, you can use SEO maps to show the search engines the relation between the content and the container, also use html tags like link to make connection.
To make sure the user's html is safe then you should use HTMLPurifer. In terms of the rest of the question, you should split this up into multiple questions.

How does the Traditional "HTML is only for content" line of thought handle dynamic formatting?

For so long, I've read and understood the following truths concerning web development:
HTML is for content
CSS is for presentation
JavaScript is for behavior.
This is normally all fine and good, and I find that when I strictly follow these guidelines and use external .css and .js files, it makes my entire site much much more manageable. However, I think I found a situation that breaks this line of thought.
I have a custom forums system that I've built for one of my sites. In addition to the usual formatting for such a system (links, images, bold italics and underline, etc) I've allowed my users to set the formatting of their text, including color, font family, and size. All of this is saved in by database of forum messages as formatting code, and then translated to the corresponding HTML when the page is viewed. (A bit inefficient, technically I should translate before saving, but this way I can work on the system live.)
Due to the nature of this and other similar systems, I end up with a lot of tags floating around the resulting HTML code, which I believe are unofficially deprecated since I'm supposed to be using CSS for formatting. This breaks rules one and two, which state that HTML should not contain formatting information, preferring that information to be located in the CSS document instead.
Is there a way to achieve dynamic formatting in CSS without including that information in the markup? Is it worth the trouble? Or, considering the implied limitations of proper code, an I to limit what my users can do in order to follow the "correct" way to format my code?
It's okay to use the style attribute for elements:
This is <span style="color: red;">red text</span>.
If users are limited to only some options, you can use classes:
This is <span class="red">red text</span>.
Be sure to use semantic HTML elements:
This is <strong>strong and <em class="blue">emphasized</em></strong>
text with a link.
Common semantic elements and their user-space terms:
<p> (paragraphs)
<strong> (bold)
<em> (italic)
<blockquote> (quotes)
<ul> and <ol> with <li> (lists)
More...?
Likely less common in forum posts, but still usable semantic elements:
<h1>, <h2>, etc. (headings; be sure to start at a value so your page makes sense)
<del>, and, to a lesser extent, <ins> (strikeout)
<sup> and <sub> (superscript and subscript, respectively)
<dl> with <dt> and <dd> (list of pairs)
<address> (contact information)
More...
This is a bit tricky. I would think about what you really want to allow visitors to do. Arbitrary colours and fonts? That seems rather useless. Emphasis, headings, links, and images? Well that you can handle easily enough by restricting to those tags / using a wikitext/forumtext markup that only provides these features.
You could dynamically build an inline style sheet in the head of the html page fed to the users. Put in the head of the page and allow it to target those elements configurable by the user.
Alternatively, there's the notion of using external stylesheets that feature the most common adjustments, but there'd be hundreds of them to account for every possible alternative. If you use this you'd need an external style sheet for a specific font size, colour and so on, and dynamically link to those in the header. As with any external stylesheet. Though this is almost unbearably complex to enable.
Option one would work okay though.
As an example:
<STYLE>
h1,h2,h3,h4 {font-family: Helvetica, Calibri;}
p {font-size: 1.2em; // Populate all this with values from the Db.
font-weight: bold;
}
a {text-decoration: underline;
color: #f00;
}
</STYLE>
Also, it just occurred to me that you could probably create a per-user stylesheet to apply the configurable aspects. Use
<link href="/css/defaultstylesheet.css" type="text/css" rel="stylesheet" media="all" />
<link href="/css/user1245configured.css" type="text/css" rel="stylesheet" media="all" />
<!-- clearly the second is a stylesheet created for 'user 1245'. -->
The bonus of this approach is that it allows caching of the stylesheet by the browser. Though it might likely clutter up the css folder, unless you have specific user-paths to the user sheet? Wow, this could get complex... :)
This is an interesting situation because you can have an infinite number of different styles, depending on your users' tastes and their own personal styles.
There are a couple of things you can be doing to manage this situation efficiently. Probably the easiest would be to just use style overrides:
<p style="color: blue; font-size: 10pt;">Lorem Ipsum</p>
This is quick and easy. And remember, this is what style overrides are there for. But as you've said, this does not fit well with this content-presentation separation paradigm. To separate them a little more, you could build some CSS information on page load and then insert it into the <head> tag of your HTML. This still keeps the HTML and the CSS somewhat distinct, even though you're not technically sepating them.
Your other option would be to build the CSS and then output that to a file. This, however, would not be efficient (in my opinion). If you were to, on every page load, build a new CSS file that accounts for your users' unique preferences, this would sort of defeat the purpose. It's the same thing as the second option, using the <head> tag, you're just making it look separated. Even if you used techniques such as caching to try to limit how often you have to build a CSS file, will the ends really justify the means?
This is a completely subjective topic and you should, in the end, choose what you're most comfortable with.
I don't know which framework or even language you are using but e.g. Django uses a certain template language to sort of represent the HTML being output. I think a nice solution would be to simply use a different "template" depending on what the user has chosen. This way you wouldn't have to care about breaking the "rules" or having a bunch of basically unused tags floating around in the DOM.
Unless I completely misunderstood...!
The easiest way to manage this is probably to emit dynamic CSS when the pages are generated, based on the user's settings. Then everything is doing the job it is supposed to be doing and the server is doing the work of converting the user's settings into the appropriate CSS.
With the CSS doing this work, you can use appropriate attributes in the HTML (id and name and class and so on) and emit CSS that will cleanly format everything the way you want.
Consider the benefits versus the costs before you do anything. What is actually wrong with your code right now? Tag soup and combined content/presentation is to be avoided not because it makes a bad website, but because it is hard to maintain. If your HTML/CSS is being generated, who cares what the output is? If what you've got now works, then stick to it.
I assume you are allowing only a limited white list of safe options, and therefore parsing the the user's HTML already.
When rendering the HTML you could convert each style declaration to a class:
<span style="font-family: SansSerif; font-size: 18px;">Hello</span>
To:
<span class="SansSerif"><span class="size_18px">Hello</span></span>
Laborious to generate (and maintain) the list. However you needn't worry about a class for each combination, which is of course your main problem.
It also has the benefit of extra security as user's CSS is less likely to slip through your filter as it's all replaced, and this should also ensure all the CSS is valid.
I've allowed my users to set the
formatting of their text, including
color, font family, and size. All of
this is saved in by database of forum
messages as formatting code, and then
translated to the corresponding HTML
when the page is viewed.
So, you've done formatting through HTML, and you know that formatting is supposed to be done through CSS, and you realise this is a problem, and you got as far as asking a 300-word SO question about it ... ?
You don't see the solution, even though you can formulate the question ... ?
Here, I'll give you a hint:
All of this is saved in by database of
forum messages as formatting code, and
then translated to the corresponding
HTML CSS when the page is viewed.
Does that help?
Is this question a joke?

Apart from <script> tags, what should I strip to make sure user-entered HTML is safe?

I have an app that reprocesses HTML in order to do nice typography. Now, I want to put it up on the web to let users type in their text. So here's the question: I'm pretty sure that I want to remove the SCRIPT tag, plus closing tags like </form>. But what else should I remove to make it totally safe?
Oh good lord you're screwed.
Take a look at this
Basically, there are so many things you want to strip out. Plus, there's stuff that's valid, but could be used in malicious ways. What if the user wants to set their font size smaller on a footnote? Do you care if that get applied to your entire page? How about setting colors? Now all the words on your page are white on a white background.
I would look into the requirements phase again.
Is a markdown-like alternative possible?
Can you restrict access to the final content, reducing risk of exposure? (meaning, can you set it up so the user only screws themselves, and can't harm other people?)
You should take the white-list rather than the black-list approach: Decide which features are desired, rather than try to block any unwanted feature.
Make a list of desired typographic features that match your application. Note that there is probably no one-size-fits-all list: It depends both on the nature of the site (programming questions? teenagers' blog?) and the nature of the text box (are you leaving a comment or writing an article?). You can take a look at some good and useful text boxes in open source CMSs.
Now you have to chose between your own markup language and HTML. I would chose a markup language. The pros are better security, the cons are incapability to add unexpected internet contents, like youtube videos. A good idea to prevent users' rage is adding an "HTML to my-site" feature that translates the corresponding HTML tags to your markup language, and delete all other tags.
The pros for HTML are consistency with standards, extendability to new contents types and simplicity. The big con is code injection security issues. Should you pick HTML tags, try to adopt some working system for filtering HTML (I think Drupal is doing quite a good job in this case).
Instead of blacklisting some tags, it's always safer to whitelist. See what stackoverflow does: What HTML tags are allowed on Stack Overflow?
There are just too many ways to embed scripts in the markup. javascript: URLs (encoded of course)? CSS behaviors? I don't think you want to go there.
There are plenty of ways that code could be sneaked in - especially watch for situations like <img src="http://nasty/exploit/here.php"> that can feed a <script> tag to your clients, I've seen <script> blocked on sites before, but the tag got right through, which resulted in 30-40 passwords stolen.
<iframe>
<style>
<form>
<object>
<embed>
<bgsound>
Is what I can think of. But to be sure, use a whitelist instead - things like <a>, <img>† that are (mostly) harmless.
† Just make sure that any javascript:... / on*=... are filtered out too... as you can see, it can get quite complicated.
I disagree with person-b. You're forgetting about javascript attributes, like this:
<img src="xyz.jpg" onload="javascript:alert('evil');"/>
Attackers will always be more creative than you when it comes to this. Definitely go with the whitelist approach.
MediaWiki is more permissive than this site; yes, it accepts setting colors (even white on white), margins, indents and absolute positioning (including those that would put the text completely out of screen), null, clippings and "display;none", font sizes (even if they are ridiculously small or excessively large) and font-names (even if this is a legacy non-Unicode Symbol font name that will not render text successfully), as opposed to this site which strips out almost everything.
But MediaWiki successifully strips out the dangerous active scripts from CSS (i.e. the behaviors, the onEvent handlers, the active filters or javascript link targets) without filtering completely the style attribute, and bans a few other active elements like object, embed, bgsound.
Both sits are banning marquees as well (not standard HTML, and needlessly distracting).
But MediaWiki sites are patrolled by lots of users and there are policy rules to ban those users that are abusing repeatedly.
It offers support for animated iamges, and provides support for active extensions, such as to render TeX maths expressions, or other active extensions that have been approved (like timeline), or to create or customize a few forms.