I keep seeing things like <div ... g_editable="true" ... >.
I've searched for anything to help me understand its purpose, but all I get back is more markup and nothing explaining it.
Can somebody please explain it?
The g_editable="true" attribute is used by Google in their Gmail email composition (and likely other Google products as well). Instead of using a textarea tag they use a source-less iframe and all of the email contents are highlighted, typed in, made bold, adding line breaks etc are all handled by javascript so that it resembles a textarea, but with a lot more functionality.
The g_editable="true" attribute itself is likely to limit the scope that different functions and events have so that you can't just boldify text in your contacts list and that sort of thing. It's probably why they have it in an iframe as well.
It may just be a custom attribute defined by Google. It could be a hook for their JavaScript and/or CSS.
For best practice though, they should have prefixed it with data-.
Related
So, the citeattribute is used with an URL as a value to indicate the source of a quote (for <q> and <blockquote>) or a page that can provide additional information (for <del> and <ins>).
Because this URL isn't shown in any way to the end user by the browser, the only reason to put it in the document would be for non-user, hence crawlers, bots and whatever. You could also use it with a script, but that's not in my intentions.
My question: is it, from your experience, worth it to bother with this attribute or should I pass over?
What if I linked to that URL with a, which is way more common: would your answer change?
Yes, use cite, the semantic web can be ours today. Cite becomes more important the more people use it. If everyone used the attribute, developers could make some pretty awesome stuff.
Using extra semantics such as these are also good for SEO. Even if Google does not currently look for it, it is a safe bet they eventually will.
This is an optional attribute which you can use if you feel it has value in your case. If you are concerned that your quote's legitimacy might be in question, and you have a source you took it from, then adding the cite attribute gives inquisitive people or bots an idea of where you got the information. If you don't really care that people know where you got the quote from, then don't bother.
You might also put it for your own reference so when you look at it a year from now you'll have that information at hand.
What does data-component-bound="true" mean?
I've found this within a collapsed element but adjusting the value doesn't do anything. I've tried looking for the attribute "data-component-bound" on Stack Overflow and on Google but it points to a limited set of various jquery articles which are over my head and which take it for granted.
[edit post some answers]
Ahh, I see now that i should have been searching for "data-" to solve this. In so doing, I found this useful article which could help the next person: http://www.sitepoint.com/use-html5-data-attributes/
data- attributes are extensible: the author of the code can make up any data attribute they like. In this case, from a quick look, it seems it is used for the internal workings of websites to know when the 'component' (i.e. DOM element) is 'bound' to something - an event, an interaction, etc.
In general, data- attributes are used for that: data. They store any data, so are often used to substitute non-standard attributes that would otherwise flag up in a validator.
HTML data attributes allow you to set custom data for an element. The meaning of data-component-bound is determined by your code or some css or javascript framework that you might be using.
Search the codebase for 'component-bound' to see if it's being used. If you don't find anything, Google it to see if it's a popular attribute from some framework. If it's not, then you're safe to remove it.
https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_data_attributes
I am seeing some attribute I have never seen before in a div tag. I haven't touch html for a while but googling the attribute didn't return much useful info.
<div dataquery="#item_1306" comp="box.components.Flashplayer" id="box_Flashplayer_2" propertyquery="#box_Flashplaye_2" class="box_Flashplaye_style2"...
My question is, do you know what are these "dataquery" "comp" and "propertyquery" attributes?
Thanks alot folks.
HTML is often enhanced with custom attributes these days, and HTML5 explicitly allows for that. Normally these attributes should be prefixed with "data-", but obviously this is not the case here.
The meaning depends most probably on a script included in the page.
For example, in twitter bootstrap it is common to see attributes like <body data-spy='scroll'> which is than interpreted by a script and allows for monitoring the amount a user scrolls.
When including Facebook like buttons you may have attributes like data-style which controls whether a box, or a button, or hwatever is used.
You can add you own attributes to elements. I don't think theese atributes are standard attributes lika class and name but an attribute that the programmer has added self for some purpose.
Those are not W3C attributes, they have used to perform some task, may be to the lagulage it used and may performance some special tags, But its not best practice because it gives HTML validation errors, better thing is use data-xxxx tag for extra attributes.
More readings.
http://www.javascriptkit.com/dhtmltutors/customattributes.shtml
http://ejohn.org/blog/html-5-data-attributes/
http://html5doctor.com/html5-custom-data-attributes/
We want to allow "normal" href links to other webpages, but we don't want to allow anyone to sneak in client-side scripting.
Is searching for "javascript:" within the HREF and onclick/onmouseover/etc. events good enough? Or are there other things to check?
It sounds like you're allowing users to submit content with markup. As such, I would recommend taking a look at a few articles about preventing cross-site scripting which would cover a bit more than simply preventing javascript from being inserted into an HREF tag. Below is one I found that might be useful:
http://weblogs.java.net/blog/gmurray71/archive/2006/09/preventing_cros.html
You'll have to use a whitelist of allowed protocols to be completely safe. If you use a blacklist, sooner or later you'll miss something like "telnet://" or "shell:" or some exploitable browser-specific thing you've never heard of...
Nope, there's a lot more that you need to check.
First of the URL could be encoded (using HTML entities or URL encoding or a mixture of both).
Secondly you need to check for malformed HTML, which the browser might guess at and end up allowing some script in.
Thirdly you need to check for CSS based script, e.g. background: url(javascript:...) or width:expression(...)
There's probably more that I've missed - you need to be careful!
You have to be extremely careful when taking user input. You'll want to do a whitelist as mentioned, but not just with the href. Example:
<img src="nosuchimage.blahblah" onerror="alert('Haxored!!!');" />
or
click meh
one option would be to disallow html at all and use the same sort of formatting that some forums use. Just replace
[url="xxx"]yyy[/url]
with
yyy
That'll get you around the issues with mouse over etc. Then just make sure the link starts off with a white-listed protocol, and doesn't have a quote in it (" or some such that might be decrypted by php or the browser).
Sounds like you're looking for the companion function to PHP's strip_tags, which is strip_attributes. Unfortunately, it hasn't been written yet. (Hint, hint.)
There is, however, an interesting-looking suggestion in the strip_tags documentation, here:
http://www.php.net/manual/en/function.strip-tags.php#85718
In theory this will strip anything that isn't an href, class, or ID from submitted links; seems like you probably want to lock it down even further and just take hrefs.
I have an app that reprocesses HTML in order to do nice typography. Now, I want to put it up on the web to let users type in their text. So here's the question: I'm pretty sure that I want to remove the SCRIPT tag, plus closing tags like </form>. But what else should I remove to make it totally safe?
Oh good lord you're screwed.
Take a look at this
Basically, there are so many things you want to strip out. Plus, there's stuff that's valid, but could be used in malicious ways. What if the user wants to set their font size smaller on a footnote? Do you care if that get applied to your entire page? How about setting colors? Now all the words on your page are white on a white background.
I would look into the requirements phase again.
Is a markdown-like alternative possible?
Can you restrict access to the final content, reducing risk of exposure? (meaning, can you set it up so the user only screws themselves, and can't harm other people?)
You should take the white-list rather than the black-list approach: Decide which features are desired, rather than try to block any unwanted feature.
Make a list of desired typographic features that match your application. Note that there is probably no one-size-fits-all list: It depends both on the nature of the site (programming questions? teenagers' blog?) and the nature of the text box (are you leaving a comment or writing an article?). You can take a look at some good and useful text boxes in open source CMSs.
Now you have to chose between your own markup language and HTML. I would chose a markup language. The pros are better security, the cons are incapability to add unexpected internet contents, like youtube videos. A good idea to prevent users' rage is adding an "HTML to my-site" feature that translates the corresponding HTML tags to your markup language, and delete all other tags.
The pros for HTML are consistency with standards, extendability to new contents types and simplicity. The big con is code injection security issues. Should you pick HTML tags, try to adopt some working system for filtering HTML (I think Drupal is doing quite a good job in this case).
Instead of blacklisting some tags, it's always safer to whitelist. See what stackoverflow does: What HTML tags are allowed on Stack Overflow?
There are just too many ways to embed scripts in the markup. javascript: URLs (encoded of course)? CSS behaviors? I don't think you want to go there.
There are plenty of ways that code could be sneaked in - especially watch for situations like <img src="http://nasty/exploit/here.php"> that can feed a <script> tag to your clients, I've seen <script> blocked on sites before, but the tag got right through, which resulted in 30-40 passwords stolen.
<iframe>
<style>
<form>
<object>
<embed>
<bgsound>
Is what I can think of. But to be sure, use a whitelist instead - things like <a>, <img>† that are (mostly) harmless.
† Just make sure that any javascript:... / on*=... are filtered out too... as you can see, it can get quite complicated.
I disagree with person-b. You're forgetting about javascript attributes, like this:
<img src="xyz.jpg" onload="javascript:alert('evil');"/>
Attackers will always be more creative than you when it comes to this. Definitely go with the whitelist approach.
MediaWiki is more permissive than this site; yes, it accepts setting colors (even white on white), margins, indents and absolute positioning (including those that would put the text completely out of screen), null, clippings and "display;none", font sizes (even if they are ridiculously small or excessively large) and font-names (even if this is a legacy non-Unicode Symbol font name that will not render text successfully), as opposed to this site which strips out almost everything.
But MediaWiki successifully strips out the dangerous active scripts from CSS (i.e. the behaviors, the onEvent handlers, the active filters or javascript link targets) without filtering completely the style attribute, and bans a few other active elements like object, embed, bgsound.
Both sits are banning marquees as well (not standard HTML, and needlessly distracting).
But MediaWiki sites are patrolled by lots of users and there are policy rules to ban those users that are abusing repeatedly.
It offers support for animated iamges, and provides support for active extensions, such as to render TeX maths expressions, or other active extensions that have been approved (like timeline), or to create or customize a few forms.