Link sharing - Google Caja HTML Sanitizer - html

I'm trying to tackle the classic "user input sanitisation" problem on my new web app and I chose to use Google Caja's HTML Sanitizer server-side for this task.
Although the implementation + testing went fine, I still have some questions :
1) I could only find the HTML4 definitions, does this mean that HTML5 tags wouldn't be safe ?
I did some tests with HTML 5 specific tag / attribute XSS's and although none of them worked I'm not 100% sure that some untested ones wouldn't work.
2) Google Caja doesn't seem too active anymore, would this constitute a security issue ?
3) I want my users to be able to share links, how would I be able to do this in a safe way that passes Google Caja's filters ? (like StackOverflow)
4) How does Caja handle Unicode ?
Thank you in advance !

1) I could only find the HTML4 definitions, does this mean that HTML5 tags wouldn't be safe ?
We have added HTML5 support in the past few months. Please let us know if anything is missing.
2) Google Caja doesn't seem too active anymore, would this constitute a security issue ?
Are you perhaps looking in the wrong place? We're quite busy, as you can see here.
3) I want my users to be able to share links, how would I be able to do this in a safe way that passes Google Caja's filters ? (like StackOverflow)
You can supply a URI policy which permits or rejects outgoing links.
4) How does Caja handle Unicode ?
Correctly, I should hope. If things don't work, please file a bug.

Related

Is there a Google Chrome switch (command line) to toggle url-encoding?

I have written a C# code that calls Google Chrome to load a specific url on the computers local hard drive. The C# code already encodes the URL (replaces space characters with %20) and this used to work fine until a couple of weeks ago that this issue started happening!
The quickest band-aid seems to be modifying the C# code to not replace " " with %20. But the Chrome behavior may change in the future or some customers may not have the same version/settings of the chrome installed.
I think the sustainable solution is to Tell Chrome to either Encode or not encode the URL. Now my question is:
Is there a Chrome switch that enforces it to either translate the url or ignore it?
Does it have a setting or option for this?
I have found a couple of lists with Chrome Switches Here and Here. However both of these lists are incomplete and refer to this reference. I have Ctrl+F'ed for "URL" and "URI" but did not come across a relevant answer.
Thanks in advance.
I have already asked this question on Super User, however I think
people in this forum has more expertise on solving these types of
issues. (Also after 24hrs, the question in Super User has not received
any feedback at all!)
There is no Switch for it, however, as HexBolt has stated adding the "file://" tells Chrome that the url is already translated. Please see his extension to the answer for this question.

Give all links to a certain domain the "no follow"-attribute automatically? (html, css)

I just want to give all external links which go to a certain domain (and all of its subpages) the "no follow"-attribute automatically. Only to a certain domain, not all external links. I use Wordpress, so maybe there is a plugin (I didnt find any)?
Is that possible with css or html without doing it 1000 times via hand manually?
Thanks a bunch!
PS: sry for my bad english, I am no native speaker :(
Not possible with CSS or HTML. But there are a lot of plugins doing that job. By default, all outgoing links get a nofollow. But you can configure most plugins to exclude certain domains. Unfortunately, you have to go the other way around, exclude your "certain domain".
Try External Links or Ultimate Nofollow as a start. It's not too complicated.
But there are lots of others.
OR write a custom function in your functions.php. Take this tutorial as an example and modify the processing - so that it only processes your custom domain.

Good link checking tool?

Can anyone recommend a good, free link checker to check all pages within a domain? Ideally a browser add-on or a web app (otherwise something that runs on OSX).
Crucially it needs to follow links recursively within a domain. Links outside the domain should be followed to a depth of 1, but not checked recursively.
This is for the fairly common situation where you want to check all pages on your own site, but not evaluate the links on e.g. Google's homepage.
I can't find anything suitable. Am I missing something?
I've tried the Firefox LinkChecker add-on and the W3C link validator - neither seem to have the 'follow recursively within a domain' property, or am I being dumb?
I know Xenu does this, but I don't run Windows.
The W3C offering does only check to a depth of 1 when it leaves the domain.

Google toolbar reporting "This page is in Filipino" when it is English

How does Google Toolbar determine the language of a page to offer translation from it?
Google is mis-identifiying a simple login page on our site as Filipino and offering to translate it into English. I've tried added a lang="en" attribute to the <html> element of the page, but that seems to have made no difference.
Anyone know why this is happening?
Edit: It's a login page. The text of the page consists only of the following:
Admin
Log Out
Admin Panel Login
Username
Password
Plus a logo and some input boxes.
When I press the translate button, it doesn't seem to change anything.
One way you can fix this problem is to let Google know it made a mistake on translating your page. Not a real solution though, especially if there's a whole website dealing with this issue.
According to this article on multilingual websites from the Google Webmaster blog, Google's crawlers ignore language metadata such as the "lang" attribute and infer the language from the page content. Their explanation is that the lang attribute is sometimes auto-generated and therefore not reliable. Perhaps adding more English text to the page and ensuring that all the English is well-formed may fix the problem, although submitting a bug report to Google is a better way to fix the problem than adding random English text.
I had this problem on an aspx form I was making. By means of process of elimination, I was able to identify the problem for me was in my calendar control. I was using the calendar control and in my skin I was setting the DayNameFormat="Shortest". With this property, I had the issue, without it I did not. What this property did was take my days of the week and change them from "Mon" to "Mo". I'm speculating that the Google Language inference was reading "words" like "Mo" and "Tu" and using this to guess that this was Filipino. Since I didn't have many other words, this must have been enough of a weight to determine that the page was Filipino.
Hope that helps!
jMo

validating HTML

I am beginner in HTML and CSS. I just designed web site and tried to validate but my HTML end up having some "geovisit();"
and it wont validate.
I do not know how to get rid of it.
Help me?
Thank you
Guest
A quick Google search for geovisit suggests that the non-validating code is being added by your hosting provider. It looks like this problem may actually be specific to Yahoo!, which has an option to disable that "feature". I suggest you read this forum thread on the problem.
That's usually Yahoo (or other hosting providers) sticking javascript on your page without your knowing. In Yahoo's case you should be able to turn it off if you dig through the settings.