If I use a data URI to construct a src attribute for an HTML element, can it in turn have another data URI inside it?
I know you can't use data uri's for iframes (I'm actually trying to construct an OSDX document and pass it to the browser with an icon encoded in base64 but that's a really niche use case and this is more of a general question), but assuming you could, my use case would look like:
var iframe = document.createElement('iframe');
var icon = document.createElement('image');
var iSrc = '*[REALLY LONG STRING]*/';
iframe.src='data:text/html,<html><body><image src="'+iSrc+'" /></body</html>
document.body.appendChild(iframe);
Basically what I'm after is is there anything in a data uri that would break a parent data uri?
Yes you can. I really thought it was impossible, as did everyone I asked.
Example:
Pasting the following into your browser's URL bar should render a gmail logo in an html page that says hello world.
data:text/html,<html><body><p>hello world</p><img src="" /></body></html>
or for a shorter example courtesy of Pumbaa80:
data:text/html,<script src="data:text/javascript,alert('hello world')"></script>
MSDN explicitly supports this:
Data URIs can be nested.
An old blog entry talks a little bit more about embedding images within CSS using data: :
Neither dataURI spec nor any other mentions if dataURI’es can not be nested. So here’s the testcase where dataURI’ed CSS has dataURI’ed image embedded. IE8b1, Firefox3 and Safari applied the stylesheet and showed the image, Opera9.50 (build 9613) applies the stylesheet but doesn’t show the embedded image! So it seems that Opera9 doesn’t expect to get anything embedded inside of an already embedded resource! :D
But funny thing, as IE8b1 supports expressions and also supports nested data URI’es, it has the same potential security flaw as Firefox does (as described in the section above). See the testcase — embedded CSS has the following code: body { background: expression(a()); } which calls function a() defined in the javascript of the main page, and this function is called every time the expression is reevaluated. Though IE8b1 has limited expressions support (which is going to be explained in a separate post) you can’t use any code as the expression value, but you can only call already defined functions or use direct string values. So in order to exploit this feature we need to have a ready javascript function already located on the page and then we can just call it from the expression embedded in the stylesheet. That’s not very trivial obviously, but if you have a website that allows people to specify their own stylesheets and you want to be on the safe side, you have to either make sure you don’t have a javascript function that can cause any potential harm or filter expressions from people’s stylesheets.
Related
Can a website's generated HTML be saved using Canopy? Looking at the documentation under 'Getting Started', I could not find anything related.
You can run arbitrary JavaScript using js, document.documentElement.outerHTML will return the current DOM, so
let html = js "return document.documentElement.outerHTML" |> string
does the trick.
Canopy is a wrapper around Selenium that provides some useful helper functions. But it also provides access to the Selenium IWebElement instances in case you need them, via the element function (halfway down the page; there don't seem to be internal anchors in that page so I couldn't link directly to the function). Then once you have the IWebElement object, your problem becomes similar to this one, where the answer seems to be elem.getAttribute("innerHtml") where elem is the elememt whose content you want (which might even be the html element). Note that the innerHtml attribute is not a standard DOM attribute, so this won't work with all Selenium drivers; it will be dependent on which browser you're running in. But it apparently works on all major Web browsers.
See Get HTML Source of WebElement in Selenium WebDriver using Python for a related question using Python, which has more discussion about whether the innetHtml attribute will work in all browsers. If it doesn't, Canopy also has the js function, which you could leverage to run some Javascript to get the HTML you're looking for -- but if you're having trouble with that, you probably need to ask a Javascript question rather than an F# question.
So, I was taking a look at some websites that display video on their front page, and I came across this website. I wanted to know how they had accomplished such a remarkable result, so I inspected the element that contained the video. To my surprise, I encountered html attributes I had never seen before.
My question is: If this is not standard html, then what is it?, and how can I use it?
Here is a snippet of the code they are using:
<div class="frontpage-head-wrapper" data-has-video="1" data-video-mp4="http://d27shkkua6xyjc.cloudfront.net/videos/maaemo-film-2.mp4?mtime=20141113185431" data-video-ogv="http://d27shkkua6xyjc.cloudfront.net/videos/maaemo-film-2.ogv?mtime=20141113185421">
</div>
A point before I answer your question is that the data attributes are custom attributes in which user can store any data. These were introduced in HTML5. For more information please refer to this
From what I understand looking at the code link https://developer.mozilla.org/en/docs/Web/Guide/HTML/Using_data_attributes
Now to your question
what happens is that the website is using custom data attributes and the custom attributes specifies the url where the video is stored. And when I dug further in the code, I found that they are displaying video using ajax calls
Those are data attributes. They are useful because they are standardized, and
allow easy JavaScript access:
ch = document.querySelector('div').dataset;
// 1
ch.hasVideo;
// http://d27shkkua6xyjc.cloudfront.net/videos/maaemo-film-2.mp4?mtime=20141113185431
ch.videoMp4;
// http://d27shkkua6xyjc.cloudfront.net/videos/maaemo-film-2.ogv?mtime=20141113185421
ch.videoOgv;
Using data attributes
I am looking for a way to replace the content of the src attribute for an iframe with a dummy variant containing the original src value (but will not actually fetch anything). I am loading the html code via Ajax so I can change the src-attribute before the code is injected into the DOM - so I don't need help with that part. What I would appreciate feedback on is what to put in the src attribute. There is a related post here discussing what can go in the src attribute, but in contrast to this post, I want to store data (namely the original src value) so that I can extract it later. It seems the alternatives are:
src="javascript:/*http://originalsrcvalue.com*/"
src="about:blank/*http://originalsrcvalue.com*/"
src="#http://originalsrcvalue.com"
I am leaning towards the last variant using bookmarks. I'm looking for feedback on potential problems or cross-browser issues that might arise or suggestions for alternative solutions.
Edit: One way of addressing the problem is to use custom attributes - and this is probably what I'll end up using in this specific case. However, I would also like feedback on ways to store data in src-tags in the fashion showed above.
You could store the actual URL to a data-your-data-name attribute and fetch it with Javascript when you need it, by doing element.getAttribute('data-your-data-name') or if you don't care much about IE users, by element.dataset.yourDataName
References:
https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_data_attributes
https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/dataset
I'm trying to understand why do I need to use XSS library when I can merely do HtlEncode when sending data from server to client ...?
For example , here in Stackoverflow.com - the editor - all the SO tem neads to do is save the user input and display it with html encode.
This way - there will never going to be a HTML tag - which is going to be executed.
I'm probably wrong here -but can you please contradict my statement , or exaplain?
For example :
I know that IMG tag for example , can has onmouseover , onload which a user can do malicious scripts , but the IMG won't event run in the browser as IMG since it's <img> and not <img>
So - where is the problem ?
HTML-encoding is itself one feature an “XSS library” might provide. This can be useful when the platform doesn't have a native HTML encoder (eg scriptlet-based JSP) or the native HTML encoder is inadequate (eg not escaping quotes for use in attributes, or ]]> if you're using XHTML, or #{} if you're worried about cross-origin-stylesheet-inclusion attacks).
There might also be other encoders for other situations, for example injecting into JavaScript strings in a <script> block or URL parameters in an href attribute, which are not provided directly by the platform/templating language.
Another useful feature an XSS library could provide might be HTML sanitisation, for when you want to allow the user to input data in HTML format, but restrict which tags and attributes they use to a safe whitelist.
Another less-useful feature an XSS library could provide might be automated scanning and filtering of input for HTML-special characters. Maybe this is the kind of feature you are objecting to? Certainly trying to handle HTML-injection (an output stage issue) at the input stage is a misguided approach that security tools should not be encouraging.
HTML encoding is only one aspect of making your output safe against XSS.
For example, if you output a string to JavaScript using this code:
<script>
var enteredName = '<%=EnteredNameVariableFromServer %>';
</script>
You will be wanting to hex entity encode the variable for proper insertion in JavaScript, not HTML encode. Suppose the value of EnteredNameVariableFromServer is O'leary, then the rendered code when properly encoded will become:
<script>
var enteredName = 'O\x27leary';
</script>
In this case this prevents the ' character from breaking out of the string and into the JavaScript code context, and also ensures proper treatment of the variable (HTML encoding it would result in the literal value of O'leary being used in JavaScript, affecting processing and display of the value).
Side note:
Also, that's not quite true of Stack Overflow. Certain characters still have special meanings like in the <!-- language: lang-none --> tag. See this post on syntax highlighting if you're interested.
I'm trying to add localization support to a Google Chrome Web App and, while it is easy to define strings for manifest and CSS files, it is somewhat more difficult for HTML pages.
In the manifest and in CSS files I can simply define localization strings like so:
__MSG_name__
but this doesn't work with HTML pages.
I can make a JavaScript function to fire onload that does the job like so:
document.title = chrome.i18n.getMessage("name");
document.querySelector("span.name").innerHTML = chrome.i18n.getMessage("name");
but this seems awfully ineffecient. Furthermore, I would like to be able to specify the page metadata; application-name and description, pulling the values from the localization files. What would be the best way of doing all this?
Thanks for your help.
Please refer to this documentation:
http://code.google.com/chrome/extensions/i18n.html
If you want to add localized content within HTML, you would need to do it via JavaScript as you mentioned before. That is the only way you can do it.
chrome.i18n.getMessage("name")
It isn't inefficient to do that, you can place your JavaScript at the end of the document (right before the end body tag) and it will fill up the text with respect to the locale.
Dunno if i understand exactly what you are trying to do but you could dynamically retrieve the LANG attribute (using .getAttribute("lang") or .lang) of the targeted tag and serve accordingly the proper values.