I want to use an HTML Entity Encoder for my HTML <div>${data}</div>
I was using ESAPI library ESAPI.encoder().encodeForHTML But I am not sure if it is correct, because for instance, the result of encoding test/a/2 using ESAPI.encoder().encodeForHTML is test/a/2 (that's what I see in my JSP using :
<div><esapi:encodeForHTML>${deviceKey}</esapi:encodeForHTML></div>
but on this site http://www.web2generators.com/html-based-tools/online-html-entities-encoder-and-decoder the result is test/a/2 (?!) Why ?
ESAPI is almost unsupported due to a lack of interest in maintaining the platform; we'd love to have more active developers.
If you're doing active Ajax replacing of divs, you should look into safe templating and SCE in AngularJS as a replacement for ESAPI.
https://docs.angularjs.org/api/ngSanitize/service/$sanitize
https://docs.angularjs.org/api/ng/service/$sce
Related
I have an AngularJS 1.x app and a node.js server which needs to send emails occasionally. The emails require that horrible old-style HTML and CSS to make them render nicely in the major email clients, but they also share much data and logic that is already there in the Angular app.
I could use Jade, Pug, Mustache, doT, etc... as an Express templating engine, but it seems to me, ng-repeat, ng-if would also do the job without having to learn another framework and syntax.
What approaches do people recommend for using Angular server side to render emails?
First off, is there a email client rendering problem with custom elements and attributes (eg <div ng-if="..." ... ></div>) that will be left over in the email?
If not...
What I really want is some engine that takes a template with Angular markup as an input, processes it within the context of some Angular controller or component, and produces the finally rendered HTML as an output.
Here's some approaches I'm considering
Use Angular 1.x with jsdom or the like. Something like angular-on-server or ng-node-compile.
Use Angular 2.x. angular2-universal-starter is the closest example I've found, but I can't quickly understand how that works and whether it fits my use case.
NOTE: this is not the usual use case that uses Angular server-side to initial page render faster or for SEO purposes. I have no need, after the email is rendered, to run Angular on the "client side" (you can't even do that in emails).
I am running a springboot application with Thymeleaf and reactJS. All the HTML text are read from message.properties by using th:text in the pages, but when I have th:text in reactJS HTML block, reactJS seems angry about it.
render() {
return (
<input type="text" th:text="#{home.welcome}">
)
}
The error is:
Namespace tags are not supported. ReactJSX is not XML.
Is there a walkaround besides using dangerouslySetInnerHTML?
Thank you!
There is no sane workaround.
You are getting this error because Thymeleaf outputs XML, and JSX parsers do not parse XML.
You did this because JSX looks very, very similar to XML. But they are very, very different, and even if you somehow hacked Thymeleaf to strip namespaced attributes and managed to get a component to render, it would be merely a fleeting moment of duct-taped-together, jury-rigged code that will fall apart under further use.
This is a really, really bad idea because JSX is Javascript. You are generating Javascript on the fly. Just to name a few reasons this will not work in the long term:
This makes your components difficult if not impossible to test.
Reasoning about application state will be a nightmare as you will struggle to figure out if the source of a certain state is coming from Thymeleaf or JS.
Your application will completely grind to a halt if Thymeleaf outputs bad JS.
These problems will all get worse with time (Thyme?) as as developers abuse the ease with which they can render server-side data to the client-side, leading to an insane application architecture.
Do not do this. Just use Thymeleaf, or just use React.
Sample Alternative: I primarily work on a React application backed by a Java backend. So I understand how someone could stumble upon this hybrid and think it might be a good idea. You are likely already using Thymeleaf and are trying to figure out how you can avoid rewriting your servlets but still get the power of React.
We were in a similar boat two years ago, except with an aging JSP frontend, but the difference is negligible. What we did (and it works well) is use a JSP page to bootstrap the entire React application. There is now one JSP page that we render to the user. This JSP page outputs JSON into a single <script> tag that contains some initial startup data that we would otherwise have to fetch immediately. This contains resources, properties, and just plain data.
We then output another <script> that points to the location of a compiled JS module containing the entire standalone React application. This application loads the JSON data once when it starts up and then makes backend calls for the rest. In some places, we have to use JSP for these, which is less than ideal but still better than your solution. What we do is have the JSP pages output a single attribute containing JSON. In this way (and with some careful pruning by our XHR library) we get a poor man's data interchange layer built atop a JSP framework we don't have time to change.
It is definitely not ideal, but it works well and we have benefited vastly from the many advantages of React. When we do have issues with this peculiar implementation, they are easy to isolate and resolve.
It is possible wrap ReactJS apps in Thymeleaf. Think if you want a static persistent part (like some links, or even just displayed data), you could use Thymeleaf. If you have a complicated part (something that requires DOM repaints, shared data, updates from UI/Sockets/whatever), you could use React.
If you need to pass state you could use Redux/other methods.
You could have your backend send data via a rest API to the React part and just render your simple parts as fragments or as whole chunks of plain HTML using Thymeleaf.
Remember, Thymeleaf is really just HTML. React is virtual DOM that renders as HTML. It's actually fairly easy to migrate one to the other. So you could write anything "Static" or that does not respond much to UI, in Thymeleaf/HTML. You could also just render those parts in React too, but without State.
Thymeleaf 3 allows you to render variables from your Java to a separate JS file. So that is also an option to pass into JSX
function showCode() {
var code = /*[[${code}]]*/ '12345';
document.getElementById('code').innerHTML = code;
}
Now you can use data- prefix attributes (ex. data-th-text="${message}").
https://www.thymeleaf.org/doc/tutorials/3.0/usingthymeleaf.html#support-for-html5-friendly-attribute-and-element-names
I'm trying to understand why do I need to use XSS library when I can merely do HtlEncode when sending data from server to client ...?
For example , here in Stackoverflow.com - the editor - all the SO tem neads to do is save the user input and display it with html encode.
This way - there will never going to be a HTML tag - which is going to be executed.
I'm probably wrong here -but can you please contradict my statement , or exaplain?
For example :
I know that IMG tag for example , can has onmouseover , onload which a user can do malicious scripts , but the IMG won't event run in the browser as IMG since it's <img> and not <img>
So - where is the problem ?
HTML-encoding is itself one feature an “XSS library” might provide. This can be useful when the platform doesn't have a native HTML encoder (eg scriptlet-based JSP) or the native HTML encoder is inadequate (eg not escaping quotes for use in attributes, or ]]> if you're using XHTML, or #{} if you're worried about cross-origin-stylesheet-inclusion attacks).
There might also be other encoders for other situations, for example injecting into JavaScript strings in a <script> block or URL parameters in an href attribute, which are not provided directly by the platform/templating language.
Another useful feature an XSS library could provide might be HTML sanitisation, for when you want to allow the user to input data in HTML format, but restrict which tags and attributes they use to a safe whitelist.
Another less-useful feature an XSS library could provide might be automated scanning and filtering of input for HTML-special characters. Maybe this is the kind of feature you are objecting to? Certainly trying to handle HTML-injection (an output stage issue) at the input stage is a misguided approach that security tools should not be encouraging.
HTML encoding is only one aspect of making your output safe against XSS.
For example, if you output a string to JavaScript using this code:
<script>
var enteredName = '<%=EnteredNameVariableFromServer %>';
</script>
You will be wanting to hex entity encode the variable for proper insertion in JavaScript, not HTML encode. Suppose the value of EnteredNameVariableFromServer is O'leary, then the rendered code when properly encoded will become:
<script>
var enteredName = 'O\x27leary';
</script>
In this case this prevents the ' character from breaking out of the string and into the JavaScript code context, and also ensures proper treatment of the variable (HTML encoding it would result in the literal value of O'leary being used in JavaScript, affecting processing and display of the value).
Side note:
Also, that's not quite true of Stack Overflow. Certain characters still have special meanings like in the <!-- language: lang-none --> tag. See this post on syntax highlighting if you're interested.
I have created a spring application with multiple language support, using spring localization/Internationalization and jstl. Now I am going to remove all the jsp and replace it with html. Can make use of spring localization/Internationalization and resource bundles in pure html without jstl? (I am sure there has to be a way.)
You can get rid of JSTL if that's what you are asking for. After all, Spring has its own <sp:message> tag.
However, if you want to get rid of JSP completely and only serve static HTML, I am afraid it can't be done correctly.
That is, you can possibly generate the whole page with JavaScript (i.e. jQuery), but how useful is that?
And you'll be forced to implement some means of Localization for JavaScript anyway. I mean you'll probably need to generate file with translations on-the-fly.
It's do-able, but it would be extremely easy to introduce for example concatenation defects (that won't allow for re-ordering the sentence, that is proper translations).
To summarize this: you probably can do that, but you probably should not.
It is possible..with minimum tweaks..
Dont remove jSP,JSTL etc..
Convert each submit request to ajax..A server doesnt care whether a request is a normal browser submit request or XMLHttpRequest(ajax)..server will use JSP,JSTL to prepare appropriate HTML..u need ajax to render that html string into DOM.
$.ajax(url:'/xyz',
success:function(htmlFromServer){
document.open();
document.write(htmlFromServer);
document.close();
});
I have a couple of websites that I want to extract data from and based on previous experiences, this isn't as easy as it sound. Why? Simply because the HTML pages I have to parse aren't properly formatted (missing closing tag, etc.).
Considering that I have no constraints regarding the technology, language or tool that I can use, what are your suggestions to easily parse and extract data from HTML pages? I have tried HTML Agility Pack, BeautifulSoup, and even these tools aren't perfect (HTML Agility Pack is buggy, and BeautifulSoup parsing engine doesn't work with the pages I am passing to it).
You can use pretty much any language you like just don't try and parse HTML with regular expressions.
So let me rephrase that and say: you can use any language you like that has a HTML parser, which is pretty much everything invented in the last 15-20 years.
If you're having issues with particular pages I suggest you look into repairing them with HTML Tidy.
I think hpricot (linked by Colin Pickard) is ace. Add scrubyt to the mix and you get a great html scraping and browsing interface with the text matching power of Ruby http://scrubyt.org/
here is some example code from http://github.com/scrubber/scrubyt_examples/blob/7a219b58a67138da046aa7c1e221988a9e96c30e/twitter.rb
require 'rubygems'
require 'scrubyt'
# Simple exmaple for scraping basic
# information from a public Twitter
# account.
# Scrubyt.logger = Scrubyt::Logger.new
twitter_data = Scrubyt::Extractor.define do
fetch 'http://www.twitter.com/scobleizer'
profile_info '//ul[#class="about vcard entry-author"]' do
full_name "//li//span[#class='fn']"
location "//li//span[#class='adr']"
website "//li//a[#class='url']/#href"
bio "//li//span[#class='bio']"
end
end
puts twitter_data.to_xml
As language Java and as a open source library Jsoup will be a pretty solution for you.
hpricot may be what you are looking for.
You may try PHP's DOMDocument class. It has a couple of methods for loading HTML content. I usually make use of this class. My advises are to prepend a DOCTYPE element to the HTML in case it hasn't one and to inspect in Firebug the HTML that results after parsing. In some cases, where invalid markup is encountered, DOMDocument does a bit of rearrangement of the HTML elements. Also, if there's a meta tag specifying the charset inside the source be careful that it will be used internally by libxml when parsing the markup. Here's a little example
$html = file_get_contents('http://example.com');
$dom = new DOMDocument;
$oldValue = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($oldValue);
echo $dom->saveHTML();
Any language which works with HTML on DOM level is good.
for perl it is HTML::TreeBuilder module.