How to parse links and escape html entities? - html

I have some user provided content that I want to render.
Obviously the content should be escaped, rails does this by default. However I also want to parse the text so that urls are presented as links.
There is an auto_link helper which does just that. However no matter what order I do this in I can't get the desired result.
Take content:
content
=> "<img src=\"foo\" />\\r\\n\\r\\nhttp://google.com"
If this is escaped, because the slashes in the url are escaped, auto_link will not work:
Rack::Utils.escape_html(content)
=> "<img src="foo" />\\r\\n\\r\\nhttp://google.com"
If I use auto_link first obviously the link will be escaped. Additionally auto_link strips unwanted content rather than escaping. If a script tag is present in the input I want it escaped not removed.
auto_link(content)
=> "<img src=\"foo\" />\\r\\n\\r\\nhttp://google.com"
Any idea how to do get the desired output?
Thanks for any help.

You could strip out all the escaped whitespace characters with content.gsub!(/\\./, ""). Then you'll be able to use auto_link.

The solution I ended up using was ditching auto_link, letting Rack escape my content server side and then parsed the links out of the text on the client side using https://github.com/gabrielizaias/urlToLink
$('p').urlToLink();

I've had success with:
auto_link(h(content))

Related

Image tag value scr double white space in path

In my jsp page in tag (img src="upload/<%=a.getUrlimmagine()%>") i have this error
Bad value in "upload/ " for attribute "src" on element "img":DOUBLE_WHITE SPACE in PATH
How can I solve it?
Because URLs should not have spaces (see : Are URLs allowed to have a space in them?
) you should encode your url so the unsafe characters will be replaced with some strings representing them (i.e. space becoming %20)
So you either do this in the getUrlimmagine() in your bean or do the encoding in the jsp page.
If you are sure that the only unsafe charcter in your image names is spaces so you can use String.replace() in the backing bean or use the JSTL replace function h in your jsp page
otherwise if you want the cleanest solution, you should definitely read this article : What every web developer must know about URL encoding by Stéphane Épardaud
A cleaner solution to get the same result without using JSP expression is in this answer by BalusC
You would need to use the replaceAll method for String:
<img src="upload/<%=a.getUrlimmagine().replaceAll(" ", "%20")%>" />

Avoid interpreting HTML code in a QTextBrowser

I have a QTextBrowser in my Qt application. I would like to append some text but, I need part of this text not to be interpreted in HTML. How can I achieve this? May I encode the QString?
If you want your browser not to interpret only parts of your text as HTML you will need to quote the part you want to omit (replace "<" with "&l t;" etc.). You can use convenient escape method:
textBrowser->insertHtml(
QString("<b>this will be bold</b>") +
Qt::escape(QString("<b>this will not</b>"))
);
If you would like not to interpret the whole thing you can insert it as plain text:
textBrowser->insertPlainText ( "<b>foobar</b>" );
Finally I solved my own question using:
QString codedHtml = Qt::escape(html);

$_GET textarea losing HTML characters

This is probably a really simple one but I can't find the answer anywhere!
I have a self submitting form with a textarea field like this
<textarea name="desc" wrap="1" cols="64" rows="5"></textarea>
When I type HTML characters in to the textarea field and hit the submit button, the HTML characters are being stripped and I can't see what is doing it!
Do $_GET variables have their HTML stripped automatically?
For example, If I type '[strong]Just[/strong] a test' in to the textarea, and echo the contents of 'desc' like this
echo(print_r($_GET));
I see $_GET['desc'] contains 'Just a test' rather than '[strong]Just[/strong] a test'.
Is this normal? If so, is there a way to keep the HTML so I can store it in a database?
I am using angle '<>' brackets rather than square '[]' in my code, but this forum converts them if I use them here!
Use CDATA
A CDATA section starts with "<![CDATA[" and ends with "]]>"
Source : http://www.w3schools.com/xml/xml_cdata.asp
Where are you printing the data too? The web will parse the html and if you're not looking at the page source you're only going to see the non-html parts.
However, you should be using print html_entities($_GET['desc']) to print out the contents with the html content properly encoded so it's printed instead of parsed.

JSON escape space characters

How would I escape space characters in a JSON string? Basically my problem is that I've gotten into a situation where the program that reads the string can use HTML tags for formatting, but I need to be able to use these HTML tags without adding more spaces to the string. so things like
<u>text</u>
is fine, for adding underline formatting
but something like
<font size="14">text</font>
is not fine, because the <font> tag with the size attribute adds an extra space to the string. I know, funny criteria, but at this point thats what has happened.
My first speculative solution would be to have some kind of \escape character that JSON can put in between font and size that will solve my "space" problems, something that the HTML will ignore but leave the human readable string in the code without actual spaces.
ex. <font\&size="14">text</font>
displays as: text
kind of like but better?
any solutions?
You can use \u0020 to escape the ' ' character in JSON.

escaping html inside comment tags

escaping html is fine - it will remove <'s and >'s etc.
ive run into a problem where i am outputting a filename inside a comment tag eg. <!-- ${filename} -->
of course things can be bad if you dont escape, so it becomes:
<!-- <c:out value="${filename}"/> -->
the problem is that if the file has "--" in the name, all the html gets screwed, since youre not allowed to have <!-- -- -->.
the standard html escape doesnt escape these dashes, and i was wondering if anyone is familiar with a simple / standard way to escape them.
Definition of a HTML comment:
A comment declaration starts with <!, followed by zero or more comments, followed by >. A comment starts and ends with "--", and does not contain any occurrence of "--".
Of course the parsing of a comment is up to the browser.
Nothing strikes me as an obvious solution here, so I'd suggest you str_replace those double dashes out.
There is no good way to solve this. You can't just escape them because comments are read in plaintext. You will have to do something like put a space between the hyphens, or use some sort of code for hyphens (like [HYPHEN]).
Since it is obvoius that you cannnot directly display the '--'s you can either encode them or use the fn:escapeXml or fn:replace tags for appropriate replacements.
JSTL documentation
There's no universal working way to escape those characters in html unless the - characters are in multiples of four so if you do -- it wont work in firefox but ---- will work. So it all depends on the browser. For Example, looking at Internet Explorer 8, it is not a problem, those characters are escaped properly. The same goes for Googles Chrome... However Firefox even the latest browser (3.0.4), it doesn't handle escaping of these characters well.
You shouldn't be trying to HTML-escape, the contents of comments are not escapable and it's fine to have a bare ‘>’ or ‘&’ inside.
‘--’ is its own, unrelated problem and is not really fixable. If you don't need to recover the exact string, just do a replacement to get rid of them (eg. replace with ‘__’).
If you do need to get a string through completely unmolested to a JavaScript that will be reading the contents of the comment, use a string literal:
<!-- 'my-string' -->
which the script can then read using eval(commentnode.data). (Yes, a valid use for eval() at last!)
Then your escaping problem becomes how to put things in JS string literals, which is fairly easily solvable by escaping the ‘'’ and ‘-’ characters:
<!-- 'Bob\x27s\x2D\x2Dstring' -->
(You should probably also escape ‘<’, ‘&’ and ‘"’, in case you ever want to use the same escaping scheme to put a JS string literal inside a <​script> block or inline handler.)