Escaping HTML attributes - html

I'm trying to render an escaped string into an a html attribute (using Twitter Bootstrap to render a popover of the escaped code used to generate what the user is looking at):
Something like:
<a class="btn" href="#" data-content='<pre>$escaped_code</pre>' rel="popover" data-orginal-title="$title">some cool looking thing</a>
The problem is that the browser will parse and unescape the escape code potentially allowing for unpleasantness.

Don't use < or > in HTML attributes, for one:
<a class="btn" href="#" data-content='<pre>$escaped_code</pre>' rel="popover" data-orginal-title="$title">some cool looking thing</a>

I'm not sure what sort of escaped string you have there. I am assuming PHP. However, the escape characters are different for HTML:
http://www.theukwebdesigncompany.com/articles/entity-escape-characters.php
The fact that you are trying to put HTML tags into an HTML attribute suggests that you aren't using the correct HTML escape characters. Make sure everything within the HTML tag is escaped for HTML.

The answer is to of course manually escape it twice.

Related

Is there a non-javascript/PHP way to write sample code that won't get evaluated? [duplicate]

I use the <pre> tag in my blog to post code. I know I have to change < to < and > to >. Are any other characters I need to escape for correct html?
What happens if you use the <pre> tag to display HTML markup on your blog:
<pre>Use a <span style="background: yellow;">span tag with style attribute</span> to hightlight words</pre>
This will pass HTML validation, but does it produce the expected result? No. The correct way is:
<pre>Use a <span style="background: yellow;">span tag with style attribute</span> to hightlight words</pre>
Another example: if you use the pre tag to display some other language code, the HTML encoding is still required:
<pre>if (i && j) return;</pre>
This might produce the expected result but does it pass HTML validation? No. The correct way is:
<pre>if (i && j) return;</pre>
Long story short, HTML-encode the content of a pre tag just the way you do with other tags.
TL;DR
PHP: htmlspecialchars($html);
JavaScript(JS): Element.innerText = "<html>...";
Note that <pre> is just for styles, so you have to escape ALL HTML.
Only For You HTML "fossil"s: using <xmp> tag
This is not well known, but it really does exist and even chrome still supports it, however using a pair of <xmp> tag is NOT recommended to be relied on - it's just for you HTML fossils, but it's a very simple way to handle your personal content, e.g. DOCS. Even the w3.org Wiki says in its example: "No, really. don't use it."
You can put ANY HTML (excluding </xmp> end tag) inside <xmp></xmp>
<xmp>
<html> <br> just any other html tags...
</xmp>
The proper version
Proper version could be considered to be HTML stored as a STRING and displayed with the help of some escaping function/mechanism.
Just remember one thing - the strings in C-like languages are usually written between single quotes or double quotes - if you wrap your string in double => you should escape doubles (probably with \), if you wrap your string in single => escape singles (probably with \)...
The most frequent - Server-side language escaping (ex. in PHP)
Server-side scripting languages often have some built-in function to escape HTML.
<?php
$html = "<html> <br> or just any other HTML"; //store html
echo htmlspecialchars($html); //display escaped html
?>
Note that in PHP 8.1 there was a change so you no longer have to specify ENT_QUOTES flag:
flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401.
The client-side way (example in JavaScript / JS&jQuery)
Similar approach as on server-side is achievable in client-side scripts.
Pure JavaScript
There is no function, but there is the default behavior, if you set element's innerText or node's textContent:
document.querySelector('.myTest').innerText = "<html><head>...";
document.querySelector('.myTest').textContent = "<html><head>...";
HTMLElement.innerText and Node.textContent are not the same thing! You can find out more about the difference in the MDN doc links above
jQuery (a JS library)
jQuery has $jqueryEl.text() for this purpose:
$('.mySomething .test').text("<html><head></head><body class=\"test\">...");
Just remember the same thing as for server-side - in C-like languages, escape the quotes you've wrapped your string in.
For posting code within your markup, I suggest using the <code> tag. It works the same way as pre but would be considered semantically correct.
Otherwise, <code> and <pre> only need the angle brackets encoded.
Use this and don't worry about any of them.
<pre>
${fn:escapeXml('
<!-- all your code -->
')};
</pre>
You'll need to have jQuery enabled for it to work.

Is it valid to escape html in a href attribute?

Assuming I have the following link:
<a href='http://google.com/bla'>http://google.com/bla</a>
Is this one also valid?
<a href='http://google.com/bla'>http://google.com/bla</a>
It works in Firefox, but I'm not sure if this is standardized behavior. I hope the question isn't super dumb!
Yes, it is perfectly valid to do that. In fact, the ampersand (&) character must be escaped into & in order to be valid HTML, even inside the href attribute (and all attributes for that matter).

Why is MVC 4 Razor escaping ampersand when using HTML.Raw in a title attribute

We recently upgraded to MVC 4 and now we are having titles in our links not display correctly. The problem is before HTML.Raw would not escape & in our title attributes, but now it does. Below is my sample code:
<a title="#Html.Raw("Shoe Size 6½-8")">Test</a>
Which produces the following markup:
<a title="Shoe Size 6&#189;-8">Test</a>
The only solution I found so far was to put the entire anchor into a string and then HTML.Raw that string.
Why is Html.Raw escaping ampersand in anchor tag in ASP.NET MVC 4?.
This is a very ugly solution and I am hoping there is a better alternative.
While it is only a small step less ugly workaround, you can simply #Html.Raw the full attribute name and value.
<a #Html.Raw("title=\"Show Size 6½-8\"")>Test</a>
Results in:
<a title="Show Size 6½-8">Test</a>
If you can't do the workaround listed above, I have a patched base-class you could try injecting via web.config. Check it out at https://gist.github.com/4036121

How can I escape a sequence of HTML so it can go inside a tags title attribute?

I've been working on this for way too long. I'm trying to put HTML inside the title attribute of a tag. This is for a tooltip. Of course, if this is going to be possible, then I have to escape all of the necessary characters so it doesn't screw up the tag in which it is contained. To be specific, how can I fit the following inside the title attribute of a tag:
test
That is, I want this:
<div title="test">my div</div>
I feel like I've tried everything. Is this even possible?
I googled HTML Escape Characters and found a tool to do it: http://accessify.com/tools-and-wizards/developer-tools/quick-escape/default.php
It produced this string which you can use:
<a href="test">test</a>
if you are using jquery you can do it like this
$('div').attr('title','test');
if you want to escape html tags then you simply can do this
if your test is in a div something like this
<div id="tag">test</div>
then you can do $('div').attr('title', $("#tag").text());

regular expression to remove links [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I have a HTML page with
<a class="development" href="[variable content]">X</a>
The [variable content] is different in each place, the rest is the same.
What regexp will catch all of those links?
(Although I am not writing it here, I did try...)
What about the non-greedy version:
<a class="development" href="(.*?)">X</a>
Try this regular expression:
<a class="development" href="[^"]*">X</a>
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
Regex is generally a bad solution for HTML parsing, a topic which gets discussed every time a question like this is asked. For example, the element could wrap onto another line, either as
<a class="development"
href="[variable content]">X</a>
or
<a class="development" href="[variable content]">X
</a>
What are you trying to achieve?
Using JQuery you could disable the links with:
$("a.development").onclick = function() { return false; }
or
$("a.development").attr("href", "#");
Here's a version that'll allow all sorts of evil to be put in the href attribute.
/<a class="development" href=(?:"[^"]*"|'[^']*'|[^\s<>]+)>.*?<\/a>/m
I'm also assuming X is going to be variable, so I added a non-greedy match there to handle it, and the /m means . matches line-breaks too.