Html Encoding Texts in Razor Views - html

Texts and/or markups are rendered to output as-is without any html-encoding as we already expect.
For the following, the plain text with markup must be html-encoded.(We don't care about the code output here.)
#{ var theVar = "xyz"; }
some text & other text >>#theVar
So, the html in the output;
some text & other text >>xx
So, when we want to write some static text that needs to be html-encoded we have to use constructs like;
#{ var theVar = "xyz"; }
#("some text & other text >>")#theVar
to get the following html in the output;
some text & other text >>xyz
and for clarity when viewed in browser;
some text & other text >>xyz
So, is there a simple way of doing this? Some shortcut to html encode texts instead of using #("...") for each text which will start to look nasty when there are multiples of them.
What would be the best practice? How do you do this?

So, it is not a big concern when we specify utf-8 encoding for the document. It is not required to html encode characters as entity references except special(<, >, &, ", ') characters when utf-8 encoding used for the document.
Even using & by itself is not wrong for lenient browsers but there would be ambigous cases to consider like volt&amp. So, it would be better to html encode all of these special characters.
Check the W3 Consortium articles "When to use escapes" section;
http://www.w3.org/International/questions/qa-escapes#use

Related

Why do some strings contain " " and some " ", when my input is the same(" ")?

My problem occurs when I try to use some data/strings in a p-element.
I start of with data like this:
data: function() {
return {
reportText: {
text1: "This is some subject text",
text2: "This is the conclusion",
}
}
}
I use this data as follows in my (vue-)html:
<p> {{ reportText.text1 }} </p>
<p> {{ reportText.text2 }} </p>
In my browser, when I inspect my elements I get to see the following results:
<p>This is some subject text</p>
<p>This is the conclusion</p>
As you can see, there is suddenly a difference, one p element uses and the other , even though I started of with both strings only using . I know and technically represent the same thingm, but the problem with the string is that it gets treated as a string with 1 large word instead of multiple separate words. This screws up my layout and I can't solve this by using certain css properties (word-wrap etc.)
Other things I have tried:
Tried sanitizing the strings by using .replace( , ), but that doesn't do anything. I assume this is because it basically is the same, so there is nothing to really replace. Same reason why I have to use blockcode on stackoverflow to make the destinction between and .
Logged the data from vue to see if there is any noticeable difference, but I can't see any. If I log the data/reportText I again only see string with 's
So I have the following questions:
Why does this happen? I can't seem to find any logical explanation why it sometimes uses 's and sometimes uses 's, it seems random, but I am sure I am missing something.
Any other things I could try to follow the path my string takes, so I can see where the transformation from to happens?
Per the comments, the solution devised ended up being a simple unicode character replacement targeting the \u00A0 unicode code point (i.e. replacing unicode non-breaking spaces with ordinary spaces):
str.replace(/[\\u00A0]/g, ' ')
Explanation:
JavaScript typically allows the use of unicode characters in two ways: you can input the rendered character directly, or you can use a unicode code point (i.e. in the case of JavaScript, a hexadecimal code prefixed with \u like \u00A0). It has no concept of an HTML entity (i.e. a character sequence between a & and ; like ).
The inspector tool for some browsers, however, utilizes the HTML concept of the HTML entity and will often display unicode characters using their corresponding HTML entities where applicable. If you check the same source code in Chrome's inspector vs. Firefox's inspector (as of writing this answer, anyway), you will see that Chrome uses HTML entities while Firefox uses the rendered character result. While it's a handy feature to be able to see non-printable unicode characters in the inspector, Chrome's use of HTML entities is only a convenience feature, not a reflection of the actual contents of your source code.
With that in mind, we can infer that your source code contains unicode characters in their fully rendered form. Regardless of the form of your unicode character, the fix is identical: you need to target these unicode space characters explicitly and replace them with ordinary spaces.

HTML: show > in plain text

In HTML files, if I want the browser to show a special symbol, such as >, I can use the special escape character sequence > and the browser will automatically transform it to >.
But what if I don't want it to be transferred into >? What if I happen to want the character sequence to be shown in plain text?
In order to have a character sequence not automatically rendered as a symbol, you can escape out the ampersand. This method is commonly used by instructional pages with lists of HTML symbols.
Source: >
Result: >
Source: &gt;
Result: >
Thanks to Ry-♦ for stating the obvious. I was so concerned about using raw string, I didn't realize what I was using is adequate already. Use .textContent property to render text as is. If you use something like .innerHTML, it will parse your text as HTML and apply escape sequences.
Demo
var str = ">"
document.querySelector('body').textContent = str;

Avoid interpreting HTML code in a QTextBrowser

I have a QTextBrowser in my Qt application. I would like to append some text but, I need part of this text not to be interpreted in HTML. How can I achieve this? May I encode the QString?
If you want your browser not to interpret only parts of your text as HTML you will need to quote the part you want to omit (replace "<" with "&l t;" etc.). You can use convenient escape method:
textBrowser->insertHtml(
QString("<b>this will be bold</b>") +
Qt::escape(QString("<b>this will not</b>"))
);
If you would like not to interpret the whole thing you can insert it as plain text:
textBrowser->insertPlainText ( "<b>foobar</b>" );
Finally I solved my own question using:
QString codedHtml = Qt::escape(html);

Saving NSXMLDocument escaping HTML for certain NSXMLElements

For my application I have to save an XML document containing a few elements with HTML-text.
Example as the result should be:
<gpx>
<wpt>
<elementInHTML>
<p>Sample text.</p>
</elementInHTML>
etc...
But when I add this html element to my NSXMLDocument the '<' (to <) is correctly escaped automatically, but the '>' not (to >).
In code:
NSXMLElement *newWPT = [NSXMLElement elementWithName:#"wpt"];
NSXMLElement *htmlElement = [NSXMLElement elementWithName:#"elementInHTML"];
htmlElement.stringValue = #"<Sample text>";
[newWPT addChild:htmlElement];
But this results in an XML document like this:
<gpx>
<wpt>
<elementInHTML>
<p>Sample text.</p>
</elementInHTML>
etc...
And this result is not valid for the device that has to process this xml file.
Anybody an idea how to enclose a correctly escaped html-string into a NSXMLDocument?
&The string is correctly scaped for XML, greater than is a valid character where it is: http://www.w3.org/TR/REC-xml/#syntax
It seems it's a device implementation specific problem.
Your easy option is to include your html markup in a CDATA.
...and hope the device client XML parser implementation understand it properly.
(If your html markup include also CDATA sections you'll have to find/replace ">" with ">", as stated in the link before.)
P.D.: NSXMLNode CDATA in any search engine will lead you to something closer to "copy-paste"
EDIT:
Knowing now more about the content of the string in the original question (see question comments) and depending on the nature of your string answers to this other question may also help: Objective-C and Swift URL encoding

How to stop an html TEXTAREA from decoding html entities

I have a strange problem:
In the database, I have a literal ampersand lt semicolon:
<div
whenever its printed into a html textarea tag, the source code of the page shows the > as >.
How do I stop this decoding?
You can't stop entities being decoded in a textarea since the content of a textarea is not (unlike a script or style element) intrinsic CDATA, even though error recovery may sometimes give the impression that it is.
The definition of the textarea element is:
<!ELEMENT TEXTAREA - - (#PCDATA) -- multi-line text field -->
i.e. it contains PCDATA which is described as:
Document text (indicated by the SGML construct "#PCDATA"). Text may contain character references. Recall that these begin with & and end with a semicolon (e.g., Hergé's adventures of Tintin contains the character entity reference for the e acute character).
This means that when you type (the invalid HTML of) "start of tag" (<) the browser corrects it to "less than sign" (<) but when you type "start of entity" (&), which is allowed, no error correction takes place.
You need to write what you mean. If you want to include some HTML as data then you must convert any character with special meaning to its respective character reference.
If the data is:
<div
Then the HTML must be:
<textarea>&lt;div</textarea>
You can use the standard functions for converting this (e.g. PHP's htmlspecialchars or Perl's HTML::Entities module).
NB 1: If you were using XHTML[2] (and really using it, it doesn't count if you serve it as text/html) then you could use an explicit CDATA block:
<textarea><![CDATA[<div]]></textarea>
NB 2: Or if browsers implemented HTML 4 correctly
Ok , but the question is . why it decodes them anyway ? assuming i've added & , save the textarea , ti will be saved < , but displayed as < , saving it again will convert it back to < (but it will remain < in the database) , saving again will save it a < in the database , why the textarea decodes it ?
The server sends (to the browser) data encoded as HTML.
The browser sends (to the server) data encoded as application/x-www-form-urlencoded (or multipart/form-data).
Since the browser is not sending the data as HTML, the characters are not represented as HTML entities.
If you take the data received from the client and then put it into an HTML document, then you must encode it as HTML first.
In PHP, this can be done using htmlentities(). Example below.
<?php
$content = "This string contains the TM symbol: ™";
print "<textarea>". htmlentities($content) ."</textarea>";
?>
Without htmlentities(), the textarea would interpret and display the TM symbol (™) instead of "™".
http://php.net/manual/en/function.htmlentities.php
You have to be sure that this is rendered to the browser:
<textarea name="somename">&lt;div</textarea>
Essentially, this means that the & in < has to be html encoded to &. How to do it will depend on the technologies you're using.
UPDATE: Think about it like this. If you want to display <div> inside a textarea, you'll have to encode <> because otherwise, <div> would be a normal HTML element to the browser:
<textarea name="somename"><div></textarea>
Having said this, if you want to display <div> inside a textarea, you'll have to encode & again, because the browser decodes HTML entities when rendering HTML. It has nothing to do with your database.
You can serve your DB-content from a separate page and then place it in the textarea using a Javascript (jQuery) Ajax-call:
request = $.ajax
({
type: "GET",
url: "url-with-the-troubled-content.php",
success: function(data)
{
document.getElementById('id-of-text-area').value = data;
}
});
Explained at
http://www.endtask.net/how-to-prevent-a-textarea-element-from-decoding-html-entities/
I had the same problem and I just made two replacements on the text to show from the database before letting it into the text area:
myString = Replace(myString, "&", "&")
myString = Replace(myString, "<", "<")
Replace n:o 1 to trick the textarea to show the codes.
replace n:o 2: Without this replacement you can not show the word "" inside the textarea (it would end the textarea tag).
(Asp / vbscript code above, translate to a replace method of your language choice)
I found an alternative solution for reading and working with in-browser, simply read the element's text() using jQuery, it returns the characters as display characters and allows me to write from a textarea to a div's innerHTML using the property via html()...
With only JS and HTML...
...to answer the actual question, with a bare-minimal example:
<textarea id=myta></textarea>
<script id=mytext type=text/plain>
™
</script>
<script> myta.value = mytext.innerText; </script>
Explanation:
Script tags do not render html nor entities. By storing text in a script tag, it will remain unadultered-- problem is it will try to execute as JavaScript. So we use an empty textarea and store the text in a script tag (here, the first one).
To prevent that, we change the mime-type to text/plain instead of it's default, which is text/javascript. This will prevent it from running.
Then to populate the textarea, we copy the script tag's content to it (here done in the second script tag).
The only caveats I have found with this are you have to use JavaScript and you cannot include script tags directly in it.