JSON escape space characters - html

How would I escape space characters in a JSON string? Basically my problem is that I've gotten into a situation where the program that reads the string can use HTML tags for formatting, but I need to be able to use these HTML tags without adding more spaces to the string. so things like
<u>text</u>
is fine, for adding underline formatting
but something like
<font size="14">text</font>
is not fine, because the <font> tag with the size attribute adds an extra space to the string. I know, funny criteria, but at this point thats what has happened.
My first speculative solution would be to have some kind of \escape character that JSON can put in between font and size that will solve my "space" problems, something that the HTML will ignore but leave the human readable string in the code without actual spaces.
ex. <font\&size="14">text</font>
displays as: text
kind of like but better?
any solutions?

You can use \u0020 to escape the ' ' character in JSON.

Related

Why do some strings contain " " and some " ", when my input is the same(" ")?

My problem occurs when I try to use some data/strings in a p-element.
I start of with data like this:
data: function() {
return {
reportText: {
text1: "This is some subject text",
text2: "This is the conclusion",
}
}
}
I use this data as follows in my (vue-)html:
<p> {{ reportText.text1 }} </p>
<p> {{ reportText.text2 }} </p>
In my browser, when I inspect my elements I get to see the following results:
<p>This is some subject text</p>
<p>This is the conclusion</p>
As you can see, there is suddenly a difference, one p element uses and the other , even though I started of with both strings only using . I know and technically represent the same thingm, but the problem with the string is that it gets treated as a string with 1 large word instead of multiple separate words. This screws up my layout and I can't solve this by using certain css properties (word-wrap etc.)
Other things I have tried:
Tried sanitizing the strings by using .replace( , ), but that doesn't do anything. I assume this is because it basically is the same, so there is nothing to really replace. Same reason why I have to use blockcode on stackoverflow to make the destinction between and .
Logged the data from vue to see if there is any noticeable difference, but I can't see any. If I log the data/reportText I again only see string with 's
So I have the following questions:
Why does this happen? I can't seem to find any logical explanation why it sometimes uses 's and sometimes uses 's, it seems random, but I am sure I am missing something.
Any other things I could try to follow the path my string takes, so I can see where the transformation from to happens?
Per the comments, the solution devised ended up being a simple unicode character replacement targeting the \u00A0 unicode code point (i.e. replacing unicode non-breaking spaces with ordinary spaces):
str.replace(/[\\u00A0]/g, ' ')
Explanation:
JavaScript typically allows the use of unicode characters in two ways: you can input the rendered character directly, or you can use a unicode code point (i.e. in the case of JavaScript, a hexadecimal code prefixed with \u like \u00A0). It has no concept of an HTML entity (i.e. a character sequence between a & and ; like ).
The inspector tool for some browsers, however, utilizes the HTML concept of the HTML entity and will often display unicode characters using their corresponding HTML entities where applicable. If you check the same source code in Chrome's inspector vs. Firefox's inspector (as of writing this answer, anyway), you will see that Chrome uses HTML entities while Firefox uses the rendered character result. While it's a handy feature to be able to see non-printable unicode characters in the inspector, Chrome's use of HTML entities is only a convenience feature, not a reflection of the actual contents of your source code.
With that in mind, we can infer that your source code contains unicode characters in their fully rendered form. Regardless of the form of your unicode character, the fix is identical: you need to target these unicode space characters explicitly and replace them with ordinary spaces.

How do I type html in a markdown file without it rendering?

I want to type the following sentence in a markdown file: she says <h1> is large. I can do it in StackOverflow with three backticks around h1, but this doesn't work for a .md file. I've also tried a single backtick, single quote, double quote, hashtags, spacing, <code>h1</code> and everything else I could think of. Is there a way to do this?
You can escape the < characters by replacing them with <, which is the HTML escape sequence for <. You're sentence would then be:
she says <h1> is large
As a side note, the original Markdown "spec" has the following to say:
However, inside Markdown code spans and blocks, angle brackets and ampersands are always encoded automatically. This makes it easy to use Markdown to write about HTML code. (As opposed to raw HTML, which is a terrible format for writing about HTML syntax, because every single < and & in your example code needs to be escaped.)
...which means that, if you're still getting tags when putting them in backticks, whatever renderer you're using isn't "compliant" (to the extent that one can be compliant with that document), and you might want to file a bug.
Generally, you can surround the code in single backticks to automatically escape the characters. Otherwise just use the HTML escapes for < <and > >.
i.e.
she says <h1> is large or she says `<h1>` is large
A backslash (\) can be used to escape < and >.
Ex: she says <h1> is large
P.S. See this answer's source by clicking Edit.

Issue with s:property tag in struts2, Not showing spaces in text

We are using s:property tag to display string value on struts 2.
<s:property value="stringValue"/>
If "stringValue" has multiple spaces then it is showing only 1 space instead of exact text.
Ex: String stringValue ="Hello World, Welcome";
Output: Hello World, Welcome.
Here string text has two space in between but on application it is displaying only 1 space.
I have tried to use escapeHtml as false but same issue.
What is wrong with this tag?
Best Regards,
RKG
Nothing is wrong with the tag.
HTML treats multiple whitespaces as a single whitespace; that's just the way HTML is.
If you want to explicitly have multiple spaces you'll need to convert them to entities. There are a zillion ways to do that.

Invisible Delimiter for Strings in HTML

I need a way to identify certain strings in HTML markup. I know what the strings are, but it is possible that they could be substrings of other strings in the document. To find them, I output a special delimiter character (currently using \032). On page load, we go through the HTML and record the location of the strings, and remove the delimiter.
Unfortunately, most browsers show the delimiter character until we can find and remove them all. I'd like to avoid that if possible. Is there a character or string that will be preserved in the HTML content (so a comment wont work) but wont be visible to the user? It also needs to be something that is fairly unlikely to appear next to a string, so something like wouldn't work either.
EDIT: Sorry, I forgot to mention that the strings will be in attributes, so any sort of tag wont work.
‌ - zero-width non-joiner (see http://htmlhelp.org/reference/html40/entities/special.html)
On the off chance that this already appears in your text, double it up (eg: ‌‌mytext‌‌
Edit in response to comment: works in Firefox 3. Note that you have to search for the Unicode value of the entity.
<html>
<body>
<div id="test">
This is a ‌test
</div>
<script type="application/javascript">
var myDiv = document.getElementById("test");
var content = myDiv.innerHTML;
var pos = content.indexOf("\u200C");
alert(pos);
</script>
</body>
</html>
You could insert them into <span> elements. This will work only for in-page text (not attributes, or the like).
Otherwise, you could insert a whitespace character that your program doesn't already output as part of the HTML, like a tab character (\x09), a vertical tab (\x0b), a bare carriage return (\x0d) — without a newline beside it, ala Windows text encoding — or, just a null byte (\x00).
The best thing that I shall like to insert, which is not visible on the browser, will be a pair of tags with some special id, like <span id="delimiter" class="Delimiter"></span>. This will not show up on the content, while this can be present in the doc. You don't need to remove them.
You could use left-to-right (LTR) marks. Is this for some sort of XSS testing? If so, this might be of interest: Taint support for PHP

escaping html inside comment tags

escaping html is fine - it will remove <'s and >'s etc.
ive run into a problem where i am outputting a filename inside a comment tag eg. <!-- ${filename} -->
of course things can be bad if you dont escape, so it becomes:
<!-- <c:out value="${filename}"/> -->
the problem is that if the file has "--" in the name, all the html gets screwed, since youre not allowed to have <!-- -- -->.
the standard html escape doesnt escape these dashes, and i was wondering if anyone is familiar with a simple / standard way to escape them.
Definition of a HTML comment:
A comment declaration starts with <!, followed by zero or more comments, followed by >. A comment starts and ends with "--", and does not contain any occurrence of "--".
Of course the parsing of a comment is up to the browser.
Nothing strikes me as an obvious solution here, so I'd suggest you str_replace those double dashes out.
There is no good way to solve this. You can't just escape them because comments are read in plaintext. You will have to do something like put a space between the hyphens, or use some sort of code for hyphens (like [HYPHEN]).
Since it is obvoius that you cannnot directly display the '--'s you can either encode them or use the fn:escapeXml or fn:replace tags for appropriate replacements.
JSTL documentation
There's no universal working way to escape those characters in html unless the - characters are in multiples of four so if you do -- it wont work in firefox but ---- will work. So it all depends on the browser. For Example, looking at Internet Explorer 8, it is not a problem, those characters are escaped properly. The same goes for Googles Chrome... However Firefox even the latest browser (3.0.4), it doesn't handle escaping of these characters well.
You shouldn't be trying to HTML-escape, the contents of comments are not escapable and it's fine to have a bare ‘>’ or ‘&’ inside.
‘--’ is its own, unrelated problem and is not really fixable. If you don't need to recover the exact string, just do a replacement to get rid of them (eg. replace with ‘__’).
If you do need to get a string through completely unmolested to a JavaScript that will be reading the contents of the comment, use a string literal:
<!-- 'my-string' -->
which the script can then read using eval(commentnode.data). (Yes, a valid use for eval() at last!)
Then your escaping problem becomes how to put things in JS string literals, which is fairly easily solvable by escaping the ‘'’ and ‘-’ characters:
<!-- 'Bob\x27s\x2D\x2Dstring' -->
(You should probably also escape ‘<’, ‘&’ and ‘"’, in case you ever want to use the same escaping scheme to put a JS string literal inside a <​script> block or inline handler.)