How to show special character in UTF-8 of Kmer symbol - html

I have got ឴symbol, that i can't display on web page (utf-8) content type. This symbol without width and can't see at all. How to show it? Code is ឴
for example here http://www.endmemo.com/unicode/khmer.php 6068 and 6069 are not visible, but i need to show it, at least space
Edited:
I'm using Arial or sans-serif. I think, that it is pretty usual fonts. What people do: they making UNIQUE text by inserting this symbol inside usual symbols. For example, user write: "a(invisible symbol of kmer)b(invisible symbol of kmer)" and so on. I see on page only "ab" without any spaces. I tried to put actual character inside html to see it, but with no luck. I thought that symbol, that is not present in font should be question mark or empty square, but not in that case. Solution can't just be simple replace in text.

If your page is UTF-8 then it's better to use the actual character rather than a HTML entity.
Your requested character is not present in many fonts. You can try finding the latest version of Code2000 which appears to support it.
You can see fonts that support this particular character here:
http://www.fileformat.info/info/unicode/char/17b4/fontsupport.htm
If you can't find a font and you want to display an empty space instead you could replace it before showing it in the page or put it in a container. The page you linked uses a table cell to hold the character.

Related

Strange symbol shows up on website (L SEP)?

I noticed on my website, http://www.cscc.org.sg/, there's this odd symbol that shows up.
It says L SEP. In the HTML Code, it display the same thing.
Can someone shows me how to remove them?
That character is U+2028 or HTML entity code 
 which is a kind of newline character. It's not actually supposed to be displayed. I'm guessing that either your server side scripts failed to translate it into a new line or you are using a font that displays it.
But, since we know the HTML and UNICODE vales for the character, we can add a few lines of jQuery that should get rid of the character. Right now, I'm just replacing it with an empty space in the code below. Just add this:
$(document).ready(function() {
$("body").children().each(function() {
$(this).html($(this).html().replace(/
/g," "));
});
});
This should work, though please note that I have not tested this and may not work as none of my browsers will display the character.
But if it doesn't, you can always try pasting your text block onto http://www.nousphere.net/cleanspecial.php which will remove any special characters.
Some fonts render LS as L SEP. Such a glyph is designed for unformatted presentations of the character, such as when viewing the raw characters of a file in a binary editor. In a formatted presentation, actual line spacing should be displayed instead of the glyph.
The problem is that neither the web server nor web browser are interpreting the LS as a newline. The web server could detect the LS and replace it with <br>. Such a feature would fit well with a web server that dynamically generates HTML anyway, but would add overhead and complexity to a web server that serves file contents without modification.
If a LS makes its way to the web browser, the web browser doesn't interpret it as formatting. Page formatting is based only on HTML tags. For example, LF and CR just affect formatting of the HTML source code, not the web page's formatting (except in <pre> sections). The browser could in principle interpret LS and PS (paragraph separator) as <br> and <p>, but the HTML standard doesn't tell browsers to do that. (It seems to me like it would be a good addition.)
To replace the raw LS character with the line separation that the content creator likely intended, you'll need to replace the LS characters with HTML markup such as <br>.
This is the solution for the 'strange symbol' issue.
$(document).ready(function () {
$("body").children().each(function() {
document.body.innerHTML = document.body.innerHTML.replace(/\u2028/g, ' ');
});
})
The jquery/js solutions here work to remove the character, but it broke my Revolution Slider. I ended up doing a search replace for the character on the wp_posts tabel with Better Search Replace plugin: https://wordpress.org/plugins/better-search-replace/
When you copy paste the character from a page to the plugin box, it is invisible, but it does work. Before doing DB replaces, always have a database (or full) backup ready! And be sure to uncheck the bottom checkbox to not do a dry run with the plugin.

How to display ASCII 26 (control characters) in HTML

We have a record in SQL database, which contains a ASCII 26 character:
SELECT char(26)
From the looking, it's like a arrow, which we can see it in the Eclipse debugging. However, when we try to output it to HTML front-end, it just skipped that character. What's more strange is, the arrow does appear in page source.
It seems 26 belongs to control characters. So is it possible to display the arrow in HTML? Why some place like the debugging window of Eclipse can show it well?
It's a control character, unprintable by definition. Some character sets (or fonts, not sure which determines that) do print control characters; Unicode is not one of them. See Browser Test Page for Unicode Character 'SUBSTITUTE' (U+001A).
Decide what you actually want to display, and replace this character with an actually printable Unicode character.
You could for example use →, →, Unicode Character 'RIGHTWARDS ARROW' (U+2192).

Use specific glyph name with no Unicode value in HTML?

How can I use for example the glyph name "rcaron.terminal" which has no Unicode value in HTML? or any other such case? Is it even possible? I think it must be surely but I got no clue. It's easy for regular letters like the glyph "ß" where I would just type "&#xDF" and get that character or "&#223" (same result) but for glyphs without any Unicode value I don't know what I'm supposed to do...? I've tried also "&rcaron.terminal" but nothing, where as something like "&hearts" would work giving a heart glyph of god knows what font, probably Arial I dunno.
Do I need to use state some specific encoding aside from ANSI in my html document?
ie. < meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8" > or something... like Im really lost lol
All I found on the net was this http://text-symbols.com/html/unicode/ but I cant find any more info so I came here.
Please help! Thanks! :)
There are no glyphs in HTML which do not have a Unicode name.
If you really need to have a glyph which is not representable using regular Unicode, you might want to create a font of your own and define the glyphs you need in the private use area; but obviously, then, your HTML will be impossible to use without that particular font.
Background links:
http://arstechnica.com/information-technology/2008/10/embedded-web-fonts/
http://www.font-face.com/
Practical guides:
http://blog.fogcreek.com/trello-uses-an-icon-font-and-so-can-you/
http://blogs.atlassian.com/2013/07/how-to-make-an-icon-font-the-8-step-guide/
First navigate to this site: https://fontdrop.info/#/?darkmode=true
Upload the file with your font
Click on the Ligatures tab.
Every Glyph should have a Components field
copy the components for the character you want to use
paste that string into HTML
You don't need any & or #, it just detects the string and converts it.

What is this INSANE space character??? (google chrome)

This is driving me absolutely, !&&%&$ insane... it defies everything that I can think of.
THIS character right here... " "
In between these quotes... open google chrome and inspect. You will see its a ... normal right? Now right click and actually view the source of this stack overflow page. It's a regular space... (also, the character I copied was an actual space).
I could understand if it's some kind of rich text editor or something, but in the raw html source is a regular space, so what gives?
Here's just with hitting the space key (which works fine)... " ".
You can even copy it and paste it everywhere and wreak havoc and make chrome put everywhere. Even though whats copied in your clipboard is just a SPACE.
I have these stupid characters show up everywhere randomly in my website and I have no idea where they come from, or WHY is google converting a SPACE into a nbsp;
I have tried inspecting the actual character code and it's a regular space from all things I can find...
Every single method I try shows it as a NORMAL space... so what gives?
If i use ruby and do " ".ord I get 32. If i do it with the broken space I also get 32.
Please help me im losing my mind.
edit: you can prove this... view source on this page and you will see two empty " " like normal. Now look in console and only the one will be a , yet the raw source is identical.
Image for people not using chrome (this is looking at this very post via chrome dev tools):
Here's the HTML of the same text you see when you view source... no nbsp to be found.
When I view this page's source in Internet Explorer, or download it directly from the server and view it in a text editor, the first space character in question is formatted like this in the actual HTML:
THIS character right here... " "
Notice the   entity. That is Unicode codepoint U+00A0 NO-BREAK SPACE. Chrome is just being nice and re-formatting it as when inspecting the HTML. But make no mistake, it is a real non-breaking space, not Unicode codepoint U+0020 SPACE like you are expecting. U+00A0 is visually displayed the same as U+0020, but they are semantically different characters.
The second space character in question is formatted like this in the actual HTML:
<p>Here's just with hitting the space key (which works fine)... <code>" "</code>.</p>
So it is Unicode codepoint U+0020 and not U+00A0. Viewing the raw hex data of this page confirms that:
It turns out the two seemingly identical whitespace characters are not the same character.
Behold:
var characters = ["a", "b", "c", "d", " "];
var typedSpace = " ";
var copiedSpace = " ";
alert("Typed: " + characters.indexOf(typedSpace)); // -1
alert("Copied: " + characters.indexOf(copiedSpace)); // 4
alert(typedSpace === copiedSpace); // false
JSFiddle
typedSpace.charCodeAt(0) returns 32, the classic space. Whereas copiedSpace.charCodeAt(0) returns 160, the &#160 AKA character.
The difference between the two is that a whole bunch of   repeated after one another will hold their ground and create additional space between them, whereas a whole bunch of repeated characters will squish together into one space.
For instance:
A       B results in: A       B
A B results in: A B
To convert the   character with a character in a string, try this:
.replace(new RegExp(String.fromCharCode(160),"g")," ");
To the people in the future like myself that had to debug this from a high level all the way down to the character codes, I salute you.
Don't get yer knickers in a knot. It's one of those special html characters that we old-school love because we was tort rite.
For many of us, we were taught that a sentence started with a capital letter and ended with a full-stop. But the next sentence is separated from this by TWO spaces.
Good-ol'-HTML doesn't like space(s). If you enter a string of words with 5 spaces between them (using an unintelligent editor like MS Notepad, then html shows it with single spaces.
SO, to get it looking like we old-farts like, we end a sentence with '.&NbSp; Next' This puts two spaces after the full-stop, and looks like '.  Next' rather than '. Next'.
Next point is that the real space (32) works as a linebreak, so that's good.
EXCEPT for we old-farts, who HATE to see our name split across a linebreak. That annoys us NO-END.
But, of course, that's where &NbSp; comes in handy again. If you enter 'John&NbSp;Brown', then the html thinks that's a single word, and it displays it just rite for we oldies.
How do these &NbSp; thingies get there? Well, good old Word (and I suspect many intelligent editors) see two spaces and output them as a non-breaking space followed by a normal space.
And when in Word, you can insert a non-breaking space between John and Brown by the key sequence alt-ctrl-space (sorry, you apple-users)
Lesson-over (with the exception that the term &NbSp; needs to be all lowercase - THIS viewer was even converting it)
It is a non breaking space. is the entity used to represent a non-breaking space. It is essentially a standard space, the primary difference being that a browser should not break (or wrap) a line of text at the point that this occupies.
Most likely the character is being inserted by your HTML Editor. Could you give a more specific example in context?
This is not actually an answer to the question but instead a tool that can be used to detect this special white space in the html of the pages of a website so we can proceed to locate and remove it.
The tool what basically does is:
Fetches the content of a URL
Looks for occurrences of chr(194).chr(160) in the HTML contents
Replaces and highlights the ocurrences with something more visible
This way you can actually know where the spaces are and edit your page properly to remove them.
The online version of the tool can be found here:
http://tools.heavydots.com/nbsp-space-char-detect/
A working example can be seen with the url of this question that contains one ocurrence:
http://tools.heavydots.com/nbsp-space-char-detect/?url=http%3A%2F%2Fstackoverflow.com%2Fquestions%2F26962323%2Fwhat-is-this-insane-space-character-google-chrome&highlight=1&hstring=%7BNBSP%7D
There's a Github repo available if someone wants the code to run it locally:
https://github.com/HeavyDots/nbsp-space-char-detect
Hope someone finds it useful, for any feedback there's a comments section on the tool's page.
Updated 5th of January 2017
At our company blog we just wrote a funny post about this annoying white space. You're invited to drop by and read it! :-)
http://heavydots.com/blog/when-the-white-space-became-a-beast
As the previous answers have mentioned, it's a non-breaking space (nbsp). On Macs, this character gets inserted when you accidentally press Alt + Space (most of the time, this happens when entering code that requires Alt for special characters, e.g. [ on a German keyboard layout).
To remap this key combination to a plain ol' SPACE character, you can change your default keybinding as suggested on Apple SE
For whitespace, Press "Alt+0160" which is a character also.

Why do symbols like apostrophes and hyphens get replaced with black diamonds on my website?

A website I've made has a few problems... On one of the pages, wherever there's an apostrophe (') or a dash (-), the symbol gets replaced with a weird black diamond with a question mark in the center of it
Here's what I mean
It seems this is happening all over the site wherever these symbols appear. I've never seen this before, can anyone explain it to me?
Suggestions on how to fix it would also be greatly appreciated.
See http://test.rfinvestments.co.za/index.php?c=team for a clear look at the problem.
It's an encoding problem. You have to set the correct encoding in the HTML head via meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Replace "ISO-8859-1" with whatever your encoding is (e.g. 'UTF-8'). You must find out what encoding your HTML files are. If you're on an Unix system, just type file file.html and it should show you the encoding. If this is not possible, you should be able to find out somewhere what encoding your editor produces.
You need to change your text to 'Plain text' before pasting into the HTML document. This looks like an error I've had before by pasting straight from MS word.
MS word and other rich text editors often place hidden or invalid chars into your code. Try using — for your dashes, or ’ for apostrophes (etc), to eliminate the need for relying on your char encoding.
I have the same issue in my asp.net web application. I solved by this link
I just replace ' with ’ text like below and my site in browser show apostrophe without rectangle around as in question ask.
Original text in html page
Click the Edit button to change a field's label, width and type-ahead options
Replace text in html page
Click the Edit button to change a field’s label, width and type-ahead options
Look at your actual html code and check that the weird symbols are not originating there. This issue came up when I started coding in Notepad++ halfway after coding in Notepad. It seems to me that the older version of Notepad I was using may have used different encoding to Notepad's++ UTF-8 encoding. After I transferred my code from Notepad to Notepad++, the apostrophes got replaced with weird symbols, so I simply had to remove the symbols from my Notepad++ code.
If you are editing HTML in Notepad you should use "Save As" and alter the default "Encoding:" selection at the botom of the dialog to UTF-8.
you should also include-
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This un-ambiguously sets the correct character set and informs the browser.
I experienced the same problem when I copied a text that has an apostrophe from a Word document to my HTML code.
To resolve the issue, all I did was deleted the particular word in my HTML and typed it directly, including the apostrophe. This action nullified the original copy and paste acton and displayed the newly typed apostrophe correctly
What I really don't understand with this kind of problem is that the html page I ran as a local file displayed perfectly in Chromium browser, but as soon as I uploaded it to my website, it produced this error.
Even stranger, it displayed perfectly in the Vivaldi browser whether displayed from the local or remote file.
Is this something to do with the way Chromium reads the character set? But why only with a remote file?
I fixed the problem by retyping the text in a simple text editor and making sure the single quote mark was the one I used.