How to remove  extra character in my Website? - html

I see this character in Firebug .
I don't know why this is happening, there's no such character in my code. For Firefox it's OK, but in IE everything breaks. I can't even search for this character in Google.
I saved my file with utf-8 encoding without bom.

This occurs when you include files that contains the char  (ZERO WIDTH NO-BREAK SPACE).
this character is a BOM and you can only see BOMs with low level text editors (such as DOS edit).
you can debug it seeing the Element inspector, and commenting the requires or includes to see what file contains the BOM.
then , you can copy the content of the file, change the Encoding to "encode in utf8", save and you can include it again. (notepad++) can save it
success.

 is probably a UTF-8 byte order mark in one of the files Joomla uses or includes (it's only a zero-width no break space if it occurs inside a file).
You could try this tool to check for byte order marks - they're a bit of a pain to search for by hand.

just save again your php file with notepad++

I also had the same problem.
I opened the file with notepad and saved it as UTF8.
problem solved.

Related

How to find all this kind of UNICODE � in multiple html pages (or, with notepad++)? [duplicate]

I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.
The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (duh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.
How can I find characters by charcode across files or something like that? I'm open to different tools, but they have to work on Mac OS X.
You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.
To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.
On googling, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:
:set nobomb
and save the file. Presto!
The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.
If you are using Textmate and the problem is in a UTF-8 file:
Open the file
File > Re-open with encoding > ISO-8859-1 (Latin1)
You should be able to see and remove the first character in file
File > Save
File > Re-open with encoding > UTF8
File > Save
It works for me every time.
It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:
grep -rn $'\xFEFF' *
It will show you the line numbers and filenames containing BOM.
In Notepad++, there is an option to show all characters. From the top menu:
View -> Show Symbol -> Show All Characters
I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.
See "Comparison of hex editors" in WikiPedia.
I know it is a little late to answer to this question, but I am adding how to change encoding in Visual Studio, hope it will be helpfull for someone who will be reading this sometime:
Go to File -> Save (your filename) as...
And in File Explorer window, select small arrow next to the Save button -> click Save with Encoding...
Click Yes (on Do you want to replace existing file dialog)
And finally select e.g. Unicode (UTF-8 without signature) - that removes BOM

Chinese text encoding missing characters when viewed in web browser

I have a HTML file which contains Chinese text. When I open the file in any web browser, there are characters which appear to be missing.
Here's an example copied from the browser window:
本函旨在邀請您參�� 定於
I know for a fact that all other characters seen here are correct aside from the missing ones (confirmed by a native Chinese speaker).
In the HTML header, I have a tag which signifies the file contains UTF-8 encoded characters:
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
I've already tried some other charsets in this META tag, but so far it seems any encoding method I try aside from UTF-8 ends up looking worse.
I also considered the possibility that it is a font issue, so I installed 3 different traditional Chinese fonts on my system and forced Chrome to use them. None of them made any difference - missing characters were still present.
If I open the HTML file with Notepad++, here's what I can see:
http://i.imgur.com/GoS07WX.png
If I select and copy-paste this text into regular MS Notepad, I get this:
本函旨在邀請您參劦nbsp;定於
So you can see here that the "xE5 x8A" visible in Notepad++ seems to have been replaced by 劦.
Is there any reason why the browser would be showing �� instead of 劦 in this scenario?
Look again at the HTML file.
I see the first 2 bytes of a character encoded in UTF-8, followed by ... let's imagine there was originally a \xA0, and this was mutated to when the file was created by applying global substitutions to the UTF-8-encoded data.
However, \xE5\x8A\xA0 UTF-8 decodes to U+52A0 which is not the same as the alien character which is U+52A6 ... not close enough to an answer.

utf-8 / utf-16 conversion

When I design a html page in Dreamweaver CS6 I use its validation tool (it sends the code to w3c) and I get no errors. However, when I validate the same page in UltraEdit 21 (it uses HTML Tidy) I get the warning:
"Specified input encoding (utf-8) does not match actual input encoding (utf-16)"
The page is set as html5 (with <!doctype html>), as utf-8 (with <meta charset="utf-8">) and contains greek text.
Well, the question is:
Does that problem affect the appearance of the page? I mean, when I publish it, will a user in China, Germany, or ...Tierra del Fuego see the greek text?
If yes, the rest are less important, but I'll ask them:
What makes HTML Tidy to define the document as utf-16? Is there a character, word or visible string of any kind that I can remove/delete to correct the problem?
If I use <meta charset="utf-16"> will browsers parse the code correctly (ending to greek text for the global user)?
The actual file encoding will be set in Dreamweaver properties for the file.
Dreamweaver Help / Set title and encoding properties for a page:
The Title/Encoding Page Properties options let you specify the document encoding type that is specific to the language used to author your web pages as well as specify which Unicode Normalization Form to use with that encoding type.
Select Modify > Page Properties, or click the Page Properties button in the text Property inspector.
Choose the Title/Encoding category and set the options.
...
Encoding
Specifies the encoding used for characters in the document.
If you select Unicode (UTF‑8) as the document encoding, entity encoding is not necessary because UTF‑8 can safely represent all characters. If you select another document encoding, entity encoding may be necessary to represent certain characters. For more information on character entities, see www.w3.org/TR/REC-html40/sgml/entities.html.
...
Include Unicode Signature (BOM)
Includes a Byte Order Mark (BOM) in the document. A BOM is 2 to 4 bytes at the beginning of a text file that identifies a file as Unicode, and if so, the byte order of the following bytes. Because UTF‑8 has no byte order, adding a UTF‑8 BOM is optional. For UTF‑16 and UTF‑32, it is required.
Choose UTF-8 without BOM.
UltraEdit automatically detects encoding of a file on opening and displays it at bottom in status bar. See in UltraEdit Advanced - Configuration - File Handling - Unicode/UTF-8 Detection and press button Help for some more details.
UTF-16 is displayed for a file encoded in UTF-16 Little Endian with or without BOM on using standard status bar since UE v19.00. Clicking on this list box in status bar and selecting Unicode - UTF-8 results in converting the file from UTF-16 LE to UTF-8 which then matches with the character set declaration in head of your HTML5 file.
When using basic status bar in UE v19.00 or any later version or using any UltraEdit version prior v19.00, the status bar field right to the field with line, column and clipboard number starts with U- for a file with UTF-16 LE encoding.
The UltraEdit help page about the Status Bar contains more information about information shown in standard and basic status bar in UltraEdit.
Conversion to UTF-8 can be done with UltraEdit also with command UNICODE/UTF-8 to UTF-8 (Unicode Editing) in submenu Conversions in menu File.
There are 2 configuration settings at Advanced - Configuration - File Handling - Save which define saving a UTF-8 encoded file with or without byte order mark (BOM):
Write UTF-8 BOM header to all UTF-8 files when saved
Write UTF-8 BOM on new files created within this program (if above is not set)
As UTF-8 encoded HTML files should be always without BOM, it is better to have both UTF-8 BOM settings unchecked when using UltraEdit mainly for editing HTML files.
Another possibility to convert a file with UltraEdit is using command Save As from menu File and use appropriate Encoding / Format setting. UTF-8 in Save As dialog means saving the file as UTF-8 encoded file with BOM and UTF-8 - NO BOM without BOM independent on the two configuration settings for standard Save.
For converting all files in a single folder, a folder tree, opened in UltraEdit, etc. to UTF-8 using UltraEdit, there is an UltraEdit scripting solution, see How to convert all files in a folder to UTF-8?
Unfortunately UE v21.30.0.1024 still does not recognize the short character set declaration <meta charset="utf-8"> as defined in HTML5 standard. See Short utf-8 charset declaration in HTML5 header with details about this limitation and how it can be worked around. This limitation does not matter if within first 64 KB at least one UTF-8 encoded character is found as it will be the case for your HTML5 files with Greek text.
HTML Tidy installed with UltraEdit v21.30.0.1024 is of version 25 March 2009. I'm not sure if HTML Tidy really supports short charset declaration of HTML5. But it looks so because otherwise you would not see the warning on validating the HTML5 file with HTML Tidy.
It might be useful for you to read UltraEdit power tip Unicode text and Unicode files in UltraEdit/UEStudio as it looks like you do not really know what encoding and character set really means and why it is important for applications that the declaration in the HTML5 matches with really used encoding.
I answer your questions now after all those general UltraEdit stuff.
Does that problem affect the appearance of the page?
Although the file contains the declaration that file contents is encoded with UTF-8, but is in real encoded with UTF-16 Little Endian, the browsers display the contents correct. UTF-16 detection is very easy, especially with BOM present and therefore browsers ignore wrong declaration and interpret the bytes of the HTML file from beginning right as UTF-16 encoded text file.
However, it would be much better to convert the UTF-16 encoded HTML files to UTF-8 without BOM. UTF-8 without BOM is most commonly used for HTML files worldwide and then the character set declaration in head of your HTML file would also match with really used encoding.
What makes HTML Tidy to define the document as utf-16?
The really used encoding of your HTML file is UTF-16 Little Endian and UltraEdit, HTML Tidy and the browsers detect that already after reading in the first 2 bytes of the text file - the byte order mark. That's the reason why HTML Tidy suggests to declare the encoding in head of HTML file correct as utf-16 as the file is really encoded with.
If I use <meta charset="utf-16"> will browsers parse the code correctly?
In case of keeping the file encoded in UTF-16 LE (always 2 bytes per character), it would be better to declare the character set right with <meta charset="utf-16">. But no Unicode aware text editor or browser has a problem to automatically detect UTF-16 Little Endian encoding with byte order mark.
The character set declaration becomes very important mainly for UTF-8 encoded files (1, 2, 3 or even 4 bytes per character) or files with single-byte coded characters using a code page like Windows-1252 / ISO 8859-1 (Latin 1) or Windows-1253 / ISO 8859-7 (Latin/Greek).

Why do symbols like apostrophes and hyphens get replaced with black diamonds on my website?

A website I've made has a few problems... On one of the pages, wherever there's an apostrophe (') or a dash (-), the symbol gets replaced with a weird black diamond with a question mark in the center of it
Here's what I mean
It seems this is happening all over the site wherever these symbols appear. I've never seen this before, can anyone explain it to me?
Suggestions on how to fix it would also be greatly appreciated.
See http://test.rfinvestments.co.za/index.php?c=team for a clear look at the problem.
It's an encoding problem. You have to set the correct encoding in the HTML head via meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Replace "ISO-8859-1" with whatever your encoding is (e.g. 'UTF-8'). You must find out what encoding your HTML files are. If you're on an Unix system, just type file file.html and it should show you the encoding. If this is not possible, you should be able to find out somewhere what encoding your editor produces.
You need to change your text to 'Plain text' before pasting into the HTML document. This looks like an error I've had before by pasting straight from MS word.
MS word and other rich text editors often place hidden or invalid chars into your code. Try using — for your dashes, or ’ for apostrophes (etc), to eliminate the need for relying on your char encoding.
I have the same issue in my asp.net web application. I solved by this link
I just replace ' with ’ text like below and my site in browser show apostrophe without rectangle around as in question ask.
Original text in html page
Click the Edit button to change a field's label, width and type-ahead options
Replace text in html page
Click the Edit button to change a field’s label, width and type-ahead options
Look at your actual html code and check that the weird symbols are not originating there. This issue came up when I started coding in Notepad++ halfway after coding in Notepad. It seems to me that the older version of Notepad I was using may have used different encoding to Notepad's++ UTF-8 encoding. After I transferred my code from Notepad to Notepad++, the apostrophes got replaced with weird symbols, so I simply had to remove the symbols from my Notepad++ code.
If you are editing HTML in Notepad you should use "Save As" and alter the default "Encoding:" selection at the botom of the dialog to UTF-8.
you should also include-
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This un-ambiguously sets the correct character set and informs the browser.
I experienced the same problem when I copied a text that has an apostrophe from a Word document to my HTML code.
To resolve the issue, all I did was deleted the particular word in my HTML and typed it directly, including the apostrophe. This action nullified the original copy and paste acton and displayed the newly typed apostrophe correctly
What I really don't understand with this kind of problem is that the html page I ran as a local file displayed perfectly in Chromium browser, but as soon as I uploaded it to my website, it produced this error.
Even stranger, it displayed perfectly in the Vivaldi browser whether displayed from the local or remote file.
Is this something to do with the way Chromium reads the character set? But why only with a remote file?
I fixed the problem by retyping the text in a simple text editor and making sure the single quote mark was the one I used.

HTML encoding issues - "Â" character showing up instead of " "

I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF.
The process works like this:
Pull an HTML template from a DB with tokens in it to be replaced (e.g. "~CompanyName~", "~CustomerName~", etc.)
Replace the tokens with real data
Tidy the HTML with a simple regex function that property formats HTML tag attribute values (ensures quotation marks, etc, since ActivePDF's rendering engine hates anything but single quotes around attribute values)
Send off the HTML to a web service that creates the PDF.
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character when viewing the document in a browser (FireFox). ActivePDF pukes on these non-UTF8 characters.
My question: since I don't know where the problem stems from and don't have time to investigate it, is there an easy way to re-encode or find-and-replace the bad characters? I've tried sending it through this little function I threw together, but it turns it all into gobbledegook doesn't change anything.
Private Shared Function ConvertToUTF8(ByVal html As String) As String
Dim isoEncoding As Encoding = Encoding.GetEncoding("iso-8859-1")
Dim source As Byte() = isoEncoding.GetBytes(html)
Return Encoding.UTF8.GetString(Encoding.Convert(isoEncoding, Encoding.UTF8, source))
End Function
Any ideas?
EDIT:
I'm getting by with this for now, though it hardly seems like a good solution:
Private Shared Function ReplaceNonASCIIChars(ByVal html As String) As String
Return Regex.Replace(html, "[^\u0000-\u007F]", " ")
End Function
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character
That'd be encoding to UTF-8 then, not ISO-8859-1. The non-breaking space character is byte 0xA0 in ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as " ". That includes a trailing nbsp which you might not be noticing; if that byte isn't there, then something else has mauled your document and we need to see further up to find out what.
What's the regexp, how does the templating work? There would seem to be a proper HTML parser involved somewhere if your strings are (correctly) being turned into U+00A0 NON-BREAKING SPACE characters. If so, you could just process your template natively in the DOM, and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references. That would also stop you having to do regex post-processing on the HTML itself, which is always a highly dodgy business.
Well anyway, for now you can add one of the following to your document's <head> and see if that makes it look right in the browser:
for HTML4: <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
for HTML5: <meta charset="utf-8">
If you've done that, then any remaining problem is ActivePDF's fault.
If any one had the same problem as me and the charset was already correct, simply do this:
Copy all the code inside the .html file.
Open notepad (or any basic text editor) and paste the code.
Go "File -> Save As"
Enter you file name "example.html" (Select "Save as type: All Files (.)")
Select Encoding as UTF-8
Hit Save and you can now delete your old .html file and the encoding should be fixed
Problem:
Even I was facing the problem where we were sending '£' with some string in POST request to CRM System, but when we were doing the GET call from CRM , it was returning '£' with some string content. So what we have analysed is that '£' was getting converted to '£'.
Analysis:
The glitch which we have found after doing research is that in POST call we have set HttpWebRequest ContentType as "text/xml" while in GET Call it was "text/xml; charset:utf-8".
Solution:
So as the part of solution we have included the charset:utf-8 in POST request and it works.
In my case this (a with caret) occurred in code I generated from visual studio using my own tool for generating code. It was easy to solve:
Select single spaces ( ) in the document. You should be able to see lots of single spaces that are looking different from the other single spaces, they are not selected. Select these other single spaces - they are the ones responsible for the unwanted characters in the browser. Go to Find and Replace with single space ( ). Done.
PS: It's easier to see all similar characters when you place the cursor on one or if you select it in VS2017+; I hope other IDEs may have similar features
In my case I was getting latin cross sign instead of nbsp, even that a page was correctly encoded into the UTF-8. Nothing of above helped in resolving the issue and I tried all.
In the end changing font for IE (with browser specific css) helped, I was using Helvetica-Nue as a body font changing to the Arial resolved the issue .
I was having the same sort of problem. Apparently it's simply because PHP doesn't recognise utf-8.
I was tearing my hair out at first when a '£' sign kept showing up as '£', despite it appearing ok in DreamWeaver. Eventually I remembered I had been having problems with links relative to the index file, when the pages, if viewed directly would work with slideshows, but not when used with an include (but that's beside the point. Anyway I wondered if this might be a similar problem, so instead of putting into the page that I was having problems with, I simply put it into the index.php file - problem fixed throughout.
The reason for this is PHP doesn't recognise utf-8.
Here you can check it for all Special Characters in HTML
http://www.degraeve.com/reference/specialcharacters.php
Well I got this Issue too in my few websites and all i need to do is customize the content fetler for HTML entites. before that more i delete them more i got, so just change you html fiter or parsing function for the page and it worked. Its mainly due to HTML editors in most of CMSs. the way they store parse the data caused this issue (In My case). May this would Help in your case too