Apostrophes converting to periods in HTML - html

I have a client using a CMS for a site. When they enter apostrophes, they render as periods within the HTML. I've checked the raw source, and an apostrophe (' - not a MS Word curly "smart" apostrophe) is indeed there but it renders as a period.
I've gone into the database and manually entered apostrophes thinking perhaps it was the CMS, but the problem persists. I've seen the "diamond question mark" unrecognizable character appear before, but never this... For example, the word "they're" displays as "they.re"
Any ideas? I thought it could be an encoding issue but I have
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
in place.
Any help appreciated!

As a first workaround, you could tell the content providers use the “smart” apostrophe and, for use a single quotation marks, ‘smart’ single quotes (assuming thet work OK—check it first of course). After all, the Ascii "straight" apostrophe should only be used in programming and comparable contexts, not in any normal human-language content.
It sounds like a CMS oddity, but check first that the data sent by the server actually contains “.” U+002E and not something else that just gets rendered as a period by browsers. Then you could submit a bug report to CMS provider. It might be a good idea to test the entire Ascii of characters, and why not all of Windows Latin 1 (using a page containing them all and checking that they are rendered OK, naturally with normal < and & precautions).

Related

using european characters in html

I started learning HTML + CSS a week or two ago, and I'm facing a problem. I'm european so I need to use special characters like á, ã, ç , etc a lot. Is there any other way I can do that without using the corresponding code for each letter every time I need to use one? Like a code I can put in the beggining of the html document or something like that that would make all the special characters accepted.
Decide which encoding you want to use for your site; if you don't have any preference, use UTF-8.
Save the .html file in that encoding in your text editor. Consult the help of your specific text editor how to choose which encoding the file gets saved in.
Add <meta charset="utf-8"> to your <head> to instruct the browser to treat the page as UTF-8 encoded.
Preferably also configure your web server to output a Content-Type: text/html; charset=utf-8 HTTP header, since that takes precedence if present. Consult the manual of your web server how to do that.
Write literally any character you can input directly as is into your document and enjoy.
Further reading:
https://www.w3.org/International/tutorials/tutorial-char-enc/
Handling Unicode Front To Back In A Web App
What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text
UTF-8 all the way through

Issue with apostrophes in html and pound symbols

I have a problem, some email and web designs i receive have ’ instead of ' in the text. This creates problems with rendering on some email clients and it's difficult to manually catch them all.
Is there any type of software or online script that converts these symbols (along with the £ sign) to HTML compatible text? Would notepad or anything work?
I
You'll need to convert your text to html characters before putting it into your email html. This is a common issue when you import from MS Word, as it uses characters like curly quotes, hellips and mdashes that need converting first.
There are a whole bunch of converters out there, here are 3:
Email on Acid
Web2Generators
Charset
Here is an example of something written in MS Word:
“Hello?” he said to ‘it’. Wait – I’m not finished…
This converts to this:
“Hello?” he said to ‘it’. Wait – I’m not finished…
You should use the converted version in your email, or you could be lazy and just replace all instances of curly quotes with straight ones in your code. The grammar is not technically accurate, but most people will not mind.

Multilingual site - character enconding

I know this problem is almost as old as a world and thousands of answers exists in the web, but I still cannot find what is a problem in my case and why characters shows as black question marks (�) :(
We have a multilingual site that currently supports 10 languages. Some characters are displayed incorrectly (ве��сией, 联合国���际). It can happen with regular characters in non Latin languages, and in other words on same page, the same characters are displayed correctly. In Latin languages, all special and regular characters are displayed correctly.
I tried to play with encoding, but when in one place it fixes the problem the problem appears in other place.
Here, how my encodings configured:
1) In MS SQL Server, we use NVARCHAR(MAX) column with SQL_Latin1_General_CP1_CI_AS collation.
2) In web application, in web.config file I have: <globalization requestEncoding="utf-8" responseEncoding="utf-8" />.
3) On page itself, we have <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />.
In response headers, Chrome shows: Content-Type:text/html; charset=utf-8.
What I miss? Why I still see those black question marks? What should I check/change in order to display all characters correctly.
Thanks
UPDATE
I found a problem and it is totally not related to transport encoding. I thought the problem is with encoding, in way how it passes DB -> ASP.NET -> Browser, but after lots of debugging, I found that the problem is in way, how the output has been written to HttpContext.Current.Response.Filter....we have our custom filter, and somehow, the buffer (byte[]) that was passed to the Write method of filter. It has corrupted array of Unicode string, so sometimes the last char of the string in bytes, was translated as gibberish. I still not found how to solve it correctly, but for now, i can disable our filter and there is no question marks any more.
Thanks to all.
I don't know about MS SQL server, but have you tried having it use UTF-8 encoding instead of latin-1? A quick Google search shows:
DEFAULT CHARACTER SET utf8;
DEFAULT COLLATE utf8_general_ci;
I would think that that would be a better option to use than SQL_Latin1_General_CP1_CI_AS.
If the page renders in a font which lacks those glyphs, they will be rendered with placeholders.
For example, on my phone, several of the examples you say are displaying correctly for you are shown to me with placeholders for some of the text.

HTML, XHTML validation error - can't resolve

I have been trying to validate my web page for the last two hours, I only have one error remaining before it is successfully validated but I keep on getting the character decoding problem, I cannot get round it.....
The whole document is fine except it says...
Sorry, I am unable to validate this document because on line 77 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\x85" does not map to Unicode
The only thing on line 77 is some text inside some <p> tags, I have tried changing them to <a>, or <span> and taking the <p> away so it is just loose inside the div but the error only goes away when I delete the text inside the tags.
I am using the utf-8 encoding:
<meta http-equiv="Content-type" content="text/html;charset=utf-8" />
I am sorry if this is simple to resolve, my knowledge is extremely basic, I am only a first year computing student.
EDIT: the text inside the <p> tags are as follows:
<p>Our team thrives on the latest
political news as we do you. We work
around the clock to bring you the
latest, most important news as soon as
it happens. What do we ask in return…
nothing! This site is funded by us!
Your satisfaction is as much a pay
packet to us then a wad of untraceable
counterfeit notes.<br/><br/>
Sign up
to our newsletter to get regular
updates on news as soon as it happens
without having to navigate to our
site. For your security we only sell
the details you input to our site to
companies who “pinky promise” they
won’t be naughty with
them.<br/><br/>
StudentPolitics.Now
– Trading in satisfying others since
2011</p>
The problem is that the document only claims to be UTF-8 but isn't really.
Configure your editor to save in that format (the W3C has a guide for a number of them).
If you modify the HTML programatically, then check the program (and/or database if one is in play) aren't munging the data or storing non-UTF-8 data.
If that doesn't work, then try deleting the text and retyping it. You might have a zero width character that can't be represented properly in there.
Save your document in a UTF format. If it already is, try copy-paste the source code to a new file and save it in UTF format (sometimes it can get stuck during edits in some programs).
What editor are you using?
EDIT: There are some non-standard characters in your text: … (three dots in a single character, “” (curly braces), ’ (curly apostrophe), – (dash).
I guess you've copied your text from Word or a similar text processor, I get that often too. Either change those characters to their ASCII counterparts or HTML entities or be sure to save the file with UTF encoding.
validator did not like the three full stops after the word "return" three full stops after one another must mean something else...
Thank you for all your help guys.
Sometimes when you generate query from a database the encoding of the characters may not be UTF-8 in that case you should make sure that the values returned in the queries match UTF-8, also sometimes when making a substring you can cut a character in Spanish as tildes and las ñ and to show incomplete the character.
For example check the source code in your browser

Can I replace % 20 with & nbsp in URLs that have spaces?

Within my HTML, can I use the character entity reference " " in place of "%20" in Web URLs?
They're both spaces, right?
The short answer is, they are both used to represent "spaces", but they represent different spaces.
%20 is the URL escaping for byte 32, which corresponds to plain old space in pretty much any encoding you're likely to use in a URL.
is an HTML character reference which actually refers to character 160 of Unicode (and also ISO-8859-1 aka Latin-1). It's a different space character entirely -- the "non-breaking space". Even though they look pretty much the same, they're different characters and it's unlikely that your server will treat them the same way.
No. Neither are spaces (technically). Both represent spaces in different ways though. Make every effort to NOT have spaces, or representatives of spaces, in your URLs. Many find it more elegant (me included) to replace spaces with _ or -
No. is an HTML non-breaking-space entity; this entity has no meaning when used in a filesystem or wherever else that a URL might point. URLs are not encoded in HTML.
No, not in the URLs. What you can do is replace spaces in the textual representation of the URL.
So instead of:
http://some.site/doc%20with%20spaces
you can have:
http://some.site/doc with spaces
%20 is what you get with URL encoding, so this is what you should use if you are going to use it in a URL.
is a HTML entity, which is what should be used for 'non breaking space' in an HTML document.
Most persons try to absolutely avoid spaces in their filenames in URLs. They will give you a serious headache every time so try to do so.
If you want to have spaces in an URL you have to encode them with %20.
&nbsp is used by the browser to know how to display the page. This information is only used for displaying. The %20 will be sent to the server that manages all the stuff needed to transfer the webpage to your visitors. The server doesn't speak html so the server would interpret &nbsp as a normal part of the filenname and search for a file called in the way foo bar. This file will not be found. Much worse the web server will think that the & begins the variable part of the url and only search for the page foo and then try to generate a variable nbsp and a variable bar but he want see any values for them. All in all the web server can't handle a URL with an in it.
Neither are spaces. You shouldnt be using spaces but if for what ever reason you can't avoid it you should just be able to do...
Hey there
...clicking on which will automatically navigate the user to
WebSite/Web%20Page.aspx