How to make latin extended work? - html

I've been googling for some but can't realize how to make letters like č, ć, ž, š, đ work. I tried adding <body lang="sr"> because it actually is Serbian (sr=serbian) but doesn't work. I get this PoÄetna instead of Početna.
I tried adding <meta charset="ISO-8859-2"> into the head section but still nothing. What am I missing?

Pick a character encoding that supports the characters you want to use. ISO-8859-2 should do the job, but this isn't the 1990s any more. UTF-8 should be the default choice.
Ensure your editor is configured to save in that encoding.
Specify that you are using that encoding with document level meta data: <meta charset="utf-8">
Specify that you are using that encoding in your HTTP response (this takes priority over the document level): Content-Type: text/html;charset=UTF-8.

Related

How do I display Unicode as text in HTML?

I can't manage to find a way to do this.For example ∞ (infinity symbol) to display as text in a HTML document
You have first to check what is the Content-Type header your server returns? Is it Content-Type: text/html; charset=UTF-8? See Character_encodings_in_HTML If the server returns the charset, either fix it or use it, it overrides user provided encoding. (see HTML entities).
If your server does not provide charset, then add one in the document, as early as possible (should be in the first 1024 bytes entirely). Again, see Character_encodings_in_HTML. The following header should do:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or for HTML 5:
<meta charset="utf-8">
or for XHTML (the first line):
<?xml version="1.0" encoding="ISO-8859-1"?>
And if you do not/can not use UTF-8 for your document, use HTML entities like
C Travel suggests.
You write the character, e.g. “∞”, in your authoring program, save the file as UTF-8 with BOM, and make sure that the fonts that you have declared for the page, or the relevant piece of text, contain the characters(s) you have included. For more information, see my Guide to using special characters in HTML. If problems remain, please post the code you have tried and specify how it fails (and on which browsers).
You can use the &#; HTML element.
For codes: http://unicode-table.com/en/
And you have to use UTF-8 encoding for the file save, and you have to put UTF-8 meta tag in the header too. (If you didn't already have this.)

Why does a diamond with a questionmark in it � appear in my HTML?

I have an unorder list, and � often (but not always!) appears where I have have two spaces between characters. What is causing this, and how do I prevent it?
This specific character � is usually the sign of an invalid (non-UTF-8) character showing up in an output (like a page) that has been declared to be UTF-8. It happens often when
a database connection is not UTF-8 encoded (even if the tables are)
a HTML or script source file is stored in the wrong encoding (e.g. Windows-1252 instead of UTF-8) - make sure it's saved as a UTF-8 file. The setting is often in the "Save as..." dialog.
an online source (like a widget or a RSS feed) is fetched that isn't serving UTF-8
I had the same issue ....
You can fix it by adding the following line in your template !
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
It's a character-set issue. Get a tool that inspects the response headers of the server (like the Firebug extension if you're using Mozilla Firefox) to see what character set the server response is sending with the content. If the server's character-set and the HTML character set of the actual content don't match up, you will see some strange looking characters like those little black diamond squares.
I had the same issue when getting an HTML output from an XSLT. Along with Pradip's solution I was also able to resolve the issue using UTF-32.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-32" />

HTML character sets & MySQL character sets

Which HTML character set would cover all these? Which character set do I need in MySQL to export and then import them?
SAINT RAPHAEL ARNÁIZ BARÓN (Spanish)
St Thérèse of the Child Jesus, Virgin, Doctor (French)
M. Orsola (Giulia) Ledóchowska, Religious (Eastern European)
In MySQL, use the UTF-8 character set. This will allow you to represent a very wide variety of data appropriately in your DBMS. If you use your mySQL collation settings correctly, MySql will collate (sort) your info nicely as well.
To render this stuff into HTML, you probably need to entitize characters other than the basic 7-bit ASCII ones. For example, look at this web page describing the Unicode character for uppercase Ñ http://www.fileformat.info/info/unicode/char/00D1/index.htm
In HTML this is represented by ampersand poundsign x d 1 semicolon
Your web app language (PHP? Java?) has functions built in to convert between UTF-8 strings (to stash in the DBMS) and entitized html (for display on the web). Use them.
Use MySQL's UTF-8 character set for your tables and columns, and send a SET NAMES UTF8 statement after initialising the MySQL connection in your scripting language of choice. Ensure your script also sends a HTTP header indicating that your page is in UTF-8, and you should be good to go. You may want to read this, and the links for further reading look good too.
In PHP, to send this HTTP header, you would use
header("Content-Type: text/html; charset=UTF-8");. At the top of your <head> element in your HTML page, you can also add <meta charset="UTF-8"> (in HTML5), or <meta http-equiv="Content-type" content="text/html;charset=UTF-8"> (in HTML 4.01 or HTML5; but you can't use both ways and still get valid HTML5).

Characters not displaying correctly in different browsers

I used certain characters in website such as • — “ ” ‘ ’ º ©.
I found that when testing to see what my website looked like under different browsers (BrowserLab)
the afore-mentioned characters are replaced with �.
I then changed the charset in the webpage header from:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Suddenly all the pages have the above mentioned characters replaced with a ?.
Even more puzzling is this is not always consistent across and even within the same page, as some sections display the character • and © correctly.
In particular, I need to replace the character • with one that will display across browsers, can anyone help me with the answer? Thanks.
You should save your HTML source as UTF8.
Alternatively, you can use HTML entities instead.
The source code needs to be saved in the same encoding as you're instructing the browser to parse it in. If you're saving your files in UTF-8, instruct the browser to parse it as UTF-8 by setting an appropriate HTTP header or HTML meta tag (headers preferable, your web server may be setting one without you knowing). Use a decent editor that clearly tells you what encoding you're saving the file as. If it doesn't display correctly, there's a discrepancy between what you're telling your browser the file is encoded in and what it's really encoded in.
Check to see if Apache is setup to send the charset. Look for the directive "AddDefaultCharset" and set it to Off in .htaccess or your config file.
Most/all browsers will take what is sent in the HTTP headers over what is in the document.
If you're using Notepad++, I suggest You to use Edit Plus editor to copy the text (which has the special characters) and paste it in your file. This should work.
Yes I had this problem too in notepad++ copy and pasting wasn't working with some symbols
I think SLaks is right
HTML entities for copyright symbol &#169

UTF-8 html without BOM displays strange characters

I have some HTML which contains some forign characters (€, ó, á). The HTML document is saved as UTF-8 without BOM. When I view the page in the browser the forign characters seem to get replaced with stranger character combinations (€, ó, Ã). It's only when I save my HTML document as UTF-8 with BOM that the characters then display properly.
I'd really rather not have to include a BOM in my files, but has anybody got any idea why it might do this? and a way to fix it? (other than including a BOM)
You are probably not specifying the correct character set in your HTML file. The BOM (thanks #Jukka) sends the browser into UTF-.8 mode; in its absence, you need to use other means to declare the document UTF.8.
If you have access to your server configuration, you may want to make sure the server isn't sending the wrong character set info. See e.g. How to change the default encoding to UTF-8 for Apache?
If you have access only to your HTML, adding this meta tag in your document's head should do the trick:
<meta http-equiv='Content-Type' content='Type=text/html; charset=utf-8'>
or as #Mathias points out, the new HTML 5
<meta charset="utf-8">
(valid only if you use a HTML 5 doctype, against which there is no good argument any more even if you don't use HTML 5 markup.)
Insert <meta charset="utf-8"> in <head>.
Or set the header Content-Type: text/html;charset=utf-8 on the server-side.
You can also do add in .htaccess: AddDefaultCharset UTF-8 more info here http://www.askapache.com/htaccess/setting-charset-in-htaccess.html