Creating web page in non-english language - html

I want to use non-English language (ex: Bengali) in my website. I am using the following tag which is not working.
My project encoding is widows-1252. I am using net-beans 7.0 with font Arial Unicode MS.
Is it mandatory to change my project encoding to UTF-8 or other ways are there?
Please help
<%# page language="java" contentType="text/html; charset=UTF-8 pageEncoding="UTF-8" %>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<table >
<tr >
<td >বাঙালি </td>
</tr>
</table>

Is it mandatory to change my project encoding to UTF-8 or other ways are there?
You can use any encoding that supports the characters you need (widows-1252 won't do the job for you since it is for the Latin alphabet). Support for Unicode (which supports just about every language) has been excellent since around the turn of the century. You should have been using it for all new projects for the last decade and a half.
Other than changing to an encoding that supports the characters you need, you can use HTML character references. Typically this will be:
Decimal numeric character reference
The ampersand must be followed by a "#" (U+0023) character, followed by one or more ASCII digits,
representing a base-ten integer that corresponds to a Unicode code
point that is allowed according to the definition below. The digits
must then be followed by a ";" (U+003B) character.
Since character references are expressed in ASCII characters, you can express them in any encoding (as they build on top of ASCII).

Related

How to make latin extended work?

I've been googling for some but can't realize how to make letters like č, ć, ž, š, đ work. I tried adding <body lang="sr"> because it actually is Serbian (sr=serbian) but doesn't work. I get this PoÄetna instead of Početna.
I tried adding <meta charset="ISO-8859-2"> into the head section but still nothing. What am I missing?
Pick a character encoding that supports the characters you want to use. ISO-8859-2 should do the job, but this isn't the 1990s any more. UTF-8 should be the default choice.
Ensure your editor is configured to save in that encoding.
Specify that you are using that encoding with document level meta data: <meta charset="utf-8">
Specify that you are using that encoding in your HTTP response (this takes priority over the document level): Content-Type: text/html;charset=UTF-8.

Special character not displaying as expected

I have the following simple HTML page:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
<div>
méywe
</div>
</body>
</html>
When displaying it in Chrome or Firefox (I did not test other browsers), I see the following:
m�ywe
What did I miss? The html file is saved in UTF-8 encoding. The server is Apache. My machine is Windows 7 pro. The text editor is UltraEdit.
Thanks!
Update
Initially, I used UltraEdit for editing this html file and I got the problem. Based on cmbuckley's input and install of Notepad++ (from Heatmanofurioso's suggestion), I thought about the possibility of my file being corrupt somehow (even though it looks fine in both UltraEdit and Notepad). So I saved my file with Notepad in utf-8 encoding. Still saw the problem (maybe due to cache???). Then I used UltraEdit to save it again. See the page in the browser and the problem is gone.
Lesson Learned
Have two text editors if that that is your tool, and try the different one if you see unexplainable problem. No tool is perfect, even though you use one everyday. In my case, Notepad++ fixed the utf8 issue with my file that UltraEdit somehow failed.
Thanks to folks for helping!!!
1 - Replace your
<meta charset="utf-8">
with
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
2 - Check if your HTML Editor's encoding is in UTF8. Usually this option is found on the tabs on the top of the program, like in Notepad++.
3 - Check if your browser is compatible with your font, if you're somehow importing a font. Or try and add a css to set your fonts to a default/generally accepted one like
body
{
font-family: "Times New Roman", Times, serif;
}
Hope it helps :)
The reason for having saved the file with Windows-1252 encoding (most likely) instead of UTF-8 encoding resulting in getting the non-ASCII character displayed wrong in the browsers was missing knowledge about UTF-8 detection by UltraEdit and perhaps also appropriate UTF-8 configuration.
How currently latest version 22.10 of UltraEdit detects UTF-8 encoding is explained in detail in user-to-user forum topic UTF-8 not recognized, largish file. This forum topic contains also recommendations on how to configure UltraEdit best for HTML writers who use mainly UTF-8 encoding for all HTML files. The UTF-8 detection was greatly improved with UltraEdit v24.00 which detects UTF-8 encoded characters also on in very large files on scrolling to a block containing a UTF-8 encoded character.
Unfortunately the regular expression search used by currently latest UltraEdit v22.10 and previous versions to detect a UTF-8 HTML character set declaration does not work for short HTML5 variant as reported in forum topic Short UTF-8 charset declaration in HTML5 header. The reason is the double quote character between charset= and utf-8. I reported this by email to IDM Computer Solutions, Inc. as the referenced topic was created with the suggestion to make the small change in the regular expression to detect also short HTML5 UTF-8 declaration. The UTF-8 detection was updated later by the developers of UltraEdit for UE v24.00 and UES v17.00 as a post on referenced forum topic explains in detail.
However, when an HTML5 file is declared as UTF-8 encoded, but UltraEdit loaded it as ANSI file, the user can see the wrong loading in the status bar at bottom of main window. A small (less than 64 KB) UTF-8 encoded HTML file should result in getting
either U8- and line terminator type (DOS/UNIX/MAC) displayed for users of UE < v19.00 or when using basic status bar in later versions of UE
or UTF-8 selected in encoding selector in status bar for users of UE v19.00 or later versions not using basic status bar.
If this is not the case, the UltraEdit user can use
Save As from menu File and select UTF-8 - NO BOM for Encoding (Windows Vista or later) respectively Format (Windows 2000/XP) to convert the file from ANSI to UTF-8 without byte order mark, or
ASCII to UTF-8 (Unicode editing) from submenu Conversions in menu File to convert the file from ASCII/ANSI to UTF-8 without an immediate save, or
select Unicode - UTF-8 via encoding selector in status bar (UE v19.00 or later only) resulting also in an immediate conversion from ASCII/ANSI to UTF-8 and enabling Unicode editing.
For the last two options the UTF-8 BOM settings at Advanced - Settings or Configuration - File Handling - Save determine saving the file without or with byte order mark on next save.
Once the word méywe is saved into the file using UTF-8 encoding resulting in byte stream 6D C3 A9 79 77 65 (hexadecimal) which would be displayed as méywe when UTF-8 encoded file is opened in ASCII/ANSI mode (option in File - Open dialog) using Windows-1252 as code page, UltraEdit detects this file on next opening automatically as UTF-8 encoded file although <meta charset="utf-8"> is not recognized because there is now at least one UTF-8 encoded character in the first 64 KB of the file.
To answer the question:
What did I miss?
You missed to save the file as UTF-8 encoded file after having it opened or created as ANSI file (or more precise single byte per character encoded text file using a code page) and having it declared as UTF-8 encoded. This is a common problem of many users writing into an HTML file
<meta charset="utf-8">
or
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
or into an XML file
<?xml version="1.0" encoding="UTF-8"?>
or
<?xml version="1.0" encoding='utf-8'?>
and other variations depending on usage of ' or " and writing either UTF-8 or utf-8 (and other spellings) without really knowing what this string means for the applications interpreting the bytes of the file.
What's the best default new file format? contains lots of useful information and links to web pages with useful information about text encoding, which one to use for which file types and how to configure UltraEdit accordingly.
Check and see if the server is sending a charset in the Content-type header. The encoding specified in that will take precedence over what you specify with the meta element.
Changing font-family to Calibri (or any other generally accepted font) worked for me.
Example:
<span style="font-family:Calibri"># My_Text</span>
I am using MS access accdb database and PHP. It had problem in displaying the "±" character . It was displaying "�".
I added the following line in PHP at the beginning to get it right. My problem is solved now.
header('Content-type: text/html; charset=ASCII');
Another method is to use mb_convert_encoding($row,'UTF-8','ASCII' );
The header declaration is not required.
In my case I converted the special character to decimal NCR and it worked. I have to do this because using meta tag do not work and I do not want to change my font.
There are many online unicode to decimal or hex converter.
Χαίρετε -> Χαίρετε
Replace meta charset="utf-8" with meta http-equiv="Content-Type" content="text/html; charset=utf-8". Maybe it will help.
Otherwise, what is your font?

how to fix £ showing on HTML, possible through htaccess?

I've switched hosts and somehow on all my HTML files which contain the pound sign is replaces with an A in front: £. Is there a way to overcome this problem without adding
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head>
on every HTML page?
You have various alternative ways to overcome the problem, including any one of these:
In .htaccess as you asked: insert the line "AddDefaultCharset utf-8" or follow further advice from W3C or further advice from askapache.com
Insert the HTML 5 doctype "<!DOCTYPE html>" at the beginning of each HTML page, thus causing the browser to interpret the default character encoding to be UTF-8 instead of ISO-8859-1.
Store your HTML using character encoding ISO-8859-1, so that the pound sign is stored as one byte. Currently your HTML would appear to be stored using character encoding UTF-8, so that the pound sign is stored as two bytes. Here is one way to store a copy of a UTF-8 file as ISO-8859-1: iconv --from-code=UTF-8 --to-code=ISO-8859-1 inputfile.html > outputfile.html
Store your HTML using 7-bit (ASCII) characters, with the pound sign encoded as an XML numeric character entity £ or (hexadecimal) £ or the HTML named character entity £

HTML character sets & MySQL character sets

Which HTML character set would cover all these? Which character set do I need in MySQL to export and then import them?
SAINT RAPHAEL ARNÁIZ BARÓN (Spanish)
St Thérèse of the Child Jesus, Virgin, Doctor (French)
M. Orsola (Giulia) Ledóchowska, Religious (Eastern European)
In MySQL, use the UTF-8 character set. This will allow you to represent a very wide variety of data appropriately in your DBMS. If you use your mySQL collation settings correctly, MySql will collate (sort) your info nicely as well.
To render this stuff into HTML, you probably need to entitize characters other than the basic 7-bit ASCII ones. For example, look at this web page describing the Unicode character for uppercase Ñ http://www.fileformat.info/info/unicode/char/00D1/index.htm
In HTML this is represented by ampersand poundsign x d 1 semicolon
Your web app language (PHP? Java?) has functions built in to convert between UTF-8 strings (to stash in the DBMS) and entitized html (for display on the web). Use them.
Use MySQL's UTF-8 character set for your tables and columns, and send a SET NAMES UTF8 statement after initialising the MySQL connection in your scripting language of choice. Ensure your script also sends a HTTP header indicating that your page is in UTF-8, and you should be good to go. You may want to read this, and the links for further reading look good too.
In PHP, to send this HTTP header, you would use
header("Content-Type: text/html; charset=UTF-8");. At the top of your <head> element in your HTML page, you can also add <meta charset="UTF-8"> (in HTML5), or <meta http-equiv="Content-type" content="text/html;charset=UTF-8"> (in HTML 4.01 or HTML5; but you can't use both ways and still get valid HTML5).

How can I show special characters like "e" with accent acute over it in HTML page?

I need to put the name of some universities on my web page. I have typed them as they were but in some browser or maybe some computers they appear differently. For example, "Universite de Moncton" should have the 2nd "e" in Universite with an accent acute over it. Could you please help about it.
If you’re using a character set that contains that character, you can use an appropriate character encoding and use it literally:
Universit‌é de Moncton
Don’t forget to specify the character set/encoding properly.
If not, you can use an HTML character reference, either a numeric character reference that denotes the code point of the character in the Universal Character Set (UCS):
Universit‌é de Moncton
Universit‌é de Moncton
Or using an entity reference:
Universit‌é de Moncton
But this entity is just a named representation of the numeric character reference (see the list of entity references that are defined in HTML 4):
<!ENTITY eacute CDATA "é" -- latin small letter e with acute,
U+00E9 ISOlat1 -->
You can use UTF-8 HTML Entities:
è è
é é
ê ê
ë ë
Here's a handy search page for the UTF-8 Character Map
I think from the mention that 'in some computers or browsers they appear differently' that the problem you have is with the page or server encoding. You must
encode the file correctly (how to do this depends on your text editor)
assign the correct encoding in your webpage, done with a meta tag
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
force the server encoding with, for example, PHP's header() function:
header('Content-Type: text/plain; charset=ISO-8859-1');
Or, yes, as everyone has pointed out, use the html entities for those characters, which is always safe, but might make a mess when you try to find-replace in code.
There are two methods. One is by using "HTML entities." You need to enter them as, for example, é. Here is a comprehensive reference of named entities; you can also reference the Unicode code point of a given character, using its decimal form as Ӓ or its hex form as Ӓ.
Perhaps more common now (ten years after this answer was originally entered) is simply using Unicode characters directly. Rất dễ dàng, phải không? This is more acceptable and universal because most pages now use UTF-8 as their character encoding.
运气!
By typing it in to your HTML code. é <--You can copy and paste this one if you want.
Microsoft windows has a character map for accessing characters not on your keyboard, it's called Character map.
http://www.starr.net/is/type/htmlcodes.html
This site shows you the HTML markup for all of those characters that you will need :)