HTML - Arabic Support - html

i have a website in which i have to put some lines in Arabic.... how to do it...
where to get the Arabic text characters... how to make the page support Arabic...
i have to put a line per page and there is a lotta lotta pages so can't go around making images and putting them...

This is the answer that was required but everybody answered only part one of many.
Step 1 - You cannot have the multilingual characters in unicode document.. convert the document to UTF-8 document
advanced editors don't make it simple for you... go low level...
use notepad to save the document as meName.html & change the encoding
type to UTF-8
Step 2 - Mention in your html page that you are going to use such characters by
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Step 3 - When you put in some characters make sure your container tags have the following 2 properties set
dir='rtl'
lang='ar'
Step 4 - Get the characters from some specific tool\editor or online editor like i did with Arabic-Keyboard.org
example
<p dir="rtl" lang="ar" style="color:#e0e0e0;font-size:20px;">رَبٍّ زِدْنٍي عِلمًا</p>
NOTE: font type, font family, font face setting will have no effect on special characters

The W3C has a good introduction.
In short:
HTML is a text markup language. Text means any characters, not just ones in ASCII.
Save your text using a character encoding that includes the characters you want (UTF-8 is a good bet). This will probably require configuring your editor in a way that is specific to the particular editor you are using. (Obviously it also requires that you have a way to input the characters you want)
Make sure your server sends the correct character encoding in the headers (how you do this depends on the server software you us)
If the document you serve over HTTP specifies its encoding internally, then make sure that is correct too
If anything happens to the document between you saving it and it being served up (e.g. being put in a database, being munged by a server side script, etc) then make sure that the encoding isn't mucked about with on the way.
You can also represent any unicode character with ASCII

You not only have to put the meta tag, telling that it is UTF-8 but really make the document UTF-8. You can do that with good editors (like notepad++) by converting them to "unicode" or "UTF-8 without BOM". Than you can simply use arabic characters
As this page is UTF-8, here are some examples (I hope I don't write anything rude here): شغف
If you use a server side scripting language make sure that it does not output the page in a different encoding. In PHP e.g. you can set it like this:
header('Content-Type: text/html; charset=utf-8');

If you don't even know where to get Arabic characters, but you want to display them, then you're doing something wrong.
Save files containing Arabic characters with encoding UTF-8. A good editor allows you to set the character encoding.
In the HTML page, place the following after <head>:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
If you're using XHTML:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
That's it.
An alternative way (without messing with the encoding of a file), is using HTML escape sequences. This website does that jobs for you: http://www.htmlescape.net/

Won't you need the ensure the area where you display the Arabic is Right-to-Left orientated also?
e.g.
<p dir="rtl">

i edit the html page with notepad ++ ,set encoding to utf-8 and its work

As mentioned above, by default text editors will not use UTF-8 as the standard encoding for documents.
However most editors will allow you to change that in the settings. Even for each specific document.

Check you have <meta charset="utf-8"> inside head block.

Related

W3 validation error "content" "charset" [duplicate]

In order to define charset for HTML5 Doctype, which notation should I use?
Short:
<meta charset="utf-8" />
Long:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In HTML5, they are equivalent. Use the shorter one, as it is easier to remember and type. Browser support is fine since it was designed for backwards compatibility.
Both forms of the meta charset declaration are equivalent and should work the same across browsers. But, there are a few things you need to remember when declaring your web files character-set as UTF-8:
Save your file(s) in UTF-8 encoding without the byte-order mark (BOM).
Declare the encoding in your HTML files using meta charset (like above).
Your web server must serve your files, declaring the UTF-8 encoding in the Content-Type HTTP header.
Apache servers are configured to serve files in ISO-8859-1 by default, so you need to add the following line to your .htaccess file:
AddDefaultCharset UTF-8
This will configure Apache to serve your files declaring UTF-8 encoding in the Content-Type response header, but your files must be saved in UTF-8 (without BOM) to begin with.
Notepad cannot save your files in UTF-8 without the BOM. A free editor that can is Notepad++. On the program menu bar, select "Encoding > Encode in UTF-8 without BOM". You can also open files and re-save them in UTF-8 using "Encoding > Convert to UTF-8 without BOM".
More on the Byte Order Mark (BOM) at Wikipedia.
Another reason to go with the short one is that it matches other instances where you might specify a character set in markup. For example:
<script type="javascript" charset="UTF-8" src="/script.js"></script>
<p><a charset="UTF-8" href="http://example.com/">Example Site</a></p>
Consistency helps to reduce errors and make code more readable.
Note that the charset attribute is case-insensitive. You can use UTF-8 or utf-8, however UTF-8 is clearer, more readable, more accurate.
Also, there is absolutely no reason at all to use any value other than UTF-8 in the meta charset attribute or page header. UTF-8 is the default encoding for Web documents since HTML4 in 1999 and the only practical way to make modern Web pages.
Also you should not use HTML entities in UTF-8. Characters like the copyright symbol should be typed directly. The only entities you should use are for the five reserved markup characters: less than, greater than, ampersand, prime, double prime.
Entities need an HTML parser, which you may not always want to use going forward. They introduce errors, make your code less readable, increase your file sizes, and sometimes decode incorrectly in various browsers depending on which entities you used. Learn how to type/insert copyright, trademark, open quote, close quote, apostrophe, em dash, en dash, bullet, Euro, and any other characters you encounter in your content, and use those actual characters in your code.
The Mac has a Character Viewer that you can turn on in the Keyboard System Preference, and you can find and then drag and drop the characters you need, or use the matching Keyboard Viewer to see which keys to type. For example, trademark is Option + 2. UTF-8 contains all of the characters and symbols from every written human language.
So there is no excuse for using -- instead of an em dash. It is not a bad idea to learn the rules of punctuation and typography also ... for example, knowing that a period goes inside a close quote, not outside.
Using a <meta> tag for something like content-type and encoding is highly
ironic, since without knowing those things, you couldn't parse the file
to get the value of the meta tag.
No, that is not true. The browser starts out parsing the file as the browser's default encoding, either UTF-8 or ISO-8859-1. Since US-ASCII is a subset of both ISO-8859-1 and UTF-8, the browser can read <html><head> just fine either way ... it is the same. When the browser encounters the meta charset tag, if the encoding is different than what the browser is already using, the browser reloads the page in the specified encoding.
That is why we put the meta charset tag at the top, right after the head tag, before anything else, even the title. That way you can use UTF-8 characters in your title.
You must save your file(s) in UTF-8 encoding without BOM
That is not strictly true. If you only have US-ASCII characters in your document, you can Save it as US-ASCII and serve it as UTF-8, because it is a subset. But if there are Unicode characters, you are correct, you must Save as UTF-8 without BOM.
If you want a good text editor that will save your files
in UTF-8, I recommend Notepad++.
On the Mac, use Bare Bones TextWrangler (free) from Mac App Store, or Bare Bones BBEdit which is at Mac App Store for $39.99 ... very cheap for such a great tool.
In either app, there is a menu at the bottom of the document window where you specify the document encoding and you can easily choose "UTF-8 no BOM". And of course you can set that as the default for new documents in Preferences.
But if your Webserver serves the encoding in the HTTP header,
which is recommended, both [meta tags] are needless.
That is incorrect. You should of course set the encoding in the HTTP header, but you should also set it in the meta charset attribute so that the page can be saved by the user, out of the browser onto local storage and then opened again later, in which case the only indication of the encoding that will be present is the meta charset attribute.
You should also set a base tag for the same reason ... on the server, the base tag is unnecessary, but when opened from local storage, the base tag enables the page to work as if it is on the server, with all the assets in place and so on, no broken links.
AddDefaultCharset UTF-8
Or you can just change the encoding of particular file types like so:
AddType text/html;charset=utf-8 html
A tip for serving both UTF-8 and Latin-1 (ISO-8859-1) files is to give the UTF-8 files a "text" extension and Latin-1 files "txt."
AddType text/plain;charset=iso-8859-1 txt
AddType text/plain;charset=utf-8 text
Finally, consider saving your documents with Unix line endings, not legacy DOS or (classic) Mac line endings, which don't help and may hurt, especially down the line as we get further and further from those legacy systems.
An HTML document with valid HTML5, UTF-8 encoding, and Unix line endings is a job well done. You can share and edit and store and read and recover and rely on that document in many contexts. It's lingua franca. It's digital paper.
<meta charset="utf-8"> was introduced with/for HTML5.
As mentioned in the documentation, both are valid. However, <meta charset="utf-8"> is only for HTML5 (and easier to type/remember).
In due time, the old style is bound to become deprecated in the near future. I'd stick to the new <meta charset="utf-8">. There's only one way, but up. In tech's case, that's phasing out the old (really, REALLY fast)
Documentation: HTML meta charset Attribute—W3Schools
While not contesting the other answers, I think the following is worthy of mentioning.
The “long” (http-equiv) notation and the “short” one are equal. Whichever comes first wins;
Web server headers will override all the <meta> tags;
BOM (byte order mark) will override everything, and in many cases it will affect HTML 4 (and probably other stuff, too);
If you don't declare any encoding, you will probably get your text in “fallback text encoding” that is defined your browser. Neither in Firefox nor in Chrome it's UTF-8;
In absence of other clues the browser will attempt to read your document as if it was in ASCII to get the encoding, so you can't use any weird encodings (UTF-16 with BOM should do, though);
While the specifications say that the encoding declaration must be within the first 512 bytes of the document, most browsers will try reading more than that.
You can test by running echo 'HTTP/1.1 200 OK\r\nContent-type: text/html; charset=windows-1251\r\n\r\n\xef\xbb\xbf<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta charset="windows-1251"><title>привет</title></head><body>привет</body></html>' | nc -lp 4500 and pointing your browser at localhost:4500. (Of course you will want to change or remove parts. The BOM part is \xef\xbb\xbf. Be wary of the encoding of your shell.)
Please mind that it's very important that you explicitly declare the encoding. Letting browsers guess can lead to security issues.
Use <meta charset="utf-8" /> for web browsers when using HTML5.
Use <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> when using HTML4 or XHTML, or for outdated DOM parsers, like DOMDocument in PHP 5.3.
To embed a signature in an email, I would use the long version:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The reason is that not many email readers use HTML5, so it's always better use old HTML styles. Actually, it's better to use tables than divs + CSS as well.
There is some news based on Mozilla Foundation, and SitePoint:
Do not use this value (http-equiv=content-type) as it is obsolete.
Prefer the charset attribute on the <meta> element.

using european characters in html

I started learning HTML + CSS a week or two ago, and I'm facing a problem. I'm european so I need to use special characters like á, ã, ç , etc a lot. Is there any other way I can do that without using the corresponding code for each letter every time I need to use one? Like a code I can put in the beggining of the html document or something like that that would make all the special characters accepted.
Decide which encoding you want to use for your site; if you don't have any preference, use UTF-8.
Save the .html file in that encoding in your text editor. Consult the help of your specific text editor how to choose which encoding the file gets saved in.
Add <meta charset="utf-8"> to your <head> to instruct the browser to treat the page as UTF-8 encoded.
Preferably also configure your web server to output a Content-Type: text/html; charset=utf-8 HTTP header, since that takes precedence if present. Consult the manual of your web server how to do that.
Write literally any character you can input directly as is into your document and enjoy.
Further reading:
https://www.w3.org/International/tutorials/tutorial-char-enc/
Handling Unicode Front To Back In A Web App
What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text
UTF-8 all the way through

Add another language inline

I'm writing a paragraph that requires me to use a Greek word that means something else, but when I put the Greek word into my text editor and save it, it looks weird in my browser. I tried using a span but it still shows the same weird code.
<p>Music is an art form whose medium is sound and silence. Its common elements are pitch
(which governs melody and harmony), rhythm (and its associated concepts tempo, meter, and
articulation), dynamics, and the sonic qualities of timbre and texture. The word derives
from Greek <span lang="el">μουσική</span> (mousike; "art of the Muses").</p>
Perhaps your page is being interpreted using the wrong charset, try adding <meta charset="UTF-8"> inside your <head> element. This tells the browser how to interpret more complex characters like you described.
Make sure this tag is present in your head:
<meta charset="utf-8">
ANSI and ISO charsets are the default when that tag is not present. ISO (the newest of those two) only supports 256 characters. UTF-8 character set allows you to use unicode characters directly in your HTML page.
That meta tag tells the browser to interpret your HTML page with the correct character set.
Check out the wikipedia page on ISO 8859-1 for more info. Also, here's the utf-8 wikipedia page.
Edit
As Juhana pointed out in the comments, make sure your editor is set to the appropriate encoding as well (most programming/web-specific editors, like Sublime for example, should do this by default, but other multi-purpose text editors may not.)

How to make the website show signs like "č" and "ć"?

I'm making a website that is in Croatian, and I need to use signs like: "č", "ć", "ž", "đ" and "š". They are currently displayed as little boxes.
Info:
I use Notepad ++.
I set the encoding there to UTF-8.
I put the following line of HTML in: <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
However, it does not work. Even Notepad ++ can't display my characters using UTF-8, so that would suggest that I should probably use something else...
http://webdesign.maratz.com/lab/utf_table/
Use HTML entities, for example
č : č
ž : ž
This sounds more like a font issue than a character encoding issue. If it were a character encoding issue, the characters would most likely be displayed as 2+ ASCII characters. The boxes, however, typically mean the character encoding is correct, but that specific character is not available in the font being used (which is especially common with lesser-used fonts). This would explain why it's behaving incorrectly in both the website and Notepad++.
To fix the issue, simply use a different font in your editor and website.
Note: I recommend a widely used font for the best chance of it working. Specifying a generic name in the website (e.g. serif or sans-serif) will probably have even better results, as the OS/browser would decide on the best font to use.
In short, be consistent about your character encoding throughout.
Configure your editor to save in the encoding you want
If you use any server side programming, make sure it isn't transcoding your data
If you use a database, make sure it is configured to use the same encoding
Configure your server to emit a Content-Type header that specifies that encoding
Use the meta tag in your question
The W3C provides useful material on encodings that starts here.
A useful site for special characters and their ASCII-codes: CopyPaste Character
To 'type' them, use the alt codes.
However, to use them in your site, you'll better use the HTML codes like you can find on CPC
As a test, try this:
<span style="font-family:Arial Unicode MS">
č ć ž đ š
</span>
You should be able to see your characters correctly.
I've just copied and pasted a line from your question along with your meta tag, placed it into a plain text file in vi.
It works just fine - all characters are displayed fine: http://www.dusystems.com/tmp/1.html
If you can't do the same with your editor then the problem is with the editor and not character sets and encodings.
If you're on Windows you can use its built-in Notepad to edit UTF-8 files. Open Notepad, type all of your special characters, add the meta tag. When doing Save As select UTF-8 from the Encoding drop-down in the dialog. Save as something.html and open in IE. It will 100% work.

How can I properly display German characters in HTML?

My pages contain German characters and I have typed the text in between the
HTML tag, but the browser views some characters differently. Do I need to include anything in HTML to properly display German characters?
<label> ausgefüllt </label>
It seems you need some basic explanations about something that unfortunately even most programmers don't understand properly.
Files like your HTML page are saved and transmitted over the Internet as a sequence of bytes, but you want them displayed as characters. In order to translate bytes into characters, you need a set of rules called a character encoding. Unfortunately, there are many different character encodings that have historically emerged to handle different languages. Most of them are based on the American ASCII encoding, but as soon as you have characters outside of ASCII such as German umlauts, you need to be very careful about which encoding you use.
The source of your problem is that in order to correctly decode an HTML file, the browser needs to know which encoding to use. You can tell it so in several ways:
The "Content-Type" HTTP header
The HTML META tag
The XML encoding attribute if you use XHTML
So you need to pick one encoding, save the HTML file using that encoding, and make sure that you declare that encoding in at least one of the ways listed above (and if you use more than one make damn sure they agree). As for what encoding to use, Germans often use ISO/IEC 8859-15, but UTF-8 is increasingly becoming the norm, and can handle any kind of non-ASCII characters at the same time.
UTF-8 is your friend.
Try
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
and check which encoding your webserver sends in the header.
If you use PHP, you can send your own headers in this way (you have to put this before any other output):
<?php header('Content-Type: text/html; charset=utf-8'); ?>
Also doublecheck that you saved your document in UTF-8.
Try the solution in blog post German characters encoding issue (2012-05-10):
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
Have you tried ü (ü) and Ü (Ü)?
You can find how to type other letters here.
Declare <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=utf-8">
and when saving the file, for example in notepad, choose the save as to be UTF-8 and not just .txt.
This should render the characters ok.
you may try utf8_encode() or utf8_decode() functions.Check if any of these works.
For example <?php echo utf8_encode('ausgefüllt'); ?>
Hope it will work.
Sounds like a character encoding issue, in that the file is saved as a different character encoding to what the webserver is saying it is.
I don't like the use of HTML entities (like %uuml;), they are only needed when there is something wrong with your characterset.
In short:
The RIGHT way is to fix your characterset.
The EASY way is to just use entities. You may not ever see any problems with this.
Tracking down characterset error can be very difficult. If you give us an URL where we can see the problem, we can probably give you a good hint where to look.
save as your file with UTF8, and use this META:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>