Add another language inline - html

I'm writing a paragraph that requires me to use a Greek word that means something else, but when I put the Greek word into my text editor and save it, it looks weird in my browser. I tried using a span but it still shows the same weird code.
<p>Music is an art form whose medium is sound and silence. Its common elements are pitch
(which governs melody and harmony), rhythm (and its associated concepts tempo, meter, and
articulation), dynamics, and the sonic qualities of timbre and texture. The word derives
from Greek <span lang="el">μουσική</span> (mousike; "art of the Muses").</p>

Perhaps your page is being interpreted using the wrong charset, try adding <meta charset="UTF-8"> inside your <head> element. This tells the browser how to interpret more complex characters like you described.

Make sure this tag is present in your head:
<meta charset="utf-8">
ANSI and ISO charsets are the default when that tag is not present. ISO (the newest of those two) only supports 256 characters. UTF-8 character set allows you to use unicode characters directly in your HTML page.
That meta tag tells the browser to interpret your HTML page with the correct character set.
Check out the wikipedia page on ISO 8859-1 for more info. Also, here's the utf-8 wikipedia page.
Edit
As Juhana pointed out in the comments, make sure your editor is set to the appropriate encoding as well (most programming/web-specific editors, like Sublime for example, should do this by default, but other multi-purpose text editors may not.)

Related

Why to include <meta charset=“” />?

I mean if a browser is already reading the HTML file and is able to read the text <meta charset=“” /> that means it already knows the encoding of the HTML file. So why is it needed to be specified inside the HTML file? Isn’t it redundant?
Is it because browser starts reading file using smallest charset, like ASCII, and it is subset of many charsets?
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
For a web page, the original idea was that the web server would return a similar Content-Type http header along with the web page itself — not in the HTML itself, but as one of the response headers that are sent before the HTML page.
This causes problems. Suppose you have a big web server with lots of sites and hundreds of pages contributed by lots of people in lots of different languages and all using whatever encoding their copy of Microsoft FrontPage saw fit to generate. The web server itself wouldn’t really know what encoding each file was written in, so it couldn’t send the Content-Type header.
It would be convenient if you could put the Content-Type of the HTML file right in the HTML file itself, using some kind of special tag. Of course this drove purists crazy… how can you read the HTML file until you know what encoding it’s in?! Luckily, almost every encoding in common use does the same thing with characters between 32 and 127, so you can always get this far on the HTML page without starting to use funny letters:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
But that meta tag really has to be the very first thing in the section because as soon as the web browser sees this tag it’s going to stop parsing the page and start over after reinterpreting the whole page using the encoding you specified.
See also W3.org:
Always declare the encoding of your document using a meta element with a charset attribute, or using the http-equiv and content attributes (called a pragma directive). The declaration should fit completely within the first 1024 bytes at the start of the file, so it's best to put it immediately after the opening head tag.
So yes. The entire premise is that until the HTML parser of your browser reads that meta tag, there should not be any bytes that can be ambiguously interpreted as other bytes; the entire text shown including the charset attribute value ("utf-8") fits into the ASCII encoding.
From Joel's article:
Internet Explorer actually does something quite interesting: it tries to guess, based on the frequency in which various bytes appear in typical text in typical encodings of various languages, what language and encoding was used. Because the various old 8 bit code pages tended to put their national letters in different ranges between 128 and 255, and because every human language has a different characteristic histogram of letter usage, this actually has a chance of working.
The average HTML parser goes like this:
Is there a Content-Type response header with a charset parameter? Use that to decode the bytes of the received content into a string.
Start reading the HTML as ASCII (or UTF-8). Is there a <meta http-equiv="Content-Type"> header with a usable charset? Use that.
Start parsing the bytes and use heuristics to determine the most likely encoding used.
It is an obsolete tag, but the reason: we have ISO 646 (since 1967) which defines a standard set of characters. ASCII specifies the few optional characters on ISO 646, so ISO 646 is the mother of most of encodings.
Note: most systems are based on this standard, ev. using the extension ISO 2022, where you can encode 7-bit and 8-bit characters with few different encodings (e.g. used for Asian character set, where we need more then 256 characters). In any case, the start of a text is compatible with ISO 646. Then control sequences may change the meaning.
So browser can read most of ASCII data (really ISO 646, ISO 2022), and detect exactly how to interpret all other characters.
On Western languages, you get mostly ASCII on lower codes (until 127), but how to interpret the higher codes depends on language (Nordic characters, Western accented characters, Greek characters, etc.). And there are various encoding, which cannot be really detected without explicit specification.
Note: this method fails on few encodings, e.g. multibytes, like UCS-2, UTF-16, UTF-32, but W3C had some methods to detect it: the header should be mostly ASCII charset, so we should have a lot of 00 characters. EBCDIC and other encodings not based on ISO 646 (or ASCII) were already seldom. In principle you can check for some byte strings, but I do not know if browser did it.
In short: with heuristic (and ISO 646) you can guess on how to read ASCII charset, but to know how to interpret "special characters", e.g. accented characters, we must have more information, given by META or by HTTP header. Note: this works also with many Asian encoding (ISO 2022 based)
Why META? It is about control. HTTP header often required webmaster intervention, but with META the author of a page could override the encoding. (e.g. writing static pages, now most dynamic page generators can override HTTP headers).

Print "a" with bar over it in html?

I want to display an "a" in html with bar over it..as in ā. Like I want to write āyush.
I also used overline but that makes it ugly.
Pasting the characted in html gives a-.
In html it is &amacr; (lowercase) or &Amacr; (uppercase).
Replace it with ā
See an example here
Make sure you set your charset in the head of the document.
<meta http-equiv="content-type" content="text/html; charset=utf-8">
You haven't given us enough info to be certain, but this is likely to be an encoding issue. I would guess that the character set you're sending the page in is probably just the default and doesn't include any extended characters.
You need to serve the page as UTF-8.
Add this to your <head> block:
<meta charset="utf-8">
that should be sufficient to fix it.
If you can't change the character set for whatever reason, you could send the character as a HTML entity -- find out the numeric entity code for it and use the &#xxx; notation (where xxx is the character code you require).
You have two main options: use character references like &x#101;, or insert the character “ā” using a tool that does not munge it. In the former case, you need not worry about character encodings, but some other characters may have similar issues without your noticing it. In the latter case, you need to make sure that the character encoding is properly set; see the W3C document Character encodings. Note that setting a meta tag may or may not be sufficient, depending on server.
Either way, there can be font problems. For example, a browser might pick up a glyph for “ā” from a font that is very different from the one used for “a”, causing typographic mess. To avoid this, use a font-family list containing a good selection of fonts containing all the characters you need. More info: Guide to using special characters in HTML.

How to make the website show signs like "č" and "ć"?

I'm making a website that is in Croatian, and I need to use signs like: "č", "ć", "ž", "đ" and "š". They are currently displayed as little boxes.
Info:
I use Notepad ++.
I set the encoding there to UTF-8.
I put the following line of HTML in: <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
However, it does not work. Even Notepad ++ can't display my characters using UTF-8, so that would suggest that I should probably use something else...
http://webdesign.maratz.com/lab/utf_table/
Use HTML entities, for example
č : č
ž : ž
This sounds more like a font issue than a character encoding issue. If it were a character encoding issue, the characters would most likely be displayed as 2+ ASCII characters. The boxes, however, typically mean the character encoding is correct, but that specific character is not available in the font being used (which is especially common with lesser-used fonts). This would explain why it's behaving incorrectly in both the website and Notepad++.
To fix the issue, simply use a different font in your editor and website.
Note: I recommend a widely used font for the best chance of it working. Specifying a generic name in the website (e.g. serif or sans-serif) will probably have even better results, as the OS/browser would decide on the best font to use.
In short, be consistent about your character encoding throughout.
Configure your editor to save in the encoding you want
If you use any server side programming, make sure it isn't transcoding your data
If you use a database, make sure it is configured to use the same encoding
Configure your server to emit a Content-Type header that specifies that encoding
Use the meta tag in your question
The W3C provides useful material on encodings that starts here.
A useful site for special characters and their ASCII-codes: CopyPaste Character
To 'type' them, use the alt codes.
However, to use them in your site, you'll better use the HTML codes like you can find on CPC
As a test, try this:
<span style="font-family:Arial Unicode MS">
č ć ž đ š
</span>
You should be able to see your characters correctly.
I've just copied and pasted a line from your question along with your meta tag, placed it into a plain text file in vi.
It works just fine - all characters are displayed fine: http://www.dusystems.com/tmp/1.html
If you can't do the same with your editor then the problem is with the editor and not character sets and encodings.
If you're on Windows you can use its built-in Notepad to edit UTF-8 files. Open Notepad, type all of your special characters, add the meta tag. When doing Save As select UTF-8 from the Encoding drop-down in the dialog. Save as something.html and open in IE. It will 100% work.

HTML Unicode Issue: How to display special characters

Currently, I have my webpage set to Unicode/UTF-8. When trying to display a special character (for example, em dash, double arrow, etc), it shows up as a question mark symbol. I cannot change these characters to the HTML entity equivalent. How can I circumvent this issue?
A question mark in a lozenge, �, indicates a character-level error: the data contains bytes that do no represent any character, according to the character encoding being applied. This typically happens when the document is declared as UTF-8 encoded but is really in iso-8859-1, windows-1252, or some similar encoding. Windows-1252 is a common default encoding used by various programs on Windows platforms. So you may need to open the file in your authoring program and re-save it as UTF-8 encoded.
If problems remain, please post the URL. Posting the code alone is not sufficient, since the character encoding is primarily specified in HTTP headers.
If you see a question mark in a small box, then it might be a font-level problem (lack of glyph in the fonts being used), but this would be very rare for common characters like the em dash. Different browsers have different ways of indicating character- or font-level problems.
Make sure your document is set to the correct character encoding in the actual code editor, as well as in the doctype. Both are necessary. I spent hours trying to tweak HTML when the only problem was that I needed to set the text setting in Coda.
<head>
<meta charset="utf-8">
See the following screenshot:
Make sure your characters are actually UTF-8 characters. They will probably look something like this:
® or U+0020
http://www.kinsmancreative.com/transfer/char/index.php is a handy site for finding the decimal values of commonly used UTF-8 special characters if you need a reference.

HTML - Arabic Support

i have a website in which i have to put some lines in Arabic.... how to do it...
where to get the Arabic text characters... how to make the page support Arabic...
i have to put a line per page and there is a lotta lotta pages so can't go around making images and putting them...
This is the answer that was required but everybody answered only part one of many.
Step 1 - You cannot have the multilingual characters in unicode document.. convert the document to UTF-8 document
advanced editors don't make it simple for you... go low level...
use notepad to save the document as meName.html & change the encoding
type to UTF-8
Step 2 - Mention in your html page that you are going to use such characters by
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Step 3 - When you put in some characters make sure your container tags have the following 2 properties set
dir='rtl'
lang='ar'
Step 4 - Get the characters from some specific tool\editor or online editor like i did with Arabic-Keyboard.org
example
<p dir="rtl" lang="ar" style="color:#e0e0e0;font-size:20px;">رَبٍّ زِدْنٍي عِلمًا</p>
NOTE: font type, font family, font face setting will have no effect on special characters
The W3C has a good introduction.
In short:
HTML is a text markup language. Text means any characters, not just ones in ASCII.
Save your text using a character encoding that includes the characters you want (UTF-8 is a good bet). This will probably require configuring your editor in a way that is specific to the particular editor you are using. (Obviously it also requires that you have a way to input the characters you want)
Make sure your server sends the correct character encoding in the headers (how you do this depends on the server software you us)
If the document you serve over HTTP specifies its encoding internally, then make sure that is correct too
If anything happens to the document between you saving it and it being served up (e.g. being put in a database, being munged by a server side script, etc) then make sure that the encoding isn't mucked about with on the way.
You can also represent any unicode character with ASCII
You not only have to put the meta tag, telling that it is UTF-8 but really make the document UTF-8. You can do that with good editors (like notepad++) by converting them to "unicode" or "UTF-8 without BOM". Than you can simply use arabic characters
As this page is UTF-8, here are some examples (I hope I don't write anything rude here): شغف
If you use a server side scripting language make sure that it does not output the page in a different encoding. In PHP e.g. you can set it like this:
header('Content-Type: text/html; charset=utf-8');
If you don't even know where to get Arabic characters, but you want to display them, then you're doing something wrong.
Save files containing Arabic characters with encoding UTF-8. A good editor allows you to set the character encoding.
In the HTML page, place the following after <head>:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
If you're using XHTML:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
That's it.
An alternative way (without messing with the encoding of a file), is using HTML escape sequences. This website does that jobs for you: http://www.htmlescape.net/
Won't you need the ensure the area where you display the Arabic is Right-to-Left orientated also?
e.g.
<p dir="rtl">
i edit the html page with notepad ++ ,set encoding to utf-8 and its work
As mentioned above, by default text editors will not use UTF-8 as the standard encoding for documents.
However most editors will allow you to change that in the settings. Even for each specific document.
Check you have <meta charset="utf-8"> inside head block.