HTML in Russian - html

I have to design a Russian version of a web. I get the text from a translator. I copy it in the code of the Dreamweaver but it doesn't work.
I have the usual head:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
What should I do?

You should change encoding of your file to UTF-8. You can do this process when you Save As file in Notepad or you can use Notepad++(Encoding -> Encode in UTF-8) for it.

The document http://www.mig-marketing.com/proves/nando/ru/ contains Russian text in an image only, but it links to http://www.mig-marketing.com/proves/nando/ru/firma.html which contains (in addition to text in an image) Russian text in ISO-8859-5 (= ISO Latin/Cyrillic) encoding. This encoding is declared in a meta tag, but the problem is that the declaration has no effect, since HTTP headers take preference over them, and they say
Content-Type: text/html; charset=ISO-8859-1
(You can conveniently check the HTTP response headers using Firefox with Web Developer Extension and selecting Information → View Response Headers.)
To fix this, contact the web server admin or try and fix it yourself, if the Apache settings allow the use of per-directory .htaccess files, in which case just create a file with that name (including the leading dot) in the directory containing the Russian files and enter the text
AddType text/html;charset=ISO-8859-5 html
This would then make the server send all .html files in that directory with HTTP headers that specify them as ISO-8859-5 encoded.

Re-save all your files in UTF8 forcefully.

After trying so many tings I discovered that the problem was in the server. I don't know exacly how, but when I told them that I need a web in russian they changed something and it works!.

Related

International characters in website file names

I need to create a website (in PHP) that has filenames that include international characters.
For example: transportører.php (notice the 'o' with the diagonal line through it).
So I happily create the file, save it, and upload it to the web server. Whenever I LINK to this file, however, it all goes wrong. I'll have the usual link syntax:
My Link Text
Upon clicking such a link, the web browser attempts to navigate to a non-existent page:
The requested URL /transportører.php was not found on this server.
Notice how the filename has been mutated? The "ø" character in "transportører.php" has been changed into the bizarre "ø" symbol (that's not a comma after the "A", by the way, but an actual component of the symbol itself).
There's obviously some sort of translation going on here, but what, why, and how do I prevent it?
I think, it's two possible reasons:
html encoding
Possibly the encoding of the html file is wrong, so the link is actually pointing to a wrong path. Add
<meta charset="UTF-8">
in the head section of your file.
server settings
If the server is resolving the link wrongly (you can check this by typing the address of your norwegian-named.php in the browser and see if it is replaced), you need to know which server you are using and investigate in this direction. For apache, How to change the default encoding to UTF-8 for Apache? looks promising.
As the URL isn’t percent-encoded in the hyperlink, browsers assume¹ UTF-8 for percent-encoding it, where ø becomes %C3%B8.
However, your server seems to expect/use ISO 8859-1 (instead of UTF-8), where ø becomes %F8.
A quick fix would be to link to the ISO 8859-1 percent-encoded URL:
transportører
(A better fix would be to let your server use UTF-8 for everything, and then to use the UTF-8 percent-encoded URL in the hyperlink.)
¹ Either by default, or because the linking page seems to use UTF-8 (at least according to the HTTP header Content-Type: text/html; charset=UTF-8).
Well, this is embarrassing. Everything was - in actual fact - working correctly. The 404 error made the filename LOOK "wrong" - e.g. transportører.php. However, this is actually correct. That is how HTML seems to reference the file "behind the scenes". So to the browser, "transportører.php" is synonymous with "transportører.php"
What was happening was that FileZilla (my FTP client) objects to international characters. It was changing the filename during upload.... replacing the international characters with "something else". The filenames LOOKED correct on the screen (when I viewed the website folder with Linux Mint's native FTP client), but the underlying character coding was NOT correct. The web-browsers could tell the difference, and hence didn't associated my links with the (mutated) file names, hence triggering an error 404.
The solution in a nutshell: I used Linux Mint native FTP to upload my files, overwriting the ones uploaded by FileZilla, and everything just sprang into life.
Thanks to everyone who offered advice... it was all good stuff, just not the solution in this particular case.

Ideas to get around a Character Encoding mismatch

I get this validation warning when I try and validate my page.
The character encoding specified in the HTTP header (iso-8859-1) is
different from the value in the element (utf-8). I will use the
value from the HTTP header (iso-8859-1) for this validation.
Here is the encoding set for the file:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
If I try and use characters like ĉĵŝĝ (iso-8859-3 compatible) they are rendered incorrectly. I think this is an issue with the server in my college because it is using the version one of the Latin encoding (iso-8859-1).
Is there a way I can get round this (if the problem lies with the encoding set with the college's server)?
Thanks.
You can also set this option in your .htaccess file
<FilesMatch "\.(htm|html|xhtml|php)$">
AddDefaultCharset utf-8
</FilesMatch>
That way, all the files with the above extensions will be served with utf-8 instead of iso-8859-1
Bye
Saluton,
you could try .xhtml. Maybe the college's server then uses another header. The other way is to use HTML entities for the Esperanto letters &#...;.
As the HTTP headers are normative, and not the HTML meta, you are stuck when you cannot change the server's config.
Another solution proposes a local directory for your .html with a httpd.conf file.

Unicode is not shown in meta tag

I have used unicode in my website's meta tag as follows.
<meta property="og:title" content="ශ්‍රී ලංකා" />
But when I get view source in browser, it is shown as follows.
<meta property="og:title" content="????????" />
How can I avoid this?
Thank you.
With an editor like Notepadd++, you must change the file encoding to UTF-8:
The Sinhala characters in your file have been converted to question marks somewhere in the process of uploading to the server or in server actions. They are actual question marks “?”, U+003F, not problem indicators used by browsers or source viewers. Question marks also appear near the very end of the page in visible content, line 445: ?????
The page appears to be served simply from a static HTML file by an Apache server, with no special server-side technology (though one cannot be sure, when looking from outside). This suggests that something has gone wrong in the upload process, like incorrect character code conversion (assuming you have checked that the file in your authoring system is UTF-8 encoded and displays correctly). This may happen if you transfer a file in “text mode” or “Ascii mode”, so I suggest uploading it again, in as raw mode as possible.

Characters not displaying correctly in different browsers

I used certain characters in website such as • — “ ” ‘ ’ º ©.
I found that when testing to see what my website looked like under different browsers (BrowserLab)
the afore-mentioned characters are replaced with �.
I then changed the charset in the webpage header from:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Suddenly all the pages have the above mentioned characters replaced with a ?.
Even more puzzling is this is not always consistent across and even within the same page, as some sections display the character • and © correctly.
In particular, I need to replace the character • with one that will display across browsers, can anyone help me with the answer? Thanks.
You should save your HTML source as UTF8.
Alternatively, you can use HTML entities instead.
The source code needs to be saved in the same encoding as you're instructing the browser to parse it in. If you're saving your files in UTF-8, instruct the browser to parse it as UTF-8 by setting an appropriate HTTP header or HTML meta tag (headers preferable, your web server may be setting one without you knowing). Use a decent editor that clearly tells you what encoding you're saving the file as. If it doesn't display correctly, there's a discrepancy between what you're telling your browser the file is encoded in and what it's really encoded in.
Check to see if Apache is setup to send the charset. Look for the directive "AddDefaultCharset" and set it to Off in .htaccess or your config file.
Most/all browsers will take what is sent in the HTTP headers over what is in the document.
If you're using Notepad++, I suggest You to use Edit Plus editor to copy the text (which has the special characters) and paste it in your file. This should work.
Yes I had this problem too in notepad++ copy and pasting wasn't working with some symbols
I think SLaks is right
HTML entities for copyright symbol &#169

Displaying unicode symbols in HTML

I want to simply display the tick (✔) and cross (✘) symbols in a HTML page but it shows up as either a box or goop ✔ - obviously something to do with the encoding.
I have set the meta tag to show utf-8 but obviously I'm missing something.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Edit/Solution: From comments made, using FireBug I found the headers being passed by my page were in fact "Content-Type: text/html" and not UTF-8. Looking at the file format using Notepad++ showed my file was formatted as "UTF-8 without BOM". Changing this to just UTF-8 the symbols now show correctly... but firebug still seems to indicate the same content-type.
You should ensure the HTTP server headers are correct.
In particular, the header:
Content-Type: text/html; charset=utf-8
should be present.
The meta tag is ignored by browsers if the HTTP header is present.
Also ensure that your file is actually encoded as UTF-8 before serving it, check/try the following:
Ensure your editor save it as UTF-8.
Ensure your FTP or any file transfer program does not mess with the file.
Try with HTML encoded entities, like &#uuu;.
To be really sure, hexdump the file and look as the character, for the ✔, it should be E2 9C 94 .
Note: If you use an unicode character for which your system can't find a glyph (no font with that character), your browser should display a question mark or some block like symbol. But if you see multiple roman characters like you do, this denotes an encoding problem.
I know an answer has already been accepted, but wanted to point a few things out.
Setting the content-type and charset is obviously a good practice, doing it on the server is much better, because it ensures consistency across your application.
However, I would use UTF-8 only when the language of my application uses a lot of characters that are available only in the UTF-8 charset. If you want to show a unicode character or symbol in one of cases, you can do so without changing the charset of your page.
HTML renderers have always been able to display symbols which are not part of the encoding character set of the page, as long as you mention the symbol in its numeric character reference (NCR). Sounds weird but its true.
So, even if your html has a header that states it has an encoding of ansi or any of the iso charsets, you can display a check mark by using its html character reference, in decimal - ✓ or in hex - ✓
So its a little difficult to understand why you are facing this issue on your pages. Can you check if the NCR value is correct, this is a good reference http://www.fileformat.info/info/unicode/char/2713/index.htm
Make sure that you actually save the file as UTF-8, alternatively use HTML entities (&#nnn;) for the special characters.
Unlike proposed by Nicolas, the meta tag isn’t actually ignored by the browsers. However, the Content-Type HTTP header always has precedence over the presence of a meta tag in the document.
So make sure that you either send the correct encoding via the HTTP header, or don’t send this HTTP header at all (not recommended). The meta tag is mainly a fallback option for local documents which aren’t sent via HTTP traffic.
Using HTML entities should also be considered a workaround – that’s tiptoeing around the real problem. Configuring the web server properly prevents a lot of nuisance.
I think this is a file problem, you simple saved your file in 1-byte encoding like latin-1. Google up your editor and how to set files to utf-8.
I wonder why there are editors that don't default to utf-8.