Why smart quotes look garbled in html - html

This is my htmlcode
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="en-us" http-equiv="Content-Language" />
<meta content="text/html; charset=windows-1252" http-equiv="Content-Type" />
</head>
<body>
“”
</body>
</html>
This is what I see in firefox “â€
I wonder why.

To write characters outside the standard ASCII set (which is common among almost all encodings) without using HTML entities, you have to make sure that the file is actually saved using the encoding that you specify in the meta tag.
Your page is saved as UTF-8 (the default), which means that those characters will be wrong when the browser tries to decode them as Windows-1252.

You need to use the HTML entities for smart quotes: “ for the left, ” on the right.
More info here.

Related

Removing <body> with a userscript

I'm sorry, but I am not very good with programming. I am trying to fix this irritating bug on my school's website through a userscript. I have tested the RegEx on several pages, at least that works. I need to make the userscript remove the parts I don't want to see. This is a snippet from the source of the website, I have marked what needs to be removed with '//'.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
//<html><head>
//<title>404 Not Found</title>
//</head><body>
//<h1>Not Found</h1>
//<p>The requested URL /get.php was not found on this server.</p>
//</body></html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-gb" lang="en-gb" >
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta name="robots" content="index, follow" />
This is my userscript that does not work. I know it reflects my skills as a programmer, please don't hate.
var REGEX = /<HTML>(.*?)([^\n]*?\n+?)+?<\/BODY><\/HTML>/ig;
document.body.innerHTML=document.body.innerHTML.replace(REGEX, '');
This markup is obviously invalid, but the browser (at least Chrome and Firefox) will merge these two <html> sections together with its best guess. So interacting with document.body is probably not what you want.
Doing something like this will visually fix the issue:
document.querySelector('h1').remove() // remove first h1 "Not Found"
document.querySelector('p').remove() // remove first p "The requested..."

Question marks '???' before doctype in HTML UTF-8 email, the rest of the document is fine

Im aware there are other posts on character encoding however this one appears to only occur before the doctype.
My original source is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="telephone=no" name="format-detection">
<title>TITLE</title>
</head>
But it comes through in an email as (note the question marks at the start):
???<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta content="telephone=no" name="format-detection" />
<title>TITLE</title>
</head>
This seems strange as there are no characters or whitespace before the doctype. I don't really get 'what' can't be rendered. Also, the file was originally saved in visual studio using the following settings:
Any help much appreciated
There are actually characters before the doctype. As you have saved the file as UTF-8 with signature, it contains a BOM (byte order mark) at the beginning of the file.
Save it as UTF-8 without signature, and the characters will go away.
You don't have a *< / html > * tag ending the whole code. Try that. If it still isn't working, I suggest you use notepad++ instead of Visual Studio.

unicode gibberish character in html

I am trying to rewrite old php mysql website in codeigniter, The site is in unicode, and text in database is stored like this
बेलायतम&.
when i pull data from database and display in html it appears like this
सरà¥à¤µà¥‹à¤šà¥à¤šà¤•à¥‹ à
It is fine in old site, I am using exactly as same header as old site in html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I am banging my head how to correct text ?
Try this
<meta charset='utf-8'>
There should be a problem, html's codes you are using are right, just in case check if you can see proper characters on this website:
http://www.unicodeblocks.com/block/Devanagari
and make sure you are using:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Please double check it - use "view page source" on your page and check, maybe you used meta tag twice with different variables?
and one more...
if you are using db, please do this:
$mysqli->query('set character set utf8');

Correct doctype, attributes, encoding for Chinese language

How should I modify this in order to display Chinese characters?
Those question marks are actually Chinese writings.
Thanks
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
?????????????????????????????
????????????????
?????????????????????,?????????????,???!
</body>
</html>
I would making sure that your editor's character encoding settings match. (i.e. change the setting to UTF-8 and try retyping/repasting). For example, in Eclipse, IIRC, the default encoding for most files is dependent on your regional settings (it will usually pick a non-UTF variant such as ISO-8859-1 (Latin-1) - on my machine). Which editor are you using?

HTML5 page language, direction and encoding

What is the correct way of declaring a HTML5 page to be in Hebrew, RTL and utf-8 encoded? I haven't done it in a while, but I remember that in HTML4 it involved 3 or 4 tags and attributes that seemed redundant. Is it still the same?
<html dir="rtl" lang="he">
<head>
<meta charset="utf-8">
...
</head>
...
</html>
You need the following:
A <!doctype html> to indicate your page is HTML5.
An <HTML> tag with the following attributes:
dir="rtl"
lang="he"
Note: you may omit the ", or use ' instead.
A <meta> tag to declare the character encoding. You can choose one of the following:
<meta charset="UTF-8">
Note: you may omit the ", or use ' instead.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
This is the "legacy" way of declaring character encoding. It's still allowed within HTML5, but all modern browsers support the first variant so there is no need for this.
Note: you may omit the " for the http-equiv attribute, or use ' instead for all attributes.
If the browser encounters an UTF-8 byte order mark, it will treat an HTML5 file as UTF-8. This happens regardless of any character encoding declared using meta tags.
None of the tags, attributes and attribute values used here, or the DOCTYPE, are case sensitive.
Note: if the browser encounters a character encoding declaration, it will re-parse the document from the start using the specified encoding. You can put your encoding inside a Content-Type HTTP header so this won't be a problem.
Note also that the browser will only look for a character encoding declaration in the first 1024 bytes of a document.
You need these to create a HTML5 page with language as hebrew, direction as RTL, and utf-8 encoded
<!DOCTYPE html> For declaring it as a HTML5 page
<html dir="rtl" lang="he"> For direction and language
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> For utf-8
<html dir="rtl" lang="he">
not: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Does not work on "Chrome" and "Firefox" browsers.