Chinese characters in mailto subject line of email - html

I'm looking to add some Chinese characters in a mailto: email link.
I've tried
Email
but when I click on the link the Outlook subject line shows:
调查亨德森 / Inquiry
I've also tried
Email
but I get the same result as above.
I realize this may be an Outlook issue but I'm interested in finding out the correct way to implement this functionality.

Raw URL Encode
To encode mailto links / standard links with special characters you can use the php function rawurlencode
If you are looking for an online tool try http://www.cafewebmaster.com/online_tools/rawurlencode
Using your example:
Email
would convert to:
Email

Since subject is in mail header, there is no way to know what encoding you are using. You need to use MIME Mail header extension defined in this RFC,
http://www.ietf.org/rfc/rfc2047.txt
The subject in Chinese would look like this,
Subject: =?GB2312?B?u7bTrbLOvNPDwLn61bm74Q==?=
But more and more clients assume UTF-8 encoding now. You might want try that also.

It's cumbersome but possible, here is a good article that discusses both the standard compliant way of an ideal world and real world scenarios with Internet Explorer.

For your text in comment #1, please enable the "Tool->options->mail format->International Options"->"Enable UTF-8 support for mailto: protocol"

I've been trying to put non-ASCII characters in subject lines for a while now. The bottom line is: it does not work reliably.
My (limited) understanding of why it does not work is that the standards say that email is 7-bit ASCII. The MIME standard gets around that by encoding the contents of the email differently. However: the subject line is not part of the contents. It's a header.

Because the default charset of outlook is gb2312, so when you encode the subject, you must translate the Chinese character to gb2313, then encode it.
In a word, the charset which you pass must be consistent with the default charset of outlook.

Related

Which charset should I use in mailto body for maximum compatibilty?

I have UTF-8 page in which I have mailto links with template of body. But these template with data includes East Europian (Czech) characters. These characters messes up some email clients like Outlook 2007 and they are displayed like question marks or some other strange characters.
I know about "Enable UTF-8 support for mailto: protocol." setting in Outlook 2007, but from what I know about it's off by default.
Which charset-encoding should I use in mailto body for maximum compatibilty?
it clearly depends on client settings. in the company i work for we had a similar question. we ended up using ae for ä and so on.
we faced the problem that some clients running ms outlook 2003 were reading it as utf-8 while other clients running outlook 2003 were reading it as a defined ansi table.
another issue is involved in the usage of other mail clients. we also have a bring your own device program where employees could install linux or have mac books. they are using mozilla thunderbird, mutt and so on. other clients have set gmail as their prefered mailto: handler etc. so a global change of the outlook settings through a registry edit as in Rfilip's answer would not apply to us.
anyways.. you can only define the text body of the mail. sadly its impossible to define a html body through a mailto: link. if this would work, encoding could be defined through the body.
there also is no such a flag to set up the prefered encoding inside the mailto: link.
answer to this question: its impossible to detect or set the mail clients encoding so you should switch to characters that would work in every ansi table.
as alternative you could provide two links. one encoded with utf-8 with a note that, if it will display weird, the user might use the second mailto link.
or.. you provide one mailto link aswell as a drop down box to select encoding. (including a brief description of what encoding is as hover ontop of the dropdown)
It depends on the characters you want to use and your clients setups...
it may be Windows1250 ( http://en.wikipedia.org/wiki/Windows-1250 )
So, on G+ I was pointed to answer on other thread -https://stackoverflow.com/a/1831416/1190066. It seems like that different charsets fails on some combinations of browser/OS/email client. That nothing works on everything.
So I'll use the fact that it's intarnet app and it's Win 7, Chrome, Outlook 2007 setup and I'll convince admin to enable "Enable UTF-8 support for mailto: protocol." by registry key change on client company computers. - https://social.technet.microsoft.com/Forums/en-US/183b2442-6750-4e18-b61d-d87ce5f3aac3/outlook-2007-utf8-mailto-protocol-how-to-set-this-parameter-in-thousand-machines-?forum=officesetupdeploylegacy
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Protocols\Mailto
UTF8Encoding
The value “1” turn on the “Enable UTF-8 support for mailto: protocol”
The value “0” turn off
But this workround is viable only when you can modify settings of all client's computer.
So I was looking for posibilities that don't include that change. But I see that they arent.

HTML mailto subject and body display plus sign(+) instead of space in mail client

It's happening in
- Samsung Galaxy Note3 & 4
- Google Chrome browser V39.XX
I am using a link and when click it's launch the mail client
href="mailto:info#gmail.com?subject=Network%20issue"
result: subject=Network+issue
How to remove the plus(+) sign?
Your original approach should work... The only reason that comes to mind behind it not working is perhaps encodings are being mixed up along the way? Take a look at these threads to get a better idea of what I mean:
mailto special characters
Special characters in UTF8 mailto: subject= link and Outlook
I experimented with the base64 approach that is the answer in the second link but was unable to remedy the issue :-\ I tested this on Gmail, Inbox, and Mailbox - all with the same results as what you are describing above.
Maybe something is getting messed up at the Android layer in terms of how the link is being handed off to the mail client of your choice?
If you're rendering using Javascript, try encoding as a URL using encodeURIComponent
const urlEncoded = encodeURI(mailToLink);
before you pass the URL to the mailTo link. Otherwise encode it in the backend first.
This can be done with email addresses that contain special characters too.

Unicode characters in URLs

In 2010, would you serve URLs containing UTF-8 characters in a large web portal?
Unicode characters are forbidden as per the RFC on URLs (see here). They would have to be percent encoded to be standards compliant.
My main point, though, is serving the unencoded characters for the sole purpose of having nice-looking URLs, so percent encoding is out.
All major browsers seem to be parsing those URLs okay no matter what the RFC says. My general impression, though, is that it gets very shaky when leaving the domain of web browsers:
URLs getting copy+pasted into text files, E-Mails, even Web sites with a different encoding
HTTP Client libraries
Exotic browsers, RSS readers
Is my impression correct that trouble is to be expected here, and thus it's not a practical solution (yet) if you're serving a non-technical audience and it's important that all your links work properly even if quoted and passed on?
Is there some magic way of serving nice-looking URLs in HTML
http://www.example.com/düsseldorf?neighbourhood=Lörick
that can be copy+pasted with the special characters intact, but work correctly when re-used in older clients?
Use percent encoding. Modern browsers will take care of display & paste issues and make it human-readable. E. g. http://ko.wikipedia.org/wiki/위키백과:대문
Edit: when you copy such an url in Firefox, the clipboard will hold the percent-encoded form (which is usually a good thing), but if you copy only a part of it, it will remain unencoded.
What Tgr said. Background:
http://www.example.com/düsseldorf?neighbourhood=Lörick
That's not a URI. But it is an IRI.
You can't include an IRI in an HTML4 document; the type of attributes like href is defined as URI and not IRI. Some browsers will handle an IRI here anyway, but it's not really a good idea.
To encode an IRI into a URI, take the path and query parts, UTF-8-encode them then percent-encode the non-ASCII bytes:
http://www.example.com/d%C3%BCsseldorf?neighbourhood=L%C3%B6rick
If there are non-ASCII characters in the hostname part of the IRI, eg. http://例え.テスト/, they have be encoded using Punycode instead.
Now you have a URI. It's an ugly URI. But most browsers will hide that for you: copy and paste it into the address bar or follow it in a link and you'll see it displayed with the original Unicode characters. Wikipedia have been using this for years, eg.:
http://en.wikipedia.org/wiki/ɸ
The one browser whose behaviour is unpredictable and doesn't always display the pretty IRI version is...
...well, you know.
Depending on your URL scheme, you can make the UTF-8 encoded part "not important". For example, if you look at Stack Overflow URLs, they're of the following form:
http://stackoverflow.com/questions/2742852/unicode-characters-in-urls
However, the server doesn't actually care if you get the part after the identifier wrong, so this also works:
http://stackoverflow.com/questions/2742852/これは、これを日本語のテキストです
So if you had a layout like this, then you could potentially use UTF-8 in the part after the identifier and it wouldn't really matter if it got garbled. Of course this probably only works in somewhat specialised circumstances...
Not sure if it is a good idea, but as mentioned in other comments and as I interpret it, many Unicode chars are valid in HTML5 URLs.
E.g., href docs say http://www.w3.org/TR/html5/links.html#attr-hyperlink-href:
The href attribute on a and area elements must have a value that is a valid URL potentially surrounded by spaces.
Then the definition of "valid URL" points to http://url.spec.whatwg.org/, which defines URL code points as:
ASCII alphanumeric, "!", "$", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", ":", ";", "=", "?", "#", "_", "~", and code points in the ranges U+00A0 to U+D7FF, U+E000 to U+FDCF, U+FDF0 to U+FFFD, U+10000 to U+1FFFD, U+20000 to U+2FFFD, U+30000 to U+3FFFD, U+40000 to U+4FFFD, U+50000 to U+5FFFD, U+60000 to U+6FFFD, U+70000 to U+7FFFD, U+80000 to U+8FFFD, U+90000 to U+9FFFD, U+A0000 to U+AFFFD, U+B0000 to U+BFFFD, U+C0000 to U+CFFFD, U+D0000 to U+DFFFD, U+E1000 to U+EFFFD, U+F0000 to U+FFFFD, U+100000 to U+10FFFD.
The term "URL code points" is then used in a few parts of the parsing algorithm, e.g. for the relative path state:
If c is not a URL code point and not "%", parse error.
Also the validator http://validator.w3.org/ passes for URLs like "你好", and does not pass for URLs with characters like spaces "a b"
Related: Which characters make a URL invalid?
As all of these comments are true, you should note that as far as ICANN approved Arabic (Persian) and Chinese characters to be registered as Domain Name, all of the browser-making companies (Microsoft, Mozilla, Apple, etc.) have to support Unicode in URLs without any encoding, and those should be searchable by Google, etc.
So this issue will resolve ASAP.
For me this is the correct way, This just worked:
$linker = rawurldecode("$link");
<?php echo $linker ;?>
This worked, and now links are displayed properly:
http://newspaper.annahar.com/article/121638-معرض--جوزف-حرب-في-غاليري-جانين-ربيز-لوحاته-الجدية-تبحث-وتكتشف-وتفرض-الاحترام
Link found on:
http://www.galeriejaninerubeiz.com/newsite/news
Use percent-encoded form. Some (mainly old) computers running Windows XP for example do not support Unicode, but rather ISO encodings. That is the reason percent-encoded URLs were invented. Also, if you give a URL printed on paper to a user, containing characters that cannot be easily typed, that user may have a hard time typing it (or just ignore it). Percent-encoded form can even be used in many of the oldest machines that ever existed (although they don't support internet of course).
There is a downside though, as percent-encoded characters are longer than the original ones, thus possibly resulting in really long URLs. But just try to ignore it, or use a URL shortener (I would recommend goo.gl in this case, which makes a 13-character long URL). Also, if you don't want to register for a Google account, try bit.ly (bit.ly makes slightly longer URLs, with the length being 14 characters).

Displaying unicode symbols in HTML

I want to simply display the tick (✔) and cross (✘) symbols in a HTML page but it shows up as either a box or goop ✔ - obviously something to do with the encoding.
I have set the meta tag to show utf-8 but obviously I'm missing something.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Edit/Solution: From comments made, using FireBug I found the headers being passed by my page were in fact "Content-Type: text/html" and not UTF-8. Looking at the file format using Notepad++ showed my file was formatted as "UTF-8 without BOM". Changing this to just UTF-8 the symbols now show correctly... but firebug still seems to indicate the same content-type.
You should ensure the HTTP server headers are correct.
In particular, the header:
Content-Type: text/html; charset=utf-8
should be present.
The meta tag is ignored by browsers if the HTTP header is present.
Also ensure that your file is actually encoded as UTF-8 before serving it, check/try the following:
Ensure your editor save it as UTF-8.
Ensure your FTP or any file transfer program does not mess with the file.
Try with HTML encoded entities, like &#uuu;.
To be really sure, hexdump the file and look as the character, for the ✔, it should be E2 9C 94 .
Note: If you use an unicode character for which your system can't find a glyph (no font with that character), your browser should display a question mark or some block like symbol. But if you see multiple roman characters like you do, this denotes an encoding problem.
I know an answer has already been accepted, but wanted to point a few things out.
Setting the content-type and charset is obviously a good practice, doing it on the server is much better, because it ensures consistency across your application.
However, I would use UTF-8 only when the language of my application uses a lot of characters that are available only in the UTF-8 charset. If you want to show a unicode character or symbol in one of cases, you can do so without changing the charset of your page.
HTML renderers have always been able to display symbols which are not part of the encoding character set of the page, as long as you mention the symbol in its numeric character reference (NCR). Sounds weird but its true.
So, even if your html has a header that states it has an encoding of ansi or any of the iso charsets, you can display a check mark by using its html character reference, in decimal - ✓ or in hex - ✓
So its a little difficult to understand why you are facing this issue on your pages. Can you check if the NCR value is correct, this is a good reference http://www.fileformat.info/info/unicode/char/2713/index.htm
Make sure that you actually save the file as UTF-8, alternatively use HTML entities (&#nnn;) for the special characters.
Unlike proposed by Nicolas, the meta tag isn’t actually ignored by the browsers. However, the Content-Type HTTP header always has precedence over the presence of a meta tag in the document.
So make sure that you either send the correct encoding via the HTTP header, or don’t send this HTTP header at all (not recommended). The meta tag is mainly a fallback option for local documents which aren’t sent via HTTP traffic.
Using HTML entities should also be considered a workaround – that’s tiptoeing around the real problem. Configuring the web server properly prevents a lot of nuisance.
I think this is a file problem, you simple saved your file in 1-byte encoding like latin-1. Google up your editor and how to set files to utf-8.
I wonder why there are editors that don't default to utf-8.

mailto special characters

Is there a way to make the email client ( Outlook ) accept special characters coming from the mailto link in html? I'm trying to have a mailto link with german characters in the body, but in Outlook I get only strange characters.
Thanks
I just spent 2 days investigation this issue. Our issue was that mailto: links on our utf-8 encoded web pages did not work for Outlook users if the subject= string contained non-ascii characters, like e.g Norwegian characters. An example is:
"mailto:mail#coretrek.no?subject=julegløgg og fårikål"
From what I have learned so far, Outlook simply does not handle anything other than ASCII and iso-8859-1 characters. So when trying to click on the above mailto link (either from IE or Firefox), Outlook fails to decode the characters, leaving the subject broken and containing "weird" characters.
So the next step was to try to re-encode the pages in ISO-8859-1. What we did was to replace the original mailto link on the utf-8 page with a link to a "email-to-iso"-service, like this:
http://url.com/service.php?service=util.mailtoencode&mailto=mail%40coretrek.no%3Fsubject%3Demne+%C3%B8%C3%A6%C3%A5+emne
This page would convert the mailto characters to iso-8859-1 and then output the entire page content in iso-8859-1. A javascript on the page, containing "location.href='mailto:...'" was used to open the client's email client automatically.
So far everything seemed ok. This actually works in Internet Explorer, both with Thunderbird and Outlook (tested on IE7 on WinXP with Outlook express and TB 2).
BUT the problem now is actually Firefox. It seems like Firefox is unable to decode url-encoded urls containing characters found only in ISO-8859-1 but not in ASCII (like the norwegian å, represented by %E5 when encoded). The same å is handled correct if the page encoding is utf-8, but it seems like the Firefox developers have forgotten to test special characters together with the ISO-8859-1 charset.
The result is that Firefox passes an un-decoded string (still containing %E5 intstead of å) to the email client. And, amazingly, this is handled correct by Outlook (which manages to decode the string itself), but NOT by Thunderbird, which probably has the same bug as Firefox. If you DON't url encode the subject, the string is passed correctly to Thunderbird, but not to Outlook.
We have also been trying other encoding methods, like php's htmlentities, htmlspecialchars, base64 encoding etc, but all of them fails one way or the other.
So, summarized:
Pages encoded in utf-8:
IE fails always
FF -> Thunderbird: OK
FF -> Outlook: FAIL
Pages encoded in iso-8859-1:
IE: OK
FF -> Thunderbird: Fails if subject is url encoded, ok if not)
FF -> Outlook: Fails if subject is not url encoded, ok if encoded)
(this is Windows, on Ubuntu Linux FF and TB works OK always).
Hoping this was helpful for others having the same problem.
In PHP I think the function that works best with Outlook is rawurlencode()
I think using a urlencode method should do what you're looking for. JavaScript has .encodeURI() methods on string objects, and .NET has the HttpUtility.UrlEncode method.
What language are you using?
Actually, the solution is http://blogs.msdn.com/ie/archive/2007/02/12/International-Mailto-URIs-in-IE7.aspx and it is not nice.
Basically, in IE 7 and 8 the user must have enabled an advanced setting in Internet Options, something that 100% of the users will not know will not have enabled.
You need to enable UTF-8 support for the mailto: protocol
From the main outlook window, click Tools -> Options -> mail format -> international options -> "Enable UTF-8 support for mailto: protocol".
rawurlencode() function works best with outlook,
tested with Firefox, Chrome & IE
As yandr indicated, this issue is an ongoing problem with Outlook.
Microsoft has published documentation that states that properly configured Outlook 2003 and 2007 attached to a properly configured Exchange server will default to supporting Unicode, but that doesn't really help you with the general public.
For reference, the "standard" you want to refer to for this is RFC 2047.
The solution that I have implemented to get around this limitation (with Swedish, actually) is to use a web form instead of a mailto: link. It requires more setup on the server side, but gives you a lot more control over the contact process.
I'm sure this isn't what you wanted to hear, but until the world stops using broken software from Microsoft, we'll continue to need workarounds like this.
It sounds like you need the page containing the mailto link to be in the encoding that Outlook is expecting. Without knowing any more about the situation, I'd try encoding the page in UTF-8 and ISO-8859-1.
The relevant 'more about the situation' would be what weird characters appear and what the page's encoding is currently.
If one is using SharePoint 2010, it seems Microsoft has been aware of this issue, and has supplied some functions to solve this.
The following will properly escape the link to the current page
escapeProperly(escapeProperlyCoreCore($(location).attr('href'), false, false, true))
In JavaScript you can use encodeURIComponent function for subject and body. Then it will show all special characters in email.
const emailRequest = {
to: "abc#xyz.com",
cc: "abc#xyz.com",
subject: "Email Request - for <CompanyName>",
body: `Hi All, \r\n \r\n This is my company <CompanyName>
\r\n Thanks`,
};
const subject = encodeURIComponent(emailRequest.subject.replace("<CompanyName>", 'ABC & ** Company'));
const body = encodeURIComponent(emailRequest.body.replace("<CompanyName>", 'ABC & ** Company'));
window.location.href = (`mailto:${emailRequest.to}?cc=${emailRequest.cc}&subject=${subject}&body=${body}`);