Garbled text in mail subject using mailto: under ASP.NET - html

This is my first question on StackOverflow so if I'm doing something wrong when asking this question I welcome any pointers as to how I should've posed it instead, or any further information needed.
I have developed a small ASP.NET/C# site which generates mailto-links with a preset subject/body. However for some reason my mailto:-links end up garbling non-standard letters (e.g. ååö) when opened in Outlook 2003. In Outlook 2010 it seems to work.
Sample code (apologize the Swedish):
<a href='mailto:" + emails + "?subject=Inflödning till " + language +
" för jobb nr " + projectID + " är klar. Tidsåtgång: " + time + "'>
Skicka mail till PL?</a>
(note that this happens on static links without C# variables as well)
Garbled text sample from Outlook 2003 mail window:
Inflödning till en för jobb nr 111111 är klar. Tidsåtgång: 1
I have specified UTF-8 encoding in the Web.config but I'm assuming this isn't the problem. I probably have to specify the encoding in the subject itself, but I am not sure about how to do that.
Edit: It would seem Outlook 2003 has a tough time handling UTF-8 mailto support. See for example this question. Outlook 2010 has an explicit "UTF-8 support for mailto protocol" switch under options. 2003 is missing this. Any ideas on how to get around this? UrlEncoding() doesn't seem to help.

Make sure you have the character encoding set
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Should appear at the top of your page - with charset= set to whatever character set Swedish use (Google has lead me to believe ISO-8859-1)
Take a look here which seems to use a javascript function to correctly escape the characters. I believe HttpUtility.UrlEncode(String) within the code behind will also have the same effect.
Does outlook perhaps have a different encoding specified?
EDIT:
Found this here
In versions of Outlook prior to the
2007 version, Outlook would assume the
system codepage had been used to
encode the URI. This means that this
scenario would only work with older
versions of Outlook, if the document
you’re viewing has the same character
encoding as your current system
codepage.
This would seem to point to the problem that the system code page is NOT using Swedish (ISO-8859-1) so the remedy to the situation appears to be:
Upgrade Outlook
Change System Encoding to ISO-8859-1 (the client thats running Outlook)

Related

Which charset should I use in mailto body for maximum compatibilty?

I have UTF-8 page in which I have mailto links with template of body. But these template with data includes East Europian (Czech) characters. These characters messes up some email clients like Outlook 2007 and they are displayed like question marks or some other strange characters.
I know about "Enable UTF-8 support for mailto: protocol." setting in Outlook 2007, but from what I know about it's off by default.
Which charset-encoding should I use in mailto body for maximum compatibilty?
it clearly depends on client settings. in the company i work for we had a similar question. we ended up using ae for ä and so on.
we faced the problem that some clients running ms outlook 2003 were reading it as utf-8 while other clients running outlook 2003 were reading it as a defined ansi table.
another issue is involved in the usage of other mail clients. we also have a bring your own device program where employees could install linux or have mac books. they are using mozilla thunderbird, mutt and so on. other clients have set gmail as their prefered mailto: handler etc. so a global change of the outlook settings through a registry edit as in Rfilip's answer would not apply to us.
anyways.. you can only define the text body of the mail. sadly its impossible to define a html body through a mailto: link. if this would work, encoding could be defined through the body.
there also is no such a flag to set up the prefered encoding inside the mailto: link.
answer to this question: its impossible to detect or set the mail clients encoding so you should switch to characters that would work in every ansi table.
as alternative you could provide two links. one encoded with utf-8 with a note that, if it will display weird, the user might use the second mailto link.
or.. you provide one mailto link aswell as a drop down box to select encoding. (including a brief description of what encoding is as hover ontop of the dropdown)
It depends on the characters you want to use and your clients setups...
it may be Windows1250 ( http://en.wikipedia.org/wiki/Windows-1250 )
So, on G+ I was pointed to answer on other thread -https://stackoverflow.com/a/1831416/1190066. It seems like that different charsets fails on some combinations of browser/OS/email client. That nothing works on everything.
So I'll use the fact that it's intarnet app and it's Win 7, Chrome, Outlook 2007 setup and I'll convince admin to enable "Enable UTF-8 support for mailto: protocol." by registry key change on client company computers. - https://social.technet.microsoft.com/Forums/en-US/183b2442-6750-4e18-b61d-d87ce5f3aac3/outlook-2007-utf8-mailto-protocol-how-to-set-this-parameter-in-thousand-machines-?forum=officesetupdeploylegacy
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Protocols\Mailto
UTF8Encoding
The value “1” turn on the “Enable UTF-8 support for mailto: protocol”
The value “0” turn off
But this workround is viable only when you can modify settings of all client's computer.
So I was looking for posibilities that don't include that change. But I see that they arent.

umlaut not working on some browsers

Hi Guys This is a rather frustrating problem...
I have a german client and the site is mostly in german now on some machines in the office all the special characters (umlauts n such) display correctly in chrome firefox and ie but on the clients pc and my own the characters are displayed like this
Interessierte Paare, die an diesem spektakulären Ort in der südlichsten Hochzeitskapelle Afrikas heiraten möchten, wenden sich bitte an
here is my code in
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Now from what i have found online while researching this matter is that utf-8 is supposed to handle all characters? obviously i am missing something here?
i find it rather strange that it doesnt work on the same version of browser on two different machines?
if you would like to see the site and see for yourself its capeagulhas-arthouse.com - not sure if i can have links here but there it is
Please any help is much appreciated
EDIT: forgot to mention this is a joomla site
Tx
Ant
I see that you are using joomla, joomla uses iso-8859-1 encoding by default.
Go to your language file and look for
DEFINE('_ISO','charset=iso-8859-1');
Change it to
DEFINE('_ISO','charset=utf-8');
An incorrect code conversion has been applied. From the given data, I cannot deduce where it has happened and how it can be prevented. But what clearly has happened is that UTF-8 encoded text has been incorrectly transformed so that each byte of in UTF-8 data is treated as if it were a windows-1252 encoded form of a character, and the result is then presented as UTF-8.
It is such an incorred “double UTF-8” encoding that turns e.g. “ä” to “ä”. The letter “ä” (U+00E4) is the two byes C3 A4 (hex.) in UTF-8. Now if you mistakenly interpret these as windows-1252, you get U+00C3 U+00A­4, i.e. “ä”.

Characters not displaying correctly in different browsers

I used certain characters in website such as • — “ ” ‘ ’ º ©.
I found that when testing to see what my website looked like under different browsers (BrowserLab)
the afore-mentioned characters are replaced with �.
I then changed the charset in the webpage header from:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
to
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Suddenly all the pages have the above mentioned characters replaced with a ?.
Even more puzzling is this is not always consistent across and even within the same page, as some sections display the character • and © correctly.
In particular, I need to replace the character • with one that will display across browsers, can anyone help me with the answer? Thanks.
You should save your HTML source as UTF8.
Alternatively, you can use HTML entities instead.
The source code needs to be saved in the same encoding as you're instructing the browser to parse it in. If you're saving your files in UTF-8, instruct the browser to parse it as UTF-8 by setting an appropriate HTTP header or HTML meta tag (headers preferable, your web server may be setting one without you knowing). Use a decent editor that clearly tells you what encoding you're saving the file as. If it doesn't display correctly, there's a discrepancy between what you're telling your browser the file is encoded in and what it's really encoded in.
Check to see if Apache is setup to send the charset. Look for the directive "AddDefaultCharset" and set it to Off in .htaccess or your config file.
Most/all browsers will take what is sent in the HTTP headers over what is in the document.
If you're using Notepad++, I suggest You to use Edit Plus editor to copy the text (which has the special characters) and paste it in your file. This should work.
Yes I had this problem too in notepad++ copy and pasting wasn't working with some symbols
I think SLaks is right
HTML entities for copyright symbol &#169

mailto special characters

Is there a way to make the email client ( Outlook ) accept special characters coming from the mailto link in html? I'm trying to have a mailto link with german characters in the body, but in Outlook I get only strange characters.
Thanks
I just spent 2 days investigation this issue. Our issue was that mailto: links on our utf-8 encoded web pages did not work for Outlook users if the subject= string contained non-ascii characters, like e.g Norwegian characters. An example is:
"mailto:mail#coretrek.no?subject=julegløgg og fårikål"
From what I have learned so far, Outlook simply does not handle anything other than ASCII and iso-8859-1 characters. So when trying to click on the above mailto link (either from IE or Firefox), Outlook fails to decode the characters, leaving the subject broken and containing "weird" characters.
So the next step was to try to re-encode the pages in ISO-8859-1. What we did was to replace the original mailto link on the utf-8 page with a link to a "email-to-iso"-service, like this:
http://url.com/service.php?service=util.mailtoencode&mailto=mail%40coretrek.no%3Fsubject%3Demne+%C3%B8%C3%A6%C3%A5+emne
This page would convert the mailto characters to iso-8859-1 and then output the entire page content in iso-8859-1. A javascript on the page, containing "location.href='mailto:...'" was used to open the client's email client automatically.
So far everything seemed ok. This actually works in Internet Explorer, both with Thunderbird and Outlook (tested on IE7 on WinXP with Outlook express and TB 2).
BUT the problem now is actually Firefox. It seems like Firefox is unable to decode url-encoded urls containing characters found only in ISO-8859-1 but not in ASCII (like the norwegian å, represented by %E5 when encoded). The same å is handled correct if the page encoding is utf-8, but it seems like the Firefox developers have forgotten to test special characters together with the ISO-8859-1 charset.
The result is that Firefox passes an un-decoded string (still containing %E5 intstead of å) to the email client. And, amazingly, this is handled correct by Outlook (which manages to decode the string itself), but NOT by Thunderbird, which probably has the same bug as Firefox. If you DON't url encode the subject, the string is passed correctly to Thunderbird, but not to Outlook.
We have also been trying other encoding methods, like php's htmlentities, htmlspecialchars, base64 encoding etc, but all of them fails one way or the other.
So, summarized:
Pages encoded in utf-8:
IE fails always
FF -> Thunderbird: OK
FF -> Outlook: FAIL
Pages encoded in iso-8859-1:
IE: OK
FF -> Thunderbird: Fails if subject is url encoded, ok if not)
FF -> Outlook: Fails if subject is not url encoded, ok if encoded)
(this is Windows, on Ubuntu Linux FF and TB works OK always).
Hoping this was helpful for others having the same problem.
In PHP I think the function that works best with Outlook is rawurlencode()
I think using a urlencode method should do what you're looking for. JavaScript has .encodeURI() methods on string objects, and .NET has the HttpUtility.UrlEncode method.
What language are you using?
Actually, the solution is http://blogs.msdn.com/ie/archive/2007/02/12/International-Mailto-URIs-in-IE7.aspx and it is not nice.
Basically, in IE 7 and 8 the user must have enabled an advanced setting in Internet Options, something that 100% of the users will not know will not have enabled.
You need to enable UTF-8 support for the mailto: protocol
From the main outlook window, click Tools -> Options -> mail format -> international options -> "Enable UTF-8 support for mailto: protocol".
rawurlencode() function works best with outlook,
tested with Firefox, Chrome & IE
As yandr indicated, this issue is an ongoing problem with Outlook.
Microsoft has published documentation that states that properly configured Outlook 2003 and 2007 attached to a properly configured Exchange server will default to supporting Unicode, but that doesn't really help you with the general public.
For reference, the "standard" you want to refer to for this is RFC 2047.
The solution that I have implemented to get around this limitation (with Swedish, actually) is to use a web form instead of a mailto: link. It requires more setup on the server side, but gives you a lot more control over the contact process.
I'm sure this isn't what you wanted to hear, but until the world stops using broken software from Microsoft, we'll continue to need workarounds like this.
It sounds like you need the page containing the mailto link to be in the encoding that Outlook is expecting. Without knowing any more about the situation, I'd try encoding the page in UTF-8 and ISO-8859-1.
The relevant 'more about the situation' would be what weird characters appear and what the page's encoding is currently.
If one is using SharePoint 2010, it seems Microsoft has been aware of this issue, and has supplied some functions to solve this.
The following will properly escape the link to the current page
escapeProperly(escapeProperlyCoreCore($(location).attr('href'), false, false, true))
In JavaScript you can use encodeURIComponent function for subject and body. Then it will show all special characters in email.
const emailRequest = {
to: "abc#xyz.com",
cc: "abc#xyz.com",
subject: "Email Request - for <CompanyName>",
body: `Hi All, \r\n \r\n This is my company <CompanyName>
\r\n Thanks`,
};
const subject = encodeURIComponent(emailRequest.subject.replace("<CompanyName>", 'ABC & ** Company'));
const body = encodeURIComponent(emailRequest.body.replace("<CompanyName>", 'ABC & ** Company'));
window.location.href = (`mailto:${emailRequest.to}?cc=${emailRequest.cc}&subject=${subject}&body=${body}`);

Question mark characters display within text. Why is this?

I have a backup server that automatically backs up my live site, both files and database.
On the live site, the text looks fine, but when you view the mirrored version of it, it displays '?' within some of the text. This text is stored within the news database table.
Here is a screenshot of it being on the live server and of it on the mirrored server.
What could happen within the process of backing it up to the mirrored server?
The live server is Solaris, and the mirrored server is Linux Red Hat Linux 5.
The following articles will be useful:
10.3 Specifying Character Sets and Collations
10.4 Connection Character Sets and Collations
After you connect to the database, issue the following command:
SET NAMES 'utf8';
Ensure that your web page also uses the UTF-8 encoding:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
PHP also offers several functions that will be useful for conversions:
iconv
mb_convert_encoding
Edit your Apache configuration file on the "mirror" server (the server with the problem), and comment-out the following line:
AddDefaultCharset UTF-8
Then restart Apache:
service httpd restart
The problem is that the "AddDefaultCharset UTF-8" line overrides the Content-Type specified in the .html files; e.g.:
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
The most common symptom is that character codes above 127 display as black diamonds with question marks on them (in Chrome, Safari or Firefox), or as little boxes (in Internet Explorer and Opera).
HTML files generated by Microsoft Word usually have many such characters, the most common one being character code 160 = 0xA0, which is equivalent to " " in the Windows-1252 encoding, and is often found between span tags, like this:
<span style="mso-spacerun: yes">ááá </span>
I got here looking for a solution for JavaScript displayed in the browser and although not directly related with a database...
In my case I copied and pasted some text I found on the Internet into a JavaScript file and saved it with Windows Notepad.
When the page that uses that JavaScript file output the strings, there were question marks (like the ones shown in the question) instead of the special characters like accented letters, etc.
I opened the file using Notepad++. Right after opening the file I saw that the character encoding was set as ANSI as you can see (mouse cursor on footer) in the following screenshot:
To solve the issue, click the Encoding menu in Notepad++ and select Encode in UTF-8. You should be good to go. :)
This is going to be something to do with character encodings.
Are you sure the mirrored site has the same properties with regards to character encodings as your main server?
Depending on what sort of server you have, this may be a property of the server process itself, or it could be an environment variable.
For example, if this is a UNIX environment, perhaps try comparing LANG or LC_ALL?
See also here
Unicode or other character set characters falling through?
I have seen similar "strange" characters show up on sites I have worked on often when the text is copied from an email or some other document format (e.g. word) into a text editor. The editor can display the non ASCII characters but the browser can't. For the website, I would suggest looking up the HTML entity code for the character and inserting that instead ... or switch to more standard ones.
Your browser hasn't interpreted the encoding of the page correctly (either because you've forced it to a particular setting, or the page is set incorrectly), and thus cannot display some of the characters.
Check the character set being emitted by your mirrored server. There appears to be a difference from that to the main server -- the live site appears to be outputting Unicode, where the mirror is not. Also, it's usually a good idea to scrub Unicode characters in your incoming content and replace them with their appropriate HTML entities.
Your specific issue regards "smart quotes," "em dashes" and "en dashes." I know you can replace em dashes with — and n-dashes with – (which should be done on the input side of your database); I don't know what the correct replacement for the smart quotes would be. (I usually just replace all curly single quotes with ' and all curly double quotes with " ... Typography geeks may feel free to shoot me on sight.)
I should note that some browsers are more forgiving than others with this issue -- Internet Explorer on Windows tends to auto-magically detect and "fix" this; Firefox and most other browsers display the question marks.
I had this issue so I just took all my content, copy/pasted it into Notepad, made a new PHP file, pasted back in, re-saved and overwrote, and.. that worked!
It really was some relic of Microsoft Word editing...
I usually curse MS Word and then run the following Windows Script Host script.
// Replace with path to a file that needs cleaning
PATH = "test.html"
var go = WScript.CreateObject("Scripting.FileSystemObject");
var content = go.GetFile(PATH).OpenAsTextStream().ReadAll();
var out = go.CreateTextFile("clean-"+PATH, true);
// Symbols
content = content.replace(/“/g, '"');
content = content.replace(/”/g, '"');
content = content.replace(/’/g, "'");
content = content.replace(/–/g, "-");
content = content.replace(/©/g, "©");
content = content.replace(/®/g, "®");
content = content.replace(/°/g, "°");
content = content.replace(/¶/g, "<p>");
content = content.replace(/¿/g, "¿");
content = content.replace(/¡/g, '¡');
content = content.replace(/¢/g, '¢');
content = content.replace(/£/g, '£');
content = content.replace(/¥/g, '¥');
out.Write(content);