I'm using ASP Classic/VBScript to send emails using CDO.Message object. It appears that the single quote or apostrophe character ’ (as opposed to the standard character ') shows up in the recipients email as: â?T
Where is the problem and what is the best way to resolve this? I actually tried running a replace to change all ’ to ' but it appears that didn't work.
I guess I'm really not even sure what the difference is between these two different characters, and why some sites, like Microsoft for example, use ’.
http://www.hanselman.com/blog/WhyTheAskObamaTweetWasGarbledOnScreenKnowYourUTF8UnicodeASCIIAndANSIDecodingMrPresident.aspx
all the info you could need on character encoding.
You need to set the correct character encoding on .BodyPart.Charset of your CDO.Message object.
Most likely you need to set it to "utf-8" as the default appears to be "us-ascii".
This indeed was a problem of character encoding. The solution was to put two lines of code in the web page that contains my form. I actually opted to add these lines of code to the top of my Global include file which I named inc_globals.asp. This file appears at the top of every page. Here's the code that fixed the problem:
Response.CodePage = 65001
Response.CharSet = "utf-8"
As a matter of documentation, here's a post that was helpful in solving this case:
http://groups.google.com/group/microsoft.public.inetserver.asp.general/browse_thread/thread/b79e6b95e24ef0fe/a25c643aaf12770d
Mails are written in HTML format. Have you tried using HTML Entities? For your apostrophe, it should be '.
In VB :
Replace mailBody, "'", "'"
Related
I am following a tutorial where a web application written in PHP, blacklists spaces from the input(The 'id' parameter). The task is to add other characters, which essentially bypasses this blacklist, but still gets interpreted by the MySQL database in the back end. What works is a URL constructed like so -
http://192.168.2.15/sqli-labs/Less-26/?id=1'%A0||%A0'1
Now, my question is simply that if '%A0' indicates an NBSP, then why is it that when I go to a site like http://www.url-encode-decode.com, and try to decode the URL http://192.168.2.15/sqli-labs/Less-26/?id=1'%A0||%A0'1, it gets decoded as http://192.168.2.15/sqli-labs/Less-26/?id=1'�||�'1.
Instead of the question mark inside a black box, I was expecting to see a blank space.
I suspect that this is due to differences between character encodings.
The value A0 represents nbsp in the ISO-8859-1 encoding (and probably in other extended-ASCII encodings too). The page at http://www.url-encode-decode.com appears to use the UTF-8 encoding.
Your problem is that there is no character represented by A0 in UTF-8. The equivalent nbsp character in UTF-8 would be represented by the value C2A0.
Decoding http://192.168.2.15/sqli-labs/Less-26/?id=1'%C2%A0||%C2%A0'1 will produce the nbsp characters that you expected.
Independently from why there is an encoding error, try %20 as a replacement for a whitespace!
Later on you can str_replace the whitespace with a
echo str_replace(" ", " ", $_GET["id"]);
Maybe the script on this site does not work properly. If you use it in your PHP code it should work properly.
echo urldecode( '%A0' );
outputs:
Other ASCII codes are doing the same thing.
Just to give you some background, these codes are part of the HTML that I'm reading from WordPress blog posts. I'm porting them over to BlogEngine.NET using a little C# WinForm app I wrote. Do I need to do some kind of conversion as I port them over to BlogEngine.NET (as XML files)?
It'd sure be nice if they just displayed properly without any intervention on my part.
Here's a code fragment from one of the WordPress source pages:
<link rel="alternate" type="application/rss+xml" title="INRIX® Traffic » Taking the “E” out of your “ETA” Comments Feed" href="http://www.inrixtraffic.com/blog/2012/taking-the-e-out-of-your-eta/feed/" />
Here's the corresponding chunk of XML that's in the XML file I output during the conversion:
<title>Taking the “E” out of your “ETA”</title>
UPDATE.
Tried this, but still no dice.
writer.WriteElementString("title", string.Format("<![CDATA[{0}]]>", post.Title));
...outputs this:
<title><![CDATA[Taking the “E” out of your “ETA”]]></title>
Since the data you are getting from Wordpress is already encoded you can decode it to a regular string and then let the XMLWriter encode it properly for XML.
string input = "Taking the “E” out of your “ETA”";
string decoded = System.Net.WebUtility.HtmlDecode(input);
//decoded = Taking the "E" out of your "ETA"
This may not be very efficient, but since this sounds like a one time conversion I don' think it will be an issue.
A similar question was asked here: How can I decode HTML characters in C#?
As I pointed out in my comment above: Your problem is that your Ü gets encoded into &8220;. When you output this in the browser it displays as Ü
I don't know how your porting works, but to fix this issue, you need to make sure that the & in the ASCII codes doesn't get encoded to &
Any chance CDATA tags solve the issue? Just make sure the text is correct in the source XML file. You don't need the ampersand magic (in the source) if you use CDATA tags.
<some_tag><![CDATA[Taking the “ out of your ...]]></some_tag>
Is the ampersand the only character that should be encoded in an HTML attribute?
It's well known that this won't pass validation:
Because the ampersand should be &. Here's a direct link to the validation fail.
This guy lists a bunch of characters that should be encoded, but he's wrong. If you encode the first "/" in http:// the href won't work.
In ASP.NET, is there a helper method already built to handle this? Stuff like Server.UrlEncode and HtmlEncode obviously don't work - those are for different purposes.
I can build my own simple extension method (like .ToAttributeView()) which does a simple string replace.
Other than standard URI encoding of the values, & is the only character related to HTML entities that you have to worry about simply because this is the character that begins every HTML entity. Take for example the following URL:
http://query.com/?q=foo<=bar>=baz
Even though there aren't trailing semi-colons, since < is the entity for < and > is the entity for >, some old browsers would translate this URL to:
http://query.com/?q=foo<=bar>=baz
So you need to specify & as & to prevent this from occurring for links within an HTML parsed document.
The purpose of escaping characters is so that they won't be processed as arguments. So you actually don't want to encode the entire url, just the values you are passing via the querystring. For example:
http://example.com/?parameter1=<ENCODED VALUE>¶meter2=<ENCODED VALUE>
The url you showed is actually a perfectly valid url that will pass validation. However, the browser will interpret the & symbols as a break between parameters in the querystring. So your querystring:
?q=whatever&lang=en
Will actually be translated by the recipient as two parameters:
q = "whatever"
lang = "en"
For your url to work you just need to ensure that your values are being encoded:
?q=<ENCODED VALUE>&lang=<ENCODED VALUE>
Edit: The common problems page from the W3C you linked to is talking about edge cases when urls are rendered in html and the & is followed by text that could be interpreted as an entity reference (© for example). Here is a test in jsfiddle showing the url:
http://jsfiddle.net/YjPHA/1/
In Chrome and FireFox the links works correctly, but IE renders © as ©, breaking the link. I have to admit I've never had a problem with this in the wild (it would only affect those entity references which don't require a semicolon, which is a pretty small subset).
To ensure you're safe from this bug you can HTML encode any of your URLS you render to the page and you should be fine. If you're using ASP.NET the HttpUtility.HtmlEncode method should work just fine.
You do not need HTML escapement here:
According to the HTML5 spec:
http://www.w3.org/TR/html5/tokenization.html#character-reference-in-attribute-value-state
&lang= should be parsed as non-recognized character reference and value of the attribute should be used as it is: http://domain.com/search?q=whatever&lang=en
For the reference: added question to HTML5 WG: http://lists.w3.org/Archives/Public/public-html/2011Sep/0163.html
In HTML attribute values, if you want ", '&' and a non-breaking space as a result, you should (as an author who is clear about intent) have ", & and in the markup.
For " though, you don't have to use " if you use single quotes to encase your attribute values.
For HTML text nodes, in addition to the above, if you want < and > as a result, you should use < and >. (I'd even use these in attribute values too.)
For hfnames and hfvalues (and directory names in the path) for URIs, I'd used Javascript's encodeURIComponent() (on a utf-8 page when encoding for use on a utf-8 page).
If I understand the question correctly, I believe this is what you want.
I've got a legacy app just starting to misbehave, for whatever reason I'm not sure. It generates a bunch of HTML that gets turned into PDF reports by ActivePDF.
The process works like this:
Pull an HTML template from a DB with tokens in it to be replaced (e.g. "~CompanyName~", "~CustomerName~", etc.)
Replace the tokens with real data
Tidy the HTML with a simple regex function that property formats HTML tag attribute values (ensures quotation marks, etc, since ActivePDF's rendering engine hates anything but single quotes around attribute values)
Send off the HTML to a web service that creates the PDF.
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character when viewing the document in a browser (FireFox). ActivePDF pukes on these non-UTF8 characters.
My question: since I don't know where the problem stems from and don't have time to investigate it, is there an easy way to re-encode or find-and-replace the bad characters? I've tried sending it through this little function I threw together, but it turns it all into gobbledegook doesn't change anything.
Private Shared Function ConvertToUTF8(ByVal html As String) As String
Dim isoEncoding As Encoding = Encoding.GetEncoding("iso-8859-1")
Dim source As Byte() = isoEncoding.GetBytes(html)
Return Encoding.UTF8.GetString(Encoding.Convert(isoEncoding, Encoding.UTF8, source))
End Function
Any ideas?
EDIT:
I'm getting by with this for now, though it hardly seems like a good solution:
Private Shared Function ReplaceNonASCIIChars(ByVal html As String) As String
Return Regex.Replace(html, "[^\u0000-\u007F]", " ")
End Function
Somewhere in that mess, the non-breaking spaces from the HTML template (the s) are encoding as ISO-8859-1 so that they show up incorrectly as an "Â" character
That'd be encoding to UTF-8 then, not ISO-8859-1. The non-breaking space character is byte 0xA0 in ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as "Â ". That includes a trailing nbsp which you might not be noticing; if that byte isn't there, then something else has mauled your document and we need to see further up to find out what.
What's the regexp, how does the templating work? There would seem to be a proper HTML parser involved somewhere if your strings are (correctly) being turned into U+00A0 NON-BREAKING SPACE characters. If so, you could just process your template natively in the DOM, and ask it to serialise using the ASCII encoding to keep non-ASCII characters as character references. That would also stop you having to do regex post-processing on the HTML itself, which is always a highly dodgy business.
Well anyway, for now you can add one of the following to your document's <head> and see if that makes it look right in the browser:
for HTML4: <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
for HTML5: <meta charset="utf-8">
If you've done that, then any remaining problem is ActivePDF's fault.
If any one had the same problem as me and the charset was already correct, simply do this:
Copy all the code inside the .html file.
Open notepad (or any basic text editor) and paste the code.
Go "File -> Save As"
Enter you file name "example.html" (Select "Save as type: All Files (.)")
Select Encoding as UTF-8
Hit Save and you can now delete your old .html file and the encoding should be fixed
Problem:
Even I was facing the problem where we were sending '£' with some string in POST request to CRM System, but when we were doing the GET call from CRM , it was returning '£' with some string content. So what we have analysed is that '£' was getting converted to '£'.
Analysis:
The glitch which we have found after doing research is that in POST call we have set HttpWebRequest ContentType as "text/xml" while in GET Call it was "text/xml; charset:utf-8".
Solution:
So as the part of solution we have included the charset:utf-8 in POST request and it works.
In my case this (a with caret) occurred in code I generated from visual studio using my own tool for generating code. It was easy to solve:
Select single spaces ( ) in the document. You should be able to see lots of single spaces that are looking different from the other single spaces, they are not selected. Select these other single spaces - they are the ones responsible for the unwanted characters in the browser. Go to Find and Replace with single space ( ). Done.
PS: It's easier to see all similar characters when you place the cursor on one or if you select it in VS2017+; I hope other IDEs may have similar features
In my case I was getting latin cross sign instead of nbsp, even that a page was correctly encoded into the UTF-8. Nothing of above helped in resolving the issue and I tried all.
In the end changing font for IE (with browser specific css) helped, I was using Helvetica-Nue as a body font changing to the Arial resolved the issue .
I was having the same sort of problem. Apparently it's simply because PHP doesn't recognise utf-8.
I was tearing my hair out at first when a '£' sign kept showing up as '£', despite it appearing ok in DreamWeaver. Eventually I remembered I had been having problems with links relative to the index file, when the pages, if viewed directly would work with slideshows, but not when used with an include (but that's beside the point. Anyway I wondered if this might be a similar problem, so instead of putting into the page that I was having problems with, I simply put it into the index.php file - problem fixed throughout.
The reason for this is PHP doesn't recognise utf-8.
Here you can check it for all Special Characters in HTML
http://www.degraeve.com/reference/specialcharacters.php
Well I got this Issue too in my few websites and all i need to do is customize the content fetler for HTML entites. before that more i delete them more i got, so just change you html fiter or parsing function for the page and it worked. Its mainly due to HTML editors in most of CMSs. the way they store parse the data caused this issue (In My case). May this would Help in your case too
I have a backup server that automatically backs up my live site, both files and database.
On the live site, the text looks fine, but when you view the mirrored version of it, it displays '?' within some of the text. This text is stored within the news database table.
Here is a screenshot of it being on the live server and of it on the mirrored server.
What could happen within the process of backing it up to the mirrored server?
The live server is Solaris, and the mirrored server is Linux Red Hat Linux 5.
The following articles will be useful:
10.3 Specifying Character Sets and Collations
10.4 Connection Character Sets and Collations
After you connect to the database, issue the following command:
SET NAMES 'utf8';
Ensure that your web page also uses the UTF-8 encoding:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
PHP also offers several functions that will be useful for conversions:
iconv
mb_convert_encoding
Edit your Apache configuration file on the "mirror" server (the server with the problem), and comment-out the following line:
AddDefaultCharset UTF-8
Then restart Apache:
service httpd restart
The problem is that the "AddDefaultCharset UTF-8" line overrides the Content-Type specified in the .html files; e.g.:
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
The most common symptom is that character codes above 127 display as black diamonds with question marks on them (in Chrome, Safari or Firefox), or as little boxes (in Internet Explorer and Opera).
HTML files generated by Microsoft Word usually have many such characters, the most common one being character code 160 = 0xA0, which is equivalent to " " in the Windows-1252 encoding, and is often found between span tags, like this:
<span style="mso-spacerun: yes">ááá </span>
I got here looking for a solution for JavaScript displayed in the browser and although not directly related with a database...
In my case I copied and pasted some text I found on the Internet into a JavaScript file and saved it with Windows Notepad.
When the page that uses that JavaScript file output the strings, there were question marks (like the ones shown in the question) instead of the special characters like accented letters, etc.
I opened the file using Notepad++. Right after opening the file I saw that the character encoding was set as ANSI as you can see (mouse cursor on footer) in the following screenshot:
To solve the issue, click the Encoding menu in Notepad++ and select Encode in UTF-8. You should be good to go. :)
This is going to be something to do with character encodings.
Are you sure the mirrored site has the same properties with regards to character encodings as your main server?
Depending on what sort of server you have, this may be a property of the server process itself, or it could be an environment variable.
For example, if this is a UNIX environment, perhaps try comparing LANG or LC_ALL?
See also here
Unicode or other character set characters falling through?
I have seen similar "strange" characters show up on sites I have worked on often when the text is copied from an email or some other document format (e.g. word) into a text editor. The editor can display the non ASCII characters but the browser can't. For the website, I would suggest looking up the HTML entity code for the character and inserting that instead ... or switch to more standard ones.
Your browser hasn't interpreted the encoding of the page correctly (either because you've forced it to a particular setting, or the page is set incorrectly), and thus cannot display some of the characters.
Check the character set being emitted by your mirrored server. There appears to be a difference from that to the main server -- the live site appears to be outputting Unicode, where the mirror is not. Also, it's usually a good idea to scrub Unicode characters in your incoming content and replace them with their appropriate HTML entities.
Your specific issue regards "smart quotes," "em dashes" and "en dashes." I know you can replace em dashes with — and n-dashes with – (which should be done on the input side of your database); I don't know what the correct replacement for the smart quotes would be. (I usually just replace all curly single quotes with ' and all curly double quotes with " ... Typography geeks may feel free to shoot me on sight.)
I should note that some browsers are more forgiving than others with this issue -- Internet Explorer on Windows tends to auto-magically detect and "fix" this; Firefox and most other browsers display the question marks.
I had this issue so I just took all my content, copy/pasted it into Notepad, made a new PHP file, pasted back in, re-saved and overwrote, and.. that worked!
It really was some relic of Microsoft Word editing...
I usually curse MS Word and then run the following Windows Script Host script.
// Replace with path to a file that needs cleaning
PATH = "test.html"
var go = WScript.CreateObject("Scripting.FileSystemObject");
var content = go.GetFile(PATH).OpenAsTextStream().ReadAll();
var out = go.CreateTextFile("clean-"+PATH, true);
// Symbols
content = content.replace(/“/g, '"');
content = content.replace(/”/g, '"');
content = content.replace(/’/g, "'");
content = content.replace(/–/g, "-");
content = content.replace(/©/g, "©");
content = content.replace(/®/g, "®");
content = content.replace(/°/g, "°");
content = content.replace(/¶/g, "<p>");
content = content.replace(/¿/g, "¿");
content = content.replace(/¡/g, '¡');
content = content.replace(/¢/g, '¢');
content = content.replace(/£/g, '£');
content = content.replace(/¥/g, '¥');
out.Write(content);