Database content in UTF-8 but show in ISO-8859-1 webpage - sql-server-2008

Formerly my webpage used the charset UTF-8, and it inserted a lot of content to my SQL Server 2008 database with this charset.
Now my webpage is using charset ISO-8859-1. But it is still using the same content from the database. Now my problem is, that the content in the database are with the old charset.
Is there a way to convert everything in the database from one charset to another? One for all, or with the connection string?

Well first of all you are probably using a NVARCHAR or NTEXT field in your database already. Hence the content of field is encoded as Unicode.
It would be nice to assume that your original posting form posted using UTF-8 encoding and your receiving page had its Response.Codepage set to 65001 so that the incoming string is stored in the database with fidelity.
If the foregoing is true then to send the content to the client using a new charset it would be a simple matter of setting the page codepage correctly, for ISO-8869-1 we use the codepage 1252. With the codepage set to 1252 any data sent using Response.Write will be converted from the native Unicode to the 1252 codepage.
However, it is also quite possible for you to have got by with corrupt data being stored in the DB but it all looking fine in HTML. See my answer here to an older question for detail on how that might be. That same answer contains the steps need to repair the data in the DB. After that setting the output codepage should be sufficient.
Note that the ASP file itself should be saved as Windows-1252 and not UTF-8 otherwise any none ASCII static content in the file will be accepted incorrectly by the client.

Related

UTF Hebrew Letter Encoding issue with ASP.Net MVC on Mono

I feel I'm a bit in over my head on this one. I have developed an ASP.Net MVC website for a friend that allows them to paste in Hebrew words and it does some conversion/translation. I am using MySQL as a data backend with ASP.Net MVC 5.
The website is fairly simple. The database consists of two tables which store letters, and translations. I am using MySQL EF6 for data access layer. There are basically three screens on the website, one for managing each table, and one for doing the translations.
When I run it in my development environment (VS 2017/Windows 10), everything works as expected. I can edit data using the Hebrew Unicode characters and they save properly to the database. Here is an example:
When I click Save, I expect those values to be saved to the database, and they work fine. However, I have recently converted the website to run on a Mono/Ubuntu environment for hosting. I got the environment setup using mod_mono and Apache2. Everything is working perfectly, except when I save a page like this, the Hebrew character א gets converted into a question mark (?):
Here's what I've determined so far.
I know Apache/MySQL is setup properly to handle these values, because the data displays fine. It only gets messed up when I save it.
I am also running PhpMyAdmin on the same server, and when I modify that same row through the table editor, it does not mess up the encoding.
I've tried adding the Default Encoding utf-8 to the Apache configuration with no luck.
I've tried adding globalization with default encodings of utf-8 to web.config and it didn't help.
How do I troubleshoot where the value is getting messed up? Is there a simple solution I need to apply to fix this?
Thanks!
The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
Also, check that the connection during reading is UTF-8.
HTML forms should start like <form accept-charset="UTF-8">.
For more discussion, see Trouble with utf8 characters; what I see is not what I stored
If that is not enough to solve your problem, find the HEX, as discussed in "Test the data" in that link; then ask for more help.

Changing character encoding in MySQL?

In a web application I'm working on all the content seems fine, except for the content which seems to be retrieved from the database. Some special characters are used, and they break - so it's deffinatly a character encoding error somewhere.
When I manually in the browser try to switch from iso-8859-1 to utf-8 the database-content looks fine, but the static is messed up. And vice versa. So I suspect that the static content is iso-8859-1, and the content from the database is utf-8.
I've looked around for some configuration files which states the charset, but when I try changing it, nothing happen.
Will converting my database content to iso-8859-1 help maybe, so it correlates with the static content? In that case, how? I've tried changing the schema and collation but the effect was seamless.
Edit: My apologies. This is an MySQL database.
Maybe it is an issue with your client library that communicates to MySQL.
Before you try to retrieve data from the database, execute this query:
SET NAMES 'utf8'
reference:
http://forums.mysql.com/read.php?103,46870,47245#msg-47245
Hope this helps.

Site showing lots of РуÑÑкий] Characters

I have this joomla site. I have two versions of it, both using the same database. One has these characters allover the text. The other seems to be alright. What could be the problem? My database is the same so I don't believe it is a character set issue on the db. Anyone?
Something is wrong with the encoding. You're not saying where you get this, so my guess is that it's the browser. Try to change the page encoding (check the menus of your browser) to UTF-8 (or "Unicode"). That should fix it.
If that works, check the code and make sure pages are sent with correct charset settings.
Another source of these problems is that the code reads UTF-8 data from the database as bytes and then encodes that again as UTF-8.

PHP ENV_ variables: UTF-8 encoding of £ (Pound)

I'm having issues with the encoding of the the '£' character. (Issue: When POSTing '£' from a form field and doing an insert, nothing is inserted in the MySQL table). I've checked everything wrt to UTF-8 support on my PHP code headers, server, collation/char set on MySQL etc.
I'm using MAMP as my dev environment (PHP 5.3.5).
Everything works fine on my production server (commercial host) (PHP 5.2.6) so I've ruled out any issues with my code
However, I think I have tracked down the culprit: When comparing both environments, this line is missing from my dev server:
_ENV["HTTP_ACCEPT_CHARSET"] ISO-8859-1,utf-8;q=0.7,*;q=0.3
However, there is nothing in php.ini I can see to change it. Any ideas, or am I barking up the wrong tree?
Cheers
Roland
I'd write a simple test to check out where things are going wrong.
echo() out the value from $_POST in PHP and verify whether the browser is sending the data correctly and that it's being parsed into PHP correctly. WHen you do this test make sure that the browser has correctly detected the character set.
If that works then you're likely to have mis-configured something with the database. If you have both the table collation and the connection encoding as "UTF-8" then you should have no problems saving the data into MySQL (if both SET NAMES and the table collation are the same no translation will go on so it'll be stored correctly in the table).
You didn't mention the MySQL connection anywhere in your question (just "etc"). Just in case there's anything you've missed have a good look over this article How to Avoid Character Encoding Problems in PHP

What can I do to fix an encoding problem after switching from SQLite to MySQL?

I recently deployed my application. For development I used SQLite and everything was right so far. I have a controller which uses Nokogiri to populate data into my database.
The problem is on production I'm using MySQL instead of SQLite and now my script is populating the data with the wrong encoding.
For instance, it writes "Aragón" instead of "Aragón". The MySQL is using utf8 for both the database and every table.
Nokogiri is probably returning things correctly. I suspect you have a mismatch in the character set of the content you are parsing with Nokogiri, and the database.
Your data being parsed might be ISO-8859-1 or WIN-1252, which are the most common on the internet. You'll need to look in the data to see what it is declared as. Also look at the source for the word "Aragón" and see whether it has embedded upper-bit characters, or entity-encoded characters. By looking at the value for the accented characters you can also get an idea when encoding the characters are.
Odds are good they're not UTF8, so when Nokogiri passes them to your code that writes to the database they will be wrong.
To fix the problem you'll need to either tell Nokogiri what the encoding is, or convert the text to UTF-8 before storing it.
You've got the encoding wrong somewhere in your stack. I bet it's set wrong in MySQL.
Take a look at this: I need help fixing Broken UTF8 encoding