MySQL - utf8 characters no displaying correctly on web frontend - mysql

I have a database which has the latin1 default characterset - info obtained by running the following statement:
SELECT default_character_set_name FROM information_schema.SCHEMATA
WHERE schema_name = "schemaname";
The default character set for each table and column in this database is set to utf8.
When I look at the data in the tables I can see data is stored as utf8 e.g the currency symbol € is stored in the table as €. Similarly apostraphes are stored as ’ etc.
On the web frontend I have the following meta tag and so the characters render correctly.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
However I'm also seeing a lot of � symbols on the webpage which I don't see inside the database?
When I change the database connection to include the charset utf8 as follows: mysql:host=myhost;dbname=mydatabase;charset=utf8, the diamond symbols disappear but then all the other utf8
characters revert to exactly how they are saved in the database e.g. the € symbol renders as € on the webpage?
Why is this happening?
How do I fix this and also change character set to utf8mb4?
Any help appreciated.
* UPDATE *
Tried the following steps:
for the database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Not sure if Step 3 is necessary since when I do SHOW CREATE TABLE after step 2, whilst the definition doesn't display the column charset it does display the default charset for the table as utf8mb4. As a sanity check I did run step 3 on one of the tables columns but it makes no difference - € is being rendered on the page as € with db connection as follows:
`mysql:host=myhost;dbname=mydatabase;charset=utf8mb4`
I had to run the following on each column I wanted converting which seems to fix some issues
UPDATE tbl_profiles SET profile =
convert(cast(convert(profile using latin1) as binary) using UTF8MB4);
but still seeing characters such as Iâm and «Âand ⢠rendered on the webpage
Any ideas?
* UPDATE 2 *
After running steps 1 and 2 above I have a table column as follows:
`job_salary` VARCHAR(150) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
The following query on this column returns the following result:
SELECT job_salary FROM tbl_jobs WHERE job_id = 2235;
€30,000 plus excellent benefits
I execute the following statement on this column:
UPDATE tbl_jobs SET job_salary = CONVERT(BINARY(CONVERT(job_salary USING latin1)) USING utf8mb4);
But I get the following error which means some other record has a invalid utf8mb4
Invalid utf8mb4 character string: '\x8010000 to \x8020000 Per: annum'

First, let's discuss the Mojibake of the Euro sign. All of this applies to both utf8 and utf8mb4, since the Euro is encoded the same way and there is.
It is very likely that the data was initially stored incorrectly. If you can get back to the INSERT program, let's check for:
The bytes to be stored need to be UTF-8-encoded. What was the client programming language? Where did the data come from?
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Do you have the connection parameters?
The column needs to be declared CHARACTER SET utf8 (or utf8mb4). It sounds like this was always correct.
HTML should start with .
What is currently in the table?
SELECT col, HEX(col) FROM ... WHERE ...
A correctly stored Euro sign (€) should have hex E282AC. (Interpreting that as latin1 yields €.
If instead, you see hex C3A2E2809AC2AC, you have "double encoding", and the display is probably €.
I have identified several possible fixes, but have not yet determined which applies in your case. The likely candidate is
CHARACTER SET utf8mb4 with double-encoding:
To verify it (before fixing it), please do something like:
SELECT col,
CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4),
HEX(
CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4)
)
FROM ...
WHERE ...
Do not apply a fix on top of another fix. I have struggled for a long time to decipher how character set problems occur and what to do to 'fix' a single problem. But when the wrong fix is applied, I am at a loss to unravel the mess.

Related

Error Code: 1267. Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) [duplicate]

I'm getting this strange error while processing a large number of data...
Error Number: 1267
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
SELECT COUNT(*) as num from keywords WHERE campaignId='12' AND LCASE(keyword)='hello again 昔 ã‹ã‚‰ ã‚ã‚‹ å ´æ‰€'
What can I do to resolve this? Can I escape the string somehow so this error wouldn't occur, or do I need to change my table encoding somehow, and if so, what should I change it to?
SET collation_connection = 'utf8_general_ci';
then for your databases
ALTER DATABASE your_database_name CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
MySQL sneaks swedish in there sometimes for no sensible reason.
CONVERT(column1 USING utf8)
Solves my problem. Where column1 is the column which gives me this error.
You should set both your table encoding and connection encoding to UTF-8:
ALTER TABLE keywords CHARACTER SET UTF8; -- run once
and
SET NAMES 'UTF8';
SET CHARACTER SET 'UTF8';
Use following statement for error
be careful about your data take backup if data have in table.
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
In general the best way is to Change the table collation. However I have an old application and are not really able to estimate the outcome whether this has side effects. Therefore I tried somehow to convert the string into some other format that solved the collation problem.
What I found working is to do the string compare by converting the strings into a hexadecimal representation of it's characters. On the database this is done with HEX(column). For PHP you may use this function:
public static function strToHex($string)
{
$hex = '';
for ($i=0; $i<strlen($string); $i++){
$ord = ord($string[$i]);
$hexCode = dechex($ord);
$hex .= substr('0'.$hexCode, -2);
}
return strToUpper($hex);
}
When doing the database query, your original UTF8 string must be converted first into an iso string (e.g. using utf8_decode() in PHP) before using it in the DB. Because of the collation type the database cannot have UTF8 characters inside so the comparism should work event though this changes the original string (converting UTF8 characters that are not existend in the ISO charset result in a ? or these are removed entirely). Just make sure that when you write data into the database, that you use the same UTF8 to ISO conversion.
I had my table originally created with CHARSET=latin1. After table conversion to utf8 some columns were not converted, however that was not really obvious.
You can try to run SHOW CREATE TABLE my_table; and see which column was not converted or just fix incorrect character set on problematic column with query below (change varchar length and CHARSET and COLLATE according to your needs):
ALTER TABLE `my_table` CHANGE `my_column` `my_column` VARCHAR(10) CHARSET utf8
COLLATE utf8_general_ci NULL;
I found that using cast() was the best solution for me:
cast(Format(amount, "Standard") AS CHAR CHARACTER SET utf8) AS Amount
There is also a convert() function. More details on it here
Another resource here
Change the character set of the table to utf8
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8
My user account did not have the permissions to alter the database and table, as suggested in this solution.
If, like me, you don't care about the character collation (you are using the '=' operator), you can apply the reverse fix. Run this before your SELECT:
SET collation_connection = 'latin1_swedish_ci';
After making your corrections listed in the top answer, change the default settings of your server.
In your "/etc/my.cnf.d/server.cnf" or where ever it's located add the defaults to the [mysqld] section so it looks like this:
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Source: https://dev.mysql.com/doc/refman/5.7/en/charset-applications.html

Can I convert MySQL database character set from latin1 to utf8 without losing data?

I want to convert my database to store unicode symbols.
Currently the tables have:
latin_swedish_ci collation and latin1 character set
OR
utf8_general_ci collation and utf8 character set
I am not sure how the existing data is encoded, but I suppose it is utf-8 encoded, as I am using Django which I think encodes the data in utf-8 before sending to the database.
My question is:
Can I convert the tables to utf8_unicode_ci collation and utf-8 character set using the following queries without messing up the existing data? (as sugested in this post)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Considering latin1 is subset of utf-8, I think it sould work. What do you guys think?
Thank you in advance.
P.S: The version of MySQL is: 5.1
Latin1 is not a subset of UTF-8 - ASCII is. Latin1, however, is represented in Unicode.
CONVERT TO should work, as long as the data was stored in the correct encoding in the first place. Django may have used UTF-8 on the database connection, but the database should have re-encoded on the fly.
To check the actual encoding used - Use the mysql command-line tool to execute an SQL query that selects a row that you know contains non-ASCII characters. Then use the mysql HEX() function to check the bytes used. If you see bytes greater than > 0x7f, check that they don't correspond to valid characters in https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout
If you have c396 sitting in a latin1 column, and you want it to mean Ö, then you are half way to "double encoding". Do not use CONVERT TO; that will really get you into "double encoding".
Instead, you need the 2-step ALTER.
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
If you have already messed it up further, and now the Ö is hex C383E28093, then you need to fix double encoding.
This gets you the latin1 byte in 2 steps:
CONVERT(CONVERT(UNHEX('C383E28093') USING utf8) USING latin1) --> 'Ö' (C396)
HEX(CONVERT(CONVERT(UNHEX('C396') USING utf8) USING latin1)) --> 'Ö' in latin1 (D6)
This gets you the 2-byte utf8 encoding:
CONVERT(BINARY(CONVERT(CONVERT(UNHEX('C383E28093') USING utf8) USING latin1)) USING utf8)
Do you want the column to be latin1? Or utf8?

Converting non-utf8 database to utf-8

I've been using for a long time a database/connection with the wrong encoding, resulting the hebrew language characters in the database to display as unknown-language characters, as the example shows below:
I want to re-import/change the database with the inserted-wrong-encoded characters to the right encoded characters, so the hebrew characters will be displayed as hebrew characters and not as unknown parse like *"× ×תה מסכי×,×× ×©×™× ×ž×¦×™×¢×™× ×œ×™ כמה ×”× "*
For the record, when I display this unknown characters sql data with php - it shows as hebrew. when I'm trying to access it from the phpMyAdmin Panel - it shows as jibrish (these unknown characters).
Is there any way to fix it although there is some data already inserted in the database?
That feels like "double-encoded" Hebrew strings.
This partially recovers the text:
UNHEX(HEX(CONVERT('× ×תה מסכי×,××' USING latin1)))
--> '� �תה מסכי�,��
I do not know what leads to the � symbols.
Please do SELECT col, HEX(col) FROM ... WHERE ...; for some cell. I would expect שלום to give hex D7A9D79CD795D79D if it were correctly stored. For "double encoding", I would expect C397C2A9C397C593C397E280A2C397C29D.
Please provide the output from that SELECT, then I will work on how to recover the data.
Edit
Here's what I think happened.
The client had characters encoded as utf8; and
SET NAMES latin1 lied by claiming that the client had latin1 encoding; and
The column in the table declared CHARACTER SET utf8.
Yod did not jump out as a letter, so it took a while to see it. CONVERT(BINARY(CONVERT('×™×™123' USING latin1)) USING utf8) -->יי123
So, I am thinking that that expression will clean up the text. But be cautious; try it on a few rows before 'fixing' the entire table.
UPDATE table SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8) WHERE ...;
If that does not work, here are 4 fixes for double-encoding that may or may not be equivalent. (Note: BINARY(xx) is probably the same as CONVERT(xx USING binary).)
I am not sure that you can do anything about the data that has already been stored in the database. However, you can import hebrew data properly by making sure you have the correct character set and collation.
the db collation has to be utf8_general_ci
the collation of the table with hebrew has to be utf8_general_ci
for example:
CREATE DATABASE col CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `col`.`hebrew` (
`id` INT NOT NULL AUTO_INCREMENT,
`heb` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`)
) CHARACTER SET utf8
COLLATE utf8_general_ci;
INSERT INTO hebrew(heb) values ('שלום');

mysql unicode text incorrect string warning on insert, despite character set variables set utf8mb4

First, I know, yes, this is yet another mysql unicode question.
Problem: I am unable to insert unicode text into my mysql database
I want to execute the following query:
INSERT INTO usert SET username='田中'
When I do, I get this warning:
Incorrect string value: '\x93c\x92\x86' for column 'username' at row 1
A blank space is inserted into the table instead of the data
I have tried as many answers and forums as I could, and I believe that all appropriate variables, table, and column settings are set to 'utf8mb4' character set, with collation 'utf8mb4_general_ci' or 'utfmb4_unicode_ci'
I will tell you why I believe that by giving you the details, and sql commands used to show them.
First, mysql version:
mysql:> SHOW VARIABLES LIKE 'version'
Confirms that the version is 5.6.23
To show the character set variables in mysql:
mysql:> SHOW VARIABLES LIKE '%char%'
That command shows (in slightly different format):
character_set_client: utf8mb4
character_set_connection: utf8mb4
character_set_database: utf8mb4
...
character_set_results: utf8mb4
character_set_server: utf8mb4
character_set_system: utf8
Collation:
mysql:> SHOW VARIABLES LIKE '%collat%'
RESULTS:
collation_connection: utf8mb4_unicode_ci
collation_database: utf8mb4_unicode_ci
collation_server: utf8mb4_unicode_ci
So far so good?
Now, for the table character set and collation:
Look at table details command:
mysql:> SHOW TABLE STATUS
shows that the collation is utf8mb4_general_ci
Command for looking at column details:
mysql:> SHOW FULL COLUMNS IN usert
Confirms that the collation for column 'username' is utf8mb4_general_ci
In summary, from what I have studied, all relevant variables, database, table, and column settings seem to be set to the relevant utf8mb4 setting. Despite that, I am unable to insert the unicode Japanese text.
(By the way, I dont think the 4-byte unicode settings utf8mb4 is necessary here, but it is what I am using because it seemed to fix many other unicode mysql problems)
What other settings in mysql or the system are likely causing this problem?
What other settings can I/ should I change to allow inserting japanese text appropriately?
EDIT UPDATE: I am on a Japanese computer
The problem was the default system settings, which also affected the input settings at the command line.
Its a Japanese computer, which apparently uses shift-jis encoding using, NOT unicode, by default. The text I was inputting was encoded in this way, and in similar input files I was trying to use.
Therefore, I set the character set to be 'jsis' in the server,
i.e. setting character-set-server=sjis in the my.ini initializer file, and set the mysql character set to be the same by entering skip-character-set-client-handshake into the same initilization file.
The character set for the column of course must also be changed via
ALTER TABLE usert MODIFY username varchar(30) CHARACTER SET sjis COLLATE sjis_japanese_ci
Now, you can insert the japanese text from command line, and other japanese files which use shift-jis encoding.
Another option for inputting japanese text seems to be cp932, which is the windows version of shift-jis.
Incidentally, if you DO wish to use unicode via command line, apparently powershell has better support for it, rather than the normal cmd I was using, but I haven't tried it personally.
Try check character set of Database.
Check character set of your Database with command bellow:
SELECT ##character_set_database, ##collation_database;
If result of 1 different UTF-8 then try command bellow:
ALTER DATABASE yourDatabase CHARACTER SET utf8 COLLATE
utf8_unicode_ci;
Hope it work for you.

Illegal mix of collations MySQL Error

I'm getting this strange error while processing a large number of data...
Error Number: 1267
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
SELECT COUNT(*) as num from keywords WHERE campaignId='12' AND LCASE(keyword)='hello again 昔 ã‹ã‚‰ ã‚ã‚‹ å ´æ‰€'
What can I do to resolve this? Can I escape the string somehow so this error wouldn't occur, or do I need to change my table encoding somehow, and if so, what should I change it to?
SET collation_connection = 'utf8_general_ci';
then for your databases
ALTER DATABASE your_database_name CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
MySQL sneaks swedish in there sometimes for no sensible reason.
CONVERT(column1 USING utf8)
Solves my problem. Where column1 is the column which gives me this error.
You should set both your table encoding and connection encoding to UTF-8:
ALTER TABLE keywords CHARACTER SET UTF8; -- run once
and
SET NAMES 'UTF8';
SET CHARACTER SET 'UTF8';
Use following statement for error
be careful about your data take backup if data have in table.
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
In general the best way is to Change the table collation. However I have an old application and are not really able to estimate the outcome whether this has side effects. Therefore I tried somehow to convert the string into some other format that solved the collation problem.
What I found working is to do the string compare by converting the strings into a hexadecimal representation of it's characters. On the database this is done with HEX(column). For PHP you may use this function:
public static function strToHex($string)
{
$hex = '';
for ($i=0; $i<strlen($string); $i++){
$ord = ord($string[$i]);
$hexCode = dechex($ord);
$hex .= substr('0'.$hexCode, -2);
}
return strToUpper($hex);
}
When doing the database query, your original UTF8 string must be converted first into an iso string (e.g. using utf8_decode() in PHP) before using it in the DB. Because of the collation type the database cannot have UTF8 characters inside so the comparism should work event though this changes the original string (converting UTF8 characters that are not existend in the ISO charset result in a ? or these are removed entirely). Just make sure that when you write data into the database, that you use the same UTF8 to ISO conversion.
I had my table originally created with CHARSET=latin1. After table conversion to utf8 some columns were not converted, however that was not really obvious.
You can try to run SHOW CREATE TABLE my_table; and see which column was not converted or just fix incorrect character set on problematic column with query below (change varchar length and CHARSET and COLLATE according to your needs):
ALTER TABLE `my_table` CHANGE `my_column` `my_column` VARCHAR(10) CHARSET utf8
COLLATE utf8_general_ci NULL;
I found that using cast() was the best solution for me:
cast(Format(amount, "Standard") AS CHAR CHARACTER SET utf8) AS Amount
There is also a convert() function. More details on it here
Another resource here
Change the character set of the table to utf8
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8
My user account did not have the permissions to alter the database and table, as suggested in this solution.
If, like me, you don't care about the character collation (you are using the '=' operator), you can apply the reverse fix. Run this before your SELECT:
SET collation_connection = 'latin1_swedish_ci';
After making your corrections listed in the top answer, change the default settings of your server.
In your "/etc/my.cnf.d/server.cnf" or where ever it's located add the defaults to the [mysqld] section so it looks like this:
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Source: https://dev.mysql.com/doc/refman/5.7/en/charset-applications.html