Unicode characters emoticons in MySQL with 4 bytes - mysql

I have to insert in Mysql Strings that may contain characters like '😂' . I tried this:
ALTER TABLE `table_name`
DEFAULT CHARACTER SET utf8mb4,
MODIFY `colname` VARCHAR(200)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL;
and when I insert '😂';
INSERT INTO `table_name` (`col_name`) VALUES ('😂');
I get the following
SELECT * FROM `table_name`;
????
How can I do to get the correct value in the select statements?
Thanks a lot.

You will need to set the connection encoding to utf8mb4 as well. It depends on how you connect to MySQL how to do this. SET NAMES utf8mb4 is the API-independent SQL query to do so.
What MySQL calls utf8 is a dumbed down subset of actual UTF-8, covering only the BMP (characters 0000 through FFFF). utf8mb4 is actual UTF-8 which can encode all Unicode code points. If your connection encoding is utf8, then all data is squeezed though this subset of UTF-8 and you cannot send or receive characters above the BMP to or from MySQL.

Related

MySQL - utf8 characters no displaying correctly on web frontend

I have a database which has the latin1 default characterset - info obtained by running the following statement:
SELECT default_character_set_name FROM information_schema.SCHEMATA
WHERE schema_name = "schemaname";
The default character set for each table and column in this database is set to utf8.
When I look at the data in the tables I can see data is stored as utf8 e.g the currency symbol € is stored in the table as €. Similarly apostraphes are stored as ’ etc.
On the web frontend I have the following meta tag and so the characters render correctly.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
However I'm also seeing a lot of � symbols on the webpage which I don't see inside the database?
When I change the database connection to include the charset utf8 as follows: mysql:host=myhost;dbname=mydatabase;charset=utf8, the diamond symbols disappear but then all the other utf8
characters revert to exactly how they are saved in the database e.g. the € symbol renders as € on the webpage?
Why is this happening?
How do I fix this and also change character set to utf8mb4?
Any help appreciated.
* UPDATE *
Tried the following steps:
for the database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Not sure if Step 3 is necessary since when I do SHOW CREATE TABLE after step 2, whilst the definition doesn't display the column charset it does display the default charset for the table as utf8mb4. As a sanity check I did run step 3 on one of the tables columns but it makes no difference - € is being rendered on the page as € with db connection as follows:
`mysql:host=myhost;dbname=mydatabase;charset=utf8mb4`
I had to run the following on each column I wanted converting which seems to fix some issues
UPDATE tbl_profiles SET profile =
convert(cast(convert(profile using latin1) as binary) using UTF8MB4);
but still seeing characters such as Iâm and «Âand ⢠rendered on the webpage
Any ideas?
* UPDATE 2 *
After running steps 1 and 2 above I have a table column as follows:
`job_salary` VARCHAR(150) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
The following query on this column returns the following result:
SELECT job_salary FROM tbl_jobs WHERE job_id = 2235;
€30,000 plus excellent benefits
I execute the following statement on this column:
UPDATE tbl_jobs SET job_salary = CONVERT(BINARY(CONVERT(job_salary USING latin1)) USING utf8mb4);
But I get the following error which means some other record has a invalid utf8mb4
Invalid utf8mb4 character string: '\x8010000 to \x8020000 Per: annum'
First, let's discuss the Mojibake of the Euro sign. All of this applies to both utf8 and utf8mb4, since the Euro is encoded the same way and there is.
It is very likely that the data was initially stored incorrectly. If you can get back to the INSERT program, let's check for:
The bytes to be stored need to be UTF-8-encoded. What was the client programming language? Where did the data come from?
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Do you have the connection parameters?
The column needs to be declared CHARACTER SET utf8 (or utf8mb4). It sounds like this was always correct.
HTML should start with .
What is currently in the table?
SELECT col, HEX(col) FROM ... WHERE ...
A correctly stored Euro sign (€) should have hex E282AC. (Interpreting that as latin1 yields €.
If instead, you see hex C3A2E2809AC2AC, you have "double encoding", and the display is probably €.
I have identified several possible fixes, but have not yet determined which applies in your case. The likely candidate is
CHARACTER SET utf8mb4 with double-encoding:
To verify it (before fixing it), please do something like:
SELECT col,
CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4),
HEX(
CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4)
)
FROM ...
WHERE ...
Do not apply a fix on top of another fix. I have struggled for a long time to decipher how character set problems occur and what to do to 'fix' a single problem. But when the wrong fix is applied, I am at a loss to unravel the mess.

How to enter the Indian Rupee symbol in MySQL Server 5.1?

I am unable to find the exact solution for MySQL
The thing is the column supports by default UTF-8 encoding which consists of 3 bytes. The Indian Rupee Symbol, since it is new has a 4 byte encoding. So we have to change the character encoding to utf8_general_ci by,
ALTER TABLE test_tb MODIFY COLUMN col VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
After executing the above query simply execute the following query to insert the symbol,
insert into test_tb values("₹");
Ta-Da!!!
You are talking Oracle, yet it is tagged MySQL. Which do you want? And what language and/or client tool are you using?
Copy and paste it. Which Rupee do you like? ৲ ৳ ૱ ௹ ₨ ꠸
Probably you want this one:
UNHEX('E282A8') = '₨'
which is U+20A8 or 8360 in non-MySQL contexts
You need to have CHARACTER SET utf8 on the table/column.
You need to have done SET NAMES utf8 (or equivalent) when connecting.
Simplest way to do it is, utf8mb4 stores all the symbols
ALTER TABLE AsinBuyBox CONVERT TO CHARACTER SET utf8mb4;

Can I convert MySQL database character set from latin1 to utf8 without losing data?

I want to convert my database to store unicode symbols.
Currently the tables have:
latin_swedish_ci collation and latin1 character set
OR
utf8_general_ci collation and utf8 character set
I am not sure how the existing data is encoded, but I suppose it is utf-8 encoded, as I am using Django which I think encodes the data in utf-8 before sending to the database.
My question is:
Can I convert the tables to utf8_unicode_ci collation and utf-8 character set using the following queries without messing up the existing data? (as sugested in this post)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Considering latin1 is subset of utf-8, I think it sould work. What do you guys think?
Thank you in advance.
P.S: The version of MySQL is: 5.1
Latin1 is not a subset of UTF-8 - ASCII is. Latin1, however, is represented in Unicode.
CONVERT TO should work, as long as the data was stored in the correct encoding in the first place. Django may have used UTF-8 on the database connection, but the database should have re-encoded on the fly.
To check the actual encoding used - Use the mysql command-line tool to execute an SQL query that selects a row that you know contains non-ASCII characters. Then use the mysql HEX() function to check the bytes used. If you see bytes greater than > 0x7f, check that they don't correspond to valid characters in https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout
If you have c396 sitting in a latin1 column, and you want it to mean Ö, then you are half way to "double encoding". Do not use CONVERT TO; that will really get you into "double encoding".
Instead, you need the 2-step ALTER.
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
If you have already messed it up further, and now the Ö is hex C383E28093, then you need to fix double encoding.
This gets you the latin1 byte in 2 steps:
CONVERT(CONVERT(UNHEX('C383E28093') USING utf8) USING latin1) --> 'Ö' (C396)
HEX(CONVERT(CONVERT(UNHEX('C396') USING utf8) USING latin1)) --> 'Ö' in latin1 (D6)
This gets you the 2-byte utf8 encoding:
CONVERT(BINARY(CONVERT(CONVERT(UNHEX('C383E28093') USING utf8) USING latin1)) USING utf8)
Do you want the column to be latin1? Or utf8?

Converting non-utf8 database to utf-8

I've been using for a long time a database/connection with the wrong encoding, resulting the hebrew language characters in the database to display as unknown-language characters, as the example shows below:
I want to re-import/change the database with the inserted-wrong-encoded characters to the right encoded characters, so the hebrew characters will be displayed as hebrew characters and not as unknown parse like *"× ×תה מסכי×,×× ×©×™× ×ž×¦×™×¢×™× ×œ×™ כמה ×”× "*
For the record, when I display this unknown characters sql data with php - it shows as hebrew. when I'm trying to access it from the phpMyAdmin Panel - it shows as jibrish (these unknown characters).
Is there any way to fix it although there is some data already inserted in the database?
That feels like "double-encoded" Hebrew strings.
This partially recovers the text:
UNHEX(HEX(CONVERT('× ×תה מסכי×,××' USING latin1)))
--> '� �תה מסכי�,��
I do not know what leads to the � symbols.
Please do SELECT col, HEX(col) FROM ... WHERE ...; for some cell. I would expect שלום to give hex D7A9D79CD795D79D if it were correctly stored. For "double encoding", I would expect C397C2A9C397C593C397E280A2C397C29D.
Please provide the output from that SELECT, then I will work on how to recover the data.
Edit
Here's what I think happened.
The client had characters encoded as utf8; and
SET NAMES latin1 lied by claiming that the client had latin1 encoding; and
The column in the table declared CHARACTER SET utf8.
Yod did not jump out as a letter, so it took a while to see it. CONVERT(BINARY(CONVERT('×™×™123' USING latin1)) USING utf8) -->יי123
So, I am thinking that that expression will clean up the text. But be cautious; try it on a few rows before 'fixing' the entire table.
UPDATE table SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8) WHERE ...;
If that does not work, here are 4 fixes for double-encoding that may or may not be equivalent. (Note: BINARY(xx) is probably the same as CONVERT(xx USING binary).)
I am not sure that you can do anything about the data that has already been stored in the database. However, you can import hebrew data properly by making sure you have the correct character set and collation.
the db collation has to be utf8_general_ci
the collation of the table with hebrew has to be utf8_general_ci
for example:
CREATE DATABASE col CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `col`.`hebrew` (
`id` INT NOT NULL AUTO_INCREMENT,
`heb` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`)
) CHARACTER SET utf8
COLLATE utf8_general_ci;
INSERT INTO hebrew(heb) values ('שלום');

Illegal mix of collations MySQL Error

I'm getting this strange error while processing a large number of data...
Error Number: 1267
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
SELECT COUNT(*) as num from keywords WHERE campaignId='12' AND LCASE(keyword)='hello again 昔 ã‹ã‚‰ ã‚ã‚‹ å ´æ‰€'
What can I do to resolve this? Can I escape the string somehow so this error wouldn't occur, or do I need to change my table encoding somehow, and if so, what should I change it to?
SET collation_connection = 'utf8_general_ci';
then for your databases
ALTER DATABASE your_database_name CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
MySQL sneaks swedish in there sometimes for no sensible reason.
CONVERT(column1 USING utf8)
Solves my problem. Where column1 is the column which gives me this error.
You should set both your table encoding and connection encoding to UTF-8:
ALTER TABLE keywords CHARACTER SET UTF8; -- run once
and
SET NAMES 'UTF8';
SET CHARACTER SET 'UTF8';
Use following statement for error
be careful about your data take backup if data have in table.
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
In general the best way is to Change the table collation. However I have an old application and are not really able to estimate the outcome whether this has side effects. Therefore I tried somehow to convert the string into some other format that solved the collation problem.
What I found working is to do the string compare by converting the strings into a hexadecimal representation of it's characters. On the database this is done with HEX(column). For PHP you may use this function:
public static function strToHex($string)
{
$hex = '';
for ($i=0; $i<strlen($string); $i++){
$ord = ord($string[$i]);
$hexCode = dechex($ord);
$hex .= substr('0'.$hexCode, -2);
}
return strToUpper($hex);
}
When doing the database query, your original UTF8 string must be converted first into an iso string (e.g. using utf8_decode() in PHP) before using it in the DB. Because of the collation type the database cannot have UTF8 characters inside so the comparism should work event though this changes the original string (converting UTF8 characters that are not existend in the ISO charset result in a ? or these are removed entirely). Just make sure that when you write data into the database, that you use the same UTF8 to ISO conversion.
I had my table originally created with CHARSET=latin1. After table conversion to utf8 some columns were not converted, however that was not really obvious.
You can try to run SHOW CREATE TABLE my_table; and see which column was not converted or just fix incorrect character set on problematic column with query below (change varchar length and CHARSET and COLLATE according to your needs):
ALTER TABLE `my_table` CHANGE `my_column` `my_column` VARCHAR(10) CHARSET utf8
COLLATE utf8_general_ci NULL;
I found that using cast() was the best solution for me:
cast(Format(amount, "Standard") AS CHAR CHARACTER SET utf8) AS Amount
There is also a convert() function. More details on it here
Another resource here
Change the character set of the table to utf8
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8
My user account did not have the permissions to alter the database and table, as suggested in this solution.
If, like me, you don't care about the character collation (you are using the '=' operator), you can apply the reverse fix. Run this before your SELECT:
SET collation_connection = 'latin1_swedish_ci';
After making your corrections listed in the top answer, change the default settings of your server.
In your "/etc/my.cnf.d/server.cnf" or where ever it's located add the defaults to the [mysqld] section so it looks like this:
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Source: https://dev.mysql.com/doc/refman/5.7/en/charset-applications.html