SQL converting column from Latin1 to UTF8 - mysql

I am trying to convert a specific column in a table on my DB from latin1 character set with collation latin1_swedish_ci to utf8 with collation utf8_unicode_ci.
COLUMN: description, type: longtext, default not null
I tried the following commands on the column:
ALTER TABLE sample MODIFY description LONGBLOB NOT NULL ;
ALTER TABLE sample MODIFY description LONGTEXT CHARACTER SET utf8 NOT NULL COLLATE utf8_unicode_ci;
I also tried to alter the encoding WITHOUT changing to binary first. But the characters ended up being re-encoded incorrectly by the server.
And keep getting an error regarding some characters:
Error Code: 1366. Incorrect string value: '\x92t hav...' for column 'longDesc' at row 803
It seems like some of the character in my table aren't converting correctly.
How can I fix this issue?

\x92 implies that you have latin1 in the table now. The second ALTER is claiming that the bytes are in utf8 encoding. Hence, the error message.
Case 1: You need to change the LONGTEXT to utf8 because you plan to add rows with text that cannot be encoded in latin1.
For this case, ALTER TABLE sample CONVERT TO CHARACTER SET utf8; -- converts all CHAR/TEXT columns in the table.
ALTER TABLE sample MODIFY description ... CHARACTER SET utf8; -- converts the one column.
Case 2: The rest of the system is thinking utf8 and is confused by this column.
Well, I don't think it is confused. Conversions happen as needed.

Related

Incorrect string value: \'\\xC3\' for column \'description\' at row 1 in mySql

Here, In my table, I've one column name as description.
As per my error, I've tried many solutions from SO to change the collation type.
I've tried below collection
1) utf8mb4_unicode_ci
2) utf8_general_ci
Here, SHOW FULL COLUMNS FROM your_table;
Can anyone know what is the right collation for \'\\xC3\' this type of string?
To support full UTF-8 Unicode like for example emojis in your case it is the character À you should use utf8mb4 and utf8mb4_unicode_ci utf8 is outdated.
You can find a full explanation at https://mathiasbynens.be/notes/mysql-utf8mb4.
You can check the current collations of your table like this:
SHOW FULL COLUMNS FROM your_table;
I assume your description column has type TEXT otherwise you might need to change the type.
To alter the table default character set you can use:
ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4;
But this does not change the collation of your column.
To change the collation of your column you should use:
ALTER TABLE your_table MODIFY description TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Try this first
ALTER TABLE your_database_name.your_table CONVERT TO CHARACTER SET utf8
OR If above solution won't work then do the following after connecting to your database
SET NAMES 'utf8';
SET CHARACTER SET utf8;

How can I see utf8mbf collate for a column in MySQL?

Here is the command I use:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
It works well. Now I need to set utf8mb4_unicode_ci for a column (since currently characters are shown as ???). Anyway here is my new command:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8mb4_unicode_ci;
But sadly MySQL throws:
ERROacR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER
Any idea?
The first part of the COLLATION name must match the CHARACTER SET name.
CHARACTER SET utf8mb4 is needed for Emoji and some Chinese characters.
Let's back up to the 'real' problem -- of question marks.
COLLATION refers to the rules of ordering and sorting, not encoding.
CHARACTER SET refers to the encoding. This should be consistent at all stages. Question Marks come from inconsistencies.
Trouble with UTF-8 characters; what I see is not what I stored points out that these are the likely suspects for Question Marks:
The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
The column in the database is not CHARACTER SET utf8mb4. Fix this if you need 4-byte UTF-8. (Use SHOW CREATE TABLE.)
Also, check that the connection during reading is UTF-8. The details depend on the application doing the connecting.
This worked for me:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Saving tweets in MySql throws "Incorrect string value: '\xF0\x9F\x92\xB2\xF0\x9F"

I am trying to save tweets into MySql db, most of the time it works fine, but when tweet's like the ones given below come,
Example 1
Example 2
I get the following exception from MySql,
java.sql.BatchUpdateException: Incorrect string value: '\xF0\x9F\x92\xB2\xF0\x9F...' for column 'twtText' at row 1
How can we handle such texts.
This works for me. Changing the character set in MySql
ALTER TABLE TableName MODIFY COLUMN ColumnName varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL;
Try changing your column's charset to the value reflecting the charset of the strings you want to insert.
Example:
ALTER TABLE database.table MODIFY COLUMN col VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
Besides setting the collation on the database table / column, we also had to set on the client, in the app, the collation with:
SET NAMES 'utf8mb4';
before the actual statement .
Similar errors: Incorrect string value: '\xF0\x9F\x8E\xB6\xF0\x9F...' MySQL
I got the same issure and solveld it.
The cause of the error is that the string contains emoticons.
set your mysql column's charset to utf8mb4 and Collation to utf8mb4_general_ci
set your connection string of charset to utf8mb4 like charset=utf8mb4
ok, test it

Converting mysql tables from latin1 to utf8

I'm trying to convert some mysql tables from latin1 to utf8. I'm using the following command, which seems to mostly work.
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
However, on one table I get an error about a duplicate key entry. This is caused by a unique index on a "name" field. It seems when converting to utf8, any "special" characters are indexed as their straight english equivalent. For example, there is already a record with a name field value of "Dru". When converting to utf8, a record with "Drü" is considered a duplicate. The same with "Patrick" and "Påtrìçk".
Here is how to reproduce the issue:
CREATE TABLE `example` ( `name` char(20) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO example (name) VALUES ('Drü'),('Dru'),('Patrick'),('Påtrìçk');
ALTER TABLE example convert to character set utf8 collate utf8_general_ci;
ERROR 1062 (23000): Duplicate entry 'Dru' for key 1
The reason why the strings 'Drü' and 'Dru' evaluate as the same is that in the utf8_general_ci collation, they count as "the same". The purpose of a collation for a character set is to provide a set of rules as to when strings are the same, when one sorts before the other, and so on.
If you want a different set of comparison rules, you need to choose a different collation. You can see the available collations for the utf8 character set by issuing SHOW COLLATION LIKE 'utf8%'. There are a bunch of collations intended for text that is mostly in a specific language; there is also the utf8_bin collation which compares all strings as binary strings (i.e. compares them as sequences of 0s and 1s).
UTF8_GENERAL_CI is accent insensitive.
Use UTF8_BIN or a language-specific collation.

MySQL error: "Column 'columnname' cannot be part of FULLTEXT index"

Recently I changed a bunch of columns to utf8_general_ci (the default UTF-8 collation) but when attempting to change a particular column, I received the MySQL error:
Column 'node_content' cannot be part of FULLTEXT index
In looking through docs, it appears that MySQL has a problem with FULLTEXT indexes on some multi-byte charsets such as UCS-2, but that it should work on UTF-8.
I'm on the latest stable MySQL 5.0.x release (5.0.77 I believe).
Oops, so I have found the answer to my problem:
All columns of a FULLTEXT index must have not only the same character set but also the same collation.
My FULLTEXT index had utf8_unicode_ci on one of its columns, and utf8_general_ci on its other columns.
Just to add to Thomas's good advice: And to sort things out in PHPMyAdmin you have to change the characterset for all columns AT THE SAME TIME.
Just wasted half a day trying again and again to change the columns one at a time and continually getting the error message about the FULLTEXT index.
For DBeaver/database tool users.
When you use interface to modify more than one column, the tool generate commands like this :
ALTER TABLE databaseName.tableName MODIFY COLUMN columnName1 text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL;
ALTER TABLE databaseName.tableName MODIFY COLUMN columnName2 varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL;
This is not working because you must modify the charsets at the same time.
So, you have to change it manually, in one command :
ALTER TABLE databaseName.tableName
MODIFY COLUMN columnName1 text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL,
MODIFY COLUMN columnName2 text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL;
utf8 or utf8mb4 ? See here.