MYSQL UTF8MB4 keep showing ???? character - mysql

I would like to save extra rare Chinese character in database using mysql,
however keep showing ???? after the text is inserted into the table.
Previously I have searched how to insert any 4byte Chinese character into mysql data, and following the process ("Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?) but still no use. Is there something I miss?
Fyi, I have updated mysql version into 5.7.18, and I try to insert the extra rare Chinese character from phpmyadmin directly.
thanks in advance!

As per the CREATE TABLE Syntax:
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name
(create_definition,...)
[table_options]
[partition_options]
[...]
table_option:
ENGINE [=] engine_name
| AUTO_INCREMENT [=] value
| AVG_ROW_LENGTH [=] value
| [DEFAULT] CHARACTER SET [=] charset_name
In other words:
CREATE TABLE test (
some_column VARCHAR(100)
)
CHARACTER SET = utf8mb4;
It won't hurt either to pick a specific collation.

You also must change the connection to be utf8mb4. See http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored for discussion of "question marks".
Note: The data failed to be stored correctly; it cannot be recovered without re-inserting the data.

Related

Illegal mix of collations error in mysql query

Is there any way to compare the generated range column in the mysql query ?
SELECT ue.bundle,ue.timestamp,b.id,bv.id as bundleVersionId,bv.start_date,bv.end_date, bv.type,ue.type from (
SELECT bundle,timestamp,tenant, case when Document_Id ='' then 'potrait'
WHEN Document_Id<>'' then 'persisted' end as type from uds_expanded ) ue
JOIN bundle b on b.name=ue.bundle join bundle_version bv on b.id=bv.bundle_id
WHERE ue.tenant='02306' and ue.timestamp >= bv.start_date and ue.timestamp <=bv.end_date and **ue.type=bv.type ;**
I am getting the following error when I try to compare the types
Error Code: 1267. Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation '=' 0.000 sec
Stick to one encoding/collation for your entire system. Right now you seem to be using UTF8 one place and latin1 in another place. Convert the latter to use UTF8 as well and you'll be good.
You can change the collation to UTF8 using
alter table <some_table> convert to character set utf8 collate utf8_general_ci;
I think sometimes the issue is we use different orm utilities to generate table and then we want to test queries either by mysql command line or MySql workbench, then this problem comes due to differences of table collation and the command line or app we use. simple way is to define your variables (ones used to test the query against table columns)
ex:
MySQL>
MySQL> set #testCode = 'test2' collate utf8_unicode_ci;
Query OK, 0 rows affected (0.00 sec)
MySQL> select * from test where code = #testCode;
full details
Be aware that the single columns can have their collation.
For example, Doctrine generates columns of VARCHAR type as CHARACTER SET utf8 COLLATE utf8_unicode_ci, and changing the table collation doesen't affect the single columns.
You can change the column's collation with this command:
ALTER TABLE `table`
CHANGE COLUMN `test` `test` VARCHAR(15) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
or in MySql Workbench interface-> right click on the table-> Alter Table and then in the interface click on a column and modify it.
Use ascii_bin where ever possible, it will match up with almost any collation.

MySQL column name is a weird character - how do I change it?

I'm examining a table in MySQL that has a weird column name. I want to change the name of the column to not be weird. I can't figure out how to do so.
Firstly, if I first do
SET NAMES utf8;
DESC `tblName`;
I get
| Ԫ | varchar(255) | YES | MUL | NULL | |
Instead, doing
SET NAMES latin1;
DESC `tblName`;
Results in
| ? | varchar(255) | YES | MUL | NULL | |
Fair enough - this make me think the column name is simply a latin1 question mark. But this statement doesn't work:
mysql> ALTER TABLE `tblName` CHANGE COLUMN `?` `newName` VARCHAR(255);
ERROR 1054 (42S22): Unknown column '?' in 'tblName'
So I went to the information_schema table for some info:
mysql> SELECT column_name, HEX(column_name), ordinal_position FROM information_schema.columns WHERE table_schema = 'myschema' AND table_name = 'tblName' ;
| ? | D4AA | 48 |
I looked up this hex point, and assuming I looked it up correctly (which may not be true), I determined this character is "풪" which is the "hangul syllable pweoj". So I tried that in an alter table statement to no avail:
ALTER TABLE `tblName` change column `풪` `newName` VARCHAR(255);
So that's where I'm stuck.
I figured out a way to do this (but I wonder if there's a better solution?)
I did a SHOW CREATE statement:
mysql> SHOW CREATE TABLE `tblName`;
...
`Ԫ` varchar(255) DEFAULT NULL,
I looked for the column in question, which was printed strangely (what you see above doesn't quite match it). The closing backtick wasn't visible. But I highlighted what was visible and pasted that into my ALTER TABLE and that finally fixed the issue.
I believe that Ԫ (question mark in a box) is actually shown because your system does not have a font at that code point. From your `hex(column_name)' we can see that the value is xD4AA, which is the UTF-8 value. This translates to Unicode point 052a for which I don't have the font either on my Windows box.
Setting the char set to latin1, simply meant that Mysql was unable to translate that char to a latin1/cp1252 value so replaced it with "?". (xD4AA could easily be translated to two cp1252 chars, "Ôª". For some reason Mysql chose not to. Perhaps it knew the original encoding?)
Now, how to rename the column? It should be as simple as you say with ALTER TABLE CHANGE COLUMN etc etc. However, it seems that the Mysql console doesn't play nice with non-ASCII chars, especially variable length chars found in UTF-8.
The solution was to pass the SQL as an argument to mysql from Bash instead. For example (Ensure terminal translation is UTF-8 before pasting Ԫ):
mysql --default-character-set=utf8 -e "ALTER TABLE test change column Ԫ test varchar(255);" test

MySQL LOWER() function not multi-byte safe for the º character?

When I encoding the following character to UTF-8:
º
I get:
º
Then with º stored as a field value, I select the field with the LOWER() function and get
âº
I was expecting it to respect that the value is a multi-byte character and thus will not perform the LOWER on it.
Expected:
º
I am I not understanding correctly that the LOWER() function is suppose to be multi-byte safe as stated in the manual? (http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_lower)
Or am I doing something wrong here?
I am running MySQL 5.1.
EDIT
The encoding on the table is set to UTF-8. The session encoding is default latin1.
Here are my repro steps.
CREATE TABLE test_table (
test_field VARCHAR(1000) DEFAULT NULL
) ENGINE=INNODB DEFAULT CHARSET=utf8;
INSERT INTO test_table(test_field) VALUES('º');
SELECT LOWER(test_field) FROM test_table;
INSERT INTO test_table(test_field) VALUES('º');
Will insert a 2 character string, which has the correct LOWER() of "âº"
Lower("Â") is "â"
Lower("º") is "º"
If you want to insert "º" then make sure you have
SET NAMES 'utf-8';
and
INSERT INTO test_table(test_field) VALUES('º');

MySQL: char_length(), wrong value for Russian

I am using char_length() to measure the size of "Русский": strangely, instead of telling me that it's 7 chars, it tells me there are 14. Interestingly if the query is simply...
SELECT CHAR_LENGTH('Русский')
...the answer is correct. However if I query the DB instead, the anser is 14:
SELECT CHAR_LENGTH(text) FROM locales WHERE lang = 'ru-RU' AND name = 'lang_name'
Anybody go any ideas what I might be doing wrong? I can confirm that the collation is utf8_general_ci and the table is MyISAM
Thanks,
Adrien
EDIT: My end objective is to be able to measure the lengths of records in a table containing single and double-byte chracters (eg. English & Russian, but not limited to these two languages only)
Because of two bytes is used for each UTF8 char.
See http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_char-length
mysql> set names utf8;
mysql> SELECT CHAR_LENGTH('Русский'); result - 7
mysql> SELECT CHAR_LENGTH('test'); result - 4
create table test123 (
text VARCHAR(255) NOT NULL DEFAULT '',
text_text TEXT) Engine=Innodb default charset=UTF8;
insert into test123 VALUES('русский','test русский');
SELECT CHAR_LENGTH(text),CHAR_LENGTH(text_text) from test123; result - 7 and 12
I have tested work with: set names koi8r; create table and so on and got invalid result.
So the solution is recreate table and insert all data after setting set names UTF8.
the function return it's anwser guided by the most adjacent charset avaiable
in the case of a column, the column definition
in the case of a literal, the connection default
review the column charset with:
SELECT CHARACTER_SET_NAME FROM information_schema.`COLUMNS`
where table_name = 'locales'
and column_name = 'text'
be careful, it is not filtered by table_schema

MySql Turkish Characters

I'm writing a program. This program transfers data to MySql database which is in SQL Server Datas.
MySql database default charset is Latin1. Latin5 charset is usually used for Turkish characters. But I can't change mysql table's charset because it's a very old database.
Is there any way to import turkish chars to mysql database correctly?
To test try:
CREATE TABLE newtable LIKE oldtable;
-- change the character latin character set to latin5
ALTER TABLE newtable MODIFY latin1_text_col TEXT CHARACTER SET latin5;
INSERT INTO newtable
SELECT * from oldtable;
If everything looks good you can drop the old table and rename the newtable to have the same name as the oldtable.