Could someone know what encoding is used here #T0#g0#x0#y0#w0#u0#p0#q0#o0.MYD ?
This is a file name corresponding to table name which name using cyrillic letters.
This is the MySQL internal filename encoding Documented here.
You can convert this back to normal utf8 by using a procedure like:
mysql> SELECT CONVERT(_filename'#T0#g0#x0#y0#w0#u0#p0#q0#o0' USING utf8);
+------------------------------------------------------------+
| CONVERT(_filename'#T0#g0#x0#y0#w0#u0#p0#q0#o0' USING utf8) |
+------------------------------------------------------------+
| Настройки |
+------------------------------------------------------------+
1 row in set (0.00 sec)
Related
I have a table defined as follows:
mysql> show create table temptest;
+------------+-----------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-----------------------------------------------------------------------------------------------------------+
| temptest | CREATE TABLE `temptest` (
`mystring` varchar(100) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+------------+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
When I use mysql console (through mysql temptest) and insert a character through
insert into temptest values ("é");
I can see it is getting saved as "latin1" encoding
mysql> select hex(mystring) from temptest;
+---------------+
| hex(mystring) |
+---------------+
| E9 |
+---------------+
But if I issue a "set names latin1" and perform the same operation, I see it storing the same character in utf8 encoding.
mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into temptest values ("é");
Query OK, 1 row affected (0.01 sec)
mysql> select hex(mystring) from temptest;
+---------------+
| hex(mystring) |
+---------------+
| E9 |
| C3A9 |
+---------------+
As far as I understand, "set names" shouldn't affect how mysql stores the data (https://dev.mysql.com/doc/refman/8.0/en/set-names.html). What am I missing here? Any insight into this would be greatly appreciated. Thank you.
SET NAMES latin1 declares that the encoding in your client is latin1.
But (apparently) it is actually utf8.
So, when you type é, the client generates the 2 bytes C3 A9.
Then those are sent as if they were latin1 to the server (mysqld).
The Server says "Oh, I am getting some latin1 bytes, and I will be putting them into a latin1 column, so I don't need to transform them.
In go two latin1 characters é (hex C3A9). This is called Mojibake.
If you do SET NAMES utf8 and SELECT the text, you will "see" é and it will be 4 bytes (hex C383C2A9)!
Bottom line: Your client encoding was really utf8, so you should have said SET NAMES utf8 (or utf8mb4). Confused? Welcome to the club.
From the command
select char(0x542d01);
I expect
T-1
But MySQL returns
T-:)
From a little of google research I found the result can be modified by specifying the charset in the command, something like
select char(0x542d01 using utf8);
But I wasn't able to find a way to read T-1 from 0x542d01. Would some one give me a hand here, please?
More generally, I think I have a charset issue here.
Function convert can be used to convert data between different charsets.
Reference:
http://dev.mysql.com/doc/refman/5.7/en/charset-convert.html
Here is an example of your data:
mysql> select convert(0x542d01 using utf8);
+------------------------------+
| convert(0x542d01 using utf8) |
+------------------------------+
| T- |
+------------------------------+
1 row in set (0.00 sec)
So we originally had latin1 for our MySQL database (a long long time ago) and we are trying to convert to UTF8 before a more global outreach, but I'm having issues with the transition. Here's my MySQL:
/* First set as latin1 */
SET NAMES 'latin1';
/* We must change things to blob and then back again */
ALTER TABLE `address` CHANGE line_1 line_1 BLOB;
ALTER TABLE `address` CONVERT TO CHARACTER SET utf8;
ALTER TABLE `address` CHANGE line_1 line_1 VARCHAR(64);
And the error we are getting:
Incorrect string value: '\xF6gberg...' for column 'line_1' at row 7578
ALTER TABLE `address` CHANGE line_1 line_1 VARCHAR(64)
The method we are using is basically described through here:
http://www.percona.com/blog/2013/10/16/utf8-data-on-latin1-tables-converting-to-utf8-without-downtime-or-double-encoding/
Any ideas would be great. (Also, since I'm not an expert in MySQL not sure what kind of data you would need, so lemme know if you need anything additional.)
Update
I've tried
SET NAMES 'utf8';
SET NAMES 'utf8mb4';
And I tried using utf8mb4 as was described below. After switching to utf8mb4 (which I'll likely keep), the alteration of of the address still produced the same problem.
Update 2
So I tried looking at converting the string itself to see what's happening and noticed something super weird:
mysql> select line_1 from address where line_1 like '%berg%';
+------------------------+
| line_1 |
+------------------------+
| H�bergsgatan 97 |
+------------------------+
mysql> select CONVERT(line_1 USING utf8) from address where line_1 like '%berg%';
+----------------------------+
| CONVERT(line_1 USING utf8) |
+----------------------------+
| NULL |
+----------------------------+
mysql> select CONVERT(line_1 USING utf8mb4) from address where line_1 like '%berg%';
+-------------------------------+
| CONVERT(line_1 USING utf8mb4) |
+-------------------------------+
| NULL |
+-------------------------------+
mysql> select CONVERT(line_1 USING latin1) from address where line_1 like '%berg%';
+------------------------------+
| CONVERT(line_1 USING latin1) |
+------------------------------+
| Högbergsgatan 97 |
+------------------------------+
So it seems like utf isn't the proper encoding for this? o_O As I'm working with addresses I was able to look it up and it seems like the address is in Stockholm and is supposed to be "Högbergsgatan 97" which matches latin1. I tried the swedish character encoding, but that seems to have failed as well:
mysql> select CONVERT(line_1 USING swe7) from address where addressid = 11065;
+----------------------------+
| CONVERT(line_1 USING swe7) |
+----------------------------+
| H?gbergsgatan 97 |
+----------------------------+
So I'm trying to see what I can do to rectify this.
Also, note that I had forgotten earlier to state that I'm using MySQL 5.6 (if that makes any difference)
When I write special latin1 characters, for example
á, é ã , ê
to an utf-8 encoded mysql table, is that data lost ?
The charset for that table is utf-8.
Is there any way to get that latin1 encoded rows back so I can convert to utf-8 and write back (this time in the right way)?
Update
I think I wasn't very specific about what I meant with "data". By data I mean the special characters, not the row.
When selecting, I still get the row and the fields, but with '?' instead of special latin1 characters. It is possible to recover those '?' and transform to the right utf8 ones?
If the whole database (or a whole table) is affected, you can first verify that it is a Latin1-as-UTF8 charset problem with SET NAMES Latin1:
mysql> select txt from tbl;
+-----------+
| txt |
+-----------+
| Québec |
| Québec |
+-----------+
2 rows in set (0.00 sec)
mysql> SET NAMES Latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> select txt from tbl;
+---------+
| txt |
+---------+
| Québec |
| Québec |
+---------+
2 rows in set (0.00 sec)
If this verifies, i.e. you get the desired data when using default charset Latin-1, then you can dump the whole table forcing --default-character-set=latin1 so that a file will be created with the correct data, albeit with the wrong charset specification.
But now you can replace the header row stating
/*!40101 SET NAMES latin1 */;
with UTF8. Reimport the database and you're done.
If only some rows are affected, then it is much more difficult:
SELECT txt, CAST(CAST(txt AS CHAR CHARACTER SET Latin1) AS BINARY) AS utf8 FROM tbl;
+-----------+---------+
| txt | utf8 |
+-----------+---------+
| Québec | Québec |
+-----------+---------+
1 row in set (0.00 sec)
...but you have the problem of locating the affected rows. Some of the code points you might find with
WHERE txt LIKE '%Ã%'
but for the others, you'll have to sample manually.
The data is not lost. See this SQLFiddle example
The additional affected rows can be found using the following:
SELECT column
FROM table
WHERE NOT HEX(column) REGEXP '^([0-7][0-9A-F])*$'
I've got a database where we store usernames with a capital first letter of each name -- ie, IsaacSparling. I'm trying to do case insensitive autocomplete against my MySQL (v5.1.46) db. Table has a charset of UTF8 and a collation of utf8_unicode_ci. I've done these tests against the utf8_general_ci collation as well.
Plain ASCII text works fine:
mysql> select username from users where username like 'j%';
+----------------+
| username |
+----------------+
| J******** |
| J*********** |
| J************* |
+----------------+
3 rows in set (0.00 sec)
mysql> select username from users where username like 'J%';
+----------------+
| username |
+----------------+
| J******** |
| J*********** |
| J************* |
+----------------+
3 rows in set (0.00 sec)
(names redacted, but they're there).
However, when I try to do the same for unicode characters outside the ASCII set, no such luck:
mysql> select username from users where username like 'ø%';
Empty set (0.00 sec)
mysql> select username from users where username like 'Ø%';
+-------------+
| username |
+-------------+
| Ø********* |
+-------------+
1 row in set (0.00 sec)
Some investigation has lead me to this: http://bugs.mysql.com/bug.php?id=19567 (tl;dr, this is a known bug with the unicode collations, and fixing it is at 'new feature' priority -- ie, won't be finished in any reasonable timeframe).
Has anybody discovered any effective workarounds that allow for case-insensitive searching for unicode characters in MySQL? Any thoughts appreciated!
Works fine for me with version 5.1.42-community
Maybe your mysql client did not send the unicode characters properly. I tested with sqlYog and it worked just fine with both utf8_unicode_ci and utf8_general_ci collations
IF what you care about is being able to order the field values by the text without caring if it is in upper or lower case I think the best thing you can do is when addressing the field instead of typing just username, type LOWER(username) username and then you can perfectly use an order by that field calling it by its name
Have you tried using CONVERT? Something like
WHERE `lastname` LIKE CONVERT( _utf8 'ø%' USING latin1 )
might work for you.
I just resolved the same problem using the query
show variables like '%char%';
My character_set_client was set to 'utf8', but character_set_connection and character_set_results were set to 'latin1'. Thus, the functions UPPER, LOWER, LIKE did not work as expected.
I just inserted the line
mysql_query("SET NAMES utf8");
right after connection to get the case-insensitive searching work.