Change MySQL encoding (regarding hex codes?), such as <fc> to ü - mysql

I am not sure how my error is termed, but I think my data show hex codes rather than the actual foreign characters.
To be more precise, I have a MySQL database with data like:
G<e9>rard instead of Gérard, or
M<fc>nster instead of Münster.
Apparently my columns have an utf8_unicode_ci-encoding (according to phpMyAdmin).
Now I wish to convert strings like <e9> into é, either directly in the MySQL-database, or in PHP when the output is being shown.
Apparently others were able to use this response to convert their MySQL-table successfully;
UPDATE db_name SET
column1=convert(cast(convert(column1 using latin1) as binary) using utf8),
column2=convert(cast(convert(column2 using latin1) as binary) using utf8)
However, this doesn't change anything in my case.
So how can I achieve the conversion?
Thank you!

Here's how I would fix this if the special characters are actually sequences of four characters.
First, make sure the table is all converted to utf8mb4:
ALTER TABLE mytable CONVERT TO CHARACTER SET utf8mb4;
Use the REPLACE() function to fix each character one by one.
UPDATE mytable SET
column1 = REPLACE(column1, '<e9>', 'é'),
column2 = REPLACE(column2, '<e9>', 'é');
Be careful if you're editing this SQL query by copy & paste. Make sure you fix column2 in both the left side and right side of the =. Otherwise if you forget one, you could copy the content of column1 into column2, and lose the old content of column2.
Once you're done with é, then do a similar statement for ü:
UPDATE mytable SET
column1 = REPLACE(column1, '<fc>', 'ü'),
column2 = REPLACE(column2, '<fc>', 'ü');
Gradually you will clean up all these hex sequences. You can search the table to see if you have any remaining:
SELECT DISTINCT REGEXP_SUBSTR(column1, '<[[:xdigit:]]{2}>') FROM mytable
WHERE REGEXP_LIKE(column1, '<[[:xdigit:]]{2}>');
(MySQL 8.0 is required for REGEXP_SUBSTR())

Related

Losing data on converting MySQL latin1_swedish_ci to utf8_unicode_ci

When I try to convert data from latin1_swedish_ci to utf8_unicode_ci I loose data ! The TEXT column is cut at the first special character.
For example:
Becomes:
Yet I tried many ways to convert my column and all solutions end up deleting data at the first special character!
I tried by phpMyAdmin or with this SQL request:
UPDATE `page` SET page_text = CONVERT(cast(CONVERT(page_text USING latin1) AS BINARY) USING utf8);
I also tried the php script :
https://github.com/nicjansma/mysql-convert-latin1-to-utf8/blob/master/mysql-convert-latin1-to-utf8.php
With all the time the same result, data are lost at first special character!
What should I do?
UPDATE
I could change the data to utf8 with
ALTER TABLE page CONVERT TO CHARACTER SET utf8mb4;
or
ALTER TABLE page CONVERT TO CHARACTER SET utf8;
without loosing data but it does not display properly special characters.
Using the php function utf8_encode($myvar); does display correctly special characters.
To convert a table, use
ALTER TABLE ... CONVERT TO ...
Or, to change individually columns, use
ALTER TABLE ... MODIFY COLUMN ...
Instead, you seem to have done something different. For further analysis, please provide SELECT col, HEX(col) ... before and after the conversion, plus the conversion used.
See "truncated" in this . The proper fix is found here, but depends on what you see from the HEX.

Search for replacement character (no TSQL)

I'm trying to find a way to search for the replacement character /uFFFD with SQL (since I'm using MariaDB) but I can not make it work. I tried with:
SELECT id FROM tablename WHERE content LIKE "%\ufffd%";
SELECT id FROM tablename WHERE content LIKE "%�%"
Both results are not working for me. Some topics say to use UNICODE() but it's a TSQL function and I can not use it here in MariaDB. Any solution?
What CHARACTER SET are you using? FFFD is the hex for the Unicode "codepoint". The UTF-8 encoding for it is EFBFBD.
Here's another way to look for it:
WHERE HEX(col) REGEXP '^(....)*FFFD'
or perhaps
WHERE HEX(col) REGEXP '^(..)*EFBFBD'
What are your results? Do you have any error? Try this simple working query or change your col type.
select '�' a from dual where a like '%�%'

Attempting to rename row names using mysql / regex

I am trying to rename rows in a column using mysql and possibly regex or replace?
The names are setup like this...
FL_Miamidade1026295
I need them to look like this...
FLMiami-Dade_1026295
I am thinking the sql statement would look something like this, but not sure how to do the replace part...
UPDATE tableName
WHEN columnName LIKE '%Miamidade%'
Would need regex to somehow change only the middle part of the string
If it is the only pattern you are looking to replace, you can simply use replace.
UPDATE tableName
SET columnName = replace(columnName,'Miamidade','Miami-Dade_')
WHERE columnName LIKE '%Miamidade%'
If Miamidade has to be case-senstitive (meaning miamidade is invalid), use case sensitive collation based on the database settings.
Eg:
WHERE columnName LIKE '%Miamidade%' COLLATE latin1_general_cs
or use regexp binary
WHERE columnName REGEXP BINARY 'Miamidade'

How to remove ¶ (pilcrow) sign from database records

My aim is to remove ¶ (pilcrow) sign from the database records. There are thousands of records so I cannot do it manually. Is there any script available to remove ¶ (pilcrow) sign from MySQL database column?
UPDATE table1 SET myfield = REPLACE(myfield,'¶','') WHERE myfield LIKE '%¶%'
If you want to replace ¶ with an enter do:
UPDATE table1 SET myfield = REPLACE(myfield,'¶','\n') WHERE myfield LIKE '%¶%'
-- linefeed
or
UPDATE table1 SET myfield = REPLACE(myfield,'¶','\r\n') WHERE myfield LIKE '%¶%'
-- cariage return+linefeed.
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_replace
Make sure the collation and charset of the connection and the column in question are the same:
DESCRIBE table1;
-- copy the column charset and collation
SET NAMES '<insert charset name>' COLLATE '<insert collation name>';
Now rerun the query.
See: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Here's how I solved this (yes, major necro, but hey, I found this old thread, so someone else might also!):
I edited the column directly using phpMyAdmin, and discovered that the database was displaying a pilcrow (¶), but it was storing some other representation in the background. No pilcrow displayed when editing the value, only a carriage return. I selected everything after the character that immediately preceded the pilcrow (the pilcrow was the last character in the column), used that invisible string in my query, and it worked. The working query looked like this:
UPDATE myTable SET myCol=REPLACE(myCol,'
','') WHERE myKey>'myValue'
Note that the query was one line of code, not two. The invisible carriage return only makes it look like two lines.
I hope this helps someone! I investigated/tried lots of other suggestions, but none of them worked.

Unicode (hexadecimal) character literals in MySQL

Is there a way to specify Unicode character literals in MySQL?
I want to replace a Unicode character with an Ascii character, something like the following:
Update MyTbl Set MyFld = Replace(MyFld, "ẏ", "y")
But I'm using even more obscure characters which are not available in most fonts, so I want to be able to use Unicode character literals, something like
Update MyTbl Set MyFld = Replace(MyFld, "\u1e8f", "y")
This SQL statement is being invoked from a PHP script - the first form is not only unreadable, but it doesn't actually work!
You can specify hexadecimal literals (or even binary literals) using 0x, x'', or X'':
select 0xC2A2;
select x'C2A2';
select X'C2A2';
But be aware that the return type is a binary string, so each and every byte is considered a character. You can verify this with char_length:
select char_length(0xC2A2)
2
If you want UTF-8 strings instead, you need to use convert:
select convert(0xC2A2 using utf8mb4)
And we can see that C2 A2 is considered 1 character in UTF-8:
select char_length(convert(0xC2A2 using utf8mb4))
1
Also, you don't have to worry about invalid bytes because convert will remove them automatically:
select char_length(convert(0xC1A2 using utf8mb4))
0
As can be seen, the output is 0 because C1 A2 is an invalid UTF-8 byte sequence.
Thanks for your suggestions, but I think the problem was further back in the system.
There's a lot of levels to unpick, but as far as I can tell, (on this server at least) the command
set names utf8
makes the utf-8 handling work correctly, whereas
set character set utf8
doesn't.
In my environment, these are being called from PHP using PDO, for what difference that may make.
Thanks anyway!
You can use the hex and unhex functions, e.g.:
update mytable set myfield = unhex(replace(hex(myfield),'C383','C3'))
The MySQL string syntax is specified here, as you can see, there is no provision for numeric escape sequences.
However, as you are embedding the SQL in PHP, you can compute the right bytes in PHP. Make sure the bytes you put into the SQL actually match your client character set.
There is also the char function that will allow what you wanted (providing byte numbers and a charset name) and getting a char.