MySQL REPLACE unicode characters - mysql

I'm trying to replace a Unicode character with another one.
I'm running this but it's not working.
UPDATE items SET `data` = REPLACE(`data`, '\u030C', '\u0306');
I've tried REPLACE without the \ or \u and also with multiple slashes like \\\\\\\\\\\\u030C. I've pretty much run out of random combinations to try to make this work.
How can I get this replace working.

Can we back up a step and avoid getting the \u encoding? If you are using PHP:
$t = json_encode($s, JSON_UNESCAPED_UNICODE);
Addenda
In mysql commandline tool, use 2 backslashes:
UPDATE items SET `data` = REPLACE(`data`, '\\u030C', '\\u0306');
Are you replacing a combining caron with a combining breve?
Don't you really want the utf8 character instead of the unicode code?

Related

MySQL and Bizzare Backslash escape problem

This failed in Java 13 (JDBC) code so I went to MySQL Workbench to duplicate problem.
I run a simple query as:
START TRANSACTION;
SET SESSION sql_mode = NO_BACKSLASH_ESCAPES;
SELECT *, "x\\x", "y\y" from dirs
WHERE d_pathname like 'E:\\\\BOOKS\\\\Dictionaries_and_Encyclopedias\\\\%' ORDER BY d_pathname;
and I get 400 rows returned. The issue is, that I do not want to use double-backslashes.
Rows returned show a single backslash, not a double backslash.
Interestingly, the x\\x and y\y clauses appear just as represented in the SELECT statement.
When I remove the double backslashes in the LIKE clause, I get zero rows!
Why? I'd rather not have to double-up the backslashes, and run simple and clean code.
The NO_BACKSLASH_ESCAPES mode only affects how backslashes are treated in ordinary string literals. It doesn't change how they're processed in LIKE patterns.
However, you can use the ESCAPE option to specify a different character to use as the escape character in LIKE. Just use some other character that doesn't appear in your pattern.
WHERE d_pathname like 'E:\BOOKS\Dictionaries_and_Encyclopedias\%' ESCAPE '|'

MySQL remove special characters using Regex

I want to remove special characters from a MySQL table but LEAVE UTF8 characters such as arabic.
This is to remove common special characters such as " ' # ! * $ etc.
I have used the following in PHP which works great.
preg_replace('/(?=\P{Nd})\P{L}/u', '', $name);
You can UPDATE your rows using the very same regular expression for comparison.
The string regular expression operators in MySQL are REGEXP or RLIKE.
I hope, this should solve your problem. :)

MySQL: how to replace literal \r\n with special characters \r\n

I have some faulty PHP code which inserted literal \r\n characters into the database instead of the special characters representing new line and carriage return. Can anyone help me come up with a query that will replace the literals with the special characters?
Here's an SQL Fiddle setup. All I really need is something that will return the row containing "abc\r\ndef" rather than the other row. It's probably a very simple escape that's needed, but I can't work it out.
http://sqlfiddle.com/#!9/1f2acb/1
Once I have that query I guess I will simply use
UPDATE test SET txt replace(txt, 'UNKNOWN EXPRESSIOn', '\r\n');
I'm running MySQL 5.5 on Ubuntu.
The answer was in a similar question that juanvan linked to.
UPDATE test set txt = replace(txt,'\\r\\n','\r\n');

Unicode escape sequence in command line MySQL

Short version:
What kind of escape sequence can one use to search for unicode characters in command line mysql?
Long version:
I'm looking for a way to search a column for records containing a unicode sequence, U+200B, in mysql from the command line. I can't figure out which kind of escape to use. I've tried \u200B and x200B and even ​ I finally found one blog that suggested the _utf8 syntax. This will produce the character on the command line:
select _utf8 x'200B';
Now I'm stuck trying to get that working in a "LIKE" query.
This generates the characters, but the % seem to lose their special meaning when placed in the LIKE part:
select _utf8 x'0025200B0025';
I also tried a concat but it didn't work either:
select concat('%', _utf8 x'200B', '%');
More background:
I have some data that has zero width space characters (zwsp) in it, Unicode Point U+200B. This is typically caused by copy/paste from websites that use the zwsp in their output. With most unicode characters, I can just paste the character into the terminal (or create it with a keycode), but since this one is invisible it's a bit more challenging. I can create a file that generates a "%%" sequence and copy/paste it to the terminal and it will work but it leaves my command history and terminal output screwy. I would think there is a straightforward way to do this in MySQL, but so far I've come up short.
Thanks in advance,
-Paul Burney
select _utf8 x'0025200B0025';
That's not UTF-8, it's UTF-16/UCS-2. You might be able to say SELECT _ucs2 0x0025200B0025 if you have UCS-2 support in your copy of MySQL.
Otherwise, the byte sequence encoding character U+200B in UTF-8 would be 0xE2, 0x80, 0x8B:
select 0xE2808B;
If it is Linux then hold Ctrl + Shift + U then release the U and type 200B.

Unicode (hexadecimal) character literals in MySQL

Is there a way to specify Unicode character literals in MySQL?
I want to replace a Unicode character with an Ascii character, something like the following:
Update MyTbl Set MyFld = Replace(MyFld, "ẏ", "y")
But I'm using even more obscure characters which are not available in most fonts, so I want to be able to use Unicode character literals, something like
Update MyTbl Set MyFld = Replace(MyFld, "\u1e8f", "y")
This SQL statement is being invoked from a PHP script - the first form is not only unreadable, but it doesn't actually work!
You can specify hexadecimal literals (or even binary literals) using 0x, x'', or X'':
select 0xC2A2;
select x'C2A2';
select X'C2A2';
But be aware that the return type is a binary string, so each and every byte is considered a character. You can verify this with char_length:
select char_length(0xC2A2)
2
If you want UTF-8 strings instead, you need to use convert:
select convert(0xC2A2 using utf8mb4)
And we can see that C2 A2 is considered 1 character in UTF-8:
select char_length(convert(0xC2A2 using utf8mb4))
1
Also, you don't have to worry about invalid bytes because convert will remove them automatically:
select char_length(convert(0xC1A2 using utf8mb4))
0
As can be seen, the output is 0 because C1 A2 is an invalid UTF-8 byte sequence.
Thanks for your suggestions, but I think the problem was further back in the system.
There's a lot of levels to unpick, but as far as I can tell, (on this server at least) the command
set names utf8
makes the utf-8 handling work correctly, whereas
set character set utf8
doesn't.
In my environment, these are being called from PHP using PDO, for what difference that may make.
Thanks anyway!
You can use the hex and unhex functions, e.g.:
update mytable set myfield = unhex(replace(hex(myfield),'C383','C3'))
The MySQL string syntax is specified here, as you can see, there is no provision for numeric escape sequences.
However, as you are embedding the SQL in PHP, you can compute the right bytes in PHP. Make sure the bytes you put into the SQL actually match your client character set.
There is also the char function that will allow what you wanted (providing byte numbers and a charset name) and getting a char.