MySQL regexp for emoji / unicode - mysql

I want to search my database for any string which contains the Butterfly Emoji - 🦋 - using regexp.
For example
SELECT *
FROM `table`
WHERE `text`
REGEXP '🦋'
I'm using REGEXP because I might want to search for Hello[[:space:]]world or similar.
I get the error
Got error 'nothing to repeat at offset 0' from regexp
This works:
SELECT *
FROM `table`
WHERE `text`
LIKE '%🦋%'
But then I lose the ability to search for, say, flying[[:space:]]🦋
My Collation is utf8mb4_unicode_ci. The database is 10.0.36-MariaDB

Honestly I don't know why, but escaping your butterfly will give the desired output. (At least in my version, MariaDB 10.3.10, which gave the same error without escaping).
SELECT * FROM `table` WHERE `text` REGEXP '\\🦋'
(note the double backslash, the first one is to escape the backslash within the string, yielding in the regular expression \🦋)

SHOW VARIABLES LIKE 'char%';
It sounds like you have not told MySQL what encoding the client is using for characters. This is best done via the connection parameters, or via mysqli_charset (if using mysqli, not PDO).
Also, run this on your version:
SELECT 'ab' REGEXP '?';
I suspect it will give you the same error message.

Related

Search for replacement character (no TSQL)

I'm trying to find a way to search for the replacement character /uFFFD with SQL (since I'm using MariaDB) but I can not make it work. I tried with:
SELECT id FROM tablename WHERE content LIKE "%\ufffd%";
SELECT id FROM tablename WHERE content LIKE "%�%"
Both results are not working for me. Some topics say to use UNICODE() but it's a TSQL function and I can not use it here in MariaDB. Any solution?
What CHARACTER SET are you using? FFFD is the hex for the Unicode "codepoint". The UTF-8 encoding for it is EFBFBD.
Here's another way to look for it:
WHERE HEX(col) REGEXP '^(....)*FFFD'
or perhaps
WHERE HEX(col) REGEXP '^(..)*EFBFBD'
What are your results? Do you have any error? Try this simple working query or change your col type.
select '�' a from dual where a like '%�%'

SQL - How to use wildcard in LIKE as a normal character

if I have a column with some values that starts with "%" like this:
[ID]-----[VALUES]
1--------Amount
2--------Percentage
3--------%Amount
4--------%Percentage
how can I have only these two rows with a "select" query?:
[ID]-----[VALUES]
3--------%Amount
4--------%Percentage
I tried these queries but them don't work:
select * from TABLE where VALUES like '[%]%'
select * from TABLE where VALUES like '\%%'
I know that in Java, C and other languages, the backspace \ let you use a jolly character as a normal one like:
var s = "I called him and he sad: \"Hi, there!\"";
There is a similar character/function that do it in SQL?
All answers will be appreciated, thank you for reading the question!
Your query
select * from TABLE where VALUES like '\%%'
should work. The reason it doesn't is because you may have NO_BACKSLASH_ESCAPES enabled which would treat \ as a literal character.
A way to avoid it is using LIKE BINARY
select * from TABLE where VALUES like binary '%'
or with an escape character (can be any character you choose) specification.
select * from TABLE where VALUES like '~%%' escape '~'
try this :
select * from TABLE where VALUES like '%[%]%'
There is an ESCAPE option on LIKE:
select *
from TABLE
where VALUES like '$%%' escape '$';
Anything following the escape character is treated as a regular character. However, the default is backslash (see here), so the version with backslash should do what you want.
Of course, you could also use a regular expression (although that has no hope of using an index).
Note: escape is part of the answer standard so it should work in any database.
You're right that you'll need an escape character for this. In SQL you have to define the escape character.
SELECT * FROM TABLE where VALUES like ESCAPE '!';
I'm pretty sure you can use whatever character you want.
Here's a link to a microsoft explanation that goes into more detail.
Microsoft explanation
MySQL Explanation

MySQL query with non-printing characters (left-to-right mark)

I just found myself lost in the interesting situation that I need to query MySQL for fields containing a so called Left-to-right mark.
As the nature of this character is to be non-printing, thus invisible, I'm unable to simply copy/paste it into a query.
As mentioned in the linked Wikipedia article, the Left-to-right mark is Unicode character U+200F, which is a fact that I'm sure is the key to success in my current adventure.
My question is: How do I use raw Unicode in a MySQL query? Something along the lines of:
SELECT * FROM users WHERE username LIKE '%\U+200F%'
or
SELECT * FROM users WHERE username REGEXP '\U+200F'
or whatever the correct syntax for Unicode in MySQL is and depending on whether this is supported with LIKE and/or REGEXP.
To get a unicode char, something like this should work:
SELECT CHAR(<number> USING utf8);
Also, don't use REGEXP, because the regexp lib used by MySQL is very old, and doesn't support multi-byte charsets.

Using MySQL LIKE operator for fields encoded in JSON

I've been trying to get a table row with this query:
SELECT * FROM `table` WHERE `field` LIKE "%\u0435\u0442\u043e\u0442%"
Field itself:
Field
--------------------------------------------------------------------
\u0435\u0442\u043e\u0442 \u0442\u0435\u043a\u0441\u0442 \u043d\u0430
Although I can't seem to get it working properly.
I've already tried experimenting with the backslash character:
LIKE "%\\u0435\\u0442\\u043e\\u0442%"
LIKE "%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%"
But none of them seems to work, as well.
I'd appreciate if someone could give a hint as to what I'm doing wrong.
Thanks in advance!
EDIT
Problem solved.
Solution: even after correcting the syntax of the query, it didn't return any results. After making the field BINARY the query started working.
As documented under String Comparison Functions:
Note
Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
Therefore:
SELECT * FROM `table` WHERE `field` LIKE '%\\\\u0435\\\\u0442\\\\u043e\\\\u0442%'
See it on sqlfiddle.
it can be useful for those who use PHP, and it works for me
$where[] = 'organizer_info LIKE(CONCAT("%", :organizer, "%"))';
$bind['organizer'] = str_replace('"', '', quotemeta(json_encode($orgNameString)));

How can I find non-ASCII characters in MySQL?

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?
MySQL provides comprehensive character set management that can help with this kind of problem.
SELECT whatever
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.
See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html
You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.
It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.
This is probably what you're looking for:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).
One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:
select * from TABLE where COLUMN like '%\0%';
Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:
SELECT * FROM `table` WHERE NOT `field` REGEXP "[\\x00-\\xFF]|^$";
It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike #Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)
It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:
SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP "[\\x00-\\xFF]";
It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.
Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!
Try Using this query for searching special character records
SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9#:. \'\-`,\&]'
#zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:
SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''
In Oracle we can use below.
SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;
for this question we can also use this method :
Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG
Non-ASCII characters
ans: select*from nobel where winner like'P% GR%_%berg';