Search for replacement character (no TSQL) - mysql

I'm trying to find a way to search for the replacement character /uFFFD with SQL (since I'm using MariaDB) but I can not make it work. I tried with:
SELECT id FROM tablename WHERE content LIKE "%\ufffd%";
SELECT id FROM tablename WHERE content LIKE "%�%"
Both results are not working for me. Some topics say to use UNICODE() but it's a TSQL function and I can not use it here in MariaDB. Any solution?

What CHARACTER SET are you using? FFFD is the hex for the Unicode "codepoint". The UTF-8 encoding for it is EFBFBD.
Here's another way to look for it:
WHERE HEX(col) REGEXP '^(....)*FFFD'
or perhaps
WHERE HEX(col) REGEXP '^(..)*EFBFBD'

What are your results? Do you have any error? Try this simple working query or change your col type.
select '�' a from dual where a like '%�%'

Related

MySQL regexp for emoji / unicode

I want to search my database for any string which contains the Butterfly Emoji - 🦋 - using regexp.
For example
SELECT *
FROM `table`
WHERE `text`
REGEXP '🦋'
I'm using REGEXP because I might want to search for Hello[[:space:]]world or similar.
I get the error
Got error 'nothing to repeat at offset 0' from regexp
This works:
SELECT *
FROM `table`
WHERE `text`
LIKE '%🦋%'
But then I lose the ability to search for, say, flying[[:space:]]🦋
My Collation is utf8mb4_unicode_ci. The database is 10.0.36-MariaDB
Honestly I don't know why, but escaping your butterfly will give the desired output. (At least in my version, MariaDB 10.3.10, which gave the same error without escaping).
SELECT * FROM `table` WHERE `text` REGEXP '\\🦋'
(note the double backslash, the first one is to escape the backslash within the string, yielding in the regular expression \🦋)
SHOW VARIABLES LIKE 'char%';
It sounds like you have not told MySQL what encoding the client is using for characters. This is best done via the connection parameters, or via mysqli_charset (if using mysqli, not PDO).
Also, run this on your version:
SELECT 'ab' REGEXP '?';
I suspect it will give you the same error message.

SQL - How to use wildcard in LIKE as a normal character

if I have a column with some values that starts with "%" like this:
[ID]-----[VALUES]
1--------Amount
2--------Percentage
3--------%Amount
4--------%Percentage
how can I have only these two rows with a "select" query?:
[ID]-----[VALUES]
3--------%Amount
4--------%Percentage
I tried these queries but them don't work:
select * from TABLE where VALUES like '[%]%'
select * from TABLE where VALUES like '\%%'
I know that in Java, C and other languages, the backspace \ let you use a jolly character as a normal one like:
var s = "I called him and he sad: \"Hi, there!\"";
There is a similar character/function that do it in SQL?
All answers will be appreciated, thank you for reading the question!
Your query
select * from TABLE where VALUES like '\%%'
should work. The reason it doesn't is because you may have NO_BACKSLASH_ESCAPES enabled which would treat \ as a literal character.
A way to avoid it is using LIKE BINARY
select * from TABLE where VALUES like binary '%'
or with an escape character (can be any character you choose) specification.
select * from TABLE where VALUES like '~%%' escape '~'
try this :
select * from TABLE where VALUES like '%[%]%'
There is an ESCAPE option on LIKE:
select *
from TABLE
where VALUES like '$%%' escape '$';
Anything following the escape character is treated as a regular character. However, the default is backslash (see here), so the version with backslash should do what you want.
Of course, you could also use a regular expression (although that has no hope of using an index).
Note: escape is part of the answer standard so it should work in any database.
You're right that you'll need an escape character for this. In SQL you have to define the escape character.
SELECT * FROM TABLE where VALUES like ESCAPE '!';
I'm pretty sure you can use whatever character you want.
Here's a link to a microsoft explanation that goes into more detail.
Microsoft explanation
MySQL Explanation

mysql regex utf-8 characters

I am trying to get data from MySQL database via REGEX with or without special utf-8 characters.
Let me explain on example :
If user enters word like sirena it should return rows which include words like sirena,siréna,šíreňá .. and so on..
also it should work backwards when he enters siréná it should return the same results..
I am trying to search it via REGEX, my query looks like this :
SELECT * FROM `content` WHERE `text` REGEXP '[sšŠ][iíÍ][rŕŔřŘ][eéÉěĚ][nňŇ][AaáÁäÄ0]'
It works only when in database is word sirena but not when there is word siréňa..
Is it because something with UTF-8 and MySQL? (collation of mysql column is utf8_general_ci)
Thank you!
MySQL's regular expression library does not support utf-8.
See Bug #30241 Regular expression problems, which has been open since 2007. They will have to change the regular expression library they use before that can be fixed, and I haven't found any announcement of when or if they will do this.
The only workaround I've seen is to search for specific HEX strings:
mysql> SELECT * FROM `content` WHERE HEX(`text`) REGEXP 'C3A9C588';
+----------+
| text |
+----------+
| siréňa |
+----------+
Re your comment:
No, I don't know of any solution with MySQL.
You might have to switch to PostgreSQL, because that RDBMS supports \u codes for UTF characters in their regular expression syntax.
Try something like ... REGEXP '(a|b|[ab])'
SELECT * FROM `content` WHERE `text` REGEXP '(s|š|Š|[sšŠ])(i|í|Í|[iíÍ])(r|ŕ|Ŕ|ř|Ř|[rŕŔřŘ])(e|é|É|ě|Ě|[eéÉěĚ])(n|ň|Ň|[nňŇ])(A|a|á|Á|ä|Ä|0|[AaáÁäÄ0])'
It works for me!
Use the lib_mysqludf_preg library from the mysql UDF repository for PCRE regular expressions directly in mysql
Although MySQL's regular expression library does not support utf-8 the mysql UDF repository has the ability to use utf-8 compatible regex according PCRE regular expressions directly in mysql.
http://www.mysqludf.org/
https://github.com/mysqludf/lib_mysqludf_preg#readme

How to match UTF8 characters in MySQL regular expression?

So I want to find out all the rows that has UTF8 characters in a specific field, in this manner:
SELECT * FROM table1 WHERE field1 REGEXP '[[:utf8:]]';
Searched through MySQL docs but found nothing. Is this possible?
I meant non-ASCII characters.
Managed to find a way to do that: http://www.kavoir.com/2011/03/mysql-find-non-ascii-characters.html

How can I find non-ASCII characters in MySQL?

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?
MySQL provides comprehensive character set management that can help with this kind of problem.
SELECT whatever
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.
See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html
You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.
It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.
This is probably what you're looking for:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).
One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:
select * from TABLE where COLUMN like '%\0%';
Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:
SELECT * FROM `table` WHERE NOT `field` REGEXP "[\\x00-\\xFF]|^$";
It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike #Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)
It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:
SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP "[\\x00-\\xFF]";
It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.
Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!
Try Using this query for searching special character records
SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9#:. \'\-`,\&]'
#zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:
SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''
In Oracle we can use below.
SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;
for this question we can also use this method :
Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG
Non-ASCII characters
ans: select*from nobel where winner like'P% GR%_%berg';