REGEX to Find Rows Containing Opening and Closing Parentheses in MySQL - mysql

I have tried:
....WHERE fieldname REGEXP '.*\(.*\).*';
But this returns every record in the table.

You should use double backslashes when escaping a regex special metacharacter in a REGEXP pattern. Also, since REGEXP also finds partial matches, you do not need the .* at the start/end.
So, you could fix the expression as
WHERE fieldname REGEXP '\\(.*\\)';
Or just use LIKE where % matches any amount of arbitrary chars (but must match the whole entry unlike REGEXP):
WHERE fieldname LIKE '%(%)%';
A MySQL fiddle:
DROP TABLE IF EXISTS t;
CREATE TABLE t (word varchar(255));
INSERT INTO t (word)
VALUES
('test (here)'),
('test (here) test'),
('(here) test'),
('test no'),
('no test');
SELECT * FROM t WHERE word REGEXP '\\(.*\\)';
SELECT * FROM t WHERE word LIKE '%(%)%';
To get entries like text ) and ( here, you may use
SELECT * FROM t WHERE word REGEXP '\\(.*\\)|\\).*\\(';
SELECT * FROM t WHERE word LIKE '%(%)%' OR word LIKE '%)%(%';
See another fiddle.

LIKE would be more efficient than REGEXP, but to answer in REGEXP terms...
I suspect the OP wants parentheses to be somewhere in the middle of row text.
The problem with the REGEXP supplied is that expressions are greedy.
As soon as the .* finishes matching, it has already soaked up the entire record.
Try making the first two .* expressions as [^(]( and [^)]) and removing the final .* as it is superfluous.
'[^\\(]*\\([^\\)]*\\)'
Basically, this expression says
Look for zero or more non ( characters
then look for a single ( character
then look for zero or more non ) characters
then look for a single ) character
The () may be anywhere in the record and may contain zero or more characters inside the ().
Might I suggest fiddling in https://regex101.com/
Hope this helps.

Related

How to use "#" symbol using a regular expression as a part of a whole word?

Searching for hashtags as part of a singular word (not a portion of a word) in the content like so:
SELECT * FROM `messages` WHERE LOWER(`messages`.`content`) REGEXP '[[:<:]]#anxiety[[:>:]]'
It is not finding any records, however a search for the word "anxiety" works:
SELECT * FROM `messages` WHERE LOWER(`messages`.`content`) REGEXP '[[:<:]]anxiety[[:>:]]'
Looking to find messages like "She doesn't like thunderstorms. #anxiety #nervous."
Not looking to match parts of a word like "abc#anxiety". It should match "#anxiety" as a standalone word with a "#" before it like "I have #anxiety", "#anxiety sucks!", or "This is what #anxiety looks like.".
I assume that you work on MySQL 5.7 as otherwise your first query would not return matches either. See Regexp compatibility in the MySQL 8 documentation.
Having said that, your second query does not return matches because # is a non-alphanumerical character while the [[:<:]] pattern requires the character that follows to be alphanumerical. So this will never match. In fact, the mere presence of the # prefix already assures that this "anxiety" is not a part of a word match at the left side, so you should just do:
SELECT * FROM `messages`
WHERE LOWER(`messages`.`content`) REGEXP '#anxiety[[:>:]]'
In a comment you say that abc#anxiety should not match (even though technically # already breaks a word). In that case do:
SELECT * FROM `messages`
WHERE LOWER(`messages`.`content`) REGEXP '(^|[^a-zA-Z0-9_])#anxiety[[:>:]]'
In the character class [^a-zA-Z0-9_] add any other character that you wouldn't allow to precede #anxiety.
If your purpose is to find the word with an optional # in front of it, then use the previous regex with an additional ?:
SELECT * FROM `messages`
WHERE LOWER(`messages`.`content`) REGEXP '(^|[^a-zA-Z0-9_])#?anxiety[[:>:]]'

MYSQL regular expression matching any word in between square brackets

I have to find all the values of a specific column in a table where the column values match like [name]
I should not get the values that are like [a]+[b] or [a]>[b] or [a]%[b]=[c]
So I basically do not need column values that have special characters in them except the square brackets and under score
example: [test_123] should be returned.
I tried
select * from table_name where column_name REGEXP '^[[][^+-></%]';
This is just trying to see if there is any special character immediately after [ but how to see if there is any special character in the whole column value and should we give backslashes to define special characters in MySQL?
I tried in https://regexr.com/ and I have got my required Regex to be
(\[)\w+[^\+\=\/\*\%\^\!](\])
but I could not do the same in MySQL
*** UPDATED **
(\[[^\_]+])+
That seems to work for what you're looking for.
Also your query is wrong, I believe it is supposed to look like:
SELECT * FROM mytable WHERE REGEXP_LIKE(mycolumn, 'regexp', 'i');

MySQL regex matching rows containing two or more spaces in a row

I am trying to write a MySQL statement which finds and returns the book registrations that contain 2 or more spaces in a row.
The statement below is wrong.
SELECT * FROM book WHERE titles REGEXP '[:space]{2,}';
Since the 2 spaces already meet your condition, you really do not need to check if there are more than 2. Moreover, if you need to match a regular ASCII space (decimal code 32), you do not need a REGEXP operator, you can safely use
SELECT * FROM book WHERE titles LIKE '% %';
LIKE is preferred in all cases where you can use it instead of REGEXP (see MySQL | REGEXP VS Like)
When you need to match numerous whitespace symbols, you can use WHERE titles REGEXP '[[:space:]]{2}' (it will match [ \t\r\n\v\f]), and if you only plan to match tabs and spaces, use WHERE titles REGEXP '[[:blank:]]{2}'. For more details, see POSIX Bracket Expressions.
Note that [:class_name:] should only be used inside a character class (i.e. inside another pair of [...], otherwise, they are not recognized.
Your POSIX class must be,
SELECT * FROM book WHERE titles REGEXP '[[:space:]]{2,}';
No need for ,
SELECT * FROM book WHERE titles REGEXP '[[:space:]]{2}';
You may also use [[:blank:]]
SELECT * FROM book WHERE titles REGEXP '[[:blank:]]{2}';
If you mean just the space character: REGEXP ' '. Or you could use LIKE "% %", which would be faster. (Note: there are 2 blanks in those.)
Otherwise, see http://dev.mysql.com/doc/refman/5.6/en/regexp.html for blank and space.

Querying a mysql database fetching words with a regexp

I'm using a regexp for fetching a set of words that accomplish the next syntax:
SELECT * FROM words WHERE word REGEXP '^[dcqaahii]{5}$'
My first impression gave me the sensation that it was good till I realized that some letters were used more than contained in the regexp.
The question is that I want to get all words (i.e. of 5 letters) that can be formed with the letters within the brackets, so if I have two 'a' resulting words can have no 'a', one 'a' or even two 'a', but no more.
What should i add to my regexp for avoiding this?
Thanks in advance.
It would probably be better to retrieve all candidates first and post-process, as others have suggested:
SELECT * FROM words WHERE word REGEXP '^[dcqahi]{5}$'
However, nothing is stopping you from doing multiple REGEXPs. You can select 0, 1, or 2 incidences of the letter 'a' with this grungy expression:
'^[^a]*a?[^a]*a?[^a]*$'
So do the pre-filter first and then combine additional REGEXP requirements with AND:
SELECT * FROM words
WHERE word REGEXP '^[dcqahi]{5}$'
AND word REGEXP '^[^a]*a?[^a]*a?[^a]*$'
AND word REGEXP '^[^i]*i?[^i]*i?[^i]*$'
[edit] As an afterthought, I have inferred that for the non-vowels you also want to restrict to 0 or 1 occurrance. So if that's the case, you'd keep going...
AND word REGEXP '^[^d]*d?[^d]*$'
AND word REGEXP '^[^c]*c?[^c]*$'
AND word REGEXP '^[^q]*q?[^q]*$'
AND word REGEXP '^[^h]*h?[^h]*$'
Yuck.
Only solution I can think of would be to use the above SQL you have to get an initial filtered set of data but then loop through it and further filter with some server side code (PHP etc.) which is better suited to doing that kind of logic.
In regular expressions, square brackets [] are merely a character class, like a list of allowed characters. Specifying the same letter twice within the brackets is therefore redundant.
For example the pattern [sed] will match sed, and seed because e is part of the allowed characters. Specifying a character count afterward in braces {} is merely a total count of characters previously allowed by the character class.
The pattern [sed]{3} therefore will match sed but not seed.
I would recommend moving the logic for testing the validity of words from SQL into your program.

How can I find non-ASCII characters in MySQL?

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?
MySQL provides comprehensive character set management that can help with this kind of problem.
SELECT whatever
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
The CONVERT(col USING charset) function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.
See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html
You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.
It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.
This is probably what you're looking for:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).
One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:
select * from TABLE where COLUMN like '%\0%';
Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:
SELECT * FROM `table` WHERE NOT `field` REGEXP "[\\x00-\\xFF]|^$";
It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike #Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)
It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:
SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP "[\\x00-\\xFF]";
It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.
Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!
Try Using this query for searching special character records
SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9#:. \'\-`,\&]'
#zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:
SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''
In Oracle we can use below.
SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;
for this question we can also use this method :
Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG
Non-ASCII characters
ans: select*from nobel where winner like'P% GR%_%berg';