Issue with Regexp with mySQL query - mysql

I'm trying to build a search query which searches for a word in a string and finds matches based on the following criteria:
The word is surrounded by a combination of either a space, period or comma
The word is at the start of the string and ends with a space, period or comma
The word is at the end of the string and is followed by a space, period or comma
It's a full match, i.e. the entire string is just the word
For example, if the word is 'php' the following strings would be matches:
php
mysql, php, javascript
php.mysql
javascript php
But for instance it wouldn't match:
php5
I've tried the following query:
SELECT * FROM candidate WHERE skillset REGEXP '^|[., ]php[., ]|$'
However that doesn't work, it returns every record as a match which is wrong.
Without the ^| and |$ in there, i.e.
SELECT * FROM candidate WHERE skillset REGEXP '[., ]php[., ]'
It successfully finds matches where 'php' is somewhere in the string except the start and end of the string. So the problem must be with the ^| and |$ part of the regexp.
How can I add those conditions in to make it work as required?

Try '\bphp\b', \b is a word boundary and might just be exactly what you need because it looks for the whole word php.
For MySQL, word boundaries are represented with [[:<:]] and [[:>:]] instead of \b, so use the query '[[:<:]]php[[:>:]]'. More info on word boundaries here.

Well, you can play around a bit with regex101.com
Something I found that works for you but doesn't exactly follow your rules is:
/(?=[" ".,]?php[" ".,]?)(?=php[\W])/
This uses the lookahead operator, ?=, to do AND
The first portion of the regex is
[" ".,]?php[" ".,]?
This will match anything that has a space, period, or comma before or after the php, but at most only one.
The section portion of the regex is
php[\W]
This will match anything that is php, followed by a non-character. In other words, it will NOT match php followed by a character, digit, or underscore.
It's not the perfect answer for your set of rules, but it does work with your sample data set. Play around on regex101.com and try to make a perfect one.

Related

How to make this REGEX below work for MySql?

I have written regex and tested it online, works fine. When I test in terminal, MySQL console, it doesn't match and I get an empty set. I believe MySQL regexp syntax is somehow different but I cannot find the right way.
This is data I use:
edu.ba;
medu.ba;
edu.ba;
med.edu.ba;
edu.com;
edu.ba
I should get only edu.ba matches including; if there is some. Works fine except in actual query.
(\;+|^)\bedu.ba\b(\;+|$|\n)
Is there anything I could change to get the same results?
You want to match edu.ba in between semi-colons or start/end of string. The word boundaries are redundant here (although if you want to experiment, the MySQL regex before MySQL v8 used [[:<:]] / [[:>:]] word boundaries, and in MySQL v8+, you need to use double backslashes with \b - '\\b').
Use
(;|^)edu[.]ba(;|$)
Details
(;|^) - ; or start of string
edu[.]ba - edu.ba literal string (dot inside brackets always matches a literal dot)
(;|$) - ; or end of string.

MySql Specific Search - Replace String

I need to search words that contain multiple number prefixes.
Example:
0119
0129
0139
0149
But there is other prefixes, 0155859, 0128889
Etc.
If i search 0%9 it'll come up with all the results i don't want, it'll include the 0155859, 0128889 ones
I need to search and list ONLY the ones that have 0119, etc
How do i do it ?
0XX9 ( Where XX is any strings that matches, so 0119, 0129, etc. % Lists all other characters till a 9 appears, i don't want that. )
I'm trying on my english, correct me if i did'nt expressed myself right !
In a LIKE pattern, the _ character matches any single character. So you can do:
WHERE word LIKE '0__9%'
This matches a word that begins with 0, then any two characters, then 9, then anything after that.
My gut feeling at seeing your question was to consider using REGEXP, which is MySQL's regex matching operator. Try the following query:
SELECT *
FROM yourTable
WHERE word REGEXP '0[0-9][0-9]9'
The pattern used would match any word containing a zero, followed by any two numbers, followed by a 9.

Mysql query returns no data with escaped \

I'm attempting to query our MSSQL database but I'm getting no data when there clearly is data there.
First I query
SELECT id, instruction_link FROM work_instructions WHERE instruction_link LIKE "%\\\\cots-sbs%";
Which returns 100+ lines.
http://tinypic.com/r/ief8td/8
(sorry couldn't post as actual picture, don't have enough rep :(
However if I query
SELECT id, instruction_link FROM work_instructions WHERE instruction_link LIKE "%\\\\cots-sbs\\%";
http://tinypic.com/r/33ksw3q/8
I get no results with the 2nd query. I have no idea what I'm doing wrong here. Seems pretty simple but I can't make any sense of it..
Thanks in advance.
As documented under LIKE:
Note
Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
\\% is parsed as a string containing a literal backslash followed by a percentage character, which is then interpreted as a pattern containing only a literal percentage sign.

Querying a mysql database fetching words with a regexp

I'm using a regexp for fetching a set of words that accomplish the next syntax:
SELECT * FROM words WHERE word REGEXP '^[dcqaahii]{5}$'
My first impression gave me the sensation that it was good till I realized that some letters were used more than contained in the regexp.
The question is that I want to get all words (i.e. of 5 letters) that can be formed with the letters within the brackets, so if I have two 'a' resulting words can have no 'a', one 'a' or even two 'a', but no more.
What should i add to my regexp for avoiding this?
Thanks in advance.
It would probably be better to retrieve all candidates first and post-process, as others have suggested:
SELECT * FROM words WHERE word REGEXP '^[dcqahi]{5}$'
However, nothing is stopping you from doing multiple REGEXPs. You can select 0, 1, or 2 incidences of the letter 'a' with this grungy expression:
'^[^a]*a?[^a]*a?[^a]*$'
So do the pre-filter first and then combine additional REGEXP requirements with AND:
SELECT * FROM words
WHERE word REGEXP '^[dcqahi]{5}$'
AND word REGEXP '^[^a]*a?[^a]*a?[^a]*$'
AND word REGEXP '^[^i]*i?[^i]*i?[^i]*$'
[edit] As an afterthought, I have inferred that for the non-vowels you also want to restrict to 0 or 1 occurrance. So if that's the case, you'd keep going...
AND word REGEXP '^[^d]*d?[^d]*$'
AND word REGEXP '^[^c]*c?[^c]*$'
AND word REGEXP '^[^q]*q?[^q]*$'
AND word REGEXP '^[^h]*h?[^h]*$'
Yuck.
Only solution I can think of would be to use the above SQL you have to get an initial filtered set of data but then loop through it and further filter with some server side code (PHP etc.) which is better suited to doing that kind of logic.
In regular expressions, square brackets [] are merely a character class, like a list of allowed characters. Specifying the same letter twice within the brackets is therefore redundant.
For example the pattern [sed] will match sed, and seed because e is part of the allowed characters. Specifying a character count afterward in braces {} is merely a total count of characters previously allowed by the character class.
The pattern [sed]{3} therefore will match sed but not seed.
I would recommend moving the logic for testing the validity of words from SQL into your program.

Regex Search in phpMyAdmin

Attempting to change the "files" folder location in a Drupal site from /files to /sites/default/files.
In order to avoid changing anything else such as
http://www.google.com/profiles/
I'm trying to use a basic regular expression with a word boundary.
\bfiles/
A quick check in regexpal is working as expected, but when I enter the above in the phpMyAdmin search , checking the "as regular expression" checkbox, I don't get the expected result.
Two questions:
How should I write my expression with a word boundary so that it works in phpMyAdmin?
I'm really a newbie at SQL statements! Would it be possible to write a SQL query that would simply look for every occurrence of "files/" & replace it with "sites/default/files/"?
According to the MySql docs, the regex flavour used is POSIX 1003.2. For this flavour of regex, word boundaries are as follows:
[[:<:]] (beginning) [[:>:]] (end)
so your regex would be:
[[:<:]]files/
If you want to use sql to search and replace all instances of [[:<:]]files/ from a specific field in a table, you could use a UDF such as the one found here
Also, you should be aware of the following while using regex with MySql:
Because MySQL uses the C escape syntax in strings (for example, “\n”
to represent the newline character), you must double any “\” that you
use in your REGEXP strings.