I've been looking into the REGEXP when filtering my entries in my database.
I have a columns with values separated by commas looking like:
id col A
|---|------------------------|
| 1 | P:1,P:2,P:5,P:7 |
| 2 | P:6,P:8,P:10,P:11 |
| 3 | P:4,P:3,P1,P:0 |
| 4 | P:2,P:1 |
|---|------------------------|
Let's say I want the rows containing the value P:1, how can i design a REGEXP in the form:
SELECT * FROM `table` WHERE `col A` REGEXP '?'
so that i get rows 1 3 and 4? My previous approach was simply to use:
SELECT * FROM `table` WHERE `col A` LIKE 'P:1'
However that would naturally also return row 2 because it technically contains P:1...
Any help would be appreciated, I thinking this problem is fairly simple for a regexp expert!Cheers,Andreas
You need to read up on word boundaries.
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
Related
I came across an old post and tried the code with a project that I am working on, and it worked, but I am still confused as to why, could anyone here please unpack the logic behind the code here? I am specifically referring to this fiddle.
I understand substring_index, but not sure what "numbers" does, as well as the char length calculations.
Thanks in advance.
The numbers table is a way to create an ad hoc table that consists of sequential integers.
mysql> SELECT 1 n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4;
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
These numbers are used to extract the N'th word from the comma-separated string. It's just a guess that 4 is enough to account for the number of words in the string.
The CHAR_LENGTH() expression is a tricky way to count the words in the command-separated string. The number of commas determines the number of words. So if you compare the length of the string to the length of that string with commas removed, it tells you the number of commas, and therefore the number of words.
mysql> set #string = 'a,b,c,d,e,f';
mysql> select char_length(#string) - char_length(replace(#string, ',', '')) + 1 as word_count;
+------------+
| word_count |
+------------+
| 6 |
+------------+
Confusing code like this is one of the many reasons it's a bad idea to store data in comma-separated strings.
In my MySQL database I have a row column called test_column with the following rows:
dtq test dis
ged something fbd
edf something tds
zhs nothing edk
dda anything zhg
hvf nothing ert
asf nothing vbg
I'm looking for the string between the first three and the last three characters. I can get these values with REGEX like this:
^\w{3}\s(\w+)\s\w{3}$
I want to SELECT DISTINCT these values.
Expected output is the following:
test
something
nothing
anything
How can I do that with a MySQL command?
If you are running MySQL 8.0, you can use regexp_replace() as follows:
select distinct regexp_replace(col, '(^\\w{3}\\s)|(\\s\\w{3}$)', '') new_col from mytable
This works by replacing the first and last words (and the following/preceding spaces) with the empty string. The first and last words must be 3 characters long.
Demo on DB Fiddle:
| new_col |
| :-------- |
| test |
| something |
| nothing |
| anything |
You can make the regex a little more generic so it accepts also starting and ending words that have a length other than 3 characters and sequences of more than one space:
regexp_replace(col, '(^\\w+\\s+)|(\\s+\\w{+$)', '')
Don't need a regexp if this suffices
SUBSTRING_INDEX(col, ' ', 2)
However, this assumes your "3 characters" or \w{3} (which is really 3 alphanumeric characters) is not really the test, but instead the space is critical.
You don't need regex if that is all you want (remove first 4 and last 4 characters):
SELECT DISTINCT SUBSTRING(test_column,5,LENGTH(test_column)-8)
FROM mytable
Demo on dbfiddle.uk
or everything after the first space, up until the second space:
SELECT DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(test_column,' ',2),' ',-1)
FROM mytable
Demo on dbfiddle.uk
I've this record in a Mysql table:
ADDRESS
----------------------------------
sdasd 4354 ciao 12345 sdsdsa asfds
I would like to match all chars from the beginning to the first occurrence of a 5 digits word, including it.
In this case, using REGEXP_REPLACE, I would like to remove the substring matched and return sdsdsa asfds.
What I've tried to do is this:
SELECT REGEXP_REPLACE(ADDRESS, '^.+\b\d{5}\b.','') FROM `mytable`
The regexp seems to work testing it in this snippet and I cannot understand why Mysql won't.
MySQL supports POSIX regex which doesn't support PERL like properties e.g. \b, \d etc.
This regex should work for you:
SELECT REGEXP_REPLACE
('sdasd 4354 ciao 12345 sdsdsa asfds', '^.+[[:<:]][0-9]{5}[[:blank:]]+', '') as val;
+--------------+
| val |
+--------------+
| sdsdsa asfds |
+--------------+
RegEx Details:
^.+: Match 1 or more of any characters at the start (greedy)
[[:<:]]: Match a word boundary (zero width)
[0-9]{5}: Match exactly 5 digits
[[:blank:]]+: Match 1 or more of whitespaces (tab or space)
I got a big data (approximately 600,000).
I want the rows with value "word's" will appear.
Special characters will be completely ignored.
TABLE:
| column_value |
| ------------- |
| word's |
| hello |
| world |
QUERY: select * from table where column_value like '%words%'
RESULTS:
| column_value |
| ------------- |
| word's |
I want the rows with special characters will appear and ignore their special characters.
Can you please help me how can we achieve it with fast runtime?
You can use replace to remove the "special" character prior the matching.
SELECT *
FROM table
WHERE replace(column_value, '''', '') LIKE '%words%';
Nest the replace() calls for other characters.
Or you try it with regular expressions.
SELECT *
FROM table
WHERE column_value REGEXP 'w[^a-zA-Z]*o[^a-zA-Z]*r[^a-zA-Z]*d[^a-zA-Z]*s';
[^a-zA-Z]* matches optional characters, that are not a, ..., y and z and not A, ..., Y and Z, so this matches your search word also with any non alphas between the letters.
Or you have a look at the options full text search brings with it. Maybe that can help too.
You must add an index on your column_value.
MySQL doc
I am using MySQL.
I have a car table in my database, and there is a name column in that table.
Suppose the name column of the table contain values:
+----------+
| name |
+----------+
| AAA BB |
----------
| CC D BB |
----------
| OO kk BB |
----------
| PP B CC |
----------
I would like to search the table where name column value contains word "BB" (not substring), What is the SQL command to achieve this ?
I know LIKE , but it is used to match a contained substring, not for a word match.
P.S.
My table contains large data. So, I probably need a more efficient way than using LIKE
The values in name column are random strings.
Please do not ask me to use IN (...) , because the values in that column is unpredictable.
Try this WHERE clause:
WHERE name LIKE '% BB %'
OR name LIKE 'BB %'
OR name LIKE '% BB'
OR name = 'BB'
Note that this will not perform well if your table is large. You may also want to consider a full-text search if you need better performance.
You can use the REGEXP operator in MySQL:
SELECT *
FROM car
WHERE name REGEXP '[[:<:]]BB[[:>:]]'
It will match BB if it occurs as a single word. From the MySQL manual:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0