How to find variable pattern in MySql with Regex? - mysql

I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.
The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.
The second select statement is more rudimentary, but I am unsure which operators to use.
Which would be the quickest solution?
SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName
SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%'
-- Will take a substring of this result as well
Example Data:
randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc
In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.
EDIT
SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable
This expression is still not grabbing the right data, but I feel like it may get me closer.

I suggest using
REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')
Details:
(?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
[a-zA-Z]{3} - three ASCII letters
[0-9]{3,4} - three to four ASCII digits
(?![^&#]) - that are followed either with &, # or end of string.
See the online demo:
WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte
Output:

I'd make use of capture groups:
(?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&])
I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=.
I also added a lookahead (?= for "&".
If you'd like to fidget more with regex I recommend RegExr

Related

MySQL select statement remove whitespace in WHERE clause

I'm trying to match phone numbers based on the last 6 digits. The problem is the numbers in the database are in various formats, some have whitespace within the number.
SELECT * FROM users WHERE trim(phone) LIKE '%123456'
Trim only removes the leading and trailing whitespace and doesn't find entries where clients have entered their numbers with whitespace between the numbers:
123 456, 12 34 56, etc.
So how to remove the whitespace within the search? Having the result without whitespace is not enough. Updating the database with replace is not an option either.
Use replace() to substitute all occurrences of ' ' to '' within a string.
mySQL documentation
REPLACE(str,from_str,to_str)
Returns the string str with all occurrences of the string from_str replaced by the string to_str. REPLACE() performs a case-sensitive match when searching for from_str.
SELECT * FROM users WHERE Replace(coalesce(Phone,''), ' ','') LIKE '%123456'
If you can't do it in the where clause, which seems odd to me; just nest it in a select.
Select sub.*
from (Select u.*, Replace(coalesce(Phone,''), ' ','') as ph
from users u) sub
where sub.ph LIKE '%123456'

extract a pattern number from url mysql or python

I have a bunch url that has a string either has
hotel+4 digit number: hotel1234
or slash+4digit.html: /1234.html
Is there a regex to extract 4 digit number like 1234 either use python or mysql?
I'm thinking 'hotel'[0-9][0-9][0-9][0-9],sth like this
Thanks!
You can try the REGEXP
SELECT * FROM Table WHERE ColumnName REGEXP '^[0-9]{4}$'
or
SELECT * FROM Table WHERE ColumnName REGEXP '^[[:digit:]]{4}$';
The following is a stackoverflow.com link that might be useful showing
how to extract a substring from inside a string in Python?
Unfortunately, MySQL regexp simply returns true if the string exists. I have found substring_index useful if you know the text surrounding the target...
select case when ColumnName like 'hotel____' then substring_index(ColumnName,'hotel',-1)
when ColumnName like '/____.html' then substring_index(substring_index(ColumnName,'/',-1),'.html',1)
else ColumnName
end digit_extraction
from TableName
where ...;
The case statement above isn't necessary because of the way substring_index works (by returning the entire string if the search string isn't found).
select substring_index(substring_index(substring_index(ColumnName,'hotel',-1),'/',-1),'.html',1)
from TableName
where ...;

MySQL REGEXP - Select certain pattern of numbers and characters

Anyone have a clue how I could go about trying to select a certain pattern of numbers with a 1 at the end?
Ex.
SELECT pattern FROM table WHERE pattern REGEXP '1_2+2_2+3_2+4_2&2016-06-09&1';
or
SELECT pattern FROM table WHERE pattern REGEXP '2_1&2016-06-09&1';
using the same number-underscore-number, ampersand, date, ampersand, number; just as long as that number 1 is at the end?
EDIT:
Actually, let me phrase it better. How do I use REGEXP to select an ampersand and the number 1 at the end of a string?
You don't need regex. Just use LIKE:
LIKE '%&1'
The % makes it not be anchored to the start of the string. LIKE is not regex, but closer to a glob syntax. It may be faster than regex, too.
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
SELECT column_name
FROM table
WHERE column_name LIKE '%&1';
Note:
You can also use LIKE operator for searching from start not only from end.
Here is an Example.
SELECT column_name
FROM table
WHERE column_name LIKE '&1%';

Show/convert only alphanumeric data in sql query [duplicate]

I'm trying to select all rows that contain only alphanumeric characters in MySQL using:
SELECT * FROM table WHERE column REGEXP '[A-Za-z0-9]';
However, it's returning all rows, regardless of the fact that they contain non-alphanumeric characters.
Try this code:
SELECT * FROM table WHERE column REGEXP '^[A-Za-z0-9]+$'
This makes sure that all characters match.
Your statement matches any string that contains a letter or digit anywhere, even if it contains other non-alphanumeric characters. Try this:
SELECT * FROM table WHERE column REGEXP '^[A-Za-z0-9]+$';
^ and $ require the entire string to match rather than just any portion of it, and + looks for 1 or more alphanumberic characters.
You could also use a named character class if you prefer:
SELECT * FROM table WHERE column REGEXP '^[[:alnum:]]+$';
Try this:
REGEXP '^[a-z0-9]+$'
As regexp is not case sensitive except for binary fields.
There is also this:
select m from table where not regexp_like(m, '^[0-9]\d+$')
which selects the rows that contains characters from the column you want (which is m in the example but you can change).
Most of the combinations don't work properly in Oracle platforms but this does. Sharing for future reference.
Try this
select count(*) from table where cast(col as double) is null;
Change the REGEXP to Like
SELECT * FROM table_name WHERE column_name like '%[^a-zA-Z0-9]%'
this one works fine

Mysql SELECT all rows where char exists in value but not the last one

I need a SELECT query in MYSQL that will retrieve all rows in one table witch field values contain "?" char with one condition: the char is not the last character
Example:
ID Field
1 123??see
2 12?
3 45??78??
Returning rows would then be those from ID 1 and 3 that match the condition given
The only statement I have is:
SELECT *
FROM table
WHERE Field LIKE '%?%'
But, the MySQL query does not solve my problem..
The LIKE expressions also support a wildcard "_" which matches exactly one character.
So you can write an expression like the example below, and know that your "?" will not be the last character in the string. There must be at least one more character.
WHERE intrebare LIKE '%?_%'
Re comment from #JohnRuddell,
Yes, that's true, this will match the string "??" because a "?" exists in a position that is not the last character.
It depends whether the OP means for that to be a match or not. The OP says the string "45??78??" is a match, but it's not clear if they would intend that "4578??" to be a match.
An alternative is to use a regular expression, but this is a little more tricky because you have to escape a literal "?", so it won't be interpreted as a regexp metacharacter. Then also escape the escape character.
WHERE intrebare REGEXP '\\?[^?]'
you can just add an additional where where the last character is not a ?
SELECT *
FROM intrebari
WHERE intrebare LIKE '%?%' AND intrebare NOT LIKE '%?'
you could also do it like this
SELECT *
FROM intrebari
WHERE intrebare LIKE '%?%' AND RIGHT(intrebare,1) <> '?'
DEMO