extract a pattern number from url mysql or python - mysql

I have a bunch url that has a string either has
hotel+4 digit number: hotel1234
or slash+4digit.html: /1234.html
Is there a regex to extract 4 digit number like 1234 either use python or mysql?
I'm thinking 'hotel'[0-9][0-9][0-9][0-9],sth like this
Thanks!

You can try the REGEXP
SELECT * FROM Table WHERE ColumnName REGEXP '^[0-9]{4}$'
or
SELECT * FROM Table WHERE ColumnName REGEXP '^[[:digit:]]{4}$';

The following is a stackoverflow.com link that might be useful showing
how to extract a substring from inside a string in Python?
Unfortunately, MySQL regexp simply returns true if the string exists. I have found substring_index useful if you know the text surrounding the target...
select case when ColumnName like 'hotel____' then substring_index(ColumnName,'hotel',-1)
when ColumnName like '/____.html' then substring_index(substring_index(ColumnName,'/',-1),'.html',1)
else ColumnName
end digit_extraction
from TableName
where ...;
The case statement above isn't necessary because of the way substring_index works (by returning the entire string if the search string isn't found).
select substring_index(substring_index(substring_index(ColumnName,'hotel',-1),'/',-1),'.html',1)
from TableName
where ...;

Related

How to find variable pattern in MySql with Regex?

I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.
The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.
The second select statement is more rudimentary, but I am unsure which operators to use.
Which would be the quickest solution?
SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName
SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%'
-- Will take a substring of this result as well
Example Data:
randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc
In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.
EDIT
SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable
This expression is still not grabbing the right data, but I feel like it may get me closer.
I suggest using
REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')
Details:
(?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
[a-zA-Z]{3} - three ASCII letters
[0-9]{3,4} - three to four ASCII digits
(?![^&#]) - that are followed either with &, # or end of string.
See the online demo:
WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte
Output:
I'd make use of capture groups:
(?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&])
I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=.
I also added a lookahead (?= for "&".
If you'd like to fidget more with regex I recommend RegExr

find string in first 10 characters

I have a table with fields containing html code. I would like to select only those rows where htmlfield contain string "<p>[[{"fid":" but only in first 10 characters of the html field (this field contains more of such strings and I want to find only fields that contain the string in the beginning).
Is it possible to do such select?
You can use a SUBSTRING() to grab the fields with that string in them.
SELECT field1
FROM TableName
WHERE SUBSTRING(field1, 1, 12) = '<p>[[{"fid":'
Example
You can also try using the LIKE function. Where you can use the % wildcard at the end of your string to get fields that start with that string.
SELECT field1
FROM TableName
WHERE field1 LIKE '<p>[[{"fid":%'
Example
Do it like this:
select SUBSTRING(field_name,1, 10) from table_name;
It will display 10 characters only for that specific field.
Hope it helps

MySQL REGEXP - Select certain pattern of numbers and characters

Anyone have a clue how I could go about trying to select a certain pattern of numbers with a 1 at the end?
Ex.
SELECT pattern FROM table WHERE pattern REGEXP '1_2+2_2+3_2+4_2&2016-06-09&1';
or
SELECT pattern FROM table WHERE pattern REGEXP '2_1&2016-06-09&1';
using the same number-underscore-number, ampersand, date, ampersand, number; just as long as that number 1 is at the end?
EDIT:
Actually, let me phrase it better. How do I use REGEXP to select an ampersand and the number 1 at the end of a string?
You don't need regex. Just use LIKE:
LIKE '%&1'
The % makes it not be anchored to the start of the string. LIKE is not regex, but closer to a glob syntax. It may be faster than regex, too.
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
SELECT column_name
FROM table
WHERE column_name LIKE '%&1';
Note:
You can also use LIKE operator for searching from start not only from end.
Here is an Example.
SELECT column_name
FROM table
WHERE column_name LIKE '&1%';

Show/convert only alphanumeric data in sql query [duplicate]

I'm trying to select all rows that contain only alphanumeric characters in MySQL using:
SELECT * FROM table WHERE column REGEXP '[A-Za-z0-9]';
However, it's returning all rows, regardless of the fact that they contain non-alphanumeric characters.
Try this code:
SELECT * FROM table WHERE column REGEXP '^[A-Za-z0-9]+$'
This makes sure that all characters match.
Your statement matches any string that contains a letter or digit anywhere, even if it contains other non-alphanumeric characters. Try this:
SELECT * FROM table WHERE column REGEXP '^[A-Za-z0-9]+$';
^ and $ require the entire string to match rather than just any portion of it, and + looks for 1 or more alphanumberic characters.
You could also use a named character class if you prefer:
SELECT * FROM table WHERE column REGEXP '^[[:alnum:]]+$';
Try this:
REGEXP '^[a-z0-9]+$'
As regexp is not case sensitive except for binary fields.
There is also this:
select m from table where not regexp_like(m, '^[0-9]\d+$')
which selects the rows that contains characters from the column you want (which is m in the example but you can change).
Most of the combinations don't work properly in Oracle platforms but this does. Sharing for future reference.
Try this
select count(*) from table where cast(col as double) is null;
Change the REGEXP to Like
SELECT * FROM table_name WHERE column_name like '%[^a-zA-Z0-9]%'
this one works fine

Mysql like to match pattern at end of string

I have the following strings in the following pattern in a table in my db:
this_is_my_string_tester1
this_is_my_string_mystring2
this_is_my_string_greatstring
I am trying to match all strings that start with a specific pattern split by underscores i.e.this_is_my_string_ and then a wildcard final section
Unfortunately there is an added complication where some strings like the following:
this_is_my_string_tester1_yet_more_text
this_is_my_string_mystring2_more_text
this_is_my_string_greatstring_more
Therefore taking the following as examples:
this_is_my_string_tester1
this_is_my_string_mystring2
this_is_my_string_greatstring
this_is_my_string_tester1_yet_more_text
this_is_my_string_mystring2_more_text
this_is_my_string_greatstring_more
I am trying to have returned:
this_is_my_string_tester1
this_is_my_string_mystring2
this_is_my_string_greatstring
I have no idea how to do this with a like statement. Is this possible if so how?
EDIT
There is one final complication:
this_is_my_string
needs to be supplied as a list i.e in
(this_is_my_string, this_is_my_amazing_string, this_is_another_amazing_string)
SELECT * FROM atable WHERE afield REGEXP 'this_is_my_string_[a-z]+'
It might be faster if you have an index on afield and do
SELECT * FROM atable WHERE afield REGEXP 'this_is_my_string_[a-z]+'
AND afield LIKE 'this_is_my_string_%'
After edit of question:
Either
SELECT * FROM atable
WHERE afield REGEXP '(this_is_my_string|this_is_my_amazing_string)_[a-z]+'
or maybe you want something like having a table with the prefixes:
SELECT *
FROM atable AS t,
prefixes AS p
WHERE afield REGEXP CONCAT(p.prefix, '_[a-z]+')
As by the reference documentation this should not be possible, as a pattern (string literal) is required. Give it a try nevertheless.
There the answer of #KayNelson with LIKE (?) and INSTR might do instead of REGEXP.
try this
SELECT * FROM db.table WHERE strings LIKE 'this_is_my_string_%' AND instr(Replace(strings,"this_is_my_string_",""),"_") = 0;
It checks if more _ occurs after replacing the standard "this_is_my_string_"