MySQL locate or remove string suffix from a known set - mysql

I want to write a MySQL stored procedure which will split a FQDN into host/authority/tld parts.
Let's say I have a list of known TLDs, and for the sake of illustration let's say it's the set
com
co.uk
uk
let's test it against these strings
input | output
----------------|-------
alpha.co.uk | alpha
mail.beta.uk | mail.beta
The output is the shortest substring of the input, starting from the beginning, such that CONCAT(output,'.',tld)=input for some tld which is a member of the given set.
Note that we need the shortest substring as the output, otherwise the output would be alpha.co in the first case, which is wrong.
I know how to write a MySQL function which tells me whether a given string is the suffix of another string, but here there are many possible such strings and any will do (provided no longer string is also a suffix of the input).
I know I could write a regex along the lines of co\.uk|uk|com but MySQL REGEX operator does not return the position of the match, just whether it matches or not.
Yes, I really do want a solution in SQL for this, not in the application language.
What's the best way to locate or remove the longest possible suffix, given a set of valid suffixes?

Here's one way to do that, relying on the fact that MIN() will yield the shortest of all the matches:
create table tld (tld varchar(100));
create table input (input varchar(100));
insert into tld values ('com'),('co.uk'),('uk');
insert into input values ('alpha.co.uk'),('mail.beta.com');
select
input.input as input,
min(substring(input.input, 1, length(input.input) - length(tld.tld) - 1)) as output
from input inner join tld
on input.input like concat('%.', tld.tld) group by input.input;
OR, if you only have a single value for input, then:
set #input = 'alpha.co.uk';
select min(substring(#input, 1, length(#input) - length(tld.tld) - 1)) as output
from tld
where #input like concat('%.', tld.tld);

Related

SQL get row by int | string

Im little bit stucked with my SQL query.
I've got a table with rows that can be identified by id or hash string...
id
short
title
1
asdadasdsd
foo
2
1qweqweqwe
bar
3
yxcyxcyxcy
baz
So SQL is quite easy...
SELECT * FROM table WHERE id=<identifier> OR hash=<identifier>
What I found out is that when my identifier is hash and begins with number which could be found in the id column, MYSQL returning me "wrong" row.
For example when my identifier is "1qweqweqwe" result is row 1.
I think the reason for that is it converts my hash string into integer maybe? Is there a way how to disable this behaviour?
Or the only way is to regenerate all hashes into new formats without numbers in it?
Thank you for any clarification :)
Petr
No, you do not have to regenerate the hashes. If both the id and hash match and you prefer
to pull row based on hash, then you could have the hash as the first match column condition. Basically it goes with the first match condition that is found to be true.
Also, I suppose you are already adding quotes to the hash string in the query. If not please do, as it will validate as a string then.
SELECT * FROM table WHERE hash='<identifier>' OR id=<identifier>
You seem to be passing the identifier in as a string -- because it is. But then you are comparing to a number (the id) and the string parameter is converted to a number. MySQL does so by converting the leading digits, if any.
I don't like the logic of passing in a string for an identifier, so I would really suggest that you fix the calling logic and call either:
WHERE id = <int identifier>
or:
WHERE hash = <string identifier>
But if you want to keep your current version, you can convert to a string:
WHERE CAST(id AS CHAR) = <identifier> OR hash = <identifier>

MySQL REPLACE string with regex

I have a table with about 50,000 records. One of the fields is a "imploaded" field consisting of variable number of parameters from 1 to 800. I need to replace all parameters to 0.
Example:
1 parameter 3.45 should become 0.00
2 parameters 2.27^11.03 should become 0.00^0.00
3 parameters 809.11^0.12^3334.25 should become 0.00^0.00^0.00
and so on.
Really I need to replace anything between ^ with 0.00 ( for 1 parameter it should be just 0.00 without ^).
Or I need somehow count number of ^, generate string like 0.00^0.00^0.00 ... and replace it. The only tool available is MySqlWorkbench.
I would appreciate any help.
There is no regex replace capability built in to MySQL.
You can, however, accomplish your purpose by doing what you suggested -- counting the number of ^ and crafting a string of replacement values, with this:
TRIM(TRAILING '^' FROM REPEAT('0.00^',(LENGTH(column) - LENGTH(REPLACE(column,'^','')) + 1)));
From inside to outside, we calculate the number of values by counting the number of delimiters, and adding 1 to that count. We count the delimiters by comparing the length of the original string, against the length of the same string with the delimiters stripped out using REPLACE(...,'^','') to replace every ^ with nothing.
The REPEAT() function builds a string by repeating a string expression n number of times.
This results in a spurious ^ at the end of the string, which we remove easily enough with TRIM(TRAILING '^' FROM ...).
SELECT t1.*, ... the expression above ... FROM table_name t1, from your table to verify the results of this logic (replacing column with the actual name of the column), then you can UPDATE table SET column = ... to modify the values. once you are confident in the logic.
Note, of course, that this is indicative of a problematic database design. Each column should contain a single atomic value, not a "list" of values, as this question seems to suggest.

I want to extract the parameters of a url in mysql

I have in my database a column with the parameters value of an url. I want with an sql query to put those parameters in different columns. I give an example:
I have now a column named parameters with for example this value: pOrgNum=j11000&pLanguage=nl&source=homepage
now I want three columns: pOrgnum | pLanguage | source with the values of my parameters.
The problem is that I don't know the order of my parameters or the length of it, so I can't use for example substring(parameters,9,6) to extract the parameter pOrgnum. can someone help me please?
There's a MySQL UDF that you can use to do exactly this, which also handles decoding the params and handles most character encodings, etc.
https://github.com/StirlingMarketingGroup/mysql-get-url-param
Examples
select`get_url_param`('https://www.youtube.com/watch?v=KDszSrddGBc','v');
-- "KDszSrddGBc"
select`get_url_param`('watch?v=KDszSrddGBc','v');
-- "KDszSrddGBc"
select`get_url_param`('watch?v=KDszSrddGBc','x');
-- null
select`get_url_param`('https://www.google.com/search?q=cgo+uint32+to+pointer&rlz=1C1CHBF_enUS767US767&oq=cgo+uint32+to+pointer&aqs=chrome..69i57.12106j0j7&sourceid=chrome&ie=UTF-8','q');
-- "cgo uint32 to pointer"
select`get_url_param`('/search?q=Na%C3%AFvet%C3%A9&oq=Na%C3%AFvet%C3%A9','q');
-- "Naïveté"
Disclaimer, I am the author.
I achieved this by taking the right of the string after the search parameter, then the left of the resulting string before the first &.
This handles
if the parameter was the last in the url (so no "&" follows it)
if the parameter does not exist (returns blank)
varying lengths of the search string (provided you replace "utm_medium" everywhere)
This finds the value of "utm_medium" in a parameter named url:
IF(locate("utm_medium", url)=0, '', LEFT(RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium")),IF(locate("&",RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium")))=0,length(RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium")+1)),locate("&",RIGHT(url,length(url)-locate("utm_medium",url)-length("utm_medium"))))-1)) utm_medium
To use, find and replace url with your field name, and utm_medium with your url parameter.
May be inefficient, but gets the job done, and couldn't find an easy answer elsewhere
Its code work in mysql:
SELECT substring_index(URL_FIELD,'\',-1) FROM DemoTable;

Using REGEX to alter field data in a mysql query

I have two databases, both containing phone numbers. I need to find all instances of duplicate phone numbers, but the formats of database 1 vary wildly from the format of database 2.
I'd like to strip out all non-digit characters and just compare the two 10-digit strings to determine if it's a duplicate, something like:
SELECT b.phone as barPhone, sp.phone as SPPhone FROM bars b JOIN single_platform_bars sp ON sp.phone.REGEX = b.phone.REGEX
Is such a thing even possible in a mysql query? If so, how do I go about accomplishing this?
EDIT: Looks like it is, in fact, a thing you can do! Hooray! The following query returned exactly what I needed:
SELECT b.phone, b.id, sp.phone, sp.id
FROM bars b JOIN single_platform_bars sp ON REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(b.phone,' ',''),'-',''),'(',''),')',''),'.','') = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')',''),'.','')
MySQL doesn't support returning the "match" of a regular expression. The MySQL REGEXP function returns a 1 or 0, depending on whether an expression matched a regular expression test or not.
You can use the REPLACE function to replace a specific character, and you can nest those. But it would be unwieldy for all "non-digit" characters. If you want to remove spaces, dashes, open and close parens e.g.
REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')','')
One approach is to create user defined function to return just the digits from a string. But if you don't want to create a user defined function...
This can be done in native MySQL. This approach is a bit unwieldy, but it is workable for strings of "reasonable" length.
SELECT CONCAT(IF(SUBSTR(sp.phone,1,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,1,1),'')
,IF(SUBSTR(sp.phone,2,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,2,1),'')
,IF(SUBSTR(sp.phone,3,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,3,1),'')
,IF(SUBSTR(sp.phone,4,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,4,1),'')
,IF(SUBSTR(sp.phone,5,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,5,1),'')
) AS phone_digits
FROM sp
To unpack that a bit... we extract a single character from the first position in the string, check if it's a digit, if it is a digit, we return the character, otherwise we return an empty string. We repeat this for the second, third, etc. characters in the string. We concatenate all of the returned characters and empty strings back into a single string.
Obviously, the expression above is checking only the first five characters of the string, you would need to extend this, basically adding a line for each position you want to check...
And unwieldy expressions like this can be included in a predicate (in a WHERE clause). (I've just shown it in the SELECT list for convenience.)
MySQL doesn't support such string operations natively. You will either need to use a UDF like this, or else create a stored function that iterates over a string parameter concatenating to its return value every digit that it encounters.

MySQL query - select postcode matches

I need to make a selection based on the first 2 characters of a field, so for example
SELECT * from table WHERE postcode LIKE 'rh%'
But this would select any record that contains those 2 characters at any point in the "postcode" field right? I am in need of a query that just selects the first 2 characters. Any pointerS?
Thanks
Your query is correct. It searches for postcodes starting with "rh".
In contrast, if you wanted to search for postcodes containing the string "rh" anywhere in the field, you would write:
SELECT * from table WHERE postcode LIKE '%rh%'
Edit:
To answer your comment, you can use either or both % and _ for relatively simple searches. As you have noticed already, % matches any number of characters whereas _ matches a single character.
So, in order to match postcodes starting with "RHx " (where x is any character) your query would be:
SELECT * from table WHERE postcode LIKE 'RH_ %'
(mind the space after _). For more complex search patterns, you need to read about regular expressions.
Further reading:
http://dev.mysql.com/doc/refman/5.1/en/pattern-matching.html
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
LIKE '%rh%' will return all rows with 'rh' anywhere
LIKE 'rh%' will return all rows with 'rh' at the beginning
LIKE '%rh' will return all rows with 'rh' at the end.
If you want to get only first two characters 'rh', use MySQL SUBSTR() function
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_substr
Dave, your way seems correct to me (and works on my test data). Using a leading % as well will match anywhere in the string which obviously isn't desirable when dealing with postcodes.