How to extract numeric values from text data in MySQL query - mysql

I have a table containing addresses. I would like to perform a query to select rows where the numeric values match.
address1 postcode
13 Some Road SW1 1AA
House 5 G3 7L
e.g
select * from addresses where numeric(address1)=13 and numeric(postcode)=11
^^ That would match the first row
select * from addresses where numeric(address1)=5 and numeric(postcode)=37
^^ That would match the second row
Is this possible?

Yes, this is possible. You could write a function that uses a regular expression to replace all non-numeric characters in the field with the empty character so the result would be only numeric characters returned from that function, and then filter on that function.
You might be interested in this stackoverflow question and this blog post.
See also mysql-udf-regexp.

select * from addresses where address1 REGEXP '(13)' and postcode REGEXP '(11)';

Related

How to select data where the last three characters are numbers?

I have a table with a column having values like:
AB123
AB209
ABQ52
AB18C
I would like to extract rows whose last three characters are numbers. How can I do this?
The original table is more complicated, and I tried the "WHERE" clause with "AB___", which returned the above to me.
You can use a combination of SUBSTRING and REGEXP like this:
SELECT yourcolumn FROM yourtable WHERE SUBSTRING(yourcolumn, -3) REGEXP '^[0-9]+$';
The SUBSTRING part will cut the last 3 characters of the column's value and the REGEXP condition will check whether this substring is numeric.

Get list of address dont have UK address mysql

I have a MySQL table contains a list of UK address, I was trying to get the list of address doesn't have a postcode.
I the list, we can see some of them don't have postcode at the end.
I was written a query as follows and didn't get the expected result.
select * from property_address WHERE property_address
REGEXP '^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$'
How to fix this query get working?
I will assume the postcode regexp is correct.
REGEXP '^([A-PR-UWYZ0-9]...|GIR 0AA)$'
______ _
You need to remove the "anchor" (^) that I underlined above. It is not "not". Instead, negate thus:
NOT REGEXP '([A-PR-UWYZ0-9]...|GIR 0AA)$'
___ _
Akina's suggestion of first extracting via SUBSTRING_INDEX is likely to make the query faster.

How do I covert some column's value by using SQL on MYSQL?

I have a table named Employee in my database, the structure is as shown below:
Id email phone name
1 user#gmail.com +7845690001 Jonney
2 Nortex.zone#gmail.com +7845690781 North
I have some data that I want to mask, for example +7845690001 to +7845690***. Full version as below.
Id email phone name
1 u**r#gmail.com +7845690*** J****y
2 N*********e#gmail.com +7845690*** N***h
I managed to do this for name and phone:
Select CONCAT(MID(phone, 1, LENGTH(phone) - 3), '***') as new_phone,
CONCAT(LEFT(name,1),REPEAT("*",LENGTH(name)-2),RIGHT(name,1)) as new_name from employee.
How can I do this for email?
Finally found the answer:
Select CONCAT(MID(phone, 1, LENGTH(phone) - 3), '***') as new_phone,
CONCAT(LEFT(name,1),REPEAT("*",LENGTH(name)-2),RIGHT(name,1)) as new_name,CONCAT(CONCAT(left(email,1),REPEAT("*",LENGTH(SUBSTRING_INDEX(email, "#", 1))-2),RIGHT(SUBSTRING_INDEX(email, "#", 1),1)),'#',SUBSTRING_INDEX(email,'#',-1)) as new_email from employee
Thanks all. :)
You can work with MySQL's string functions: LEFT(),RIGHT(),LENGTH(), REPEAT(), and SUBSTRING_INDEX() .
I'll just do it for email:
WITH
input(Id,email,phone,name) AS (
SELECT 1 , 'user#gmail.com' ,'+7845690001','Jonney'
UNION ALL SELECT 2 , 'Nortex.zone#gmail.com' ,'+7845690781','North'
)
SELECT
id
, -- the leftmost single character or "email"
LEFT(email,1)
-- repeat "*" for the length of the part of "email" before "#" minus 2
|| REPEAT('*',LENGTH(SUBSTRING_INDEX(email,'#',1))-2)
-- the rightmost single character of the part of "email" before "#"
|| RIGHT(SUBSTRING_INDEX(email,'#',1),1)
-- hard-wire "#"
||'#'
-- the part of "email" from the end of the string back to "#"
||SUBSTRING_INDEX(email,'#',-1)
AS email
FROM input
-- out id | email
-- out ----+-----------------------
-- out 1 | u**r#gmail.com
-- out 2 | N*********e#gmail.com
-- out (2 rows)
You can use CONCAT and SubSTRING functions in mysql.
The email and name has the same feature, use the same thing for name and change digits based your requirement.
SELECT CONCAT(LEFT(`name`, 1),"***",RIGHT(`name`, 1)) as cname, CONCAT(LEFT(`email `, 1),"***",SUBSTRING(`email `, LOCATE("#", `email `)-1, LENGTH(`email `)-LOCATE("#", `email `)-1)) as cemail , CONCAT(LEFT(`phone`, 8),"***") as cphone FROM `test4`
EDITTED -----------------
To fill by the exact number of characters you can use LPAD function. For name you can do:
SELECT CONCAT(LEFT(`name `,1),LPAD(RIGHT(`name `,1),LENGTH(`name `)-1,'*')) FROM `test4`
Use LOCATE and change indexes based on upper query for email.
A REGEXP_REPLACE can also do the trick. Here is how to do it for the email:
SELECT REGEXP_REPLACE(email, '(?!^).(?=[^#]+#)', '*') AS masked_email
FROM Employee;
Explanation:
(?!^) we make sure that the matching character is not at the beginning of the string and that way we skip the first character.
. matches the character to be replaced
(?=[^#]+#) we will stop at a sequence which is any character that is NOT #, then followed by #.
Every single character which is matched between the two will then be replaced with a * (the third parameter) by the function.
For the phone number I will show a much simpler solution:
SELECT REGEXP_REPLACE(phone, '[0-9]{3}$', '***') AS masked_phone
FROM Employee;
[0-9]{3} matches exactly three digits
$ tells that they must be at the end of the string.
We then replace them with three stars. Please note that this solution is assuming that you always store the phone numbers in a way that they always end with three digits. So for example if I enter a phone like "555-55-55-55", nothing will be masked. If you do not always insert the phones normalized in the same format, then you must think about something more complicated (like fetch digit - fetch zero or more non digits - fetch digit - fetch zero or more non digits - fetch digit - end of string, then replace whatever is matched with three *-s). Like this:
SELECT REGEXP_REPLACE(phone, '[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]*$', '***') AS masked_phone
FROM Employee;
Here [0-9] means a single digit and [^0-9]* means zero or more non-digits. And of course the same thing can be simplified by grouping the digit and the zero or more non-digits in one group which is then repeated exactly three times:
SELECT REGEXP_REPLACE(phone, '([0-9][^0-9]*){3}$', '***') AS masked_phone
FROM Employee;
And for the name, we can do the following:
SELECT REGEXP_REPLACE(name, '(?!^).(?=.+$)', '*') AS masked_name
FROM Employee;
So again we skip the first character, than match and replace every character until the last character of the string.
IMPORTANT: In the above examples we preserve the length of the strings. If you want higher anonymity, you can fetch the data in groups and then replace a desired group with a single *. For example for the e-mail:
SELECT REGEXP_REPLACE(email, '^(.)(.)+([^#]#.+)$', '\\1*\\3') AS masked_email
FROM Employee;
This will replace john#gmail.dom to j*n#gmail.dom and margareth#gmail.dom to m*h#gmail.dom. So it masked the length as well. Explanation:
^ is the start of the string
(.) is our first group. It it s single character
(.)+ is the second group. It's one or more characters.
([^#]#.+) is our third group. It is a single character which is NOT #, followed by #, then followed by one or more characters (any).
We replace that with \1 (the first group), followed by a single *, followed by \3 (the third group).

Understanding why querying cities with vowels at start and end doesnt work

Task:
Query the list of names from table which have vowels as both their first and last characters [duplicate].
I want to query the list of CITY names from the table STATION(id,city, longitude, latitude) which have vowels as both their first and last characters. The result cannot contain duplicates.
My query:
SELECT DISTINCT CITY
FROM STATION
WHERE CITY LIKE '[aeiou]%' AND '%[aeiou]'
I found this solution:
Select distinct city
from station
Where regexp_like(city, '^[aeiou].*[aeiou]$','i');
Why isn't my query working?
'[aeiou]' is a regex character class, which is not supported by operator LIKE. So your query won't do what you expect: it actually searches for a litteral string that starts with '[aeiou]' (and even if it was, you would need to repeat expression city like ... twice: city like ... and ... does not do what you expect either).
The solution you found uses regexp_like() with the following regex: ^[aeiou].*[aeiou]$, which means:
^ beginning of the string
[aeiou] one of the characters in the list
.* a sequence of 0 to N characters
[aeiou] one of the characters in the list
$ end of the string
Option 'i' makes the search case insensitive.
This works, but requires MySQL 8.0. If you are running an earlier version, consider using a REGEXP condition instead:
CITY REGEXP '^[aeiou].*[aeiou]$'

Mysql query, contains a string then 3 numbers in email

I have a mysql database with a table that contains an email address entered by website users.
How would I select all records where the email field contains any number of characters, then 3 numbers and #yahoo.com
i.e. testemail639#yahoo.com
A simple way would be to use REGEXP in your SELECT statements.
SELECT * FROM records WHERE email REGEXP '^\w+\d{3}\#.*$'
The above statement is untested, but should lead you down a better road.
SELECT * FROM table WHERE email REGEXP '[0-9]{3}#yahoo\.com'
Try this
SELECT * FROM records WHERE email REGEXP '^\w+([\.-]?\w+)?([0-9]{3})#\w+([\.-]?\w+)*(\.\w{2,4})+$'
^\w+([\.-]?\w+)?([0-9]{3})#\w+([\.-]?\w+)*(\.\w{2,4})+$ strictly checks string followed by 3 numbers pattern in the email containing no spaces