replace a character in mysql with a random character from a list - mysql

i would like to perform Mysql search & replace with random characters, taken from a list. I cannot use regex, since my version is way prior to 8.
instead of the below,
i would like to change for instance the letter u with one out of (a,e,i,f,k) randomly.
UPDATE products
SET
productDescription = REPLACE(productDescription,
'abuot',
'about');
Is there a mysql command for this task?
Actually my goal is to get in the lastnames column, new names that are not exactly like the real ones, so one could work on "anonymous" data.
I would like to replace all rows in a certain column. Say in table products, in column description, we have data like:
abcud
ieruie
kjuklkllu
uiervfd
With the replace function, we would not want to create something like: replace e with i,
but replace e with one of (a,e,i,f,k)
example desired output:
abced
ierfie
kjiklkllk
aiervfd
like i said, we plan to use this into last names, we plan to replace many characters with random ones from a list, in an effort to create anonymous data in the column that contains last names.
On a next step, i would like to do the same, in order to make anonymous telephone numbers.
example
726456273
827364878
347823472
replace 3 with one of 0-9,
output:
726456279
827664878
547821472

SELECT REPLACE('product abuot Description',
SUBSTRING('product abuot Description', CHARINDEX('abuot', 'product abuot Description') ,5) , 'about')

CREATE FUNCTION smart_replace ( argument TEXT,
search_for CHAR(1),
replace_with TEXT )
RETURNS TEXT
NO SQL
BEGIN
SET argument = REPLACE(argument, search_for, CHAR(0));
REPEAT
SET argument = CONCAT( SUBSTRING_INDEX(argument, CHAR(0), 1),
SUBSTRING(replace_with FROM CEIL(RAND() * LENGTH(replace_with)) FOR 1),
SUBSTRING(argument FROM 2 + LENGTH(SUBSTRING_INDEX(argument, CHAR(0), 1))));
UNTIL NOT LOCATE(CHAR(0), argument) END REPEAT;
RETURN argument;
END
replace e with one of (a,e,i,f,k)
SELECT smart_replace(table.column, 'e', 'aeifk')
replace 3 with one of 0-9
SELECT smart_replace(table.phone, 'e', '0123456789')

Related

MySQL - Replace string when it appears

The column 'PrizeMoneyBreakDown' includes a number of strings seperated by a semi-colon. I am trying to remove the strings 'total value', 'trophy total value', and 'welfare fund' from the data. These strings only appear sometimes in the data so it is not as simple as just removing the last three strings. I need to write a query that removes the strings IF they appear.
Example of data:
1st,5285;2nd,1680;3rd,885;4th,550;5th,350;6th,350;7th,350;8th,350;total_value,10000;welfare_fund,200;trophy_total_value,150;
Desired output of data:
1st,5285;2nd,1680;3rd,885;4th,550;5th,350;6th,350;7th,350;8th,350
Current code (only removes the words 'total value' etc - does not remove prize money associated with string):
SELECT PrizeMoneyBreakDown,
REPLACE(REPLACE(REPLACE(PrizeMoneyBreakDown,'total_value',""),'welfare_fund',""),'trophy_total_value',"") as new
FROM race2;
On MySQL 8+, we can use REGEXP_REPLACE:
SELECT PrizeMoneyBreakDown,
REGEXP_REPLACE(PrizeMoneyBreakDown,
'(total_value|welfare_fund|trophy_total_value),\\d+;',
'') AS NewPrizeMoneyBreakDown
FROM race2;
If you want to update the actual column then use:
UPDATE race2
SET PrizeMoneyBreakDown = REGEXP_REPLACE(
PrizeMoneyBreakDown,
'(total_value|welfare_fund|trophy_total_value),\\d+;',
'')
WHERE PrizeMoneyBreakDown REGEXP '(total_value|welfare_fund|trophy_total_value),\\d+;';

How to find variable pattern in MySql with Regex?

I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.
The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.
The second select statement is more rudimentary, but I am unsure which operators to use.
Which would be the quickest solution?
SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName
SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%'
-- Will take a substring of this result as well
Example Data:
randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc
In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.
EDIT
SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable
This expression is still not grabbing the right data, but I feel like it may get me closer.
I suggest using
REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')
Details:
(?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
[a-zA-Z]{3} - three ASCII letters
[0-9]{3,4} - three to four ASCII digits
(?![^&#]) - that are followed either with &, # or end of string.
See the online demo:
WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte
Output:
I'd make use of capture groups:
(?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&])
I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=.
I also added a lookahead (?= for "&".
If you'd like to fidget more with regex I recommend RegExr

How do I covert some column's value by using SQL on MYSQL?

I have a table named Employee in my database, the structure is as shown below:
Id email phone name
1 user#gmail.com +7845690001 Jonney
2 Nortex.zone#gmail.com +7845690781 North
I have some data that I want to mask, for example +7845690001 to +7845690***. Full version as below.
Id email phone name
1 u**r#gmail.com +7845690*** J****y
2 N*********e#gmail.com +7845690*** N***h
I managed to do this for name and phone:
Select CONCAT(MID(phone, 1, LENGTH(phone) - 3), '***') as new_phone,
CONCAT(LEFT(name,1),REPEAT("*",LENGTH(name)-2),RIGHT(name,1)) as new_name from employee.
How can I do this for email?
Finally found the answer:
Select CONCAT(MID(phone, 1, LENGTH(phone) - 3), '***') as new_phone,
CONCAT(LEFT(name,1),REPEAT("*",LENGTH(name)-2),RIGHT(name,1)) as new_name,CONCAT(CONCAT(left(email,1),REPEAT("*",LENGTH(SUBSTRING_INDEX(email, "#", 1))-2),RIGHT(SUBSTRING_INDEX(email, "#", 1),1)),'#',SUBSTRING_INDEX(email,'#',-1)) as new_email from employee
Thanks all. :)
You can work with MySQL's string functions: LEFT(),RIGHT(),LENGTH(), REPEAT(), and SUBSTRING_INDEX() .
I'll just do it for email:
WITH
input(Id,email,phone,name) AS (
SELECT 1 , 'user#gmail.com' ,'+7845690001','Jonney'
UNION ALL SELECT 2 , 'Nortex.zone#gmail.com' ,'+7845690781','North'
)
SELECT
id
, -- the leftmost single character or "email"
LEFT(email,1)
-- repeat "*" for the length of the part of "email" before "#" minus 2
|| REPEAT('*',LENGTH(SUBSTRING_INDEX(email,'#',1))-2)
-- the rightmost single character of the part of "email" before "#"
|| RIGHT(SUBSTRING_INDEX(email,'#',1),1)
-- hard-wire "#"
||'#'
-- the part of "email" from the end of the string back to "#"
||SUBSTRING_INDEX(email,'#',-1)
AS email
FROM input
-- out id | email
-- out ----+-----------------------
-- out 1 | u**r#gmail.com
-- out 2 | N*********e#gmail.com
-- out (2 rows)
You can use CONCAT and SubSTRING functions in mysql.
The email and name has the same feature, use the same thing for name and change digits based your requirement.
SELECT CONCAT(LEFT(`name`, 1),"***",RIGHT(`name`, 1)) as cname, CONCAT(LEFT(`email `, 1),"***",SUBSTRING(`email `, LOCATE("#", `email `)-1, LENGTH(`email `)-LOCATE("#", `email `)-1)) as cemail , CONCAT(LEFT(`phone`, 8),"***") as cphone FROM `test4`
EDITTED -----------------
To fill by the exact number of characters you can use LPAD function. For name you can do:
SELECT CONCAT(LEFT(`name `,1),LPAD(RIGHT(`name `,1),LENGTH(`name `)-1,'*')) FROM `test4`
Use LOCATE and change indexes based on upper query for email.
A REGEXP_REPLACE can also do the trick. Here is how to do it for the email:
SELECT REGEXP_REPLACE(email, '(?!^).(?=[^#]+#)', '*') AS masked_email
FROM Employee;
Explanation:
(?!^) we make sure that the matching character is not at the beginning of the string and that way we skip the first character.
. matches the character to be replaced
(?=[^#]+#) we will stop at a sequence which is any character that is NOT #, then followed by #.
Every single character which is matched between the two will then be replaced with a * (the third parameter) by the function.
For the phone number I will show a much simpler solution:
SELECT REGEXP_REPLACE(phone, '[0-9]{3}$', '***') AS masked_phone
FROM Employee;
[0-9]{3} matches exactly three digits
$ tells that they must be at the end of the string.
We then replace them with three stars. Please note that this solution is assuming that you always store the phone numbers in a way that they always end with three digits. So for example if I enter a phone like "555-55-55-55", nothing will be masked. If you do not always insert the phones normalized in the same format, then you must think about something more complicated (like fetch digit - fetch zero or more non digits - fetch digit - fetch zero or more non digits - fetch digit - end of string, then replace whatever is matched with three *-s). Like this:
SELECT REGEXP_REPLACE(phone, '[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]*$', '***') AS masked_phone
FROM Employee;
Here [0-9] means a single digit and [^0-9]* means zero or more non-digits. And of course the same thing can be simplified by grouping the digit and the zero or more non-digits in one group which is then repeated exactly three times:
SELECT REGEXP_REPLACE(phone, '([0-9][^0-9]*){3}$', '***') AS masked_phone
FROM Employee;
And for the name, we can do the following:
SELECT REGEXP_REPLACE(name, '(?!^).(?=.+$)', '*') AS masked_name
FROM Employee;
So again we skip the first character, than match and replace every character until the last character of the string.
IMPORTANT: In the above examples we preserve the length of the strings. If you want higher anonymity, you can fetch the data in groups and then replace a desired group with a single *. For example for the e-mail:
SELECT REGEXP_REPLACE(email, '^(.)(.)+([^#]#.+)$', '\\1*\\3') AS masked_email
FROM Employee;
This will replace john#gmail.dom to j*n#gmail.dom and margareth#gmail.dom to m*h#gmail.dom. So it masked the length as well. Explanation:
^ is the start of the string
(.) is our first group. It it s single character
(.)+ is the second group. It's one or more characters.
([^#]#.+) is our third group. It is a single character which is NOT #, followed by #, then followed by one or more characters (any).
We replace that with \1 (the first group), followed by a single *, followed by \3 (the third group).

How to get the values for which the format and suffix are known but the exact values are not known and there can be multiple values from the database?

I have a use case as below:
I have thousands of records in the database and let's say I am having one column named myValue.
Now the myValue's actual value can be an alphanumeric string where the first two characters are alphabets, the next 6 characters are numbers and the last character is a fixed alphabet let say 'x', which may be or may not be present in the value. (For Example 'AB123456','AB123456x')
So I know the format of the value for myValue field but not know all the actual values as there are lots of records.
Now I want to retrieve all such values for which the value without last character x (For Example, 'AB123456') and the same value with last character x (For Example, 'AB123456x') exists.
So is there any way I can retrieve such data?
I am right now doing trial and error on existing data but have not found success and there are thousands of such rows, so any help on this would be appreciated a lot.
You can do so like this:
SELECT myvalue
FROM t
WHERE myvalue LIKE '________'
AND EXISTS (
SELECT 1
FROM t AS x
WHERE x.myvalue = CONCAT(t.myvalue, 'x')
)
A (most likely) faster alternate is:
SELECT TRIM(TRAILING 'x' FROM myvalue) AS myvalue2
FROM t
GROUP BY myvalue2
HAVING COUNT(DISTINCT myvalue) > 1

MySQL move part of a cell to another column

I have the following imported to TableA, Column 'Clothes' and Column 'Colours'
The problem is the import has put in the 'Clothes' column 'Jeans - Blue' and 'Jumper - Red' etc etc
Please could someone help me with a query to keep everything before the - in 'Clothes' and everything after the - into 'Colours' and removing the - altogether.
Two steps for this.
First, update the colors:
UPDATE yourTableA T
SET T.Colours = TRIM(SUBSTR(T.Clothes,INSTR(T.Clothes,'-') + 2));
Second, update the Clothes:
UPDATE yourTableA T
SET T.Clothes = TRIM(SUBSTR(T.Clothes,1,INSTR(T.clothes,'-')-1));
I've used SUBSTR as my string swiss army knife here, and INSTR to locate the position of the - in between. You can do without TRIM, but I usually use this in those cases to avoid unnecessary white spaces.
There surely are more direct ways to do it, but this'll work.
The SUBSTRING_INDEX function is convenient, and the TRIM function can remove leading and trailing spaces. For example:
SELECT TRIM(SUBSTRING_INDEX(a.Clothes,'-',1)) AS Clothes
, TRIM(SUBSTRING_INDEX(a.Clothes,'-',-1)) AS Colours
FROM TableA a
WHERE LENGTH(a.Clothes)-LENGTH(REPLACE(a.Clothes,'-','')) = 1
(NOTE: the query above is returning the substring before the first '-' character, and is returning the substring after the last '-' character. So any values with more than one dash would lose the portion between the first and last dashes, consider e.g. 'A - B - C - D', the query above returns the A and returns the D, and loses everything else.
To handle this anomaly, the WHERE clause checks that the string contains a single occurrence of the '-' character.
Once you have a query you are happy with, you can turn that into an UPDATE statement, BUT be VERY careful about the order you assign new values to columns. Unlike other relational databases, MySQL does not guarantee that a reference to an existing column within the statement will be the value of the column from the beginning of the statement... the only guarantee is that it will be the value that is currently assigned. So, the order that the columns is assigned is important!
UPDATE TableA a
SET Colours = TRIM(SUBSTRING_INDEX(a.Clothes,'-',-1))
, Clothes = TRIM(SUBSTRING_INDEX(a.Clothes,'-',1))
WHERE LENGTH(a.Clothes)-LENGTH(REPLACE(a.Clothes,'-','')) = 1
Note that if we were to assign the Clothes column before we assigned a value to the Colours column, the value we want assigned to Colours would be "lost".
You can do it in a single UPDATE as follows:
UPDATE TableA
SET `Colours` = SUBSTRING_INDEX(`Clothes`, ' - ', -1),
`Clothes` = SUBSTRING_INDEX(`Clothes`, ' - ', 1)
;
You can experiment with SQL Fiddle Demo I created from your data.
Here's the data I worked with:
CREATE TABLE TableA
(Clothes varchar(20), Colours varchar(20))
;
INSERT INTO TableA
(`Clothes`, `Colours`)
VALUES
('Jeans - Blue', NULL),
('Jumper - Red', NULL)
;
This the result of SELECT * FROM TableA; :
CLOTHES COLOURS
Jeans Blue
Jumper Red