Replace a column value in a table using regex in mysql - mysql

I need to do the following and I'm struggling with the syntax:
I have a table called 'mytable' and a column called 'mycolumn' (string).
In mycolumn the value is a constructed value - for example, one of the values is: 'first:10:second:18:third:31'. The values in mycolumn are all using the same pattern, just the ids/numbers are different.
I need to change the value of 18 (in this particular case) to a value from another tables key. The end result for this column value should be 'first:10:second:22:third:31' because I replaced 18 with 22. I got the 22 from another table using the 18 as a lookup value.
So ideally I would have the following:
UPDATE mytable
SET mycolumn = [some regex function to find the number between 'second:' and ":third" -
let's call that oldkey - and replace it with other id from another table -
(select otherid from tableb where id = oldkey)].
I know the mysql has a REPLACE function but that doesn't get me far enough.

You can create your own function. I am scared of REGEX so I use SUBSTRING and SUBSTRING_INDEX.
CREATE FUNCTION SPLIT_STRING(str VARCHAR(255), delim VARCHAR(12), pos INT)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1),
delim, '');
SPLIT_STRING('first:10:second:18:third:31', ':', 4)
returns 18
Based on this answer:
Equivalent of explode() to work with strings in MySQL

The problem with MySQL is it's REGEX flavor is very limited and does not support back references or regex replace, which pretty much makes it impossible to replace the value like you want to with MySQL alone.
I know it means taking a speed hit, but you may want to consider selecting the row you want with by it's id or however you select it, modify the value with PHP or whatever language you have interfacing with MySQL and put it back in with an UPDATE query.
Generally speaking, REGEX in programming languages is much more powerful.
If you keep those queries slim and quick, you shouldn't take too big of a speed hit (probably negligible).
Also, here is documentation on what MySQL's REGEX CAN do. http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Cheers
EDIT:
To be honest, eggyal's comment makes a whole lot more sense for your situation (simple int values). Just break them up into columns there's no reason to access them like that at all imo.

You want something like this, where it matches the group:
WHERE REGEXP 'second:([0-9]*):third'
However, MySQL doesn't have a regex replace function, so you would have to use a user-defined function:
REGEXP_REPLACE?(text, pattern, replace [,position [,occurence [,return_end [,mode]]])
User-defined function is available here:
http://www.mysqludf.org/lib_mysqludf_preg/

Related

What is the purpose of using WHERE COLUMN like '%[_][01][7812]' in SQL statements?

What is the purpose of using WHERE COLUMN like '%[_][01][7812]' in SQL statements?
I get some result, but don't know how to use properly.
I see that it is searching through the base, but I don't understand the pattern.
Like selects strings similar to a pattern. The pattern you're looking at uses several wildcards, which you can review here: https://www.w3schools.com/SQL/sql_wildcards.asp
Briefly, the query seems to ba matching any row where COLUMN ends in an _ then a 0 or a 1, then a 7,8,1, or 2. (So it would match 'blah_07' but not 'blah_81', 'blah_0172', or 'blah18')
First thing as you might be aware that where clause is used for filtering rows.
In your case (Where column Like %[_][01][7812]) Means find the column ending with [_][01][7812] and there could be anything place of %
declare
#searchString varchar(50) = '[_][01][7812]',
#testString varchar(50) = 'BeginningOfString' + '[_][01][7812]' + 'EndofString'
select CHARINDEX(#searchString, #testString), #testString, LEN(#testString) as [totalLength]
set #testString = '[_][01][7812]' + 'EndofString'
select CHARINDEX(#searchString, #testString), #testString, LEN(#testString) as [totalLength]
set #testString = 'BeginningOfString' + '[_][01][7812]'
select CHARINDEX(#searchString, #testString), #testString, LEN(#testString) as [totalLength]
Although you've tagged your post MySQL, that code seems unlikely to have been written for it. That LIKE pattern, to me, resembles Microsoft SQL Server's variation on the syntax, where it would match anything ending with an underscore followed by a zero or a one, followed by a 7, an 8 a 1 or a 2.
So your example 'TA01_55_77' would not match, but 'TA01_55_18' would, as would 'GZ01_55_07'
(In SQL Server, enclosing a wildcard character like '_' in square brackets escapes it, turning it into a literal underscore.)
Of course, there may be other RDBMSes with similar syntax, but what you've presented doesn't seem like it would work on the data you've got if running in MySQL.

How to manipulate SET datatype in MySQL with fast binary operations?

I want to efficiently store and efficiently manipulate bit flags for a record in MySQL. The SET datatype satisfies the first wish because up to 64 flags are stored as a single number. But what about the second? I have seen only awkward solutions like
UPDATE table_name SET set_col = (set_col | 4) WHERE condition;
UPDATE table_name SET set_col = (set_col & ~4) WHERE condition;
to respectively include and exclude a member into the value. I.e. I have to use numeric constants, which renders the code unmaintainable. Then I could have used INT datatype as well. If set_col definition gets changed (adding, removing or reordering the possible members), the code with hard-coded constants becomes a mess. I could try to enforce some discipline on coders to use only named variables in application language instead of numeric constants which would make maintenance easier, but not totally error-proof. Is there a solution where MySQL would resolve the symbolic names of set members to their correct numeric values? E.g. this does not work:
UPDATE person SET tag=tag | 'MGR'
To stem useless answers, I know about database normalization and a separate m-to-n relationship table, that is not the topic here. If you need a more concrete example, here you are:
CREATE TABLE `coder` (
`name` VARCHAR(50) NOT NULL,
`languages` SET('Perl','PHP','Java','Scala') NOT NULL
)
Changes to the set definition are unlikely but possible, maybe every other year, like splitting "Perl" into "Perl5" and "Perl6".
I found the answer here:
https://dev.mysql.com/doc/refman/5.7/en/set.html
Posted by John Kozura on April 12, 2011
Note that MySQL, at least
5.1+, seems to deal just fine with extra commas, so setting/deleting individual bits by name can be done very simply without creating a
"proper" list. So even something like SET flags=',,,foo,,bar,,' works
fine, if you don't care about a truncated data warning.
add bits:
UPDATE tbl SET flags=CONCAT_WS(',', flags, 'flagtoadd');
delete bits:
UPDATE tbl SET flags=REPLACE(flags, 'flagtoremove', '')
..or if you have a bit that's name is a subname of another bit like
"foo" and "foot", slightly more complicated:
UPDATE tbl SET flags=REPLACE(CONCAT(',', flags, ','), ',foo,', ',')
If the warnings do cause issues from you, then the solutions posted
above work:
add:
UPDATE tbl SET flags=TRIM(',' FROM CONCAT(flags, ',', 'flagtoadd'))
delete:
UPDATE tbl SET flags=TRIM(',' FROM REPLACE(CONCAT(',', flags, ','), ',flagtoremove,', ','))

Optimize this MySQL query to not use functions

/*Usage example: This function takes S. O. L. I. D. as input and returns SOLID. And similarly removes single quotes, hyphens and slashes from input*/
CREATE DEFINER=`root`#`localhost` FUNCTION `SanitiseNameForSearch`(Name nvarchar(100)) RETURNS varchar(100) CHARSET utf8
BEGIN
RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Name, ' ', ''), '.', ''), '''', ''), '-', ''), '/', '');
END
Using this function here in a procedure, applied the function on search input and on column. Works fine, but definitely not scalable.
CREATE DEFINER=`root`#`localhost` PROCEDURE `Search`(SearchFilter nvarchar(20))
BEGIN
SET #SearchFilter = `SanitiseNameForSearch`(SearchFilter);
SELECT t.TermId, t.Name
FROM Terminology AS t
WHERE `SanitiseNameForSearch`(Name) Like #SearchFilter
ORDER BY length(Name) asc
LIMIT 5;
END;
Is it ideal to implement this functionality via function or add a separate column/table that holds the column values after the function is applied i.e. hold precalculated value of SanitiseNameForSearch(Name) so that it can be indexed?
LIKE '%...' does not optimize at all. It will do a table scan. Therefore, this is the main part of being "non scalable". The functions, etc are insignificant in comparison.
So, what to do? Look at FULLTEXT to see if you can use it. It will probably require changing expectations of the users -- it looks for whole words, not arbitrary substrings. But it is much faster and scalable.
Using FULLTEXT would obviate the need for your Sanitize function.
(So, guys, what's all this crap about downvoting and closing the question? Haven't I answered his question, or at least the implied question?)

MySQL: REGEXP to remove part of a record

I have a table "locales" with a column named "name". The records in name always begin with a number of characters folowed by an underscore (ie "foo_", "bar_"...). The record can have more then one underscore and the pattern before the underscore may be repeated (ie "foo_bar_", "foo_foo_").
How, with a simple query, can I get rid of everything before the first underscore including the first underscore itself?
I know how to do this in PHP, but I cannot understand how to do it in MySQL.
SELECT LOCATE('_', 'foo_bar_') ... will give you the location of the first underscore and SUBSTR('foo_bar_', LOCATE('_', 'foo_bar_')) will give you the substring starting from the first underscore. If you want to get rid of that one, too, increment the locate-value by one.
If you now want to replace the values in the tables itself, you can do this with an update-statement like UPDATE table SET column = SUBSTR(column, LOCATE('_', column)).
select substring('foo_bar_text' from locate('_','foo_bar_text'))
MySQL REGEXs can only match data, they can't do replacements. You'd need to do the replacing client-side in your PHP script, or use standard string operations in MySQL to do the changes.
UPDATE sometable SET somefield=RIGHT(LENGTH(somefield) - LOCATE('_', somefield));
Probably got some off-by-one errors in there, but that's the basic way of going about it.

Converting a upper case database to proper case

I am new to SQL and I have several large database with upper case first and last names that I need to convert to proper case in SQL sever 2008.
I am using the following to do this:
update database
Set FirstNames = upper(substring(FirstNames, 1, 1))
+ lower(substring(FirstNames, 2, (len(FirstNames) - 1) ))
I was wondering if there was any way to adapt this so that a field with two first names is also updated (currently I make the change and then go through and manually change the second name).
I have looked over the other answers in this field and they all seem quit long, compared to the query above.
Also is there any way to assist with converting the Mc suranmes ( I will manually change the others)? MCDONALD to McDonald, again I am just using the about query but replacing the FirstNames with LastName.
This is probably best done outside of SQL. However, if there is a requirement to do it on the server or if speed isn't an issue (because it will be an issue so you need to figure out if you care), the way you are going about it is probably the best way of doing so. If you want, you could create a UDF that puts all of the logic in one area.
Here is some code I came across (with attribution and more information below it):
CREATE FUNCTION dbo.fCapFirst(#input NVARCHAR(4000)) RETURNS NVARCHAR(4000)
AS
BEGIN
DECLARE #position INT
WHILE IsNull(#position,Len(#input)) > 1
SELECT #input = Stuff(#input,IsNull(#position,1),1,upper(substring(#input,IsNull(#position,1),1))),
#position = charindex(' ',#input,IsNull(#position,1)) + 1
RETURN (#input)
END
--Call it like so
select dbo.fCapFirst(Lower(Column)) From MyTable
I got this code from http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=37760 There is more information and other suggestions in this forum as well.
As for dealing with cases like the McDonald, I would suggest one of two ways to handle this. One would be to put a search in the above UDF for key names ('McDonald', 'McGrew', etc.) or for patterns (the first two letters are Mc then make the next one capital, etc.) The second way would be to put these cases (the full names) in a table and have their replacement value in a second column. Then simply do a replace. Most likely, however, it will be easiest to identify rules like Mc then capitalize instead of trying to list every last-name possibility.
Don't forget you may want to modify the above UDF to include dashes, not just spaces.
Maybe this is too long but it is very easy and can be adapted for -, ', etc:
UPDATE tbl SET LastName = Case when (CharIndex(' ',lastname,1)<>0) then (Upper(Substring(lastname,1,1))+Lower(Substring(lastname,2,CharIndex(' ',lastname,1)-1)))+
(Upper(Substring(lastname,CharIndex(' ',lastname,1)+1,1))+
Lower(Substring(lastname,CharIndex(' ',lastname,1)+2,Len(lastname)-(CharIndex(' ',lastname,1)-1))))
else (Upper(Substring(lastname,1,1))+Lower(Substring(lastname,2,Len(lastname)-1))) end,
FirstName = Case when (CharIndex(' ',firstname,1)<>0) then (Upper(Substring(firstname,1,1))+Lower(Substring(firstname,2,CharIndex(' ',firstname,1)-1)))+
(Upper(Substring(firstname,CharIndex(' ',firstname,1)+1,1))+
Lower(Substring(firstname,CharIndex(' ',firstname,1)+2,Len(firstname)-(CharIndex(' ',firstname,1)-1))))
else (Upper(Substring(firstname,1,1))+Lower(Substring(firstname,2,Len(firstname)-1))) end;
Tony Rogerson has code that deals with:
double barrelled names eg Arthur Bentley-Smythe
Control characters
I haven't used it myself though...