I have large tables of freely formatted text strings stored in MySQL database. Within each of those strings I have to find three substrings which are specifically formatted. This problem looks like an ideal fit for MySQL REGEXP pattern matching.
I know that MySQl REGEXP operator returns only True or False. Moreover, because I need to process large tables, I would need to achieve the goal within MySQL and not to involve PHP or any other server side language.
Example of source data:
FirstEntry_somestring_202320047A_210991957_700443250_Lieferadresse:_modified string c/o Logistics, some address and another text
SecondEntry_hereisanothertext_210991957_text_202320047A_and_700443250_another text which does not have any predefined structure
ThirdEntry_700443250_210991957_202320047A_Lieferadresse:_here some address, Logistics, and some another text with address.
FourthEntry some very long text before numbers__202320047A-700443250-210991957-Lieferadresse:, another text with address and company name. None of this text has predefined structure
The examples above have are four strings stored as TEXT datatypes within MySQL table. They do not have any specific structure. I know however, that somewhere in each records must be three numbers freely delimited and but they have specific format:
Regex Format: '\d{3}(30|31|32)\d{4}[A-Z])'
Regex Format:'(\d{3}(99)\d{4})')
Regex Format: '((700)\d{6})'
Could you please help me how can I get the substrings matching the Regex patterns in the text above?
The Server runs on:
Windows OS
IIS 7
MySQL for Windows
PHP
...
Thank you!
MariaDB 10.0.5 (from 2013) is virtually the same as MySQL, but it includes the full set of REGEXP. Also it has REGEXP_REPLACE().
See https://mariadb.com/kb/en/mariadb/pcre/
For those interested in this question, I have developed my own solution using MySQL Stored Procedures.
I think, this is the most valuable solution on this subject on StackOverflow, as it provides real solution. In contrast to others, there were only vague ideas offered:
-- Return REGEX Value
DELIMITER $$
DROP PROCEDURE IF EXISTS RETURNREGEX$$
CREATE PROCEDURE RETURNREGEX(IN strSentence VARCHAR(1024), IN regex_str VARCHAR(1024), IN length_str INT )
BEGIN
DECLARE index_str INT DEFAULT 0;
DECLARE match_str VARCHAR(1024) DEFAULT '';
DECLARE result BOOL DEFAULT FALSE;
REPEAT
-- Get substring with predefined length
SELECT SUBSTRING(strSentence, index_str, length_str) INTO match_str;
-- compare this substring agains REGEX to see if we have match
SELECT match_str REGEXP regex_str INTO result;
SET index_str = index_str + 1;
-- evaluate result (TRUE / FALSE)
UNTIL result OR index_str > length(strSentence)
END REPEAT;
IF result = TRUE THEN SELECT match_str;
ELSE SELECT NULL;
END IF;
END$$
DELIMITER ;
Related
I have a MySql stored procedure that has multiple parts. Procedure receives an INT "inId" and a VARCHAR(500) argument called "inIgnoreLogTypes" that's a comma-separated list of numbers.
First part of SQL looks like this:
DECLARE affectedNumbers text;
SELECT GROUP_CONCAT(am.Numbers) INTO affectedNumbers FROM Users am WHERE am.userID = inId;
I need to do that because variable "affectedNumbers" will be used later on throughout this rather big stored procedure so for sake of performances i don't wanna do "IN (Select ...)" every time i need to look up the list.
I checked, variable "affectedNumbers" get's correctly populated with comma separated values.
Next part is this (and that's where the problem occurs):
DELETE FROM UserLogs WHERE
FIND_IN_SET(User_Number, affectedNumbers) AND
NOT FIND_IN_SET(LogType, inIgnoreLogTypes);
Above statement does nothing and after hours of searching for "why" i can't find the answer... Maybe because "affcetedNumbers" is TEXT and "User_Number" is INT? Or maybe because "LogType" is INT and "inIgnoreLogTypes" is VARCHAR?
I checked both sets, they are comma separated integers...
Found the issue! I have to use something like this:
DELETE FROM UserLogs WHERE
FIND_IN_SET(UserLogs.User_Number, affectedNumbers) AND
NOT FIND_IN_SET(UserLogs.LogType, inIgnoreLogTypes);
Strange, as there were no errors.... Now it works.
MySql has a function CONCAT_WS that I use to export multiple fields with a delimiter into a single field. Works great!
There are multiple fields being stored in a database I query off of that has data that I need to extract each field individually but within each field the data need to include a delimiter. I can most certainly do a concatenate but that does take awhile to set-up if my data requires up to 100 unique values. Below is an example of what I am talking about
Stored Data 01020304050607
End Result 01,02,03,04,05,06,07
Stored Data 01101213
End Result 01,10,12,13
Is there a function in MySQL that does the above?
I am not that familiar with mysql but I have seen questions like this come up before where a regular expression function would be useful. There are user-defined functions available that allow Oracle-like regular expression functions to be used as their support is weak in mysql. See here: https://github.com/hholzgra/mysql-udf-regexp
So you could do something like this:
select trim(TRAILING ',' FROM regexp_replace(your_column, '(.{2})', '\1,') )
from your_table;
This adds a comma every 2 character then chops off the last one. Maybe this will give you some ideas.
I am a novice programmer and I'm currently working with functions and stored procedures in MySQL using Workbench 5.6 . I've been searching for some time now here on SO and on the Web for a formal definition of the "#" operator in MySQL and it's proper use, but I wasn't able to find some concrete explanation.
Let's say that I have this :
/*..... Stored Procedure... */
declare i int ;
set #i = 1 ;
select #i ;
/* do some other stuff */
End;
The result of select will be 1 ,instead, if I do:
select i ;
I will get a Null result.
From my intuition so far, I think that is accessing the direction in the memory of a stored variable and prints/modifies its content,still I'm not quite sure.Could you shed some more light?
Are there any other uses of it?
Thanks a priori.
It isn't an operator (I suspect you come from PHP, where it is an operator). It's the syntax for user-defined variables:
User variables are written as #var_name, where the variable name
var_name consists of alphanumeric characters, “.”, “_”, and “$”. A
user variable name can contain other characters if you quote it as a
string or identifier (for example, #'my-var', #"my-var", or
#my-var).
The # denotes a variable, you prefix your variables with the # to prevent confusing them with column names and other schema, it also makes life a lot easier when looking at code. When you enter select I from x;, your looking for column I, which doesn't exist in the table, hence the null.
I need to do the following and I'm struggling with the syntax:
I have a table called 'mytable' and a column called 'mycolumn' (string).
In mycolumn the value is a constructed value - for example, one of the values is: 'first:10:second:18:third:31'. The values in mycolumn are all using the same pattern, just the ids/numbers are different.
I need to change the value of 18 (in this particular case) to a value from another tables key. The end result for this column value should be 'first:10:second:22:third:31' because I replaced 18 with 22. I got the 22 from another table using the 18 as a lookup value.
So ideally I would have the following:
UPDATE mytable
SET mycolumn = [some regex function to find the number between 'second:' and ":third" -
let's call that oldkey - and replace it with other id from another table -
(select otherid from tableb where id = oldkey)].
I know the mysql has a REPLACE function but that doesn't get me far enough.
You can create your own function. I am scared of REGEX so I use SUBSTRING and SUBSTRING_INDEX.
CREATE FUNCTION SPLIT_STRING(str VARCHAR(255), delim VARCHAR(12), pos INT)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1),
delim, '');
SPLIT_STRING('first:10:second:18:third:31', ':', 4)
returns 18
Based on this answer:
Equivalent of explode() to work with strings in MySQL
The problem with MySQL is it's REGEX flavor is very limited and does not support back references or regex replace, which pretty much makes it impossible to replace the value like you want to with MySQL alone.
I know it means taking a speed hit, but you may want to consider selecting the row you want with by it's id or however you select it, modify the value with PHP or whatever language you have interfacing with MySQL and put it back in with an UPDATE query.
Generally speaking, REGEX in programming languages is much more powerful.
If you keep those queries slim and quick, you shouldn't take too big of a speed hit (probably negligible).
Also, here is documentation on what MySQL's REGEX CAN do. http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Cheers
EDIT:
To be honest, eggyal's comment makes a whole lot more sense for your situation (simple int values). Just break them up into columns there's no reason to access them like that at all imo.
You want something like this, where it matches the group:
WHERE REGEXP 'second:([0-9]*):third'
However, MySQL doesn't have a regex replace function, so you would have to use a user-defined function:
REGEXP_REPLACE?(text, pattern, replace [,position [,occurence [,return_end [,mode]]])
User-defined function is available here:
http://www.mysqludf.org/lib_mysqludf_preg/
Using SQL-Server 2008 and concatenating string literals to more than 8000 characters by obvious modification of the following script, I always get the result 8000. Is there a way to tag string literals as varchar(max)?
DECLARE #t TABLE (test varchar(max));
INSERT INTO #t VALUES ( '0123456789012345678901234567890123456789'
+ '0123456789012345678901234567890123456789'
+ '... and 200 times the previous line'
);
select datalength(test) from #t
I used the following code on SQL Server 2008
CREATE TABLE [dbo].[Table_1](
[first] [int] IDENTITY(1,1) NOT NULL,
[third] [varchar](max) NOT NULL
) ON [PRIMARY]
END
GO
declare #maxVarchar varchar(max)
set #maxVarchar = (REPLICATE('x', 7199))
set #maxVarchar = #maxVarchar+(REPLICATE('x', 7199))
select LEN(#maxVarchar)
insert table_1( third)
values (#maxVarchar)
select LEN(third), SUBSTRING (REVERSE(third),1,1) from table_1
The value you are inserting in your example is being stored temporally as a varchar(8000) because. To make the insert one will need to use a variable which is varchar(max) and append to it to overcome the internal 8000 limit.
Try casting your value being inserted as a varchar(max):
INSERT INTO #t VALUES (CAST('0123456789012345678901234567890123456789'
+ '0123456789012345678901234567890123456789'
+ '... and 200 times the previous line' AS varchar(max)
);
Also, you may have to concatenate several <8000 length strings (each casted as varchar(max)).
See this MSDN Forum Post.
When I posted the question, I was convinced that there are some limitations for the length or maximal line width of a single string literal to be used in INSERT and UPDATE statement.
This assumption is wrong.
I was led to this impression by the fact the SSMS limits output width for a single column in text mode to 8192 characters and output of PRINT statements to 8000 characters.
Fact is, as far as I know you need only enclose the string with apostrophes and double all embedded apostrophes. I found no restrictions concerning width or total length of a string.
For the opposite task, to convert such strings back from database back to script the best tool I found is ssms toolspack which works for SQL-Server 2005+.