In my MySQL database I have a column of strings in UTF-8 format for which I want to extract the first character using a RegEx, for example.
Assuming a RegEx which ONLY extracts the following characters:
ਹਮਜਰਣਚਕਨਖਲਨ
And given the following string:
ਹੁਕਮਿ ਰਜਾਈ ਚਲਣਾ ਨਾਨਕ ਲਿਖਿਆ ਨਾਲਿ ॥੧॥
The only characters extracted would be:
ਹਰਚਨਲਨ
I know the following steps would be required to solve this problem:
Break the string into individual words (substrings) by using space as the delimiter
For each word extract the first letter (substring of a substring) if it matches what is in the regex of valid characters
I have looked at all the similar questions/answers on SO and none have been able to solve my problem thus far.
I realy don't know MySql Regex Syntax and restrictions(never used), but you can add leading space before string, and match with something simple like this: " ([ਮਜਰਣਚਕਨਖਲਨ]{1})"
So, if you concatenate matched groups you will have this string "ਰਚਨਲਨ"(only "ਹ" not matched, because it's not exists in sample")
in C# it may look like this(working sample):
namespace TestRegex
{
using System.Linq;
using System.Text.RegularExpressions;
using System.Windows.Forms;
class Program
{
static void Main(string[] args)
{
// leading space(to match first word too)
// + sample string
var sample = " ";
sample += "ਹੁਕਮਿ ਰਜਾਈ ਚਲਣਾ ਨਾਨਕ ਲਿਖਿਆ ਨਾਲਿ ॥੧॥";
// Regex pattern that will math space, and
// if next character in set - add it to "match group 1"
var pattern = " ([ਮਜਰਣਚਕਨਖਲਨ]{1})";
// select every "match group 1" from matches as array
var result = from Match m in Regex.Matches(sample, pattern)
select m.Groups[1];
// concatenate array content into one string and
// show it in message box to user, for example..
MessageBox.Show(string.Concat(result));
}
}
}
in most non-query languages it will be look almost same. For example in php you need to do preg_match_all, and in foreach loop add "$match[i][1]"(every "match group 1") from every match to end of one single string.
well.. pretty simple. but not for mysql...
I finally achieved this with the help of a programmer friend of mine. I directly pasted the following piece of code into the SQL section of my database in PhpMyAdmin:
delimiter $$
drop function if exists `initials`$$
CREATE FUNCTION `initials`(str text, expr text) RETURNS text CHARSET utf8
begin
declare result text default '';
declare buffer text default '';
declare i int default 1;
if(str is null) then
return null;
end if;
set buffer = trim(str);
while i <= length(buffer) do
if substr(buffer, i, 1) regexp expr then
set result = concat( result, substr( buffer, i, 1 ));
set i = i + 1;
while i <= length( buffer ) and substr(buffer, i, 1) regexp expr do
set i = i + 1;
end while;
while i <= length( buffer ) and substr(buffer, i, 1) not regexp expr do
set i = i + 1;
end while;
else
set i = i + 1;
end if;
end while;
return result;
end$$
drop function if exists `acronym`$$
CREATE FUNCTION `acronym`(str text) RETURNS text CHARSET utf8
begin
declare result text default '';
set result = initials( str, '[ੴਓੳਅੲਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜਸ਼ਖ਼ਗ਼ਜ਼ਫ਼ਲ਼]' );
return result;
end$$
delimiter ;
UPDATE scriptures SET search = acronym(scripture)
Just to explain the last line:
scriptures is the table I want to update
search is a new empty column I created inside the table to store the result
scripture is an existing column inside the scriptures table with all the strings I want to extract from
acronym is the function previously declared which is looking to match the first letter of each word with a character from the RegEx [ੴਓੳਅੲਸਹਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਵੜਸ਼ਖ਼ਗ਼ਜ਼ਫ਼ਲ਼]
So this final line of the code will go through each row of the column scripture, apply the function acronym to it and store the result in the new search column.
Perfect! Exactly what I was looking for :)
Related
I have a column with both English and Chinese text.
Example: The hills have eyes. 隔山有眼
Expected results: The hills have eyes.
How can I extract the English text from that string using sql, please.
Thanks for help.
A quick-and-dirty way simply converts the string to ASCII and removes the '?' -- which is the representation of the other characters:
select replace(convert(t.str using ascii), '?', '')
from t;
The only downside is that you lose '?' characters in the original string as well.
Here is a db<>fiddle.
For more control over the replacement, you can use regexp_replace():
select regexp_replace(t.str, '[^a-zA-Z0-9.?, ]', '')
from t;
Unfortunately, I am not aware of a character class for ASCII-only characters.
One option you have is to use a function that returns just the english only text.
Additionally, you could make it dual-purpose with another parameter to determine if you want the English text or Non-English text to switch the <127 comparison.
CREATE FUNCTION `EnglishOnly`(String VARCHAR(100))
RETURNS varchar(100)
NO SQL
BEGIN
DECLARE output VARCHAR(100) DEFAULT '';
DECLARE i INTEGER DEFAULT 1;
DECLARE ch varchar(1);
IF LENGTH(string) > 0 THEN
WHILE(i <= LENGTH(string)) DO
SET ch=SUBSTRING(string, i, 1);
IF ASCII(ch)<127 then
set output = CONCAT(output,ch);
END IF;
SET i = i + 1;
END WHILE;
END IF;
RETURN output;
END;
You can then sinply use it like so
select EnglishOnly ("The hills have eyes 隔山有眼that see all.")
Output
The hills have eyes that see all.
Example Fiddle
The intended result is to store the notes of edits to a field, in another field.
I want the new notes to APPEND to the storage field, and since the is not function that does this I am attmpting to find a way to work this out without adding more layers of code like functions and stored procedures.
/* Before Update Trigger */
DECLARE v_description VARCHAR(255);
DECLARE v_permnotes MEDIUMTEXT;
DECLARE v_oldnote VARCHAR(500);
DECLARE v_now VARCHAR(25);
SET v_now = TRIM(DATE_FORMAT(NOW(), '%Y-%m-%d %k:%i:%s'));
SET v_oldnote = OLD.notes;
IF (NEW.permanent_notes IS NULL) THEN
SET v_permnotes = '';
ELSE
SET v_permnotes = OLD.permanent_notes;
END IF;
SET NEW.permanent_notes = CONCAT_WS(CHAR(10), v_permnotes, v_now,": ", v_description);
I'm aiming to have the results in the permanent field look like this
<datetime value>: Some annotation from the notes field.
<a different datetime>: A new annotation
etc....
What I get from my current trigger:
2018-12-30 17:15:50
:
Test 17: Start from scratch.
2018-12-30 17:35:51
:
Test 18: Used DATE_FORMAT to sxet the time
2018-12-30 17:45:52
:
Test 19. Still doing a carriage return after date and after ':'
I can't figure out why there is a newline after the date, and then again after the ':'.
If I leave out CHAR(10), I get:
Test 17: Start from scratch.
2018-12-30 17:35:51
:
Test 18: Used DATE_FORMAT to sxet the time
2018-12-30 17:45:52
:
Test 19. Still doing a carriage return after date and after ':'Test 20. Still doing a carriage return after date and after ':'
Some fresh/more experienced eyes would be really helpful in debugging this.
Thanks.
I think you should just be using plain CONCAT here:
DECLARE separator VARCHAR(1);
IF (NEW.permanent_notes IS NULL) THEN
SET separator = '';
ELSE
SET separator = CHAR(10)
END IF;
-- the rest of your code as is
SET
NEW.permanent_notes = CONCAT(v_permnotes, separator, v_now, ": ", v_description);
The logic here is that we conditionally print a newline (CHAR(10)) before each new log line, so long as that line is not the very first. You don't really want CONCAT_WS here, which is mainly for adding a separator in between multiple terms. You only want a single newline in between each logging statement.
Is there a case insensitive Replace for MySQL?
I'm trying to replace a user's old username with their new one within a paragraph text.
$targetuserold = "#".$mynewusername;
$targetusernew = "#".$newusername;
$sql = "
UPDATE timeline
SET message = Replace(message,'".$targetuserold."', '".$targetusernew."')
";
$result = mysql_query($sql);
This is missing the instances where the old username is a different case. Example: replacing "Hank" with "Jack" in all the rows in my database will leave behind instances of "hank".
An easier way that works without any stored function:
SELECT message,
substring(comments,position(lower('".$targetuserold."') in message) ) AS oldval
FROM timeline
WHERE message LIKE '%".$targetuserold."%'
gives you the exact, case sensitive spellings of the username in all messages. As you seem to run that from a PHP script, you could use that to collect the spellings together with the corresponding IDs, and then run a simple REPLACE(message,'".$oldval.",'".$targetusernew."') on that. Or use the above as sub-select:
UPDATE timeline
SET message = REPLACE(
message,
(SELECT substring(comments,position(lower('".$targetuserold."') in message))),
'".$targetusernew."'
)
Works like a charm here.
Credits given to this article, where I got the idea from.
Here it is:
DELIMITER $$
DROP FUNCTION IF EXISTS `replace_ci`$$
CREATE FUNCTION `replace_ci` ( str TEXT,needle CHAR(255),str_rep CHAR(255))
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE return_str TEXT DEFAULT '';
DECLARE lower_str TEXT;
DECLARE lower_needle TEXT;
DECLARE pos INT DEFAULT 1;
DECLARE old_pos INT DEFAULT 1;
SELECT lower(str) INTO lower_str;
SELECT lower(needle) INTO lower_needle;
SELECT locate(lower_needle, lower_str, pos) INTO pos;
WHILE pos > 0 DO
SELECT concat(return_str, substr(str, old_pos, pos-old_pos), str_rep) INTO return_str;
SELECT pos + char_length(needle) INTO pos;
SELECT pos INTO old_pos;
SELECT locate(lower_needle, lower_str, pos) INTO pos;
END WHILE;
SELECT concat(return_str, substr(str, old_pos, char_length(str))) INTO return_str;
RETURN return_str;
END$$
DELIMITER ;
Usage:
$sql = "
UPDATE timeline
SET message = replace_ci(message,'".$targetuserold."', '".$targetusernew."')
";
My solution ultimately was that I cannot do a case insensitive Replace.
However, I did find a workaround.
I was trying to have a feature where a user can change their username. The system would then need to update wherever #oldusername was found in all the messages in the database.
The problem was... people wouldn't type other people's usernames in the correct case that it is found in the members table. So when the user would change their username, it wouldn't catch those instances of #oldSeRNAmE because of it not matching the case of the real format of the oldusername.
I don't have permission with my GoDaddy shared server to do this with a customized SQL function, so I had to find a different way.
My solution: Upon inserting new messages into the database, whenever a username is found in the new message, I have an UPDATE statement at that point to replace the username they typed with the correct formatted case that is found in the members table. That way, if that person ever wants to change their username in the future, all the instances of that username in the database will all be the same exact formatted case. Problem solved.
I keep getting a error when i run this code, What wrong with the code?
create or replace function f_vars(line varchar2,delimit varchar2 default ',')
return line_type is type line_type is varray(1000) of varchar2(3000);
sline varchar2 (3000);
line_var line_type;
pos number;
begin
sline := line;
for i in 1 .. lenght(sline)
loop
pos := instr(sline,delimit,1,1);
if pos =0 then
line_var(i):=sline;
exit;
endif;
string:=substr(sline,1,pos-1);
line_var(i):=string;
sline := substr(sline,pos+1,length(sline));
end loop;
return line_var;
end;
LINE/COL ERROR
20/5 PLS-00103: Encountered the symbol "LOOP" when expecting one of
the following:
if
22/4 PLS-00103: Encountered the symbol "end-of-file" when expecting
one of the following:
end not pragma final instantiable order overriding static
member constructor map
Stack Overflow isn't really a de-bugging service.
However, I'm feeling generous.
You have spelt length incorrectly; correcting this should fix your first error. Your second is caused by endif;, no space, which means that the if statement has no terminator.
This will not correct all your errors. For instance, you're assigning something to the undefined (and unnecessary) variable string.
I do have more to say though...
I cannot over-emphasise the importance of code-style and whitespace. Your code is fairly unreadable. While this may not matter to you now it will matter to someone else coming to the code in 6 months time. It will probably matter to you in 6 months time when you're trying to work out what you wrote.
Secondly, I cannot over-emphasise the importance of comments. For exactly the same reasons as whitespace, comments are a very important part of understanding how something works.
Thirdly, always explicitly name your function when ending it. It makes things a lot clearer in packages so it's a good habit to have and in functions it'll help with matching up the end problem that caused your second error.
Lastly, if you want to return the user-defined type line_type you need to declare this _outside your function. Something like the following:
create or replace object t_line_type as object ( a varchar2(3000));
create or replace type line_type as varray(1000) of t_line_type;
Adding whitespace your function might look something like the following. This is my coding style and I'm definitely not suggesting that you should slavishly follow it but it helps to have some standardisation.
create or replace function f_vars ( PLine in varchar2
, PDelimiter in varchar2 default ','
) return line_type is
/* This function takes in a line and a delimiter, splits
it on the delimiter and returns it in a varray.
*/
-- local variables are l_
l_line varchar2 (3000) := PLine;
l_pos number;
-- user defined types are t_
-- This is a varray.
t_line line_type;
begin
for i in 1 .. length(l_line) loop
-- Get the position of the first delimiter.
l_pos := instr(l_line, PDelimiter, 1, 1);
-- Exit when we have run out of delimiters.
if l_pos = 0 then
t_line_var(i) := l_line;
exit;
end if;
-- Fill in the varray and take the part of a string
-- between our previous delimiter and the next.
t_line_var(i) := substr(l_line, 1, l_pos - 1);
l_line := substr(l_line, l_pos + 1, length(l_line));
end loop;
return t_line;
end f_vars;
/
Is there a way of enabling a long strings to be put onto multiple lines so that when viewed on screen or printed the code is easier to read?
Perhaps I could be clearer.
Have a stored procedure with lines like
IF ((select post_code REGEXP '^([A-PR-UWYZ][A-HK-Y]{0,1}[0-9]{1,2} [0-9][ABD-HJLNP-UW-Z]{2})|([A-PR-UWYZ][0-9][A-HJKMPR-Y] [0-9][ABD-HJLNP-UW-Z]{2})|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]) [0-9][ABD-HJLNP-UW-Z]{2})$') = 0)
Would like to be able to modify the string so that I can view it within 80 character width. Anybody got any ideas of how to do this.
PS: It is the regular expression for UK postcodes
For example,
-- a very long string in one block
set my_str = 'aaaabbbbcccc';
can be also written as
-- a very long string, as a concatenation of smaller parts
set my_str = 'aaaa' 'bbbb' 'cccc';
or even better
-- a very long string in a readable format
set my_str = 'aaaa'
'bbbb'
'cccc';
Note how the spaces and end of line between the a/b/c parts are not part of the string itself, because of the placement of quotes.
Also note that the string data here is concatenated by the parser, not at query execution time.
Writing something like:
-- this is broken
set my_str = 'aaaa
bbbb
cccc';
produces a different result.
See also
http://dev.mysql.com/doc/refman/5.6/en/string-literals.html
Look for "Quoted strings placed next to each other are concatenated to a single string"
You could split it up into the front and back components of the postcode and then dump the whole lot into a UDF.
This will keep the ugliness in one place and means you'll only have to make changes to one block of code when/if Royal Mail decide to change the format of UK postcodes ;-)
Something like this should do the trick:
DELIMITER $$
CREATE FUNCTION `isValidUKPostcode`(candidate varchar(255)) RETURNS BOOLEAN READS SQL DATA
BEGIN
declare back varchar(3);
declare front varchar(10);
declare v_out boolean;
set back = substr(candidate,-3);
set front = substr(candidate,1,length(candidate)-3);
set v_out = false;
IF (back REGEXP '^[0-9][ABD-HJLNP-UW-Z]{2}$'= 1) THEN
CASE
WHEN front REGEXP '^[A-PR-UWYZ][A-HK-Y]{0,1}[0-9]{1,2} $' = 1 THEN set v_out = true;
WHEN front REGEXP '^[A-PR-UWYZ][0-9][A-HJKMPR-Y] $' = 1 THEN set v_out = true;
WHEN front REGEXP '^[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y] $' = 1 THEN set v_out = true;
END CASE;
END IF;
return v_out;
END