I need to remove specific text from a string value in SQL.
I've tried various CHARINDEX and LEN combinations but keep getting it wrong!
I have a name field which, contains names. Some of the fields have text in () added in.
Example :
Smith (formerly Jones)
I need to remove the whole section inside the brackets. as well as the brackets themselves. Unfortunately sometimes the value can be
Smith (formerly Jones) Reeves
So I can't just remove everything from the ( onwards!
Here are two examples how to accomplish this. You can do this without declaring the #StartIndex and #EndIndex variables, but I have used them for the sake of clarity.
DECLARE #StartIndex int, #EndIndex int;
DECLARE #Str varchar(100);
SET #Str = 'This is a (sentence to use for a) test';
SELECT #StartIndex = CHARINDEX('(', #Str, 0), #EndIndex = CHARINDEX(')', #Str, 0);
SELECT SUBSTRING(#Str, 0, #StartIndex) + SUBSTRING(#Str, #EndIndex + 1, LEN(#Str) - #EndIndex) AS [Method1],
LEFT(#Str, #StartIndex - 1) + RIGHT(#Str, LEN(#Str) - #EndIndex) AS [Method2];
Note that this code does not remove the spaces before or after the parentheses, so you end up with two spaces between "a" and "test" (since that wasn't part of your question).
Additional error checking should be included before actually using code like this as well, for example if #Str does not contain parentheses it would cause an error.
Related
I have a table where each row consist of an attribute which consist of html data with like this.
<div className="single_line"><p>New note example</p></div>
I need to omit the html tags and extract only the data inside the tags using sql query. Any idea on how to achieve this?. I tried out different regex but they didnt work.
There are 2 solutions based on mysql version.
If you are using MySQL 8.0 then you can use REGEXP_REPLACE() directly inside the select statement.
SELECT REGEXP_REPLACE('<div><p>New note example</p></div>', '(<[^>]*>)|( )', '');
If you are using MySQL 5.7 then you have to create a user define function in database to strip html tags.
DROP FUNCTION IF EXISTS fn_strip_html_tags;
CREATE FUNCTION fn_strip_html_tags( html_text TEXT ) RETURNS TEXT
BEGIN
DECLARE start,end INT DEFAULT 1;
DECLARE text_without_nbsp TEXT;
LOOP
SET start = LOCATE("<", html_text, start);
IF (!start) THEN RETURN html_text; END IF;
SET end = LOCATE(">", html_text, start);
IF (!end) THEN SET end = start; END IF;
SET text_without_nbsp = REPLACE(html_text, " ", " ");
SET html_text = INSERT(text_without_nbsp, start, end - start + 1, "");
END LOOP;
END
For example
SELECT fn_strip_html_tags('<div><p>New note example</p></div>');
Preface:
I've done quite a bit of (re)searching on this, and found the following SO post/answer: https://stackoverflow.com/a/5361490/6095216 which was pretty close to what I'm looking for. The same code, but with somewhat more helpful comments, appears here: http://thenoyes.com/littlenoise/?p=136 .
Problem Description:
I need to split 1 column of MySQL TEXT data into multiple columns, where the original data has this format (N <= 7):
{"field1":"value1","field2":"value2",...,"fieldN":"valueN"}
As you might guess, I only need to extract the values, putting each one into a separate (predefined) column. The problem is that the number and order of the fields is not guaranteed to be the same for all records. Thus, solutions using SUBSTR/LOCATE, etc. don't work, and I need to use regular expressions. Another restriction is that 3rd party libraries such as LIB_MYSQLUDF_PREG (suggested in the answer from my 1st link above) cannot be used.
Solution/Progress so far:
I've modified the code from the above links such that it returns the first/shortest match, left-to-right; otherwise, NULL is returned. I also refactored it a bit and made the identifiers more reader/maintainer-friendly :)
Here's my version:
CREATE FUNCTION REGEXP_EXTRACT_SHORTEST(string TEXT, exp TEXT)
RETURNS TEXT DETERMINISTIC
BEGIN
DECLARE adjustStart, adjustEnd BOOLEAN DEFAULT TRUE;
DECLARE startInd INT DEFAULT 1;
DECLARE endInd, strLen INT;
DECLARE candidate TEXT;
IF string NOT REGEXP exp THEN
RETURN NULL;
END IF;
IF LEFT(exp, 1) = '^' THEN
SET adjustStart = FALSE;
ELSE
SET exp = CONCAT('^', exp);
END IF;
IF RIGHT(exp, 1) = '$' THEN
SET adjustEnd = FALSE;
ELSE
SET exp = CONCAT(exp, '$');
END IF;
SET strLen = LENGTH(string);
StartIndLoop: WHILE (startInd <= strLen) DO
IF adjustEnd THEN
SET endInd = startInd;
ELSE
SET endInd = strLen;
END IF;
EndIndLoop: WHILE (endInd <= strLen) DO
SET candidate = SUBSTRING(string FROM startInd FOR (endInd - startInd + 1));
IF candidate REGEXP exp THEN
RETURN candidate;
END IF;
IF adjustEnd THEN
SET endInd = endInd + 1;
ELSE
LEAVE EndIndLoop;
END IF;
END WHILE EndIndLoop;
IF adjustStart THEN
SET startInd = startInd + 1;
ELSE
LEAVE StartIndLoop;
END IF;
END WHILE StartIndLoop;
RETURN NULL;
END;
I then added a helper function to avoid having to repeat the regex pattern, which, as you can see from above, is the same for all the fields. Here is that function (I left my attempt to use a lookbehind - unsupported in MySQL - as a comment):
CREATE FUNCTION GET_MY_FLD_VAL(inputStr TEXT, fldName TEXT)
RETURNS TEXT DETERMINISTIC
BEGIN
DECLARE valPattern TEXT DEFAULT '"[^"]+"'; /* MySQL doesn't support lookaround :( '(?<=^.{1})"[^"]+"'*/
DECLARE fldNamePat TEXT DEFAULT CONCAT('"', fldName, '":');
DECLARE discardLen INT UNSIGNED DEFAULT LENGTH(fldNamePat) + 2;
DECLARE matchResult TEXT DEFAULT REGEXP_EXTRACT_SHORTEST(inputStr, CONCAT(fldNamePat, valPattern));
RETURN SUBSTRING(matchResult FROM discardLen FOR LENGTH(matchResult) - discardLen);
END;
Currently, all I'm trying to do is a simple SELECT query using the above code. It works correctly, BUT IT. IS. SLOOOOOOOW... There are only 7 fields/columns to split into, max (not all records have all 7)! Limited to 20 records, it takes about 3 minutes - and I have about 40,000 records total (not very much for a database, right?!) :)
And so, finally, we get to the actual question: [how] can the above algorithm/code (pretty much a brute search at this point) be improved SIGNIFICANTLY performance-wise, such that it can be run on the actual database in a reasonable amount of time? I started looking into the major known pattern-matching algorithms, but quickly got lost trying to figure out what would be appropriate here, in large part due to the number of available options and their respective restrictions, conditions for use, etc. Plus, it seems like implementing one of these in SQL just to see if it would help, might be a lot of work.
Note: this is my first post ever(!), so please let me know (nicely) if something is not clear, etc. and I will do my best to fix it. Thanks in advance.
I was able to solve this by parsing the JSON, as suggested by tadman and Matt Raines above. Being new to the concept of JSON, I just didn't realize it could be done this way at all...a little embarrassing, but lesson learned!
Anyway, I used the get_option function in the common_schema framework: https://code.google.com/archive/p/common-schema/ (found through this post, which also demonstrates how to use the function: Parse JSON in MySQL ). As a result, my INSERT query took about 15 minutes to run, vs the 30+ hours it would've taken with the REGEXP solution. Thanks, and until next time! :)
Don't do it in SQL; do it in PHP or some other language that has builtin tools for parsing JSON.
i want to use a anchor tag in stored procedure that would be sent as a body of the mail and it's not working
declare #loginlink VARCHAR(MAX)=N'';
SELECT #loginlink= N''+ CONVERT(varchar(36),#loginlink)+'ClickMe ' AS CaseLink
If you want to know how to set a T-SQL variable to an HTML anchor, then you could use the code below. Just replace the href value with the value in your situation.
Note the use of two single quotes around href value. This is necessary since we want single quote to be around the href value and we are using single quotes for enclosing string in SQL Server.
declare #loginlink VARCHAR(MAX)=N'';
set #loginlink = '<a href=''http://www.yahoo.com''>Click Me</a>';
SELECT #loginlink as CaseLink;
If you want to use a T-SQL variable for the href part of anchor link, then you can use code like below.
declare #loginlink VARCHAR(MAX)=N'';
DECLARE #href varchar(500);
SET #href = 'http://www.yahoo.com';
set #loginlink = '<a href=''' + #href + '''>Click Me</a>';
SELECT #loginlink as CaseLink;
If you are sending this link using database mail then you need to create a html page with all relevant html tags and send that as body, plus you need to make sure that the mail format is Html.
declare #body nvarchar(max)
set #body = '<html><head><title>Results</title></head><body>' + #loginlink + '</body></html>'
EXEC sp_send_dbmail
#profile_name='emailProfile',
#copy_recipients ='abc#xyz.com',
#recipients='opq#oxyz.com',
#subject='New Results',
#body=#body , --this needs to be an html with all necessary html tags
#body_format = 'HTML' ;
OMG it was a very silly mistake i did
EXECUTE AS USER = 'dbo'
EXEC msdb.dbo.sp_send_dbmail
#recipients='abc.com ',
#copy_recipients='abc.com',
--#blind_copy_recipients=
#from_address = 'abc.com',
--#reply_to=
#importance='High', -- High | Normal | Low
#profile_name = 'test',
#subject = #EmailSubject,
#body = #EmailBody,
#body_format='HTML'; --TEXT | HTML
#body_format='HTML'; --TEXT | HTML ::: my body format of mail was set to text that is why it was not treating anchor tag as it
I have a form with 7 TEdit having name EditPhone1, EditPhone2 and so on.
In the same form I query a DB to get data to fill those TEdits. Of course I cannot know in advance how many results the query will return.
How can I call the various TEdit objects when looping on the rowcount of the query?
Use FindComponent to "convert" a component name to the component itself:
var
Edit: TEdit;
I: Integer;
begin
DataSet.First;
I := 1;
while not DataSet.Eof do
begin
Edit := TEdit(FindComponent(Format('EditPhone%d', [I])));
if Edit <> nil then
Edit.Text := DataSet.FieldValues['PhoneNo'];
DataSet.Next;
Inc(I);
end;
Now, this requires to hard-code the EditPhone%d string into the source which results in all kinds of maintainability issues. For example: consider renaming the edits.
Alternative 1:
To not rely on the component names, you could instead make use of TLama's idea and add all the edits to a list:
uses
... , Generics.Collections;
type
TForm1 = class(TForm)
EditPhone1: TEdit;
...
procedure FormCreate(Sender: TObject);
procedure FormDestroy(Sender: TObject);
private
FEdits: TList<TEdit>;
end;
procedure TForm1.FormCreate(Sender: TObject);
begin
FEdits := TList<TEdit>.Create;
FEdits.AddRange([EditPhone1, EditPhone2, EditPhone3, EditPhone4, EditPhone5,
EditPhone6, EditPhone7]);
end;
procedure TForm1.FormDestroy(Sender: TObject);
begin
FEdits.Free;
end;
procedure TForm1.ADOQuery1AfterOpen(DataSet: TDataSet);
var
I: Integer;
begin
DataSet.First;
I := 0;
while (not DataSet.Eof) and (I < FEdits.Count) do
begin
FEdits[I].Text := DataSet.FieldValues['PhoneNo'];
DataSet.Next;
Inc(I);
end;
end;
This still requires some maintenance in case of adding edits in future.
Alternative 2:
You could also loop over all edits in the form to find the ones tagged to be added to the list, instead of adding them each explicitly:
procedure TForm1.FormCreate(Sender: TObject);
var
I: Integer;
begin
FEdits := TList<TEdit>.Create;
for I := 0 to ComponentCount - 1 do
if (Components[I] is TEdit) and (TEdit(Components[I]).Tag = 1) then
FEdits.Add(TEdit(Components[I]));
end;
But keeping those tags up to date is another burden.
Alternative 3:
I suggest you use a TDBGrid which is a data-component. Opening the linked dataset will automatically add all phone numbers to the grid. With some settings, the grid may kind of look like a couple of edits below each other.
You can, for example, use Tag property, to find needed component. Set all you TEdit's tag from 1 to 7 (or more), and find component by:
Var I: Integer;
MyEdit : TEdit;
For I = 0 To Self.ComponentCount - 1 Do
if (Self.Components[I] IS TEdit) AND (Self.Components[I] AS TEdit).Tag = YourTag
MyEdit = (Self.Components[I] AS TEdit);
You can also dynamically create so many TEdits, you need, and assign Tag property on creation, and find it this code later in runtime.
I'd suggest using DBCtrlGrid. You place your controls for one row on it, and it repeats the controls for as many rows as your data set has.
Get query result (usually using .RowCount property of TDataset return)
After getting the number of row, do iteration to make TEdit and set the text property
Here is sample of code:
...
For i:=0 to RowCount do
Begin
A:=TEdit.Create(self);
A.Parent:=AForm;
A.Top:=i*14;
A.Text:=ADataset.Field(i).AsString;
End;
...
I have a strings stored in my database formatted as html, and users can change the font size. That's fine, but I need to make a report and the font sizes all need to be the same. So, if I have the following html, I want to modify it to have a font size of 10:
<HTML><BODY><DIV STYLE="text-align:Left;font-family:Tahoma;font-style:normal;font-weight:normal;font-size:11;color:#000000;"><DIV><DIV><P><SPAN>This is my text to display.</SPAN></P></DIV></DIV></DIV></BODY></HTML>
I have a user defined function, but apparently, I can't use wildcards in a REPLACE, so it doesn't actually do anything:
ALTER FUNCTION [dbo].[udf_SetFont]
(#HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN REPLACE (#HTMLText, 'font-size:%;', 'font-size:10;')
END
(Of course, it would be even better if I sent the font size as a parameter, so I could change it to whatever.)
How do I modify this to change any string so the font size is 10?
This appears to work, although I've only tried it on one string (which has the font set in 2 places). I started with code that strips ALL html and modified it to only look for and change 'font-size:*'. I suspected there would be issues if the font size is 9 or less (1 character) and I'm changing it to 10 (2 chars), but it seems to work for that too.
ALTER FUNCTION [dbo].[udf_ChangeFont]
(#HTMLText VARCHAR(MAX), #FontSize VARCHAR(2))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #Start INT
DECLARE #End INT
DECLARE #Length INT
SET #Start = CHARINDEX('font-size:',#HTMLText)
SET #End = CHARINDEX(';',#HTMLText,CHARINDEX('font-size:',#HTMLText))
SET #Length = (#End - #Start) + 1
WHILE #Start > 0
AND #End > 0
AND #Length > 0
BEGIN
SET #HTMLText = STUFF(#HTMLText,#Start,#Length,'font-size:' + #FontSize + ';')
SET #Start = CHARINDEX('font-size:',#HTMLText, #End+2)
SET #End = CHARINDEX(';',#HTMLText,CHARINDEX('font-size:',#HTMLText, #End+2))
SET #Length = (#End - #Start) + 1
END
RETURN LTRIM(RTRIM(#HTMLText))
END
DECLARE #HTML NVarChar(2000) = '
<HTML>
<BODY>
<DIV STYLE="text-align:Left;font-family:Tahoma;font-style:normal;font-weight:normal;font-size:11;color:#000000;">
<DIV>
<DIV>
<P><SPAN>This is my text to display.</SPAN></P>
</DIV>
</DIV>
</DIV>
</BODY>
</HTML>';
DECLARE #X XML = #HTML;
WITH T AS (
SELECT C.value('.', 'VarChar(1000)') StyleAttribute
FROM #X.nodes('//#STYLE') D(C)
)
SELECT *
FROM T
WHERE T.StyleAttribute LIKE '%font-size:%';
From here I'd use a CLR function to split the StyleAttribute column on ;. Then look for the piece(s) that begin with font-size: and split again on :. TryParse the second element of that result and if it isn't 10, replace it. You'd then build up your string to get the value that StyleAttribute should have. From there you can do a REPLACE looking for the original value (from the table above) and substituting the output of the CLR function.
Nasty problem...good luck.
As Yuck said, SQL Server string functions pretty limited. You'll eventually run into a wall where your best bet is to resort to non-SQL solutions.
If you absolutely need to store HTML with embedded styles are you currently have, but also have the flexibility to revise your data model, you might want to consider adding a second database column to your table. The second column would store the style-free version of the HTML. You could parse out the styling at the application layer. That would make it a lot easier to view the contents in future reports and other scenarios.