How to search for a specific word in a long string of text field and count them - sqlserver - sql-server-2008

I am trying to count a word if how many times it appears in a string or a sentence from a text field. For example:
Declare #text as nvarchar(max)
SET #text = 'Bounce check this date, bounce again then bounce and then bounce again this date. Check BOUNCED'
So, i wanna count how many "bounce" where there. My goal here is to see how our customers perform with their check issuance, we record them that way like a sentence.
I have tried the following code, but it shows more than what it should. Bounce appears 5 but my code counts 8.
DECLARE #text as nvarchar(max)
SET #text = 'Bounce what will you bounce do that changed bounce bounce bounce'
SELECT DISTINCT
ISNULL(((Datalength(#text) - Datalength(REPLACE(CAST(#text as nvarchar(max)), 'BOUNCE',4)))/Datalength('BOUNCE')),0) [BounceRate]
I expect the output will be 5

DECLARE #text AS NVARCHAR(max)
SET #text = 'Bounce what will you bounce do that changed bounce bounce bounce'
SELECT (len(#text) - len(replace(#text, 'bounce', ''))) / len('bounce')
can you try this

Related

Pascal Progress Status

I am trying to emulate a progress loading status in free pascal but i am having difficulty trying to achieve an output that looks like a loading progression status.
The code i have for this is:
percent := 0;
Writeln('Loading');
Repeat
Write('(',percent,'%)');
percent = percent + 1;
Delay(50);
Until percent > 100;
But the output turns out like this:
Loading(0%)(1%)(2%)
When i want it to look like this:
Loading(0%) -> Loading(1%) {The percent variable going up like a loading status}
I only want the percent variable to change in the loop. I've looked over the delete and insert procedures but i don't think it is what i am looking for.
You need to use backspace to go back and write over. Like so:
uses Crt;
var percent: integer;
begin
percent := 0;
Write('Loading ');
Repeat
Write('(',percent:3,'%)'#8#8#8#8#8#8);
percent := percent + 1;
Delay(50);
Until percent > 100;
end.

Remove characters between () SQL

I need to remove specific text from a string value in SQL.
I've tried various CHARINDEX and LEN combinations but keep getting it wrong!
I have a name field which, contains names. Some of the fields have text in () added in.
Example :
Smith (formerly Jones)
I need to remove the whole section inside the brackets. as well as the brackets themselves. Unfortunately sometimes the value can be
Smith (formerly Jones) Reeves
So I can't just remove everything from the ( onwards!
Here are two examples how to accomplish this. You can do this without declaring the #StartIndex and #EndIndex variables, but I have used them for the sake of clarity.
DECLARE #StartIndex int, #EndIndex int;
DECLARE #Str varchar(100);
SET #Str = 'This is a (sentence to use for a) test';
SELECT #StartIndex = CHARINDEX('(', #Str, 0), #EndIndex = CHARINDEX(')', #Str, 0);
SELECT SUBSTRING(#Str, 0, #StartIndex) + SUBSTRING(#Str, #EndIndex + 1, LEN(#Str) - #EndIndex) AS [Method1],
LEFT(#Str, #StartIndex - 1) + RIGHT(#Str, LEN(#Str) - #EndIndex) AS [Method2];
Note that this code does not remove the spaces before or after the parentheses, so you end up with two spaces between "a" and "test" (since that wasn't part of your question).
Additional error checking should be included before actually using code like this as well, for example if #Str does not contain parentheses it would cause an error.

How to improve performance for REGEXP string matching in MySQL?

Preface:
I've done quite a bit of (re)searching on this, and found the following SO post/answer: https://stackoverflow.com/a/5361490/6095216 which was pretty close to what I'm looking for. The same code, but with somewhat more helpful comments, appears here: http://thenoyes.com/littlenoise/?p=136 .
Problem Description:
I need to split 1 column of MySQL TEXT data into multiple columns, where the original data has this format (N <= 7):
{"field1":"value1","field2":"value2",...,"fieldN":"valueN"}
As you might guess, I only need to extract the values, putting each one into a separate (predefined) column. The problem is that the number and order of the fields is not guaranteed to be the same for all records. Thus, solutions using SUBSTR/LOCATE, etc. don't work, and I need to use regular expressions. Another restriction is that 3rd party libraries such as LIB_MYSQLUDF_PREG (suggested in the answer from my 1st link above) cannot be used.
Solution/Progress so far:
I've modified the code from the above links such that it returns the first/shortest match, left-to-right; otherwise, NULL is returned. I also refactored it a bit and made the identifiers more reader/maintainer-friendly :)
Here's my version:
CREATE FUNCTION REGEXP_EXTRACT_SHORTEST(string TEXT, exp TEXT)
RETURNS TEXT DETERMINISTIC
BEGIN
DECLARE adjustStart, adjustEnd BOOLEAN DEFAULT TRUE;
DECLARE startInd INT DEFAULT 1;
DECLARE endInd, strLen INT;
DECLARE candidate TEXT;
IF string NOT REGEXP exp THEN
RETURN NULL;
END IF;
IF LEFT(exp, 1) = '^' THEN
SET adjustStart = FALSE;
ELSE
SET exp = CONCAT('^', exp);
END IF;
IF RIGHT(exp, 1) = '$' THEN
SET adjustEnd = FALSE;
ELSE
SET exp = CONCAT(exp, '$');
END IF;
SET strLen = LENGTH(string);
StartIndLoop: WHILE (startInd <= strLen) DO
IF adjustEnd THEN
SET endInd = startInd;
ELSE
SET endInd = strLen;
END IF;
EndIndLoop: WHILE (endInd <= strLen) DO
SET candidate = SUBSTRING(string FROM startInd FOR (endInd - startInd + 1));
IF candidate REGEXP exp THEN
RETURN candidate;
END IF;
IF adjustEnd THEN
SET endInd = endInd + 1;
ELSE
LEAVE EndIndLoop;
END IF;
END WHILE EndIndLoop;
IF adjustStart THEN
SET startInd = startInd + 1;
ELSE
LEAVE StartIndLoop;
END IF;
END WHILE StartIndLoop;
RETURN NULL;
END;
I then added a helper function to avoid having to repeat the regex pattern, which, as you can see from above, is the same for all the fields. Here is that function (I left my attempt to use a lookbehind - unsupported in MySQL - as a comment):
CREATE FUNCTION GET_MY_FLD_VAL(inputStr TEXT, fldName TEXT)
RETURNS TEXT DETERMINISTIC
BEGIN
DECLARE valPattern TEXT DEFAULT '"[^"]+"'; /* MySQL doesn't support lookaround :( '(?<=^.{1})"[^"]+"'*/
DECLARE fldNamePat TEXT DEFAULT CONCAT('"', fldName, '":');
DECLARE discardLen INT UNSIGNED DEFAULT LENGTH(fldNamePat) + 2;
DECLARE matchResult TEXT DEFAULT REGEXP_EXTRACT_SHORTEST(inputStr, CONCAT(fldNamePat, valPattern));
RETURN SUBSTRING(matchResult FROM discardLen FOR LENGTH(matchResult) - discardLen);
END;
Currently, all I'm trying to do is a simple SELECT query using the above code. It works correctly, BUT IT. IS. SLOOOOOOOW... There are only 7 fields/columns to split into, max (not all records have all 7)! Limited to 20 records, it takes about 3 minutes - and I have about 40,000 records total (not very much for a database, right?!) :)
And so, finally, we get to the actual question: [how] can the above algorithm/code (pretty much a brute search at this point) be improved SIGNIFICANTLY performance-wise, such that it can be run on the actual database in a reasonable amount of time? I started looking into the major known pattern-matching algorithms, but quickly got lost trying to figure out what would be appropriate here, in large part due to the number of available options and their respective restrictions, conditions for use, etc. Plus, it seems like implementing one of these in SQL just to see if it would help, might be a lot of work.
Note: this is my first post ever(!), so please let me know (nicely) if something is not clear, etc. and I will do my best to fix it. Thanks in advance.
I was able to solve this by parsing the JSON, as suggested by tadman and Matt Raines above. Being new to the concept of JSON, I just didn't realize it could be done this way at all...a little embarrassing, but lesson learned!
Anyway, I used the get_option function in the common_schema framework: https://code.google.com/archive/p/common-schema/ (found through this post, which also demonstrates how to use the function: Parse JSON in MySQL ). As a result, my INSERT query took about 15 minutes to run, vs the 30+ hours it would've taken with the REGEXP solution. Thanks, and until next time! :)
Don't do it in SQL; do it in PHP or some other language that has builtin tools for parsing JSON.

mySQL Error When Making Trigger To Find Hashtags and add them to other tables

The closest (it's exactly the same) thing I could find to what I'm doing is this...
Database design for apps using "hashtags"
...except that that question didn't address actually finding the hashtags, just how to store them, which doesn't help me with this, although it does makes me feel confident about my design choices.
I am trying to create a trigger, that every time a message is added it looks through to find hashtags and adds them to 2 other tables in my database. One of the tables is 'hashTags' which is just a list of the tags & an ID column. The other table is hashUsed, which shows where the hashtags are used with two columns (MsgID, HTagID) that are both set as PrimaryKey (shouldn't have a message & hashtag linked more than once). I manually inserted some 'messages' with hashtags, I created the proper entries in the other tables to reference the data, and I've built queries that show the data just the way I need it.
I just can't figure out why this Trigger won't save.
When I execute it, I get this response:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '#hl := INSTR(#s, '#'); /get hashtag location/ #ss := SUBSTRING(#s, #h' at line 14'
BTW... I'm using 'Toad for MySQL' which takes care of the delimiter automatically behind the scenes, which is why I don't have it here.
I added comments out the wazoo to help you follow along.
What did I do wrong? Thanks in advance!
CREATE TRIGGER `mybase`.`FindHashTags`
AFTER INSERT /* I chose AFTER INSER because I need to use the
auto_incremented 'MsgID' field/key, which doesn't
exist beforehand. Correct me if I'm wrong. */
ON mybase.messages FOR EACH ROW
BEGIN
DECLARE c INT; /* c stands for count */
DECLARE h INT; /* h stands for HowMany times is the hashtag(#) present */
SET #s := new.message; /* s stands for string */
SET #h := (ROUND ((LENGTH(new.message) /* Get the Length of the Message */
- LENGTH( REPLACE (new.message, "#", "") ) /* How long would the message be without the string in it*/
) / LENGTH("#"))); /*Divide the difference of those two lengths by the length of the string in question.
This will tell how many times it was removed, hence how many times the string was
present in the text. */
SET #c := 1;
WHILE #c <= #h DO
#hl := INSTR(#s, '#'); /* get hashtag location */
#ss := SUBSTRING(#s, #hl, LENGTH(#s)-#hl); /* Get all text after hashtag usint SubString*/
#sl := INSTR(#ss,' '); /* String Length. Search the new substring for a space(' ') */
IF #sl <= 0 THEN #sl=LENGTH(#S) /* If there isn't a space, set #sl position to end of the field */
#ht := SUBSTRING(#ss, 1, #sl); /* #ht is a hashTag, select from first char to #sl */
#s := SUBSTRING(#ss, #sl, LENGTH(#SS)-#sl); /* remove the hashtag from the sting to prepare for searching
again during next loop to look for more hashtags */
INSERT IGNORE INTO hashTags(Name) VALUES (#ht); /* add hashtag to list, ignore if it already exists */
#t := (SELECT HTagID FROM hashTags WHERE Name=#ht); /* get the id of the hashtag we just found */
INSERT IGNORE INTO hashUsed(MsgID,HTagID) VALUES (new.MsgID,#t); /* link the hashtag to the message */
SET #c = #c + 1;
END IF
END WHILE;
END;
I figured it out...
1. I forgot to put SET in front of the variables in the WHILE block.
2. I also forgot the semicolon after END IF

SQL Server change font in html string

I have a strings stored in my database formatted as html, and users can change the font size. That's fine, but I need to make a report and the font sizes all need to be the same. So, if I have the following html, I want to modify it to have a font size of 10:
<HTML><BODY><DIV STYLE="text-align:Left;font-family:Tahoma;font-style:normal;font-weight:normal;font-size:11;color:#000000;"><DIV><DIV><P><SPAN>This is my text to display.</SPAN></P></DIV></DIV></DIV></BODY></HTML>
I have a user defined function, but apparently, I can't use wildcards in a REPLACE, so it doesn't actually do anything:
ALTER FUNCTION [dbo].[udf_SetFont]
(#HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN REPLACE (#HTMLText, 'font-size:%;', 'font-size:10;')
END
(Of course, it would be even better if I sent the font size as a parameter, so I could change it to whatever.)
How do I modify this to change any string so the font size is 10?
This appears to work, although I've only tried it on one string (which has the font set in 2 places). I started with code that strips ALL html and modified it to only look for and change 'font-size:*'. I suspected there would be issues if the font size is 9 or less (1 character) and I'm changing it to 10 (2 chars), but it seems to work for that too.
ALTER FUNCTION [dbo].[udf_ChangeFont]
(#HTMLText VARCHAR(MAX), #FontSize VARCHAR(2))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #Start INT
DECLARE #End INT
DECLARE #Length INT
SET #Start = CHARINDEX('font-size:',#HTMLText)
SET #End = CHARINDEX(';',#HTMLText,CHARINDEX('font-size:',#HTMLText))
SET #Length = (#End - #Start) + 1
WHILE #Start > 0
AND #End > 0
AND #Length > 0
BEGIN
SET #HTMLText = STUFF(#HTMLText,#Start,#Length,'font-size:' + #FontSize + ';')
SET #Start = CHARINDEX('font-size:',#HTMLText, #End+2)
SET #End = CHARINDEX(';',#HTMLText,CHARINDEX('font-size:',#HTMLText, #End+2))
SET #Length = (#End - #Start) + 1
END
RETURN LTRIM(RTRIM(#HTMLText))
END
DECLARE #HTML NVarChar(2000) = '
<HTML>
<BODY>
<DIV STYLE="text-align:Left;font-family:Tahoma;font-style:normal;font-weight:normal;font-size:11;color:#000000;">
<DIV>
<DIV>
<P><SPAN>This is my text to display.</SPAN></P>
</DIV>
</DIV>
</DIV>
</BODY>
</HTML>';
DECLARE #X XML = #HTML;
WITH T AS (
SELECT C.value('.', 'VarChar(1000)') StyleAttribute
FROM #X.nodes('//#STYLE') D(C)
)
SELECT *
FROM T
WHERE T.StyleAttribute LIKE '%font-size:%';
From here I'd use a CLR function to split the StyleAttribute column on ;. Then look for the piece(s) that begin with font-size: and split again on :. TryParse the second element of that result and if it isn't 10, replace it. You'd then build up your string to get the value that StyleAttribute should have. From there you can do a REPLACE looking for the original value (from the table above) and substituting the output of the CLR function.
Nasty problem...good luck.
As Yuck said, SQL Server string functions pretty limited. You'll eventually run into a wall where your best bet is to resort to non-SQL solutions.
If you absolutely need to store HTML with embedded styles are you currently have, but also have the flexibility to revise your data model, you might want to consider adding a second database column to your table. The second column would store the style-free version of the HTML. You could parse out the styling at the application layer. That would make it a lot easier to view the contents in future reports and other scenarios.