Understanding SQL Language - mysql

Can someone help me understand this query better? I want to remove all special characters from my string but I don't understand how to apply it to my own query. I found this query on Stackoverflow and it seems to work for some people. I'm assuming #str is my string name but I don't know what #expres stands for. And do I need a select/from statement?
DECLARE #str VARCHAR(400)
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!]%'
SET #str = '(remove) ~special~ *characters. from string in sql!'
WHILE PATINDEX( #expres, #str ) > 0
SET #str = Replace(REPLACE( #str, SUBSTRING( #str, PATINDEX( #expres, #str ), 1 ),''),'-',' ')

From the code above there are a couple of things you need to understand first
# in sql is a form of variable declaration, meaning your assigning a value to that name
#express in this case is the list of characters you want to remove from the string. so anything inside the [] will be searched for in the next section of the code
PATINDEX is a function that will search through your #string to see if theres any matches with what you put in #express. IF there is, it will return the index of the start of the match.
putting this condition inside the WHILE means that it will loop through the #string until there is no match, meaning all matches have been removed
The final SET line is where the removal happens. This is accomplished using REPLACE.
REPLACE takes 3 arguments; the string you are searching through, in this case #string, the pattern you are trying to replace, in this case #expres and finally what you will replace it with, in this case ' ' and '-'
The SUBSTRING inside the REPLACE is trying to find the first thing it wants to replace. to do this it need to find where the pattern starts, therefor it uses PATINDEX to find where the index of it is
I hope that was clear enough. you can find the documentation for SUBSTRING PATINDEX and REPLACE here

If you analyze the SQL you will see that you are stripping characters from the #str variable. Therefore you need to set it to the value you want the characters to be stripped from.
use SBLReporting
DECLARE #str VARCHAR(400)
SELECT #str = name from bbnet.customerrelationship --here you set the #str variable to your desired value
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!]%'
WHILE PATINDEX( #expres, #str ) > 0 SET #str = Replace(REPLACE( #str, SUBSTRING( #str, PATINDEX( # expres, # str ), 1 ),''),'-',' ')
SELECT #str -- this will be your stripped value

You need to enter the string to be stripped of its special characters as #str.
The snippet DECLARE #str VARCHAR(400) simply says that #str is a variable of type varchar that can contain up to 400 characters.
Lets suppose your string is "Special Characters Are Things Like $##%" your code would be:
DECLARE #str VARCHAR(400)
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!]%'
SET #str = 'Special Characters Are Things Like $##%'
WHILE PATINDEX( #expres, #str ) > 0
SET #str = Replace(REPLACE( #str, SUBSTRING( #str, PATINDEX( #expres, #str ), 1 ),''),'-',' ')
Finally execute SELECT #str and you should see "Special Characters Are Things Like " as your output.

Related

SQL - Remove all HTML tags in a string

In my dataset, I have a field which stores text marked up with HTML. The general format is as follows:
<html><head></head><body><p>My text.</p></body></html>
I could attempt to solve the problem by doing the following:
REPLACE(REPLACE(Table.HtmlData, '<html><head></head><body><p>', ''), '</p></body></html>')
However, this is not a strict rule as some of entries break W3C Standards and do not include <head> tags for example. Even worse, there could be missing closing tags. So I would need to include the REPLACE function for each opening and closing tag that could exist.
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
Table.HtmlData,
'<html>', ''),
'</html>', ''),
'<head>', ''),
'</head>', ''),
'<body>', ''),
'</body>', ''),
'<p>', ''),
'</p>', '')
I was wondering if there was a better way to accomplish this than using multiple nested REPLACE functions. Unfortunately, the only languages I have available in this environment are SQL and Visual Basic (not .NET).
DECLARE #x XML = '<html><head></head><body><p>My text.</p></body></html>'
SELECT t.c.value('.', 'NVARCHAR(MAX)')
FROM #x.nodes('*') t(c)
Update - For strings with unclosed tags:
DECLARE #x NVARCHAR(MAX) = '<html><head></head><body><p>My text.<br>More text.</p></body></html>'
SELECT x.value('.', 'NVARCHAR(MAX)')
FROM (
SELECT x = CAST(REPLACE(REPLACE(#x, '>', '/>'), '</', '<') AS XML)
) r
If the HTML is well formed then there's no need to use replace to parse XML.
Just cast or convert it to an XML type and get the value(s).
Here's an example to output the text from all tags:
declare #htmlData nvarchar(100) = '<html>
<head>
</head>
<body>
<p>My text.</p>
<p>My other text.</p>
</body>
</html>';
select convert(XML,#htmlData,1).value('.', 'nvarchar(max)');
select cast(#htmlData as XML).value('.', 'nvarchar(max)');
Note that there's a difference in the output of whitespace between cast and convert.
To only get content from a specific node, the XQuery syntax is used. (XQuery is based on the XPath syntax)
For example:
select cast(#htmlData as XML).value('(//body/p/node())[1]', 'nvarchar(max)');
select convert(XML,#htmlData,1).value('(//body/p/node())[1]', 'nvarchar(max)');
Result : My text.
Of course, this still assumes a valid XML.
If for example, a closing tag is missing then this would raise an XML parsing error.
If the HTML isn't well formed as an XML, then one could use PATINDEX & SUBSTRING to get the first p tag. And then cast that to an XML type to get the value.
select cast(SUBSTRING(#htmlData,patindex('%<p>%',#htmlData),patindex('%</p>%',#htmlData) - patindex('%<p>%',#htmlData)+4) as xml).value('.','nvarchar(max)');
or via a funky recursive way:
declare #xmlData nvarchar(100);
WITH Lines(n, x, y) AS (
SELECT 1, 1, CHARINDEX(char(13), #htmlData)
UNION ALL
SELECT n+1, y+1, CHARINDEX(char(13), #htmlData, y+1) FROM Lines
WHERE y > 0
)
SELECT #xmlData = concat(#xmlData,SUBSTRING(#htmlData,x,IIF(y>0,y-x,8)))
FROM Lines
where PATINDEX('%<p>%</p>%', SUBSTRING(#htmlData,x,IIF(y>0,y-x,10))) > 0
order by n;
select
#xmlData as xmlData,
convert(XML,#xmlData,1).value('(/p/node())[1]', 'nvarchar(max)') as FirstP;
Firstly create a user defined function that strips the HTML out like so:
CREATE FUNCTION [dbo].[udf_StripHTML] (#HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #Start INT;
DECLARE #End INT;
DECLARE #Length INT;
SET #Start = CHARINDEX('<', #HTMLText);
SET #End = CHARINDEX('>', #HTMLText, CHARINDEX('<', #HTMLText));
SET #Length = (#End - #Start) + 1;
WHILE #Start > 0
AND #End > 0
AND #Length > 0
BEGIN
SET #HTMLText = STUFF(#HTMLText, #Start, #Length, '');
SET #Start = CHARINDEX('<', #HTMLText);
SET #End = CHARINDEX('>', #HTMLText, CHARINDEX('<', #HTMLText));
SET #Length = (#End - #Start) + 1;
END;
RETURN LTRIM(RTRIM(#HTMLText));
END;
GO
When you're trying to select it:
SELECT dbo.udf_StripHTML([column]) FROM SOMETABLE
This should lead to you avoiding to have to use several nested replace statements.
Credit and further info: http://blog.sqlauthority.com/2007/06/16/sql-server-udf-user-defined-function-to-strip-html-parse-html-no-regular-expression/
This is the simplest way.
DECLARE #str VARCHAR(299)
SELECT #str = '<html><head></head><body><p>My text.</p></body></html>'
SELECT cast(#str AS XML).query('.').value('.', 'varchar(200)')
One more solution, just to demonstrate a trick to replace many values of a table (easy to maintain!!!) in one single statement:
--add any replace templates here:
CREATE TABLE ReplaceTags (HTML VARCHAR(100));
INSERT INTO ReplaceTags VALUES
('<html>'),('<head>'),('<body>'),('<p>'),('<br>')
,('</html>'),('</head>'),('</body>'),('</p>'),('</br>');
GO
--This function will perform the "trick"
CREATE FUNCTION dbo.DoReplace(#Content VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
SELECT #Content=REPLACE(#Content,HTML,'')
FROM ReplaceTags;
RETURN #Content;
END
GO
--All examples I found in your question and in comments
DECLARE #content TABLE(Content VARCHAR(MAX));
INSERT INTO #content VALUES
('<html><head></head><body><p>My text.</p></body></html>')
,('<html><head></head><body><p>My text.<br>More text.</p></body></html>')
,('<html><head></head><body><p>My text.<br>More text.</p></body></html>')
,('<html><head></head><body><p>My text.</p></html>');
--this is the actual query
SELECT dbo.DoReplace(Content) FROM #content;
GO
--Clean-Up
DROP FUNCTION dbo.DoReplace;
DROP TABLE ReplaceTags;
UPDATE
If you add a replace-value to the template-table you might even use different values as replacements like replace a <br> with an actual line break...
SQL Server 2017+
If you have a string splitter function, you can strip HTML tags from virtually any text (well-formed or not):
select string_agg(c.String, null) within group (order by o.Ordinal)
from dbo.SplitString(#Input, N'<') o
cross apply dbo.SplitString(o.String, N'>') c
where o.Ordinal = 1
or c.Ordinal = 2;
This will be as performant as your splitter function. It should therefore generally out-perform any of the loop-based solutions.
The replace-based solutions cannot deal with comments or elements that have attributes, which makes them practically useless to me.
Here are my versions of the split and strip functions:
create or alter function dbo.SplitString (
#String nvarchar(max)
, #Delimiter nvarchar(4000)
)
returns table with schemabinding
as
return
select [key] + 1 as Ordinal, value as String
from openjson(replace(json_modify(N'[]', N'append $', #String), string_escape(#Delimiter, N'json'), N'","'))
create or alter function dbo.StripHtml (
#Input nvarchar(max)
)
returns nvarchar(max)
as
begin
return (
select string_agg(c.String, null) within group (order by o.Ordinal)
from dbo.SplitString(#Input, N'<') o
cross apply dbo.SplitString(o.String, N'>') c
where o.Ordinal = 1
or c.Ordinal = 2
)
end
This is just an example. You can use this in script to rmeove any html tags:
DECLARE #VALUE VARCHAR(MAX),#start INT,#end int,#remove varchar(max)
SET #VALUE='<html itemscope itemtype="http://schema.org/QAPage">
<head>
<title>sql - Converting INT to DATE then using GETDATE on conversion? - Stack Overflow</title>
<html>
</html>
'
set #start=charindex('<',#value)
while #start>0
begin
set #end=charindex('>',#VALUE)
set #remove=substring(#VALUE,#start,#end)
set #value=replace(#value,#remove,'')
set #start=charindex('<',#value)
end
print #value
You mention the XML is not always valid, but does it always contain the <p> and </p> tags?
In that case the following would work:
SUBSTRING(Table.HtmlData,
CHARINDEX('<p>', Table.HtmlData) + 1,
CHARINDEX('</p>', Table.HtmlData) - CHARINDEX('<p>', Table.HtmlData) + 1)
For finding all positions of a <p> within a HTML, there's already a good post here: https://dba.stackexchange.com/questions/41961/how-to-find-all-positions-of-a-string-within-another-string
Alternatively I suggest using Visual Basic, as you mentioned that is also an option.

How to set into query string the Replace Statement in SQL?

This is my problem.. but i will show example to make it simpler
I have a stored proc named usp_Replace. This sp will replace quotation marks with whitespace.
This is the original query without setting it into query string..
CREATE PROCEDURE usp_Replace
#desc varchar(50)
AS
BEGIN
SET NOCOUNT ON;
SELECT '' + Replace(REPLACE(REPLACE(#desc, CHAR(13), ''), CHAR(10), ''),'"','') + ''
END
but..
I want to set the select statement into query string because for some reason I will have to use condition statement but will not show it as it is not part of the problem.
Here..
ALTER PROCEDURE usp_Replace
#desc varchar(50)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #query nvarchar(max)
DECLARE #condition nvarchar(max)
SET #condition = null
SET #query = ' SELECT '' + Replace(REPLACE(REPLACE('''+#desc+''', CHAR(13), ''), CHAR(10), ''),''"'','') + '' '
SET #query = #query + ISNULL(#condition,'')
EXEC sp_executesql #query
END
I'm getting error on this, resulting that there is unclosed quotation marks.
I'm really having problem with how and where to put single quotation marks.
Please help, if you have encountered this problem or any links/suggestions. I will be glad to learn about it. Thanks.
You need to quote all the single in the string, including the empty string. So, '' becomes '''':
SET #query = 'SELECT Replace(REPLACE(REPLACE(#desc, CHAR(13), ''''), CHAR(10), ''''),'' " '', '''') '
I'm not sure what the empty strings were doing at the beginning and end, so I just removed them.

Converting varchars into two points Decimal value by using SQL

I have a flat file with below list of amounts, could you please tell me , how can I make this below list of amount into two point decimal value something like 1234567.80 which are ending with {,A,H,E,C,I,F by using SQL?
12345678{
00484326A
00000210H
00000185A
00000077E
00000833C
00000255I
00000077E
00000039F
00000088A
00000000F
00000000A
00000100{
Thank You,
Try this as with this SQLfiddle. Not pretty but it works
SELECT
CAST(
CONCAT(SUBSTRING(test_value,1, LENGTH(test_value) -2),
'.',
SUBSTRING(test_value, LENGTH(test_value) -1, 1))
AS DECIMAL(7,1))
FROM TEST
WHERE SUBSTRING(test_value, LENGTH(test_value)) = 'A'
|| SUBSTRING(test_value, LENGTH(test_value)) = 'H'
-- keep adding above line for the rest of the ending characters you want
I wrote this as a function because I think it's easier to read than inline code:
create function dbo.convert_amount_str ( #amount_str varchar(50) )
returns money
as
begin
declare #char_index int
declare #amount money
declare #char varchar(50)
declare #decimal money
-- Match the first non-numeric character
select #char_index = PATINDEX('%[^0-9]%', #amount_str)
-- Get the numeric characters into a numeric variable
select #amount = convert(money, SUBSTRING(#amount_str, 0, #char_index))
-- Get the non-numeric character (will work for multiple characters)
select #char = SUBSTRING(#amount_str, #char_index, (len(#amount_str) - #char_index) + 1)
-- Convert the non-numeric characters into decimal amounts
select #decimal = case #char
when 'A' then .8 -- whatever this should equate to
when 'H' then .7 -- whatever this should equate to
-- output for remaining characters
end
return #amount + #decimal
end
Then just use it like this:
select dbo.convert_amount_str('00484326A')
Or, more likely, referencing whatever column contains the numeric string values.

REGEX in mysql query

i have a table with address as column.
values for address is "#12-3/98 avenue street", which has numbers, special characters and alphabets.
i want to write my sql query usng regex to remove special characters from the address value
ex: "12398avenuestreet" will be the value i want after removing the special characters
thank you.
maybe this function help you
CREATE FUNCTION strip_non_alpha(
_dirty_string varchar(40)
)
RETURNS varchar(40)
BEGIN
DECLARE _length int;
DECLARE _position int;
DECLARE _current_char varchar(1);
DECLARE _clean_string varchar(40);
SET _clean_string = '';
SET _length = LENGTH(_dirty_string);
SET _position = 1;
WHILE _position <= _length DO
SET _current_char = SUBSTRING(_dirty_string, _position, 1);
IF _current_char REGEXP '[A-Za-z0-9]' THEN
SET _clean_string = CONCAT(_clean_string, _current_char);
END IF;
SET _position = _position + 1;
END WHILE;
RETURN CONCAT('', _clean_string);
END;
so you need to call this like
update mytable set address = strip_non_alpha(address);
You don't need RegExp for simple character replacement.
MySQL string functions
Unfortunately, MySQL regular expressions are "match only", you cannot do a replace in your query. This leaves you with only something like this (witch is very-very stupid):
SELECT REPLACE(REPLACE(address, '?', ''), '#', '') -- and many many other nested replaces
FROM table
Or put this logic inside your application (the best option here).
MySQL regular expressions is only for pattern matching and not replacing, so your best bet is to create a function or a repetative use of Replace().
As far as I know, it is not possible to replace via MySQL regex, since these functions are only used for matching.
Alternatively, you can use MySQL Replace for this:
SELECT REPLACE(REPLACE(REPLACE(REPLACE(address, '#', ''), '-', ''), '/', ''), ' ', '') FROM table;
Which will remove #, -, / and spaces and result in the string you want.
You may use this MySQL UDF. And then simply,
update my_table set my_column = PREG_REPLACE('/[^A-Za-z0-9]/' , '' , my_column);

MySQL find_in_set with multiple search string

I find that find_in_set only search by a single string :-
find_in_set('a', 'a,b,c,d')
In the above example, 'a' is the only string used for search.
Is there any way to use find_in_set kind of functionality and search by multiple strings, like :-
find_in_set('a,b,c', 'a,b,c,d')
In the above example, I want to search by three strings 'a,b,c'.
One way I see is using OR
find_in_set('a', 'a,b,c,d') OR find_in_set('b', 'a,b,c,d') OR find_in_set('b', 'a,b,c,d')
Is there any other way than this?
there is no native function to do it, but you can achieve your aim using following trick
WHERE CONCAT(",", `setcolumn`, ",") REGEXP ",(val1|val2|val3),"
The MySQL function find_in_set() can search only for one string in a set of strings.
The first argument is a string, so there is no way to make it parse your comma separated string into strings (you can't use commas in SET elements at all!). The second argument is a SET, which in turn is represented by a comma separated string hence your wish to find_in_set('a,b,c', 'a,b,c,d') which works fine, but it surely can't find a string 'a,b,c' in any SET by definition - it contains commas.
You can also use this custom function
CREATE FUNCTION SPLIT_STR(
x VARCHAR(255),
delim VARCHAR(12),
pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');
DELIMITER $$
CREATE FUNCTION `FIND_SET_EQUALS`(`s1` VARCHAR(200), `s2` VARCHAR(200))
RETURNS TINYINT(1)
LANGUAGE SQL
BEGIN
DECLARE a INT Default 0 ;
DECLARE isEquals TINYINT(1) Default 0 ;
DECLARE str VARCHAR(255);
IF s1 IS NOT NULL AND s2 IS NOT NULL THEN
simple_loop: LOOP
SET a=a+1;
SET str= SPLIT_STR(s2,",",a);
IF str='' THEN
LEAVE simple_loop;
END IF;
#Do check is in set
IF FIND_IN_SET(str, s1)=0 THEN
SET isEquals=0;
LEAVE simple_loop;
END IF;
SET isEquals=1;
END LOOP simple_loop;
END IF;
RETURN isEquals;
END;
$$
DELIMITER ;
SELECT FIND_SET_EQUALS('a,c,b', 'a,b,c')- 1
SELECT FIND_SET_EQUALS('a,c', 'a,b,c')- 0
SELECT FIND_SET_EQUALS(null, 'a,b,c')- 0
Wow, I'm surprised no one ever mentioned this here.In a nutshell, If you know the order of your members, then just query in a single bitwise operation.
SELECT * FROM example_table WHERE (example_set & mbits) = mbits;
Explanation:
If we had a set that has members in this order: "HTML", "CSS", "PHP", "JS"... etc.
That's how they're interpreted in MySQL:
"HTML" = 0001 = 1
"CSS" = 0010 = 2
"PHP" = 0100 = 4
"JS" = 1000 = 16
So for example, if you want to query all rows that have "HTML" and "CSS" in their sets, then you'll write
SELECT * FROM example_table WHERE (example_set & 3) = 3;
Because 0011 is 3 which is both 0001 "HTML" and 0010 "CSS".
Your sets can still be queried using the other methods like REGEXP , LIKE, FIND_IN_SET(), and so on. Use whatever you need.
Amazing answer by #Pavel Perminov! - And also nice comment by #doru for dynamically check..
From there what I have made for PHP code CONCAT(',','" . $country_lang_id . "', ',') REGEXP CONCAT(',(', REPLACE(YourColumnName, ',', '|'), '),') this below query may be useful for someone who is looking for ready code for PHP.
$country_lang_id = "1,2";
$sql = "select a.* from tablename a where CONCAT(',','" . $country_lang_id . "', ',') REGEXP CONCAT(',(', REPLACE(a.country_lang_id, ',', '|'), '),') ";
You can also use the like command for instance:
where setcolumn like '%a,b%'
or
where 'a,b,c,d' like '%b,c%'
which might work in some situations.
you can use in to find match values from two values
SELECT * FROM table WHERE myvals in (a,b,c,d)