Preventing duplicates while selecting into variable in MySQL - mysql

My function generates unique combinations from Items table and stores into list variable. In the end of generation, it returns list as a result and application processes every combination: saves into Combo table.
The Problem
It checks every time for duplicates from another tables called Combo which is getting filled in the second step of process (by application, not by function itself.)
But, It doesn't check for duplicates inside listvariable before inserting into it newly generated combinations.
So I'm getting result from function with duplicates inside the result itself. For example. 3423 appears here in the result 2 times:
3410;3463;3423;3489;3446;3445;3417;3436;3497;3454;3491;3420;3502;3496;3458;3493;3439;3499;3497;3487;3486;3504;3458;3501;3503;3441;3443;3453;3508;3474;3469;3497;3508;3433;3451;3449;3422;3453;3428;3475;3474;3458;3480;3422;3488;3432;3501;3414;3425;3444;3509;3502;3440;3422;3472;3501;3477;3483;3449;3480;3456;3463;3493;3476;3479;3425;3485;3464;3410;3434;3488;3504;3439;3423;3434;3486;3448;3456;3496;3413;3428;3482;3439;3437;3473;3420;3439;3470;3463;3494;3415;3442;3428;3500;3488;3478;3475;3417;3472;3463
How can I check list itself for duplicates before insertion?
Details
My function:
SELECT gen_n_uniq_perms_by_cat(1, 100, 1, 45, 1, 120, 20) as comb
which look like:
BEGIN
SET #result := "";
SET #counter := 0;
iterat :
LOOP
SELECT
gen_uniq_perm_by_cat(
permSize ,
user_id ,
catID ,
itemType ,
tsc_id ,
tries
) INTO #combo;
IF(ISNULL(#combo)) THEN
RETURN #result;
ELSE
SET #result := CONCAT_WS(';' ,#result ,#combo);
END
IF;
SET #counter := #counter + 1;
IF #counter > permCount THEN
RETURN #result;
END
IF;
END
LOOP
iterat;
END
and gen_uniq_perm_by_cat looks like:
BEGIN
iterat :
LOOP
SELECT
SUBSTRING_INDEX(
GROUP_CONCAT(`id` ORDER BY RAND() SEPARATOR '-') ,
'-' ,
permSize
) INTO #list
FROM
`Item`
LEFT JOIN `ItemCategory` ON `Item`.`id` = `ItemCategory`.`itemID`
WHERE
(`Item`.`user_id` = user_id)
AND(`ItemCategory`.`catID` = catID)
AND(`Item`.`type` = itemType);
SET #md5 := MD5(CONCAT_WS('-' , #list , tsc_id));
IF(
SELECT
count(*)
FROM
`Combo`
WHERE
`Combo`.`hash` = #md5
LIMIT 1
) = 0 THEN
RETURN #list;
END
IF;
SET tries := tries - 1;
IF tries = 0 THEN
RETURN NULL;
END
IF;
END
LOOP
iterat;
END
generates unique (never created in past) combinations by following arguments:
permSize = 1
permCount =100
user_id = 1
catID = 45
itemType = 1
tsc_id = 120
tries = 20

Use NOT LIKE for this purpose. In your case, replace corresponding condition lines with following:
IF(ISNULL(#combo)) THEN
RETURN #result;
END IF;
IF(#result NOT LIKE CONCAT('%' , #combo , '%')) THEN
SET #result := CONCAT_WS(';' ,#result ,#combo);
SET #counter := #counter + 1;
END IF;
IF #counter = permCount THEN
RETURN #result;
END IF;

Related

Running the loop stops after changing the Counter Variable

When I run the following code, it always returns a NULL value for the two variables inside the loop.
SET #c = '';
SET #i = 4;
REPEAT
SET #c = CONCAT(
' -> ',
(
SELECT name
FROM c
WHERE id = #i
),
#c
);
SET #i = (
SELECT pid
FROM c
WHERE id = #i
);
UNTIL TRIM(COALESCE(#i, '')) = ''
END REPEAT;
The loop must be run at least 4 times; But in the second run, the run of the loop ends.
The value is set for variables in the first time that the loop is executed; But in the second run of the loop, the value of both changes to NULL and the loop stops. However, inside table c there are related values for the execution of the loop.
The problem occurs when the value of the variable #i in the loop
changes.
For example, the first time the loop is executed, the value of #i becomes 3, and there is an id with a value of 3 in table c in addition to the initial value of 4.
Structure of Table c
CREATE TABLE c (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
name varchar(50) NOT NULL,
pid int UNSIGNED DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE = INNODB,
AUTO_INCREMENT = 1,
CHARACTER SET utf8mb4,
COLLATE utf8mb4_unicode_ci;
ALTER TABLE c
ADD CONSTRAINT c_pid_foreign FOREIGN KEY (pid) REFERENCES c (id);
INSERT INTO c(id, name, pid)
VALUES (1, '1st', NULL),
(2, '2nd', 1),
(3, '3rd', 2),
(4, '4th', 3);
Thank you in advance for the good cooperation of the people to solve this problem.
I was finally able to solve my question after about more than a day; Of course, I did not understand the cause of this problem and I do not know why MySQL changes the return value of the result of the SELECT statement to NULL when using the #i variable after changing its value inside the loop?!!!
The solution is as follows:
SET #d = '';
SET #j = 4;
REPEAT
SELECT #c := name
FROM c
WHERE id = #j;
SET #d = CONCAT(' -> ', #c, #d);
SELECT #i := pid
FROM c
WHERE id = #j;
SET #j = #i;
UNTIL TRIM(COALESCE(#j, '')) = ''
END REPEAT;
Here are the best solution to this problem:
SET #d = '';
SET #j = 4;
REPEAT
SELECT name INTO #c
FROM c
WHERE id = #j;
SET #d = CONCAT(' -> ', #c, #d);
SELECT pid INTO #i
FROM c
WHERE id = #j;
SET #j = #i;
UNTIL TRIM(COALESCE(#j, '')) = ''
END REPEAT;
I hope it is useful for others as well.

How to speed up stored procedure in mysql

Below given is my procedure takes too much time to execute.
BEGIN
DECLARE rank1 BIGINT DEFAULT 0;
DECLARE id1 BIGINT;
DECLARE rankskip BIGINT DEFAULT 0;
DECLARE mark DECIMAL(10,2) DEFAULT 0;
DECLARE oldmark DECIMAL(10,2) DEFAULT -100000;
DECLARE done int DEFAULT 0;
DECLARE cursor_i CURSOR FOR
SELECT
(rightmarks - negativemarks) as mark, id
FROM
testresult
WHERE
testid = testid1
ORDER BY
(rightmarks - negativemarks) DESC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cursor_i;
read_loop: LOOP
FETCH cursor_i INTO mark, id1;
IF done = 1 THEN
LEAVE read_loop;
END IF;
IF oldmark = mark THEN
BEGIN
IF IsRankSkip = 1 THEN
BEGIN
SET rankskip = rankskip + 1;
END;
END IF;
END;
ELSE
BEGIN
SET rank1 = rank1 + rankskip + 1;
SET rankskip = 0;
END;
END IF;
SET oldmark = mark;
UPDATE testresult SET rank = rank1 WHERE id=id1;
END LOOP;
CLOSE cursor_i;
END
This loop iterate minimum 2000 times.
Here IsRankSkip and testid1 is an argument passed to the procedure.
This procedure takes 65.343152046204 time to execute. If anybody guide me how can I reduce executing time?
Thank you in advance.
You can do this with a single update statement, making use of variables that change during the execution of it:
UPDATE testresult a
JOIN ( SELECT id,
#row := #row + 1 row_number,
#rank := if(mark = #lastmark, #rank, #row) as rank,
#dense_rank := #dense_rank + if(mark = #lastmark, 0, 1) as dense_rank,
#lastmark := mark as mark
FROM ( SELECT rightmarks - negativemarks as mark,
id
FROM testresult
WHERE testid = testid1
ORDER BY 1 DESC
) data,
(SELECT #row := 0, #dense_rank := 0) r
) b
ON a.id = b.id
SET a.rank = if(IsRankSkip, b.rank, b.dense_rank);
The query with alias b calculates the rank and adds it as a column in the result set. In fact, it adds three kinds of numbers:
#row: the sequential row number without special treatment of equal values
#rank: the same as row number of the value differs from the previous value, otherwise it is the same as in the previous row
#dense_rank: this increments when the value differs from the previous value, otherwise it is the same as in the previous row
You can choose which one to update your rank column with. In the above SQL I have used the two procedure variables IsRankSkip and testid1.
Remark
If you are always calling your procedure for all testid values, then the above can be further improved, so all these updates are done with only one update statement.

Get data from comma separated string input

I have a stored procedure where input is a comma separated string say '12341,34567,12446,12997' and it is not sure that the input string always carries numerical data. It may be '12341,34as67,12$46,1we97' so I need to validate them and use only the valid data in query.
Say my query is (Where the column OrderCode is int type)
select * from dbo.DataCollector where OrderCode in (12341,34567,12446,12997)
or only the valid data if other are invalid
select * from dbo.DataCollector where OrderCode in (12341)
For such situation what would be a good solution.
One way that works also in SQl-Server 2005 would be to create a split-function, then you can use ISNUMERIC to check if it's a number:
DECLARE #Input VARCHAR(MAX) = '12341,34as67,12$46,1we97'
SELECT i.Item FROM dbo.Split(#Input, ',')i
WHERE IsNumeric(i.Item) = 1
Demo
Your complete query:
select * from dbo.DataCollector
where OrderCode in ( SELECT i.Item FROM dbo.Split(#Input, ',')i
WHERE IsNumeric(i.Item) = 1 )
Here is the split-function which i use:
CREATE FUNCTION [dbo].[Split]
(
#ItemList NVARCHAR(MAX),
#delimiter CHAR(1)
)
RETURNS #ItemTable TABLE (Item VARCHAR(250))
AS
BEGIN
DECLARE #tempItemList NVARCHAR(MAX)
SET #tempItemList = #ItemList
DECLARE #i INT
DECLARE #Item NVARCHAR(4000)
SET #i = CHARINDEX(#delimiter, #tempItemList)
WHILE (LEN(#tempItemList) > 0)
BEGIN
IF #i = 0
SET #Item = #tempItemList
ELSE
SET #Item = LEFT(#tempItemList, #i - 1)
INSERT INTO #ItemTable(Item) VALUES(#Item)
IF #i = 0
SET #tempItemList = ''
ELSE
SET #tempItemList = RIGHT(#tempItemList, LEN(#tempItemList) - #i)
SET #i = CHARINDEX(#delimiter, #tempItemList)
END
RETURN
END
Edit according to the comment of Damien that ISNUMERIC has it's issues. You can use this function to check if it's a real integer:
CREATE FUNCTION dbo.IsInteger(#Value VarChar(18))
RETURNS Bit
AS
BEGIN
RETURN IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
END
Here is another example with damien's "bad" input which contains £ and 0d0:
Demo

T-SQL: split and aggregate comma-separated values

I have the following table with each row having comma-separated values:
ID
-----------------------------------------------------------------------------
10031,10042
10064,10023,10060,10065,10003,10011,10009,10012,10027,10004,10037,10039
10009
20011,10027,10032,10063,10023,10033,20060,10012,10020,10031,10011,20036,10041
I need to get a count for each ID (a groupby).
I am just trying to avoid cursor implementation and stumped on how to do this without cursors.
Any Help would be appreciated !
You will want to use a split function:
create FUNCTION [dbo].[Split](#String varchar(MAX), #Delimiter char(1))
returns #temptable TABLE (items varchar(MAX))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end;
And then you can query the data in the following manner:
select items, count(items)
from table1 t1
cross apply dbo.split(t1.id, ',')
group by items
See SQL Fiddle With Demo
Well, the solution i always use, and probably there might be a better way, is to use a function that will split everything. No use for cursors, just a while loop.
if OBJECT_ID('splitValueByDelimiter') is not null
begin
drop function splitValueByDelimiter
end
go
create function splitValueByDelimiter (
#inputValue varchar(max)
, #delimiter varchar(1)
)
returns #results table (value varchar(max))
as
begin
declare #delimeterIndex int
, #tempValue varchar(max)
set #delimeterIndex = 1
while #delimeterIndex > 0 and len(isnull(#inputValue, '')) > 0
begin
set #delimeterIndex = charindex(#delimiter, #inputValue)
if #delimeterIndex > 0
set #tempValue = left(#inputValue, #delimeterIndex - 1)
else
set #tempValue = #inputValue
if(len(#tempValue)>0)
begin
insert
into #results
select #tempValue
end
set #inputValue = right(#inputValue, len(#inputValue) - #delimeterIndex)
end
return
end
After that you can call the output like this :
if object_id('test') is not null
begin
drop table test
end
go
create table test (
Id varchar(max)
)
insert
into test
select '10031,10042'
union all select '10064,10023,10060,10065,10003,10011,10009,10012,10027,10004,10037,10039'
union all select '10009'
union all select '20011,10027,10032,10063,10023,10033,20060,10012,10020,10031,10011,20036,10041'
select value
from test
cross apply splitValueByDelimiter(Id, ',')
Hope it helps, although i am still looping through everything
After reiterating the comment above about NOT putting multiple values into a single column (Use a separate child table with one value per row!),
Nevertheless, one possible approach: use a UDF to convert delimited string to a table. Once all the values have been converted to tables, combine all the tables into one table and do a group By on that table.
Create Function dbo.ParseTextString (#S Text, #delim VarChar(5))
Returns #tOut Table
(ValNum Integer Identity Primary Key,
sVal VarChar(8000))
As
Begin
Declare #dlLen TinyInt -- Length of delimiter
Declare #wind VarChar(8000) -- Will Contain Window into text string
Declare #winLen Integer -- Length of Window
Declare #isLastWin TinyInt -- Boolean to indicate processing Last Window
Declare #wPos Integer -- Start Position of Window within Text String
Declare #roVal VarChar(8000)-- String Data to insert into output Table
Declare #BtchSiz Integer -- Maximum Size of Window
Set #BtchSiz = 7900 -- (Reset to smaller values to test routine)
Declare #dlPos Integer -- Position within Window of next Delimiter
Declare #Strt Integer -- Start Position of each data value within Window
-- -------------------------------------------------------------------------
-- ---------------------------
If #delim is Null Set #delim = '|'
If DataLength(#S) = 0 Or
Substring(#S, 1, #BtchSiz) = #delim Return
-- --------------------------------------------
Select #dlLen = DataLength(#delim),
#Strt = 1, #wPos = 1,
#wind = Substring(#S, 1, #BtchSiz)
Select #winLen = DataLength(#wind),
#isLastWin = Case When DataLength(#wind) = #BtchSiz
Then 0 Else 1 End,
#dlPos = CharIndex(#delim, #wind, #Strt)
-- --------------------------------------------
While #Strt <= #winLen
Begin
If #dlPos = 0 Begin -- No More delimiters in window
If #isLastWin = 1 Set #dlPos = #winLen + 1
Else Begin
Set #wPos = #wPos + #Strt - 1
Set #wind = Substring(#S, #wPos, #BtchSiz)
-- ----------------------------------------
Select #winLen = DataLength(#wind), #Strt = 1,
#isLastWin = Case When DataLength(#wind) = #BtchSiz
Then 0 Else 1 End,
#dlPos = CharIndex(#delim, #wind, 1)
If #dlPos = 0 Set #dlPos = #winLen + 1
End
End
-- -------------------------------
Insert #tOut (sVal)
Select LTrim(Substring(#wind,
#Strt, #dlPos - #Strt))
-- -------------------------------
-- Move #Strt to char after last delimiter
Set #Strt = #dlPos + #dlLen
Set #dlPos = CharIndex(#delim, #wind, #Strt)
End
Return
End
Then write, (using your table schema),
Declare #AllVals VarChar(8000)
Select #AllVals = Coalesce(#allVals + ',', '') + ID
From Table Where ID Is Not null
-- -----------------------------------------
Select sVal, Count(*)
From dbo.ParseTextString(#AllVals, ',')
Group By sval

T-SQL strip all non-alpha and non-numeric characters

Is there a smarter way to remove all special characters rather than having a series of about 15 nested replace statements?
The following works, but only handles three characters (ampersand, blank and period).
select CustomerID, CustomerName,
Replace(Replace(Replace(CustomerName,'&',''),' ',''),'.','') as CustomerNameStripped
from Customer
One flexible-ish way;
CREATE FUNCTION [dbo].[fnRemovePatternFromString](#BUFFER VARCHAR(MAX), #PATTERN VARCHAR(128)) RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE #POS INT = PATINDEX(#PATTERN, #BUFFER)
WHILE #POS > 0 BEGIN
SET #BUFFER = STUFF(#BUFFER, #POS, 1, '')
SET #POS = PATINDEX(#PATTERN, #BUFFER)
END
RETURN #BUFFER
END
select dbo.fnRemovePatternFromString('cake & beer $3.99!?c', '%[$&.!?]%')
(No column name)
cake beer 399c
Create a function:
CREATE FUNCTION dbo.StripNonAlphaNumerics
(
#s VARCHAR(255)
)
RETURNS VARCHAR(255)
AS
BEGIN
DECLARE #p INT = 1, #n VARCHAR(255) = '';
WHILE #p <= LEN(#s)
BEGIN
IF SUBSTRING(#s, #p, 1) LIKE '[A-Za-z0-9]'
BEGIN
SET #n += SUBSTRING(#s, #p, 1);
END
SET #p += 1;
END
RETURN(#n);
END
GO
Then:
SELECT Result = dbo.StripNonAlphaNumerics
('My Customer''s dog & #1 friend are dope, yo!');
Results:
Result
------
MyCustomersdog1friendaredopeyo
To make it more flexible, you could pass in the pattern you want to allow:
CREATE FUNCTION dbo.StripNonAlphaNumerics
(
#s VARCHAR(255),
#pattern VARCHAR(255)
)
RETURNS VARCHAR(255)
AS
BEGIN
DECLARE #p INT = 1, #n VARCHAR(255) = '';
WHILE #p <= LEN(#s)
BEGIN
IF SUBSTRING(#s, #p, 1) LIKE #pattern
BEGIN
SET #n += SUBSTRING(#s, #p, 1);
END
SET #p += 1;
END
RETURN(#n);
END
GO
Then:
SELECT r = dbo.StripNonAlphaNumerics
('Bob''s dog & #1 friend are dope, yo!', '[A-Za-z0-9]');
Results:
r
------
Bobsdog1friendaredopeyo
I faced this problem several years ago, so I wrote a SQL function to do the trick. Here is the original article (was used to scrape text out of HTML). I have since updated the function, as follows:
IF (object_id('dbo.fn_CleanString') IS NOT NULL)
BEGIN
PRINT 'Dropping: dbo.fn_CleanString'
DROP function dbo.fn_CleanString
END
GO
PRINT 'Creating: dbo.fn_CleanString'
GO
CREATE FUNCTION dbo.fn_CleanString
(
#string varchar(8000)
)
returns varchar(8000)
AS
BEGIN
---------------------------------------------------------------------------------------------------
-- Title: CleanString
-- Date Created: March 26, 2011
-- Author: William McEvoy
--
-- Description: This function removes special ascii characters from a string.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
declare #char char(1),
#len int,
#count int,
#newstring varchar(8000),
#replacement char(1)
select #count = 1,
#len = 0,
#newstring = '',
#replacement = ' '
---------------------------------------------------------------------------------------------------
-- M A I N P R O C E S S I N G
---------------------------------------------------------------------------------------------------
-- Remove Backspace characters
select #string = replace(#string,char(8),#replacement)
-- Remove Tabs
select #string = replace(#string,char(9),#replacement)
-- Remove line feed
select #string = replace(#string,char(10),#replacement)
-- Remove carriage return
select #string = replace(#string,char(13),#replacement)
-- Condense multiple spaces into a single space
-- This works by changing all double spaces to be OX where O = a space, and X = a special character
-- then all occurrences of XO are changed to O,
-- then all occurrences of X are changed to nothing, leaving just the O which is actually a single space
select #string = replace(replace(replace(ltrim(rtrim(#string)),' ', ' ' + char(7)),char(7)+' ',''),char(7),'')
-- Parse each character, remove non alpha-numeric
select #len = len(#string)
WHILE (#count <= #len)
BEGIN
-- Examine the character
select #char = substring(#string,#count,1)
IF (#char like '[a-z]') or (#char like '[A-Z]') or (#char like '[0-9]')
select #newstring = #newstring + #char
ELSE
select #newstring = #newstring + #replacement
select #count = #count + 1
END
return #newstring
END
GO
IF (object_id('dbo.fn_CleanString') IS NOT NULL)
PRINT 'Function created.'
ELSE
PRINT 'Function NOT created.'
GO
I know this is an old thread, but still, might be handy for others.
Here's a quick and dirty (Which I've done inversely - stripping out non-numerics) - using a recursive CTE.
What makes this one nice for me is that it's an inline function - so gets around the nasty RBAR effect of the usual scalar and table-valued functions.
Adjust your filter as needs be to include or exclude whatever char types.
Create Function fncV1_iStripAlphasFromData (
#iString Varchar(max)
)
Returns
Table With Schemabinding
As
Return(
with RawData as
(
Select #iString as iString
)
,
Anchor as
(
Select Case(IsNumeric (substring(iString, 1, 1))) when 1 then substring(iString, 1, 1) else '' End as oString, 2 as CharPos from RawData
UNION ALL
Select a.oString + Case(IsNumeric (substring(#iString, a.CharPos, 1))) when 1 then substring(#iString, a.CharPos, 1) else '' End, a.CharPos + 1
from RawData r
Inner Join Anchor a on a.CharPos <= len(rtrim(ltrim(#iString)))
)
Select top 1 oString from Anchor order by CharPos Desc
)
Go
select * from dbo.fncV1_iStripAlphasFromData ('00000')
select * from dbo.fncV1_iStripAlphasFromData ('00A00')
select * from dbo.fncV1_iStripAlphasFromData ('12345ABC6789!&*0')
If you can use SQL CLR you can use .NET regular expressions for this.
There is a third party (free) package that includes this and more - SQL Sharp .