In a recent post Sql server rtrim not working for me, suggestions?, I got some good help getting a csv string out of a select query. It's behaving unexpectedly though, and I can't find any similar examples or documentation on it. The query returns 802 records without the coalesce statement, as a normal select. With the coalesce, I'm getting back just 81. I get this same result if I output to text, or output to file. This query returns 800+ rows:
declare #maxDate date = (select MAX(TradeDate) from tblDailyPricingAndVol)
select p.Symbol, ','
from tblDailyPricingAndVol p
where p.Volume > 1000000 and p.Clse <= 40 and p.TradeDate = #maxDate
order by p.Symbol
But when I attempt to concatenate those values, many are missing:
declare #maxDate date = (select MAX(TradeDate) from tblDailyPricingAndVol)
declare #str VARCHAR(MAX)
SELECT #str = COALESCE(#str+',' ,'') + LTRIM(RTRIM((p.Symbol)))
FROM tblDailyPricingAndVol p
WHERE p.Volume > 1000000 and p.Clse <= 40 and p.TradeDate = #maxDate
ORDER by p.Symbol
SELECT #str
This should be working fine, however here is how I would do it:
DECLARE #str VARCHAR(MAX) = '';
SELECT #str += ',' + LTRIM(RTRIM(Symbol))
FROM dbo.tblDailyPricingAndVol
WHERE Volume > 1000000 AND Clse <= 40 AND radeDate = #maxDate
ORDER by Symbol;
SET #str = STUFF(#str, 1, 1, '');
To determine whether the string is complete, stop looking at the output in Management Studio. This is always going to be truncated if you exceed the number of characters Management Studio will show. You can run a couple of tests to check the variable without inspecting it in its entirety:
A. Compare the datalength of the individual parts to the datalength of the result.
SELECT SUM(DATALENGTH(LTRIM(RTRIM(Symbol)))) FROM dbo.tblDailyPricingAndVol
WHERE ...
-- concatenation query here
SELECT DATALENGTH(#str);
-- these should be equal or off by one.
B. Compare the end of the variable to the last element in the set.
SELECT TOP 1 Symbol FROM dbo.tblDailyPricingAndVol
WHERE ...
ORDER BY Symbol DESC;
-- concatenation query here
SELECT RIGHT(#str, 20);
-- is the last element in the set represented at the end of the string?
Related
I've been asked to create a VIEW off a table that includes a varchar(MAX) column containing a JSON string. Unfortunately, some of the entries contain double quotes that aren't escaped.
Example (invalid in Notes):
{"Eligible":"true","Reason":"","Notes":"Left message for employee to "call me"","EDate":"08/16/2021"}
I don't have access to correct wherever this is being inserted so I just have to work with the data as is.
So in my view I need to find a way to escape those double quotes.
I'm pulling the data like so:
JSON_VALUE(JsonData, '$.Notes') as Notes
However, I get the following error:
JSON text is not properly formatted. Unexpected character '"' is found at position 102.
I can't do a simple replace on the whole field because that would create invalid JSON also.
I tried JSON_MODIFY but run into the problem of getting the notes field to replace itself.
JSON_MODIFY(JsonData, '$.Notes', REPLACE(JSON_VALUE(JsonData, '$.Notes'), '"', '\"'))
Maybe I'm missing something obvious, but I can't figure out how to handle this. Is there a way to escape those double quotes in my query?
So this is incredibly hacky and there are probably several examples that could break it as is, but if you absolutely can't fix your source data output or simply flag bad JSON for manual adjustment, this may be the route you need to take and further flesh out.
Based on your example and a couple extras I have thrown in, with the help of a custom string splitting table valued function that maintains sort order, you can achieve the output as follows:
Query
declare #t table (JsonData nvarchar(max));
insert into #t values('{"Eligible":true,"Reason":"","Notes":"Left message for employee to "call me"","EDate":"08/16/2021","Test": "999","Another Test":"Value with " character"}');
with q as
(
select t.JsonData
,s.rn
,case when right(trim(lag(s.item,1) over (order by s.rn)),1) in('{',':',',')
then '"'
else ''
end -- Do we need a starting double quote?
+ s.item -- Value from the split text
+ case when right(trim(lead(s.item,1) over (order by s.rn)),1) not in('}',':',',')
and right(trim(s.item),1) not in('{','}',':',',')
then '\"'
else ''
end -- Do we need an escaped double quote?
+ case when left(trim(lead(s.item,1) over (order by s.rn)),1) in('}',':',',')
then '"'
else ''
end -- Do we need an ending double quote?
as Quoted
from #t as t
cross apply dbo.fn_StringSplit4k(t.JsonData,'"',null) as s -- By splitting on " characters, we know where they all are even though they are removed, so we can add them back in as required based on the remaining text
)
,j as
(
select JsonData
,string_agg(Quoted,'') within group (order by rn) as JsonFixed
from q
group by JsonData
)
select json_value(JsonFixed, '$.Eligible') as Eligible
,json_value(JsonFixed, '$.Reason') as Reason
,json_value(JsonFixed, '$.Notes') as Notes
,json_value(JsonFixed, '$.EDate') as EDate
,json_value(JsonFixed, '$.Test') as Test
,json_value(JsonFixed, '$."Another Test"') as AnotherTest
from j;
Output
Eligible
Reason
Notes
EDate
Test
AnotherTest
true
Left message for employee to "call me"
08/16/2021
999
Value with " character
String Splitter
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
I would like to suggest a stored procedure along these lines:
CREATE FUNCTION dbo.clearJSon(#v nvarchar(max)) RETURNS nvarchar(max)
AS
BEGIN
DECLARE #i AS int
DECLARE #security int
SET #i=PATINDEX('%[^{:,]"[^,:}]%',#v)
SET #security=0 -- just to prevent an endless loop
WHILE #i>0 and #security<100
BEGIN
SET #v = LEFT(#v,#i)+''''+SUBSTRING(#v,#i+2,len(#v))
SET #i=PATINDEX('%[^{:,]"[^,:}]%',#v)
SET #security = #security+1
END
RETURN #v
END
which returns
{"Eligible":"true","Reason":"","Notes":"Left message for employee to 'call me'","EDate":"08/16/2021"} as the result of dbo.clearJSon(JsonData)
I have to admit though, that the above code would fail, if the unescaped quotes would be followed by one of ,:} or if it would trail one of {:,
In the code below, I'm trying go through the results of endDateTable row by row, comparing the current row's endDate to the previous row's endDate. If there has been any change since the previous, we increment #revisionNum. However, upon populating the new table, all of the #revisionNum entries are 0. What am I doing wrong?
NOTE: I'm using prepared statements in this manner since doing a straightforward SELECT into a variable gives a syntax error due to the LIMIT clause not allowing a variable in our version of MySQL.
BEGIN
DECLARE _currentEndDate DATETIME DEFAULT now();
DECLARE _priorEndDate DATETIME DEFAULT now();
SET #ResultsCount = (SELECT COUNT(*) FROM mainTable);
SET #j = 0;
WHILE #j < #ResultsCount DO
SET #revisionNum = 0;
/*CURRENT END DATE*/
SET #appResultQueryCurrent = CONCAT('
SELECT
end_date
INTO _currentEndDate
FROM endDateTable
LIMIT ', #j, ', 1'
);
PREPARE currentQueryStmt FROM #appResultQueryCurrent;
EXECUTE currentQueryStmt;
/*PREVIOUS END DATE*/
SET #appResultQueryPrior = CONCAT('
SELECT
end_date
INTO _priorAppEndDate
FROM endDateTable
LIMIT ', IF(#j = 0, 0, #j - 1), ', 1'
);
PREPARE priorQueryStmt FROM #appResultQueryPrior;
EXECUTE priorQueryStmt;
SET #revisionNum = IF(
#j = 0 OR (_currentEndDate = _priorEndDate),
#revisionNum,
IF(
_currentEndDate != _priorEndDate,
#revisionNum + 1,
#revisionNum
)
);
INSERT INTO finalTable (RevisionNum)
SELECT
#revisionNum AS RevisionNum
FROM endDateTable;
SET #j = #j +1;
END WHILE;
END $$
You don't need a loop, you can use INSERT INTO ... SELECT ..., incrementing the variable in the select query.
You also need an ORDER BY criteria to specify how to order the rows when comparing one row to the previous row.
INSERT INTO finalTable (RevisionNum, otherColumn)
SELECT revision, otherColumn
FROM (
SELECT IF(end_date = #prev_end_date, #revision, #revision := #revision + 1) AS revision,
#prev_end_date := end_date,
otherColumn
FROM endDateTable
CROSS JOIN (SELECT #prev_end_date := NULL, #revision := -1) AS vars
ORDER BY id) AS x
DEMO
The offset value in the LIMIT clause is tenuous without an ORDER BY.
Without an ORDER BY clause, MySQL is free to return results in any sequence.
There is no guarantee that LIMIT 41,1 will return the row before LIMIT 42,1, or that it won't return the exact same row as LIMIT 13,1 did.
(A table in a relational database represents an unordered set of tuples, there is no guaranteed "order" or rows in a table.)
But just adding ORDER BY to the queries isn't enough to fix the Rube-Goldberg-esque rigmarole.
In the code shown, it looks like each time through the loop, we're inserting a copy of endDateTable into finalTable. If that's 1,000 rows in endDateTable, we're going to get 1,000,000 rows (1,000 x 1,000) inserted into finalTable. Not at all clear why we need so many copies.
Given the code shown, it's not clear what the objective is. Looks like we are conditionally incrementing revisionNum, the end result of which is the highest revision num. Just guessing here.
If there is some kind of requirement to do this in a LOOP construct, within a procedure, I'd think we'd do a cursor loop. And we can use procedure variables vs user-defined variables.
Something along these lines:
BEGIN
DECLARE ld_current_end_date DATETIME;
DECLARE ld_prior_end_date DATETIME;
DECLARE li_done INT;
DECLARE li_revision_num INT;
DECLARE lcsr_end_date CURSOR FOR SELECT t.end_date FROM `endDateTable` t ORDER BY NULL;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET li_done = TRUE;
SET li_done = FALSE;
SET li_revision_num = 0;
OPEN lcsr_end_date;
FETCH lcsr_end_date INTO ld_current_end_date;
SET ld_prior_end_date = ld_current_end_date;
WHILE NOT li_done DO
SET li_revision_num = li_revision_num + IF( ld_current_end_date <=> ld_prior_end_date ,0,1);
SET ld_prior_end_date := ld_current_end_date;
FETCH lcsr_end_date INTO ld_current_end_date;
END WHILE;
CLOSE lcsr_end_date;
INSERT INTO `finalTable` (revisionnum) VALUES (li_revision_num);
END $$
Note the "order by" clause on the SELECT, its not clear what the rows should be ordered on, so we're using a literal as a placeholder.
As the end result, we insert a single row into finalTable.
Again, it's not clear what the code in the question is supposed to achieve, but doing a cursor loop across ordered rows would be much more efficient than a bazillion dynamic SQL executions fetching individual rows.
So I have the following sql that works:
SELECT CONCAT_WS('',`brand`,`pattern`,`product_code`,`size`,`normal_price`,`sale_price`,`text_special`,`load_index`,`speed_index`,`id`) `all_columns` FROM baz_tyres
HAVING (`all_columns` LIKE '%con%')
ORDER BY brand asc
LIMIT 0, 10
This works fine returning any record that contains the search string in any column such as "ContinentalContiCrossContact® LX21549340225/65R17."
But then I also wanted to be able to match just the numerical value from size so I added the custom function I wound here:
How to get only Digits from String in mysql?
DELIMITER $$
CREATE FUNCTION `ExtractNumber`(in_string VARCHAR(50))
RETURNS INT
NO SQL
BEGIN
DECLARE ctrNumber VARCHAR(50);
DECLARE finNumber VARCHAR(50) DEFAULT '';
DECLARE sChar VARCHAR(1);
DECLARE inti INTEGER DEFAULT 1;
IF LENGTH(in_string) > 0 THEN
WHILE(inti <= LENGTH(in_string)) DO
SET sChar = SUBSTRING(in_string, inti, 1);
SET ctrNumber = FIND_IN_SET(sChar, '0,1,2,3,4,5,6,7,8,9');
IF ctrNumber > 0 THEN
SET finNumber = CONCAT(finNumber, sChar);
END IF;
SET inti = inti + 1;
END WHILE;
RETURN CAST(finNumber AS UNSIGNED);
ELSE
RETURN 0;
END IF;
END$$
DELIMITER ;
Now that I have this function I also want to concat the resulting number and use it for the search.
So I added ExtractNumber(size) into the congat
SELECT CONCAT_WS('',ExtractNumber(`size`),`brand`,`pattern`,`product_code`,`size`,`normal_price`,`sale_price`,`text_special`,`load_index`,`speed_index`,`id`) `all_columns` FROM baz_tyres
HAVING (`all_columns` LIKE '%con%')
ORDER BY brand asc
LIMIT 0, 10
When the function is involved the like search fails to find any matches. However if I change the havving to a where condition checking for a specific brand name...
SELECT CONCAT_WS('',ExtractNumber(`size`),`brand`,`pattern`,`product_code`,`size`,`normal_price`,`sale_price`,`text_special`,`load_index`,`speed_index`,`id`) `all_columns` FROM baz_tyres
WHERE (`brand` LIKE '%con%')
ORDER BY brand asc
LIMIT 0, 10
Then I can see that indeed the concat with the function inside does actually work returning "2147483647ContinentalVancoFourSeason 20473361205/7...." but performing a like on this resulting string doesn't match when it should.
Some columns have special chars and I have tried casting the function result to utf8 and it had no effect.
Any ideas why I cant do a like on this concat string?
UPDATE:
It's working now..
I had to put the conversion inside the iterator function itself.
RETURN CAST(finNumber AS CHAR);
For some reason convert or cast with the function inside would produce the correct result but still wouldn't allow a like comparison to match afterwards.
So now the following query
SELECT *,CONCAT_WS('',ExtractNumber_CHAR(`size`),`brand`,`pattern`,`product_code`,`size`,`normal_price`,`sale_price`,`text_special`,`load_index`,`speed_index`,`id`) `all_columns`
FROM `baz_tyres`
HAVING (`all_columns` LIKE '%nat%' )
ORDER BY `brand` DESC
LIMIT 0,10
Produces the desired result however a NEW problem has now appeared that is really strange..
If I do the exact same query but order by ASC instead of DESC then I get 0 results.
Really strange that order affects weather any results are returned. If the extract number function is removed ordering either way returns results.
When I put the function back in I can only get results when order by desc.
Can anyone tell me why this odd behavior would occur?
Snytax of using having clause is this
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY
Having is used with aggregated function after using group by
Try to change query by using group by else use where instead having
I wish to replace cca 3 500 000 values in Mysql Table. Each value is string in the form of AB00123012 and I wish to remove leading zeroes after letters - i.e. get AB123012 (zeroes inside number should be kept).
The value has always exactly 10 characters.
Since Mysql does not allow replace by regex, I have used following function:
DELIMITER $$
CREATE FUNCTION fn_RawRingNumber (rn CHAR(10))
RETURNS CHAR(10) DETERMINISTIC
BEGIN
DECLARE newrn CHAR(10);
DECLARE pos INT(8);
DECLARE letters CHAR(2);
DECLARE nr CHAR(8);
IF (CHAR_LENGTH(rn) = 10) THEN
SET pos = (SELECT POSITION('0' IN rn));
SET letters = (SELECT SUBSTRING_INDEX(rn, '0', 1));
SET nr = (SELECT TRIM(LEADING '0' FROM SUBSTRING(rn,pos)));
SET newrn = (SELECT CONCAT(letters, nr));
ELSE
SET newrn = rn;
END IF;
RETURN newrn;
END$$
DELIMITER ;
While this works, it is rather slow and I am wondering, if there is not a better way to do this?
If you can afford to take your site offline for a few minutes, the fastest way would be to dump, process and re import. Since the current operation makes queries/inserts on that table pretty slow, so you are probably better off with a dump/process/import anyway.
Step 1 dump.
SELECT INTO OUTFILE is your friend here
Step 2 process
Use your favourite programming language or if you are lucky to be on linux, something like sed or even cut. If you need help with the regex post a comment.
Step 3 reimport
After clearing out the table. Do a LOAD DATA INFILE.
these three steps should all be reasonably quick. Specially if you have a n index on that column.
Try this
Note: I not tested this with many rows and don't know how this is efficient.
Also, if this is fast, please before using this, think all possible variations, which may occurs with your string, may be I missed some variants, not sure 100%.
select case
when INSTR(col, '0') = 2 then concat( substr(col, 1, 1), substr(col, 2) * 1)
when INSTR(col, '0') = 3 and substr(col, 2, 1) not in('1','2','3','4','5','6','7','8','9') then concat( substr(col, 1, 2), substr(col, 3) * 1)
else col
end
from (
select 'AB00123012' as col union all
select 'A010000123' as col union all
select 'A1000000124' as col union all
select 'A0000000124' as col union all
select '.E00086425' as col
) t
We have a SQL Server Scalar Function and part of the process is to take one of the input values and do the following
'inputvalue'
Create a table variable and populate with the following rows
inputvalue
inputvalu
inputval
inputva
inputv
input
inpu
inp
Then this table is joined to a query, ordered by len of the inputvalue desc and returns the top 1. The actual code is here
DECLARE #Result NVARCHAR(20);
DECLARE #tempDialCodes TABLE (tempDialCode NVARCHAR(20));
DECLARE #counter INT = LEN(#PhoneNumber);
WHILE #counter > 2
BEGIN
INSERT INTO #tempDialCodes(tempDialCode) VALUES(#PhoneNumber);
SET #PhoneNumber = SUBSTRING(#PhoneNumber, 1, #counter - 1);
SET #counter = #counter - 1;
END
SET #Result = (SELECT TOP 1 [DialCodeID]
FROM DialCodes dc JOIN #tempDialCodes s
ON dc.DialCode = s.tempDialCode
ORDER BY LEN(DialCode) DESC);
RETURN #Result
It works fine but I am asking if there is a way to replace the while loop and somehow joining to the inputvalue to get the same result. When I say it works fine, it's too dam slow but it does work.
I'm stumped on how to break up this string without using a loop and to a table variable but my warning light tells me this is not efficient for running against a table with a million rows.
Are you familiar with tally tables? The speed difference can be incredible. I try to replace every loop with a tally table if possible. The only time I haven't been able to so far is when calling a proc from within a cursor. If using this solution I would recommend a permanent dbo.Tally table with a sufficiently large size rather than recreating every time in the function. You will find other uses for it!
declare #PhoneNumber nvarchar(20) = 'inputvalue';
declare #tempDialCodes table (tempDialCode nvarchar(20));
--create and populate tally table if you don't already a permanent one
--arbitrary 1000 rows for demo...you should figure out if that is enough
--this a 1-based tally table - you will need to tweak if you make it 0-based
declare #Tally table (N int primary key);
insert #Tally
select top (1000) row_number() over (order by o1.object_id) from sys.columns o1, sys.columns o2 order by 1;
--select * from #Tally order by N;
insert #tempDialCodes
select substring(#PhoneNumber, 1, t.N)
from #Tally t
where t.N between 3 and len(#PhoneNumber)
order by t.N desc;
select *
from #tempDialCodes
order by len(tempDialCode) desc;