unescape diactrics in \u0 format (json) in ms sql (SQL Server) - json

I'm getting json file, which I load to Azure SQL databese. This json is direct output from API, so there is nothing I can do with it before loading to DB.
In that file, all Polish diactircs are escaped to "C/C++/Java source code" (based on: http://www.fileformat.info/info/unicode/char/0142/index.htm
So for example:
ł is \u0142
I was trying to find some method to convert (unescape) those to proper Polish letters.
In worse case scenario, I can write function which will replace all combinations
Repalce(Replace(Replace(string,'\u0142',N'ł'),'\u0144',N'ń')))
And so on, making one big, terrible function...
I was looking for some ready functions like there is for URLdecode, which was answered here on stack in many topics, and here: https://www.codeproject.com/Articles/1005508/URL-Decode-in-T-SQL
Using this solution would be possible but I cannot figure out cast/convert with proper collation and types in there, to get result I'm looking for.
So if anyone knows/has function that would make conversion in string for unescaping that \u this would be great, but I will manage to write something on my own if I would get right conversion. For example I tried:
select convert(nvarchar(1), convert(varbinary, 0x0142, 1))
I made assumption that changing \u to 0x will be the answer but it gives some Chinese characters. So this is wrong direction...
Edit:
After googling more I found exactly same question here on stack from #Pasetchnik: Json escape unicode in SQL Server
And it looks this would be the best solution that there is in MS SQL.
Onlty thing I needed to change was using NVARCHAR instead of VARCHAR that is in linked solution:
CREATE FUNCTION dbo.Json_Unicode_Decode(#escapedString nVARCHAR(MAX))
RETURNS nVARCHAR(MAX)
AS
BEGIN
DECLARE #pos INT = 0,
#char nvarCHAR,
#escapeLen TINYINT = 2,
#hexDigits TINYINT = 4
SET #pos = CHARINDEX('\u', #escapedString, #pos)
WHILE #pos > 0
BEGIN
SET #char = NCHAR(CONVERT(varbinary(8), '0x' + SUBSTRING(#escapedString, #pos + #escapeLen, #hexDigits), 1))
SET #escapedString = STUFF(#escapedString, #pos, #escapeLen + #hexDigits, #char)
SET #pos = CHARINDEX('\u', #escapedString, #pos)
END
RETURN #escapedString
END

Instead of nested REPLACE you could use:
DECLARE #string NVARCHAR(MAX)= N'\u0142 \u0144\u0142';
SELECT #string = REPLACE(#string,u, ch)
FROM (VALUES ('\u0142',N'ł'),('\u0144', N'ń')) s(u, ch);
SELECT #string;
DBFiddle Demo

Related

Sql script to get English text from script

I have a column with both English and Chinese text.
Example: The hills have eyes. 隔山有眼
Expected results: The hills have eyes.
How can I extract the English text from that string using sql, please.
Thanks for help.
A quick-and-dirty way simply converts the string to ASCII and removes the '?' -- which is the representation of the other characters:
select replace(convert(t.str using ascii), '?', '')
from t;
The only downside is that you lose '?' characters in the original string as well.
Here is a db<>fiddle.
For more control over the replacement, you can use regexp_replace():
select regexp_replace(t.str, '[^a-zA-Z0-9.?, ]', '')
from t;
Unfortunately, I am not aware of a character class for ASCII-only characters.
One option you have is to use a function that returns just the english only text.
Additionally, you could make it dual-purpose with another parameter to determine if you want the English text or Non-English text to switch the <127 comparison.
CREATE FUNCTION `EnglishOnly`(String VARCHAR(100))
RETURNS varchar(100)
NO SQL
BEGIN
DECLARE output VARCHAR(100) DEFAULT '';
DECLARE i INTEGER DEFAULT 1;
DECLARE ch varchar(1);
IF LENGTH(string) > 0 THEN
WHILE(i <= LENGTH(string)) DO
SET ch=SUBSTRING(string, i, 1);
IF ASCII(ch)<127 then
set output = CONCAT(output,ch);
END IF;
SET i = i + 1;
END WHILE;
END IF;
RETURN output;
END;
You can then sinply use it like so
select EnglishOnly ("The hills have eyes 隔山有眼that see all.")
Output
The hills have eyes that see all.
Example Fiddle

JSON text is not properly formatted. Unexpected character 'N' is found at position 0

I am new to JSON in SQL. I am getting the error "JSON text is not properly formatted. Unexpected character 'N' is found at position 0." while executing the below -
DECLARE #json1 NVARCHAR(4000)
set #json1 = N'{"name":[{"FirstName":"John","LastName":"Doe"}], "age":31, "city":"New York"}'
DECLARE #v NVARCHAR(4000)
set #v = CONCAT('N''',(SELECT value FROM OPENJSON(#json1, '$.name')),'''')
--select #v as 'v'
SELECT JSON_VALUE(#v,'$.FirstName')
the " select #v as 'v' " gives me
N'{"FirstName":"John","LastName":"Doe"}'
But, using it in the last select statement gives me error.
DECLARE #v1 NVARCHAR(4000)
set #v1 = N'{"FirstName":"John","LastName":"Doe"}'
SELECT JSON_VALUE(#v1,'$.FirstName') as 'FirstName'
also works fine.
If you're using SQL Server 2016 or later there is build-in function ISJSON which validates that the string in the column is valid json.
Therefore you can do things like this:
SELECT
Name,
JSON_VALUE(jsonCol, '$.info.address.PostCode') AS PostCode
FROM People
WHERE ISJSON(jsonCol) > 0
You are adding the Ncharacter in your CONCAT statement.
Try changing the line:
set #v = CONCAT('N''',(SELECT value FROM OPENJSON(#json1, '$.name')),'''')
to:
set #v = CONCAT('''',(SELECT value FROM OPENJSON(#json1, '$.name')),'''')
JSON_VALUE function may first be executed on all rows before applying the where clauses. it will depend on execution plan so small things like having top clause or ordering may have a impact on that.
It means that if your json data is invalid anywhere in that column(in the whole table), it will throw an error when the query is executed.
So find and fix those invalid json formats first. for example if that column has a ' instead of " it cannot be parsed and will cause the whole TSQL query to throw an error

The argument 1 of the XML data type method "modify" must be a string literal

Hi i am getting the string literal error when i am trying to add an attribute to the child node. How can i modify my code in order to add an attribute successfully.
declare #count int=(select mxGraphXML.value('count(/mxGraphModel/root/Cell/#Value )','nvarchar') from TABLE_LIST
where Table_ListID=1234 )
declare #index int=1;
while #index<=#count
begin
declare #Value varchar(100)= #graphxml.value('(/mxGraphModel/root/Cell/#Value )[1]','nvarchar');
SET #graphxml.modify('insert attribute copyValueID {sql:variable("#Value ")}
as first into (/mxGraphModel/root/Cell)['+convert(varchar,#index)+']');
end
set #index=#index+1;
end
You're using the addition operator where you should be using the CONCAT function. So
'insert attribute copyValueID {sql:variable("#Value ")}
as first into (/mxGraphModel/root/Cell)['+convert(varchar,#index)+']'
is being coerced into a number. Try:
CONCAT('insert attribute copyValueID {sql:variable("#Value ")}
as first into (/mxGraphModel/root/Cell)[',convert(varchar,#index),']')
instead.
Adam, you can do it in Microsoft T-SQL like this:
declare #sql nvarchar(max)
set #sql = 'set #myxml.modify(''
insert (
attribute scalableFieldId {sql:variable("#sf_id")},
attribute myTypeId {sql:variable("#my_type_id")}
) into (/VB/Condition/Field[#fieldId=sql:variable("#field_id")
and #fieldCode=sql:variable("#field_code")])['+
cast(#instance as varchar(3))+']'')'
exec sp_executesql
#sql
,N'#myxml xml output, #field_code varchar(20),
#field_id varchar(20), #sf_id int, #my_type_id tinyint'
,#myxml = #myxml output
,#field_code = #field_code
,#field_id = #field_id
,#sf_id = #sf_id
,#my_type_id = #my_type_id
See what I've done here? It's just a clever usage of Dynamic SQL to overcome Microsoft's moronic limitation of "string literal error".
IMPORTANT NOTE: yes, you can MOSTLY do this by using sql:variable() in SOME places BUT good luck trying to use it in the node number qualifier inside the square brackets! You can't do this without Dynamic SQL by design!
The trick is not mine actually, I got the idea from https://www.opinionatedgeek.com/Snaplets/Blog/Form/Item/000299/Read after banging my head against the wall for a while.
Feel free to ask questions if my sample does not work or something is not clear.

SQL Server 2008: why doesn't BETWEEN handle strings which include the dash (-)?

My applications pulls rows from a table which contain a column = StringA. The user enters 2 parameters, From and Thru strings, and if she enters the same string in both, and the string has an embedded dash, it's not working.
Why doesn't the following return True where ColumnA and StringA = 'medi-care', but it does return true where ColumnA and StringA = 'medicare' (no dash)?
IF ColumnA between StringA and StringA...
I also tried:
IF ColumnA <= StringA and ColumnA >= StringA...
Is this a bug? I tried appending a 'z' to the Thru parameter string - still didn't work for the string with a dash embedded. Can you suggest a way to make this work?
Is it possible that you actually have an en dash or em dash in one of them but not the other. Often times it is very difficult to tell. Ex:
Declare #StringA varchar(20)
Declare #ColumnA VarChar(20)
Select #StringA = 'medi-care',
#ColumnA = 'medi—care'
Select 'They Match'
Where #StringA = #ColumnA
Note that in this example, #ColumnA actually has an en dash instead of a dash. If you look closely, you may be able to tell, but it is very difficult to notice a difference unless you are specifically looking for it.
It works for me. I ran this on SQL 2008R2:
DECLARE #Test TABLE (
ColA varchar(7)
);
INSERT INTO #Test (ColA) VALUES ('a-z');
SELECT * FROM #Test WHERE ColA BETWEEN 'a-z' AND 'a-z';
And I got the result of 'a-z'. There must be something else in your data that is causing the non-match. Blank spaces or other invisible characters.

Having long strings in MySql stored procedures

Is there a way of enabling a long strings to be put onto multiple lines so that when viewed on screen or printed the code is easier to read?
Perhaps I could be clearer.
Have a stored procedure with lines like
IF ((select post_code REGEXP '^([A-PR-UWYZ][A-HK-Y]{0,1}[0-9]{1,2} [0-9][ABD-HJLNP-UW-Z]{2})|([A-PR-UWYZ][0-9][A-HJKMPR-Y] [0-9][ABD-HJLNP-UW-Z]{2})|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]) [0-9][ABD-HJLNP-UW-Z]{2})$') = 0)
Would like to be able to modify the string so that I can view it within 80 character width. Anybody got any ideas of how to do this.
PS: It is the regular expression for UK postcodes
For example,
-- a very long string in one block
set my_str = 'aaaabbbbcccc';
can be also written as
-- a very long string, as a concatenation of smaller parts
set my_str = 'aaaa' 'bbbb' 'cccc';
or even better
-- a very long string in a readable format
set my_str = 'aaaa'
'bbbb'
'cccc';
Note how the spaces and end of line between the a/b/c parts are not part of the string itself, because of the placement of quotes.
Also note that the string data here is concatenated by the parser, not at query execution time.
Writing something like:
-- this is broken
set my_str = 'aaaa
bbbb
cccc';
produces a different result.
See also
http://dev.mysql.com/doc/refman/5.6/en/string-literals.html
Look for "Quoted strings placed next to each other are concatenated to a single string"
You could split it up into the front and back components of the postcode and then dump the whole lot into a UDF.
This will keep the ugliness in one place and means you'll only have to make changes to one block of code when/if Royal Mail decide to change the format of UK postcodes ;-)
Something like this should do the trick:
DELIMITER $$
CREATE FUNCTION `isValidUKPostcode`(candidate varchar(255)) RETURNS BOOLEAN READS SQL DATA
BEGIN
declare back varchar(3);
declare front varchar(10);
declare v_out boolean;
set back = substr(candidate,-3);
set front = substr(candidate,1,length(candidate)-3);
set v_out = false;
IF (back REGEXP '^[0-9][ABD-HJLNP-UW-Z]{2}$'= 1) THEN
CASE
WHEN front REGEXP '^[A-PR-UWYZ][A-HK-Y]{0,1}[0-9]{1,2} $' = 1 THEN set v_out = true;
WHEN front REGEXP '^[A-PR-UWYZ][0-9][A-HJKMPR-Y] $' = 1 THEN set v_out = true;
WHEN front REGEXP '^[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y] $' = 1 THEN set v_out = true;
END CASE;
END IF;
return v_out;
END