mySql JSON string field returns encoded - mysql

First week having to deal with a MYSQL database and JSON field types and I cannot seem to figure out why values are encoded automatically and then returned in encoded format.
Given the following SQL
-- create a multiline string with a tab example
SET #str ="Line One
Line 2 Tabbed out
Line 3";
-- encode it
SET #j = JSON_OBJECT("str", #str);
-- extract the value by name
SET #strOut = JSON_EXTRACT(#J, "$.str");
-- show the object and attribute value.
SELECT #j, #strOut;
You end up with what appears to be a full formed JSON object with a single attribute encoded.
#j = {"str": "Line One\n\tLine 2\tTabbed out\n\tLine 3"}
but using JSON_EXTRACT to get the attribute value I get the encoded version including outer quotes.
#strOut = "Line One\n\tLine 2\tTabbed out\n\tLine 3"
I would expect to get my original string with the \n \t all unescaped to the original values and no outer quotes. as such
Line One
Line 2 Tabbed out
Line 3
I can't seem to find any JSON_DECODE or JSON_UNESCAPE or similar functions.
I did find a JSON_ESCAPE() function but that appears to be used to manually build a JSON object structure in a string.
What am I missing to extract the values to the original format?

I like to use handy operator ->> for this.
It was introduced in MySQL 5.7.13, and basically combines JSON_EXTRACT() and JSON_UNQUOTE():
SET #strOut = #J ->> '$.str';

You are looking for the JSON_UNQUOTE function
SET #strOut = JSON_UNQUOTE( JSON_EXTRACT(#J, "$.str") );

The result of JSON_EXTRACT() is intentionally a JSON document, not a string.
A JSON document may be:
An object enclosed in { }
An array enclosed in [ ]
A scalar string value enclosed in " "
A scalar number or boolean value
A null — but this is not an SQL NULL, it's a JSON null. This leads to confusing cases because you can extract a JSON field whose JSON value is null, and yet in an SQL expression, this fails IS NULL tests, and it also fails to be equal to an SQL string 'null'. Because it's a JSON type, not a scalar type.

Related

Error parsing JSON: more than one document in the input (Redshift to Snowflake SQL)

I'm trying to convert a query from Redshift to Snowflake SQL.
The Redshift query looks like this:
SELECT
cr.creatives as creatives
, JSON_ARRAY_LENGTH(cr.creatives) as creatives_length
, JSON_EXTRACT_PATH_TEXT(JSON_EXTRACT_ARRAY_ELEMENT_TEXT (cr.creatives,0),'previewUrl') as preview_url
FROM campaign_revisions cr
The Snowflake query looks like this:
SELECT
cr.creatives as creatives
, ARRAY_SIZE(TO_ARRAY(ARRAY_CONSTRUCT(cr.creatives))) as creatives_length
, PARSE_JSON(PARSE_JSON(cr.creatives)[0]):previewUrl as preview_url
FROM campaign_revisions cr
It seems like JSON_EXTRACT_PATH_TEXT isn't converted correctly, as the Snowflake query results in error:
Error parsing JSON: more than one document in the input
cr.creatives is formatted like this:
"[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]"
It seems to me that you are not working with valid JSON data inside Snowflake.
Please review your file format used for the copy into command.
If you open the "JSON" text provided in a text editor , note that the information is not parsed or formatted as JSON because of the quoting you have. Once your issue with double quotes / escaped quotes is handled, you should be able to make good progress
Proper JSON on Left || Original Data on Right
If you are not inclined to reload your data, see if you can create a Javascript User Defined Function to remove the quotes from your string, then you can use Snowflake to process the variant column.
The following code is working POJO that can be used to remove the doublequotes for you.
var textOriginal = '[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]';
function parseText(input){
var a = input.replaceAll('""','\"');
a = JSON.parse(a);
return a;
}
x = parseText(textOriginal);
console.log(x);
For anyone else seeing this double double quote issue in JSON fields coming from CSV files in a Snowflake external stage (slightly different issue than the original question posted):
The issue is likely that you need to use the FIELD_OPTIONALLY_ENCLOSED_BY setting. Specifically, FIELD_OPTIONALLY_ENCLOSED_BY = '"' when setting up your fileformat.
(docs)
Example of creating such a file format:
create or replace file format mydb.myschema.my_tsv_file_format
type = CSV
field_delimiter = '\t'
FIELD_OPTIONALLY_ENCLOSED_BY = '"';
And example of querying from a stage using this file format:
select
$1 field_one
$2 field_two
-- ...and so on
from '#my_s3_stage/path/to/file/my_tab_separated_file.csv' (file_format => 'my_tsv_file_format')

How replace "-" only in a *text* value of any generic jsonb in postgresql?

I need to clean JSON data that could look like:
{
"reference":"0000010-CAJ",
"product_code":"00000-10",
"var_name":"CAJ-1",
"doc_date":"2020-02-09T21:01:01-05:00",
"due_date":"2020-03-10T21:01:01-05:00",
}
However, this is just one of many other possibilities (is for a log aggregation that gets data from many sources).
I need to replace "-" with "_", but without break the dates like "2020-03-10T21:01:01-05:00", so can't simply cast to string and do a replace. I wonder if exist an equivalent of:
for (k,v) in json:
if is_text(v):
v = replace(...)
You can check with a regex if the value looks like a timestamp:
update the_table
set the_column = (select
jsonb_object_agg(
key,
case
when value ~ '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}.*' then value
else replace(value, '-', '_')
end)
from jsonb_each_text(the_column) as t(key, value))
This iterates over all key/value pairs in the JSON column (using jsonb_each_text()) and assembles all of them back into a JSON again (using jsonb_object_agg()). Values that look like a timestamp are left unchanged, for all others, the - is replaced with a _.

Hive parse json elements from a long concat json string

I have a server log, it continuously records json values without any delimiter, such as:
{"a":1}{"b",2}{"a":2}{"c":{\"qwe\":\"asd\"},"d":"ert"}{"e":12}....
I want to extract each element and put them into rows like:
{"a":1}
{"b",2}
{"a":2}
{"c":{\"qwe\":\"asd\"},"d":"ert"}
{"e":12}..
The log lacks delimiter and comprises nested json, so I cannot use
split function...How to achieve this...
One option would be to split on }{ character and get the elements using posexplode. Positions are only needed to concatenate properly for the first and last elements.
select case when pos = 0 then concat(split_str,'}')
when pos = max(pos) over(partition by str) then concat('{',split_str)
else concat('{',split_str,'}') end as res
from tbl
lateral view posexplode(split(str,'\\}\\{')) t as pos,split_str
Note the result will be a string.

How replace null value with {} in mysql?

I am trying to fetch value from table where null status value should get replace with {}(empty json object), so that I have used below mysql function
IFNULL(status, '{}') as status from table;
but its output is '{}' but I want output as only {} (without single quotes)
Also I have tried with below options as well
IFNULL(status, "{}") --> output -"{}"
IFNULL(status, '{}') --> output -'{}'
IFNULL(status, {}) --> output -Mysql error`
Expected output is only empty j son object Please suggest any solution.
Check the function JSON_UNQUOTE :
SELECT JSON_UNQUOTE(IFNULL(status, "{}")) as status FROM table
mysql is not supporting JSON_UNQUOTE function in case you are converting that mysql result into json object. so work around is to use replace string function(java or any other language) in your framework.
EX.
String rs = str.replace(""{","{"); // Replace '"{' with '{'
String rs = str.replace("}"","}"); // Replace '}"' with '}'

Updating integer column from jsonb member fails with: column is of type integer but expression is of type jsonb

In a PostgreSQL 9.5 table I have an integer column social.
When I try to update it in a stored procedure given the following JSON data (an array with 2 objects, each having a "social" key) in the in_users variable of type jsonb:
'[{"sid":"12345284239407942","auth":"ddddc1808197a1161bc22dc307accccc",**"social":3**,"given":"Alexander1","family":"Farber","photo":"https:\/\/graph.facebook.com\/1015428423940942\/picture?type=large","place":"Bochum,
Germany","female":0,"stamp":1450102770},
{"sid":"54321284239407942","auth":"ddddc1808197a1161bc22dc307abbbbb",**"social":4**,"given":"Alxander2","family":"Farber","photo":null,"place":"Bochum,
Germany","female":0,"stamp":1450102800}]'::jsonb
Then the following code is failing:
FOR t IN SELECT * FROM JSONB_ARRAY_ELEMENTS(in_users)
LOOP
UPDATE words_social SET
social = t->'social',
WHERE sid = t->>'sid';
END LOOP;
with the error message:
ERROR: column "social" is of type integer but expression is of type jsonb
LINE 3: social = t->'social',
^
HINT: You will need to rewrite or cast the expression.
I have tried changing that line to:
social = t->'social'::int,
but then I get the error:
ERROR: invalid input syntax for integer: "social"
LINE 3: social = t->'social'::int,
^
Why doesn't PostgreSQL recognize that the data is integer?
From the JSON-TYPE-MAPPING-TABLE I was having the impression that JSON number would be auto-converted to PostgreSQL numeric type.
A single set-based SQL command is far more efficient than looping:
UPDATE words_social w
SET social = (iu->>'social')::int
FROM JSONB_ARRAY_ELEMENTS(in_users) iu -- in_user = function variable
WHERE w.sid = iu->>'sid'; -- type of sid?
To answer your original question:
Why doesn't PostgreSQL recognize that the data is integer?
Because you were trying to convert the jsonb value to integer. In your solution you already found that you need the ->> operator instead of -> to extract text, which can be cast to integer.
Your second attempt added a second error:
t->'social'::int
In addition to the above: operator precedence. The cast operator :: binds stronger than the json operator ->. Like you found yourself already, you really want:
(t->>'social')::int
Very similar case on dba.SE:
Querying JSONB in PostgreSQL
I've ended up using:
FOR t IN SELECT * FROM JSONB_ARRAY_ELEMENTS(in_users)
LOOP
UPDATE words_social SET
social = (t->>'social')::int
WHERE sid = t->>'sid';
IF NOT FOUND THEN
INSERT INTO words_social (social)
VALUES ((t->>'social')::int);
END IF;
END LOOP;