Handling Unicode sequences in postgresql

Handling Unicode sequences in postgresql - json

I have some JSON data stored in a JSON (not JSONB) column in my postgresql database (9.4.1). Some of these JSON structures contain unicode sequences in their attribute values. For example:
{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }
When I try to query this JSON column (even if I'm not directly trying to access the device_name attribute), I get the following error:
ERROR: unsupported Unicode escape sequence
Detail: \u0000 cannot be converted to text.
You can recreate this error by executing the following command on a postgresql server:
select '{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }'::json->>'client_id'
The error makes sense to me - there is simply no way to represent the unicode sequence NULL in a textual result.
Is there any way for me to query the same JSON data without having to perform "sanitation" on the incoming data? These JSON structures change regularly so scanning a specific attribute (device_name in this case) would not be a good solution since there could easily be other attributes that might hold similar data.
After some more investigations, it seems that this behavior is new for version 9.4.1 as mentioned in the changelog:
...Therefore \u0000 will now also be rejected in json values when conversion to de-escaped form is required. This change does not break the ability to store \u0000 in json columns so long as no processing is done on the values...
Was this really the intention? Is a downgrade to pre 9.4.1 a viable option here?
As a side note, this property is taken from the name of the client's mobile device - it's the user that entered this text into the device. How on earth did a user insert NULL and REPLACEMENT CHARACTER values?!

\u0000 is the one Unicode code point which is not valid in a string. I see no other way than to sanitize the string.
Since json is just a string in a specific format, you can use the standard string functions, without worrying about the JSON structure. A one-line sanitizer to remove the code point would be:
SELECT (regexp_replace(the_string::text, '\\u0000', '', 'g'))::json;
But you can also insert any character of your liking, which would be useful if the zero code point is used as some form of delimiter.
Note also the subtle difference between what is stored in the database and how it is presented to the user. You can store the code point in a JSON string, but you have to pre-process it to some other character before processing the value as a json data type.

The solution by Patrick didn't work out of the box for me. Regardless there was always an error thrown. I then researched a little more and was able to write a small custom function that fixed the issue for me.
First I could reproduce the error by writing:
select json '{ "a": "null \u0000 escape" }' ->> 'a' as fails
Then I added a custom function which I used in my query:
CREATE OR REPLACE FUNCTION null_if_invalid_string(json_input JSON, record_id UUID)
RETURNS JSON AS $$
DECLARE json_value JSON DEFAULT NULL;
BEGIN
BEGIN
json_value := json_input ->> 'location';
EXCEPTION WHEN OTHERS
THEN
RAISE NOTICE 'Invalid json value: "%". Returning NULL.', record_id;
RETURN NULL;
END;
RETURN json_input;
END;
$$ LANGUAGE plpgsql;
To call the function do this. You should not receive an error.
select null_if_invalid_string('{ "a": "null \u0000 escape" }', id) from my_table
Whereas this should return the json as expected:
select null_if_invalid_string('{ "a": "null" }', id) from my_table

You can fix all entries with SQL like this:
update ___MY_TABLE___
set settings = REPLACE(settings::text, '\u0000', '' )::json
where settings::text like '%\u0000%'

I found solution that works for me
SELECT (regexp_replace(the_string::text, '(?<!\\)\\u0000', '', 'g'))::json;
Note the match pattern '(?<!\)\u0000'.

Just for websearchers, who strand here:
This is not a solution to the exact question, but in some similar cases the solution, if you just don't want those datasets containing nullbytes in your json. Just add:
AND json NOT LIKE '%\u0000%'
in your WHERE statement.
You could also use the REPLACE SQL-syntax to sanitize the data:
REPLACE(source_field, '\u0000', '' );

Related

How do I update data inside a stringified JSON object in SQL?

So I have three databases - an Oracle one, SQL Server one, and a Postgres one. I have a table that has two columns: name, and value, both are texts. The value is a stringified JSON object. I need to update the nested value.
This is what I currently have:
name: 'MobilePlatform',
value:
'{
"iosSupported":true,
"androidSupported":false,
}'
I want to add {"enableTwoFactorAuth": false} into it.

In PostgreSQL you should be able to do this:
UPDATE mytable
SET MobilePlatform = jsonb_set(MobilePlatform::jsonb, '{MobilePlatform,enableTwoFactorAuth}', 'false');

In Postgres, the plain concatenation operator || for jsonb could do it:
UPDATE mytable
SET value = value::jsonb || '{"enableTwoFactorAuth":false}'::jsonb
WHERE name = 'MobilePlatform';
If a top-level key "enableTwoFactorAuth" already exists, it is replaced. So it's an "upsert" really.
Or use jsonb_set() for manipulating nested values.
The cast back to text works implicitly as assignment cast. (Results in standard format; any insignificant whitespace is removed effectively.)
If the content is valid JSON, the storage type should be json to begin with. In Postges, jsonb would be preferable as it's easier to manipulate, but that's not directly portable to the other two RDBMS mentioned.
(Or, possibly, a normalized design without JSON altogether.)

For ORACLE 21
update mytable
set json_col = json_transform(
json_col,
INSERT '$.value.enableTwoFactorAuth' = 'false'
)
where json_exists(json_col, '$?(#.name == "MobilePlatform")')
;
With json_col being JSON or VARCHAR2|CLOB column with IS JSON constraint.
(but must be JSON if you want a multivalue index on json_value.name:
create multivalue index ix_json_col_name on mytable t ( t.json_col.name.string() );
)

Two of the databases you are using support JSON data type, so it doesn't make sense to have them as stringified JSON object in a Text column.
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-in-oracle-database.html
PostgreSQL: https://www.postgresql.org/docs/current/datatype-json.html
Apart from these, MSSQL Server also provides methods to work with JSON data type.
MS SQL Server: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16
Using a JSON type column in any of the above databases would enable you to use their JSON functions to perform the tasks that you are looking for.
If you've to use Text only then you can use replace to add the key-value pair at the end of your JSON
update dataTable set value = REPLACE(value, '}',",\"enableTwoFactorAuth\": false}") where name = 'MobilePlatform'
Here dataTable is the name of table.
The cleaner and less riskier way would be connect to db using the application and use JSON methods such as JSON.parse in Javascript and JSON.loads in Python. This would give you the JSON object (dictionary in case of Python) to work on. You can look for similar methods in other languages as well.
But i would suggest, if possible use JSON columns instead of Text to store the JSON value wherever possible.

MySQL 5.7 - Query to set the value of a JSON key to a JSON Object

Using MySQL 5.7, how to set the value of a JSON key in a JSON column to a JSON object rather than a string.
I used this query:
SELECT json_set(profile, '$.twitter', '{"key1":"val1", "key2":"val2"}')
from account WHERE id=2
Output:
{"twitter": "{\"key1\":\"val1\", \"key2\":\"val2\"}", "facebook": "value", "googleplus": "google_val"}
But it seems like it considers it as a string since the output escapes the JSON characters in it. Is it possible to do that without using JSON_OBJECT()?

There's a couple of options that I know of:
Use the JSON_UNQUOTE function to unquote the output (ie not cast it to string) as documented here
Possibly use the ->> operator and select a specific path, documented here
Has a lot of implications, but you could disable backslashes as an escape character. I haven't tried this, so I don't even know if that works, but it's mentioned in the docs
On balance, I'd either use the ->> operator, or handle the conversion on the client side, depending on what you want to do.

How do I ignore invalid JSON when using json_parse with PrestoDB?

I am fairly new to Presto, and am trying to parse a bunch of records containing JSON data. It appears that some of the data is invalid, which causes Presto to abort the query during the call to json_parse. Is it possible to somehow return NULL instead of throwing an error in this case?
It seems like previously you could use try_cast(value as json), but that was removed in favor of json_parse. Is there any sort of configuration I can change to resolve this, or do I need to resort to creating a custom SerDe?

It looks like json_extract(data, '$') will return NULL for invalid JSON:
presto:default> select json_extract('{', '$');
_col0
-------
NULL
(1 row)

Insert/update JSON into Postgresql column WHERE myvar = myval

I'm trying to insert JSON into a Postgresql column who's data type is JSON, but I'm having trouble finding how I can do this. This is as far as I've gotten but it's not correct because it just overwrites it every time, instead of adding a new key pair.
I'm using pg-promise node module to perform these queries. Here's what I have so far:
db.query("UPDATE meditation_database SET completed=$1 WHERE user_id=$2", [{myVar : true}, user_id]);
Also 'myVar' should be updated to the variable value, but instead it treats it as a string. How can I get the actual value of 'myVar' instead of it being treated literally.
Thanks,

I'm trying to insert JSON into a Postgresql column who's data type is JSON, but I'm having trouble finding how I can do this.
By executing this:
db.query("INSERT INTO meditation_database(completed, user_id) VALUES($1, $2)",
[{myVar : true}, user_id]);
Also 'myVar' should be updated to the variable value, but instead it treats it as a string. How can I get the actual value of 'myVar' instead of it being treated literally.
myVar is serialized into JSON as a string, that's the proper JSON format for property names, and is the only format that PostgreSQL will accept.
This is as far as I've gotten but it's not correct because it just overwrites it every time, instead of adding a new key pair.
If you are asking how to update JSON in PostgreSQL, this question has been answered previously, and in great detail: How do I modify fields inside the new PostgreSQL JSON datatype?

Get data type of JSON field in Postgres

I have a Postgres JSON column where some columns have data like:
{"value":90}
{"value":99.9}
...whereas other columns have data like:
{"value":"A"}
{"value":"B"}
The -> operator (i.e. fields->'value') would cast the value to JSON, whereas the ->> operator (i.e. fields->>'value') casts the value to text, as reported by pg_typeof. Is there a way to find the "actual" data type of a JSON field?
My current approach would be to use Regex to determine whether the occurrence of fields->>'value' in fields::text is surrounded by double quotes.
Is there a better way?

As #pozs mentioned in comment, from version 9.4 there are available json_typeof(json) and jsonb_typeof(jsonb) functions
Returns the type of the outermost JSON value as a text string. Possible types are object, array, string, number, boolean, and null.
https://www.postgresql.org/docs/current/functions-json.html
Applying to your case, an example of how this could be used for this problem:
SELECT
json_data.key,
jsonb_typeof(json_data.value) AS json_data_type,
COUNT(*) AS occurrences
FROM tablename, jsonb_each(tablename.columnname) AS json_data
GROUP BY 1, 2
ORDER BY 1, 2;

I ended up getting access to PLv8 in my environment, which made this easy:
CREATE FUNCTION value_type(fields JSON) RETURNS TEXT AS $$
return typeof fields.value;
$$ LANGUAGE plv8;
As mentioned in the comments, there will be a native function for this in 9.4.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008