Generate UUID for Postgres JSON document - json

I'm inserting into a Postgres table with a JSON document and I want to generate a unique ID for the document. I can do that on my own, of course, but I was wondering if there was a way to have PG do it.
INSERT INTO test3 (data) VALUES ('{"key": "value", "unique": ????}')
The docs seem to indicate that JSON records fit into various SQL data types, but I don't see how that actually works.

How about just concatenating? Assuming your column is of type json/jsonb, something like the following should work:
INSERT INTO test3 (data) VALUES (('{"key": "value", "unique": "' || uuid_generate_v4() || '"}')::jsonb)

If you're looking to generate a UUID and store it at the same time as a value within a JSON data field, here is something some may find to be a little more sane:
WITH
-- Create a temporary view named "new_entry" containing your data
new_entry
-- This is how you name the view's columns
("key", "unique")
AS (
VALUES
-- This is the actual row returned by the view
(
'value',
uuid_generate_v4()
)
)
INSERT INTO
test3(
data
)
SELECT
-- Convert row to JSON. Column name = key, column value = value.
ROW_TO_JSON(new_entry.*)
FROM
new_entry
First, we're creating a temporary view named new_entry, which containing all of the data want to store in a JSON data field.
Second, we're grabbing that entry and passing it to the ROW_TO_JSON function which converts it to a valid JSON data type. Once converted, it's then inserting the row into the test3 table.
My reasoning for the "sanity" is that more than likely, your JSON object will end up containing more than just two key/value pairs... Rather, you'll end up with a hand full of keys and values, in which it'll be up to you to ensure you don't miss any quotes and escape user input appropriately. Why glue all of this together manually when you can have Postgres do it for you (with the help of ROW_TO_JSON()) while at the same time, making it easier to read and debug?

Related

How do I update data inside a stringified JSON object in SQL?

So I have three databases - an Oracle one, SQL Server one, and a Postgres one. I have a table that has two columns: name, and value, both are texts. The value is a stringified JSON object. I need to update the nested value.
This is what I currently have:
name: 'MobilePlatform',
value:
'{
"iosSupported":true,
"androidSupported":false,
}'
I want to add {"enableTwoFactorAuth": false} into it.
In PostgreSQL you should be able to do this:
UPDATE mytable
SET MobilePlatform = jsonb_set(MobilePlatform::jsonb, '{MobilePlatform,enableTwoFactorAuth}', 'false');
In Postgres, the plain concatenation operator || for jsonb could do it:
UPDATE mytable
SET value = value::jsonb || '{"enableTwoFactorAuth":false}'::jsonb
WHERE name = 'MobilePlatform';
If a top-level key "enableTwoFactorAuth" already exists, it is replaced. So it's an "upsert" really.
Or use jsonb_set() for manipulating nested values.
The cast back to text works implicitly as assignment cast. (Results in standard format; any insignificant whitespace is removed effectively.)
If the content is valid JSON, the storage type should be json to begin with. In Postges, jsonb would be preferable as it's easier to manipulate, but that's not directly portable to the other two RDBMS mentioned.
(Or, possibly, a normalized design without JSON altogether.)
For ORACLE 21
update mytable
set json_col = json_transform(
json_col,
INSERT '$.value.enableTwoFactorAuth' = 'false'
)
where json_exists(json_col, '$?(#.name == "MobilePlatform")')
;
With json_col being JSON or VARCHAR2|CLOB column with IS JSON constraint.
(but must be JSON if you want a multivalue index on json_value.name:
create multivalue index ix_json_col_name on mytable t ( t.json_col.name.string() );
)
Two of the databases you are using support JSON data type, so it doesn't make sense to have them as stringified JSON object in a Text column.
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-in-oracle-database.html
PostgreSQL: https://www.postgresql.org/docs/current/datatype-json.html
Apart from these, MSSQL Server also provides methods to work with JSON data type.
MS SQL Server: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16
Using a JSON type column in any of the above databases would enable you to use their JSON functions to perform the tasks that you are looking for.
If you've to use Text only then you can use replace to add the key-value pair at the end of your JSON
update dataTable set value = REPLACE(value, '}',",\"enableTwoFactorAuth\": false}") where name = 'MobilePlatform'
Here dataTable is the name of table.
The cleaner and less riskier way would be connect to db using the application and use JSON methods such as JSON.parse in Javascript and JSON.loads in Python. This would give you the JSON object (dictionary in case of Python) to work on. You can look for similar methods in other languages as well.
But i would suggest, if possible use JSON columns instead of Text to store the JSON value wherever possible.

Change datatype of SSIS flat file data with string "NULL" values

In my SSIS project I have to retrieve my data from a flat csv file. The data itself looks something like this:
AccountType,SID,PersonID,FirstName,LastName,Email,Enabled
NOR,0001,0001,Test,Test0001,Test1#email.com,TRUE
NOR,1001,NULL,Test,Test1002,Test2#email.com,FALSE
TST,1002,NULL,Test,Test1003,Test3#email.com,TRUE
I need to read this data and make sure it has the correct datatypes for future checks. Meaning SID and PersonID should have a numeric datatype, Enabled should be a boolean. But I would like to keep the same columns and names as my source file.
It seems like the only correct way to read this data trough the 'Flat File Source'-Task is as String. Otherwise I keep getting errors because "NULL" is literally a String and not a NULL value.
Next I perform a Derived Column transformation to get rid of all "NULL" values. For example, I use the following expression for PersonId:
(TRIM(PersonID) == "" || UPPER(PersonID) == "NULL") ? (DT_WSTR,50)NULL(DT_WSTR,50) : PersonID
I would like to immediatly convert it to the correct datatype by adding it in the expression above, but it seems impossible to select another datatype for the same column when I select 'Replace 'PersonId'' in the Derived Column dropdown box.
So next up I thought of using the Data Conversion task next to change the datatypes of these columns, but when I use this it only creates new columns, even when I enter the output alias to remain the same.
How could I alter my solution to efficiently and correctly read this data and convert its values to the correct datatypes?

Update the value of JSON elements in Postgresql

I have table with following table structure:
create table instances(
id bigint,
createdate timestamp,
createdby bigint,
lastmodifieddate timestamp,
lastmodifiedby bigint,
context text
)
Field context contains a JSON data i.e.
insert into instances values
(1, '2020-06-01 22:10:04', 20112,'2020-06-01 22:10:04',20112,
'{"id":1,"details":[{"binduserid":90182}]}')
I need to replace all values of JSON element binduserid with value 90182 using postgres query.
I have achieved this by using REPLACE function:
update instances
set context = replace(context, '"binduserid":90182','"binduserid":1000619')
Is there any other way to do this by using Postgres JSON Functions
Firstly, let's consider storing the column as JSON or JSONB those are already defined to hold the data properly and use in a productive manner such as no needed conversions among types like holding a DATE value in DATE format rather than a STRING.
In this case I consider context column in JSONB data type.
You can use JSONB_SET() function in order to get the desired result where the first argument(target) might be in array format through use of JSONB_BUILD_ARRAY() function with indexes (as 0 in '{0,details}' for this case ) to manipulate easily by the below DML Statement :
UPDATE instances
SET context =
JSONB_SET(JSONB_BUILD_ARRAY(context), '{0,details}','[{"binduserid":1000619}]')
Demo

Can I get a Json Key into a Hive Column?

I am trying to read data from json files in S3 into my Hive table. If the column names and json keys are same its all loading properly. But now I want to read data in such a way that the nested json values goes into specific columns (For eg: for json
{"data1": {"key1": "value1"}}
I want the data1.key1 value to go into column named data1_key1; which I understand is achievable with SERDEPROPERTIES. My next problem is there can be multiple json keys and I want the key names to be column values in my Hive table.
Also, depending upon those keys, the keys that go into other columns will also change.
For eg my json files will be either:
{"data1" : {"key1":"value1"}}
or
{"data2" : { "key2" : "value2"}}
This need to create a table as below:
col1 col2
data1 value1
data2 value2
Is this possible? If so how should it be done?
You can do it using regular expressions. Define json column as string in table DDL and use regexp to parse it. Tested on your data example:
Demo:
with your_table as ( --Replace this CTE with your table
select stack(2,
'{"data1": {"key1": "value1"}}',
'{"data2" : { "key2" : "value2"}}'
) as json
)
select regexp_extract(json,'^\\{ *\\"(\\w+)\\" *:', 1) as col1, --capturing group 1 in a parenthesis START{spaces"(word)"spaces:
regexp_extract(json,': *\\"(.+)\\" *\\} *\\}$', 1) as col2 --:spaces"(value characters)"spaces}spaces}END
from your_table;
Result:
col1,col2
data1,value1
data2,value2
Read the comments in the code please. You can adjust this solution to fit your JSON. This approach allows to extract keys and values from JSON not knowing their names. json_tuple and get_json_object are not applicable in this case.
Alternatively you can use regexSerDe to do the same in the table DDL like in this answer: https://stackoverflow.com/a/47944328/2700344. For the RegexSerDe solution you need to write more complex single regexp containing one capturing group (in parenthesis) for each column.

How to change the field name in big query?

I have a nested JSON to upload in Big Query.
{
"status":{
"sleep":"12333",
"wake":"3837"
}
}
After inserting it in Big Query, I am getting the field names as :
status_sleep and status_wake
I require the field names to be seperated by delimeters like '.' or any other delimeter
status.sleep and status.wake
Please suggest how to add the field deimeter. I checked there is a field delimeter key for uploading the data in csv format.
After you insert data with above schema you have record named status with two fields in it status.sleep and status.wake
When you query as
SELECT * FROM yourtable
without providing aliases - you will get output named as status_sleep and status_wake because dot notation is reserved for referencing nested data.
But you still can reference your data with dots as in below
SELECT status.sleep as sleep, status.wake as wake FROM yourtable