I am trying to read data from json files in S3 into my Hive table. If the column names and json keys are same its all loading properly. But now I want to read data in such a way that the nested json values goes into specific columns (For eg: for json
{"data1": {"key1": "value1"}}
I want the data1.key1 value to go into column named data1_key1; which I understand is achievable with SERDEPROPERTIES. My next problem is there can be multiple json keys and I want the key names to be column values in my Hive table.
Also, depending upon those keys, the keys that go into other columns will also change.
For eg my json files will be either:
{"data1" : {"key1":"value1"}}
or
{"data2" : { "key2" : "value2"}}
This need to create a table as below:
col1 col2
data1 value1
data2 value2
Is this possible? If so how should it be done?
You can do it using regular expressions. Define json column as string in table DDL and use regexp to parse it. Tested on your data example:
Demo:
with your_table as ( --Replace this CTE with your table
select stack(2,
'{"data1": {"key1": "value1"}}',
'{"data2" : { "key2" : "value2"}}'
) as json
)
select regexp_extract(json,'^\\{ *\\"(\\w+)\\" *:', 1) as col1, --capturing group 1 in a parenthesis START{spaces"(word)"spaces:
regexp_extract(json,': *\\"(.+)\\" *\\} *\\}$', 1) as col2 --:spaces"(value characters)"spaces}spaces}END
from your_table;
Result:
col1,col2
data1,value1
data2,value2
Read the comments in the code please. You can adjust this solution to fit your JSON. This approach allows to extract keys and values from JSON not knowing their names. json_tuple and get_json_object are not applicable in this case.
Alternatively you can use regexSerDe to do the same in the table DDL like in this answer: https://stackoverflow.com/a/47944328/2700344. For the RegexSerDe solution you need to write more complex single regexp containing one capturing group (in parenthesis) for each column.
Related
So I have three databases - an Oracle one, SQL Server one, and a Postgres one. I have a table that has two columns: name, and value, both are texts. The value is a stringified JSON object. I need to update the nested value.
This is what I currently have:
name: 'MobilePlatform',
value:
'{
"iosSupported":true,
"androidSupported":false,
}'
I want to add {"enableTwoFactorAuth": false} into it.
In PostgreSQL you should be able to do this:
UPDATE mytable
SET MobilePlatform = jsonb_set(MobilePlatform::jsonb, '{MobilePlatform,enableTwoFactorAuth}', 'false');
In Postgres, the plain concatenation operator || for jsonb could do it:
UPDATE mytable
SET value = value::jsonb || '{"enableTwoFactorAuth":false}'::jsonb
WHERE name = 'MobilePlatform';
If a top-level key "enableTwoFactorAuth" already exists, it is replaced. So it's an "upsert" really.
Or use jsonb_set() for manipulating nested values.
The cast back to text works implicitly as assignment cast. (Results in standard format; any insignificant whitespace is removed effectively.)
If the content is valid JSON, the storage type should be json to begin with. In Postges, jsonb would be preferable as it's easier to manipulate, but that's not directly portable to the other two RDBMS mentioned.
(Or, possibly, a normalized design without JSON altogether.)
For ORACLE 21
update mytable
set json_col = json_transform(
json_col,
INSERT '$.value.enableTwoFactorAuth' = 'false'
)
where json_exists(json_col, '$?(#.name == "MobilePlatform")')
;
With json_col being JSON or VARCHAR2|CLOB column with IS JSON constraint.
(but must be JSON if you want a multivalue index on json_value.name:
create multivalue index ix_json_col_name on mytable t ( t.json_col.name.string() );
)
Two of the databases you are using support JSON data type, so it doesn't make sense to have them as stringified JSON object in a Text column.
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-in-oracle-database.html
PostgreSQL: https://www.postgresql.org/docs/current/datatype-json.html
Apart from these, MSSQL Server also provides methods to work with JSON data type.
MS SQL Server: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver16
Using a JSON type column in any of the above databases would enable you to use their JSON functions to perform the tasks that you are looking for.
If you've to use Text only then you can use replace to add the key-value pair at the end of your JSON
update dataTable set value = REPLACE(value, '}',",\"enableTwoFactorAuth\": false}") where name = 'MobilePlatform'
Here dataTable is the name of table.
The cleaner and less riskier way would be connect to db using the application and use JSON methods such as JSON.parse in Javascript and JSON.loads in Python. This would give you the JSON object (dictionary in case of Python) to work on. You can look for similar methods in other languages as well.
But i would suggest, if possible use JSON columns instead of Text to store the JSON value wherever possible.
Situation
I have a table in a MariaDB database. This table has a LONGTEXT column which is used to store a JSON array (read more about this topic in MariaDB JSON Data Type).
Question
I would like to extract values from the JSON array, based on a certain key. How do I achieve this with MariaDB (or MySQL)?
Example
Here's the simplified table thing (just for demo purposes):
id
thing_name
examples
0
fruit
[{"color": "green","title": "Apple"},{"color": "orange","title": "Orange"},{"color": "yellow","title": "Banana"}]
1
car
[{"color": "silver","title": "VW"},{"color": "black","title": "Bentley"},{"color": "blue","title": "Tesla"}]
My goal is to extract all title values from the JSON array.
You can use JSON_EXTRACT for this task (works for both MariaDB and MySQL). This function also supports wildcards, as described in the docs:
Paths can contain * or ** wildcards
Depending on whether you have multiple levels of data (e.g. single document vs array), either a single or double asterisk wildcard should be used:
JSON_EXTRACT(json,'$**.key')
json is a valid JSON document (e.g. a column), key is the lookup key used.
For your example
In order to find all title values in your JSON array, use the following query:
SELECT id, thing_name, JSON_EXTRACT(examples, '$**.title') as examples_titles FROM thing
id
thing_name
examples_titles
0
fruit
["Apple", "Orange", "Banana"]
1
car
["VW", "Bentley", "Tesla"]
I have a Json value stored in SQL server table as ntext:
JSON (column: json_val):
[{"prime":{"image":{"id":"123","logo":"","productId":"4000","enable":true},"accountid":"78","productId":"16","parentProductId":"","aprx":"4.599"}}]
select JSON_VALUE(cast(json_val as varchar(8000)), '$.prime.aprx') as px
from table_1
where id = 1
Whenever I execute it, i receive a null. What's wrong with the query?
Thanks for your help!
The JSON string is an array with a single item. You need to specify the array index to retrieve a specific item, eg :
declare #t table (json_val nvarchar(4000))
insert into #t
values ('[{"prime":{"image":{"id":"123","logo":"","productId":"4000","enable":true},"accountid":"78","productId":"16","parentProductId":"","aprx":"4.599"}}]')
select JSON_VALUE(cast(json_val as varchar(8000)), '$[0].prime.aprx') as px
from #t
This returns 4.599
If you want to search all array entries, you'll have to use OPENJSON. If you need to do that though ...
Avoid JSON if possible
JSON storage is not an alternative to using a proper table design though. JSON fields can't be indexed, so filtering by a specific field will always result in a full table scan. Given how regular this JSON string is, you should consider using proper tables instead
As Panagiotis said in the comments:
As for the JSON path, this JSON string is an array with a single element
Instead, therefore, you can use OPENJSON which would inspect each array:
DECLARE #JSON nvarchar(MAX) = N'[{"prime":{"image":{"id":"123","logo":"","productId":"4000","enable":true},"accountid":"78","productId":"16","parentProductId":"","aprx":"4.599"}}]';
SELECT aprx
FROM (VALUES(#JSON))V(json_val)
CROSS APPLY OPENJSON(V.json_val)
WITH (aprx decimal(4,3) '$.prime.aprx');
As also mentioned, your JSON should already be a string data type (should be/probably an nvarchar(MAX)) so there's no reason to CAST it.
I am using Postgresql 9.6
I have a table, where my column is of type json.
create table test (
my_data json,
.
.
);
When queried, for each row the column is shown as json:
I want to aggregate the data, for sake of simplicity selected 2 columns only. I need to group by col1_data. I want like below:
I tried to use json_agg, but that will agregate and output as array of jsons.
select col1_data, json_agg(col2_data) as col2_data
from test
group by col1_data;
Can someone help to convert the array of json to json?
This should do it:
select col1_data, json_agg(col2_element) as col2_data
from test, json_array_elements(col2_data) as col2_element
group by col1_data;
Alternatively, you can write your own aggregate function that concatenates json arrays.
I'm inserting into a Postgres table with a JSON document and I want to generate a unique ID for the document. I can do that on my own, of course, but I was wondering if there was a way to have PG do it.
INSERT INTO test3 (data) VALUES ('{"key": "value", "unique": ????}')
The docs seem to indicate that JSON records fit into various SQL data types, but I don't see how that actually works.
How about just concatenating? Assuming your column is of type json/jsonb, something like the following should work:
INSERT INTO test3 (data) VALUES (('{"key": "value", "unique": "' || uuid_generate_v4() || '"}')::jsonb)
If you're looking to generate a UUID and store it at the same time as a value within a JSON data field, here is something some may find to be a little more sane:
WITH
-- Create a temporary view named "new_entry" containing your data
new_entry
-- This is how you name the view's columns
("key", "unique")
AS (
VALUES
-- This is the actual row returned by the view
(
'value',
uuid_generate_v4()
)
)
INSERT INTO
test3(
data
)
SELECT
-- Convert row to JSON. Column name = key, column value = value.
ROW_TO_JSON(new_entry.*)
FROM
new_entry
First, we're creating a temporary view named new_entry, which containing all of the data want to store in a JSON data field.
Second, we're grabbing that entry and passing it to the ROW_TO_JSON function which converts it to a valid JSON data type. Once converted, it's then inserting the row into the test3 table.
My reasoning for the "sanity" is that more than likely, your JSON object will end up containing more than just two key/value pairs... Rather, you'll end up with a hand full of keys and values, in which it'll be up to you to ensure you don't miss any quotes and escape user input appropriately. Why glue all of this together manually when you can have Postgres do it for you (with the help of ROW_TO_JSON()) while at the same time, making it easier to read and debug?