Error in Nested JSON in HIve - json

I was trying to load this json data in hive
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
using DDL commands
ADD JAR /home/cloudera/Downloads/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE format.json_serde (
`id` string,
`type` string,
`name` string,
`ppu` float,
batters` struct < `batter`:array < struct <`bid`:string, `btype`:string >>>,
`topping`:array < struct<`tid`:int, `ttype`:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
is throwing me error
FAILED: ParseException line 7:11 cannot recognize input near ':' 'array' '<' in column type </b>

You got typos
ttype`:string should be ttype:string
battersstruct should be batters struct
topping:array should be topping array
JSON SerDe mapping is done by name.
Your structs fields names should match the actual names, e.g. id and not bid or tid, otherwise you'll get NULL values for these fields.
There is already a JSON SerDe whicg is part of the Hive installation.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe
create external table json_serde
(
id string
,type string
,name string
,ppu float
,batters struct<batter:array<struct<id:string,type:string>>>
,topping array<struct<id:string,type:string>>
)
row format serde
'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from json_serde
;
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | type | name | ppu | batters | topping |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0001 | donut | Cake | 0.550000011920929 | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil'sFood"}]} | [{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"PowderedSugar"},{"id":"5006","type":"ChocolatewithSprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}] |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

It worked when i removed the semicolons near topping. Thanks
CREATE EXTERNAL TABLE format.json_serde (
id string,
type string,
name string,
ppu float,
batters struct<batter:array<
struct<bid:string, btype:string >>>,
topping array< struct<tid:string, ttype:string>>
)

Related

BigQuery JSON element extraction

I have a table in BigQuery with a JSON column, see below.
doc_id
data
222
{...}
333
{...}
The data JSON column looks like the IDs are set as headers.
{
"1675223776617": {
"author": "aaa",
"new": "2023-02-01",
"old": null,
"property": "asd",
"sender": "wew"
},
"1675223776618": {
"author": "aaa",
"new": true,
"old": null,
"property": "asd",
"sender": "ewew"
},
"1675223776619": {
"author": "bbb",
"new": "ySk2btk7",
"old": null,
"property": "qwe",
"sender": "yyy"
}
}
I would like to extract this JSON into this format using SQL in BigQuery.
Note, the header id isn't defined in the JSON.
doc_id
id
author
new
old
property
sender
222
1675223776617
aaa
2023-02-01
null
asd
wew
222
1675223776618
aaa
true
null
asd
ewew
222
1675223776619
bbb
ySk2btk7
null
qwe
yyy
I tried using the JSON_EXTRACT function without any success.
You might consider below approach using javascript UDF.
CREATE TEMP FUNCTION flatten_json(json STRING)
RETURNS ARRAY<STRUCT<id STRING, author STRING, new STRING, old STRING, property STRING, sender STRING>>
LANGUAGE js AS """
result = [];
for (const [key, value] of Object.entries(JSON.parse(json))) {
value["id"] = key; result.push(value);
}
return result;
""";
WITH sample_table AS (
SELECT 222 doc_id, '''{
"1675223776617": {
"author": "aaa",
"new": "2023-02-01",
"old": null,
"property": "asd",
"sender": "wew"
},
"1675223776618": {
"author": "aaa",
"new": true,
"old": null,
"property": "asd",
"sender": "ewew"
},
"1675223776619": {
"author": "bbb",
"new": "ySk2btk7",
"old": null,
"property": "qwe",
"sender": "yyy"
}
}''' data
)
SELECT doc_id, flattened.*
FROM sample_table, UNNEST(flatten_json(json)) flattened;
Query results

How to write JSON to Mysql?

sorry for my bad english.
I am inserting a json into mysql like this:
set #json = '[{"name":"ivan","city":"london","kurs":"1", },{"name":"lena","city":"tokio","kurs":"5"},{"name":"misha","city":"kazan","kurs":"3"}]';
select * from json_table(#json,'$[*]' columns(name varchar(20) path '$.name',
city varchar(20) path '$.city',
kurs varchar(20) path '$.kurs')) as jsontable;
But now there is a task to insert an unknown number of additional properties:
set #json = '[{"name":"ivan","city":"london","kurs":"1","options": [{
"ao_id": 90630,
"name": "Высота предмета",
"value": "3.7 см"
}, {
"ao_id": 90673,
"name": "Ширина предмета",
"value": "4 см"
}, {
"ao_id": 90745,
"name": "Ширина упаковки",
"value": "4 см"
}]},{"name":"lena","city":"tokio","kurs":"5", "options": [{
"ao_id": 90630,
"name": "Высота предмета",
"value": "9.7 см"
}]},{"name":"misha","city":"kazan","kurs":"3", "options": [{
"ao_id": 90999,
"name": "Высота",
"value": "5.7 см"
}]}]';
How can I best do this so that I can access the table in the future (search, index, output)?

Retrieve JSON from sql

My json format in one of the sql columns "jsoncol" in the table "jsontable" is like below.
Kindly help me to get this data using JSON_QUERY or JSON_VALUE
Please pay attention to the brackets and double quotes in the key value pairs...
{
"Company": [
{
"Info": {
"Address": "123"
},
"Name": "ABC",
"Id": 999
},
{
"Info": {
"Address": "456"
},
"Name": "XYZ",
"Id": 888
}
]
}
I am trying to retrieve all the company names using sql query. Thanks in advance
You can use:
SELECT j.name
FROM table_name t
CROSS APPLY JSON_TABLE(
t.value,
'$.Company[*]'
COLUMNS(
name VARCHAR2(200) PATH '$.Name'
)
) j
Which, for the sample data:
CREATE TABLE table_name (
value CLOB CHECK (value IS JSON)
);
INSERT INTO table_name (value)
VALUES ('{
"Company": [
{
"Info": {
"Address": "123"
},
"Name": "ABC",
"Id": 999
},
{
"Info": {
"Address": "456"
},
"Name": "XYZ",
"Id": 888
}
]
}');
Outputs:
NAME
ABC
XYZ
db<>fiddle here
You can easily use JSON_TABLE() function for this case rather provided the DB version is at least 12.1.0.2 such as
SELECT name
FROM jsontable,
JSON_TABLE(jsoncol,
'$' COLUMNS(NESTED PATH '$."Company"[*]'
COLUMNS(name VARCHAR2 PATH '$."Name"')))
Demo

TSQL JSON_QUERY can you use a filter in the JSON Path

I have a table with a column that holds valid JSON, heres an example of the JSON structure:
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}
I need to return the values from MultiSelected collection depending on the value of ListID.
I'm using the following JSON Path to retun value
$.Requirements."$values"[?(#.ListId=='956cf9c5-24ab-47d9-8082-940118f2f1a3')].MultiSelected."$values"
This worked fine in a JSON Expression tester.
But when I try to use it to query the table I get the following error:
JSON path is not properly formatted. Unexpected character '?' is found at position 25.
The query I'm using is as follows:
SELECT ID AS PayloadID,
Items.Item AS ItemsValues
FROM dbo.Payload
CROSS APPLY ( SELECT *
FROM OPENJSON( JSON_QUERY( Payload, '$.Requirements."$values"[?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')].MultiSelected."$values"' ) )
WITH ( Item nvarchar(200) '$' ) ) AS Items
WHERE ID = 3
I've tried replacing
?(#.ListId==''956cf9c5-24ab-47d9-8082-940118f2f1a3'')
with 0 and it works fine on SQL Server.
My question is, is filter syntax ?(...) supported in JSON_QUERY or is there something else I should be doing?
The database is running on Azure, were the database compatability level is set to SQL Server 2017 (140).
Thanks for your help in advance.
Andy
I would use openjson twice in stead
drop table if exists #payload
create table #payload(ID int,Payload nvarchar(max))
insert into #payload VALUES
(3,N'
{
"Requirements": {
"$values": [
{
"$type": "List",
"ListId": "956cf9c5-24ab-47d9-8082-940118f2f1a3",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value1",
"Value2",
"Value3"
]
}
},
{
"$type": "List",
"ListId": "D11149DD-A682-4BC7-A87D-567954779234",
"DefaultValue": "",
"MultiSelect": true,
"Selected": null,
"MultiSelected": {
"$type": "ListItem",
"$values": [
"Value4",
"Value5",
"Value6",
"Value7"
]
}
}
]
}
}'
)
SELECT ID AS PayloadID,
Items.[value]
FROM #Payload a
CROSS APPLY OPENJSON( Payload, '$.Requirements."$values"' ) with ( ListId varchar(50),MultiSelected nvarchar(max) as json) b
CROSS APPLY OPENJSON( MultiSelected,'$."$values"' ) Items
where
a.id=3
AND b.listid='956cf9c5-24ab-47d9-8082-940118f2f1a3'
Reply:
+-----------+--------+
| PayloadID | value |
+-----------+--------+
| 3 | Value1 |
| 3 | Value2 |
| 3 | Value3 |
+-----------+--------+

Representing a DB schema in JSON

Let's say I have two tables in my database, employee and car defined thusly.
employee:
+--------------+------------+
| col_name | data_type |
+--------------+------------+
| eid | int |
| name | string |
| salary | int |
| destination | string |
+--------------+------------+
car:
+------------+----------------+
| col_name | data_type |
+------------+----------------+
| cid | int |
| name | string |
| model | string |
| cylinders | int |
| price | int |
+------------+----------------+
I would like to export this schema to a JSON object so that I can populate an HTML dropdown menu based on the table - for instance, the table menu would have employee and car. Selecting employee would populate another dropdown with the column names and types corresponding to that table.
Given this use case, would the optimal json representation of the database be this?
{
"employee": {
"salary": "int",
"destination": "string",
"eid": "int",
"name": "string"
},
"car": {
"price": "int",
"model": "string",
"cylinders": "int",
"name": "string",
"cid": "int"
}
}
EDIT:
Or would this be more appropriate?
{
"employee": [
{
"type": "int",
"colname": "eid"
},
{
"type": "string",
"colname": "name"
},
{
"type": "int",
"colname": "salary"
},
{
"type": "string",
"colname": "destination"
}
],
"car": [
{
"type": "int",
"colname": "cid"
},
{
"type": "string",
"colname": "name"
},
{
"type": "string",
"colname": "model"
},
{
"type": "int",
"colname": "cylinders"
},
{
"type": "int",
"colname": "price"
}
]
}
In the first example, all your data is stored in objects. Assuming the structure is stored in a var mytables, you can get the names with Object.keys(mytables), which returns ['employee', 'car']. Equivalent for the columns inside: Object.keys(mytables['employee'].cols) returns ['salary','destination','eid','name'].
In the second example I would suggest to also store the tables in an array as the columns, like
[name: 'employee',
cols: [ {
"type": "int",
"colname": "cid"
}, ...]
Then you can easily iterate over the arrays and get the names by accessing mytables[i].name
for (t in tables){
console.log(tables[t].name);
for (c in tables[t].cols)
console.log(" - ",tables[t].cols[c].colname, ": ", tables[t].cols[c].type);
}