Spliting Json data into array<string> columns - json

I have many json arrays stored in a table like this:
{"p_id":
{"id_type":"XXX","id":"ABC111"},
"r_ids":[
{"id_type":"HAWARE_ABCDA1","id":"dfe234fhgt"},
{"id_type":"HAWARE_CDFE2","id":"sgteth5673"}
]
}
My requirement is to get data in below format:
p_id , p_id_type ,r_ids (array string), r_id_type (array string)
Ex: XXX,ABC111,[dfe234fhgt,sgteth5673],[HAWARE_ABCDA1,HAWARE_CDFE2]
I am able to get the whole set in exploded format but how to generate array
My current query:
select p_id
,p_id_type
,get_json_object(c.qqqq,'$.id') as r_id
,get_json_object(c.qqqq,'$.id_type') as r_id_type
from
(
select p_id
,p_id_type
,qqqq
from
(
select
get_json_object(a.main_pk,'$.id_type') as p_id_type
,get_json_object(a.main_pk,'$.id') as p_id
,split(regexp_replace(regexp_replace(a.r_ids,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''),'\\;') as yyyy
from
(
select
get_json_object(json_string,'$.p_id') as main_pk
,get_json_object(json_string, '$.r_ids') as r_ids
from sample_table limit 10
) a
) b lateral view explode(b.yyyy) yyyy_exploded as qqqq
)c
Can anyone help me what wrong I am doing? Any suggestions will be appreciated.

If you use JsonSerDe, it will be more easy to solve complex data types.
I am giving here small example, you can solve by using this:
CREATE TABLE table_json (
p_id struct<id_type:string,
id:string,
r_ids:array<struct<id_type:string,
id:string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
LOAD DATA LOCAL INPATH '<path>/your_file.json'
OVERWRITE INTO TABLE table_json;

Related

UPDATE statement using GROUP_CONCAT()

I'm attempting to create a single query that UPDATES another table but the SUBQUERY/DERIVED-QUERY that I would use requires me to have them GROUP BY and GROUP_CONCAT().
I was able to get my desired output but to do so I had to create a temporary table to store the "grouped/ concated" data and then push that "re-organized" data to the destination table. TO do so, I have to literally run 2 separate queries one that populates the temp table with the "organized" data in it's fields and then run another UPDATE that pushes the "organized" data from the temp table to the final destination table.
I've created a REPREX that exemplifies what I'm trying to achieve below:
/*
Create a simplified sample table:
*/
CREATE TABLE `test_tbl` (
`equipment_num` varchar(20),
`item_id` varchar(40),
`quantity` decimal(10,2),
`po_num` varchar(20)
)
--
-- Dumping data for table `test_tbl`
--
INSERT INTO `test_tbl` (`equipment_num`, `item_id`, `quantity`, `po_num`) VALUES
(TRHU8399302, '70-8491', '5.00', 'PO10813-Air'),
(TRHU8399302, '40-21-72194', '22.00', '53841'),
(TRHU8399302, '741-PremBundle-CK', '130.00', 'NECTAR-PMBUNDLE-2022'),
(TRHU8399302, '741-GWPBundle-KG', '650.00', 'NECTAR2021MH185-Fort'),
(TRHU6669420, '01-DGCOOL250FJ', '76000.00', '4467'),
(TRHU6669420, '20-2649', '450.00', 'PO9994'),
(TRHU6669420, 'PFL-PC-GRY-KG', '80.00', '1020'),
(TRHU6669420, '844067025947', '120.00', 'Cmax 2 15 22'),
(TRHU5614145, 'Classic Lounge Chair Walnut leg- A XH301', '372.00', 'P295'),
(TRHU5614145, '40-21-72194', '22.00', '53837'),
(TRHU5614145, 'MAR-PLW-55K-BX', '2313.00', 'SF220914R-CA'),
(TRHU5614145, 'OPCP-BH1-L', '150.00', 'PO-00000429B'),
(TRHU5367889, 'NL1000WHT', '3240.00', 'PO1002050'),
(TRHU4692842, '1300828', '500.00', '4500342008'),
(TRHU4560701, 'TSFP-HB2-T', '630.00', 'PO-00000485A'),
(TRHU4319443, 'BGS21ASFD', '20.00', 'PO10456-1'),
(TRHU4317564, 'CSMN-AM1-X', '1000.00', 'PO-00000446'),
(TRHU4249449, '4312970', '3240.00', '4550735164'),
(TRHU4238260, '741-GWPBundle-TW', '170.00', 'NECTAR2022MH241'),
(TRHU3335270, '1301291', '60000.00', '4500330599'),
(TRHU3070607, '36082233', '150.00', '11199460'),
(TLLU8519560, 'BGM03AWFX', '360.00', 'PO10181A'),
(TLLU8519560, '10-1067', '9120.00', 'PO10396'),
(TLLU8519560, 'LUNA-KP-SS', '8704.00', '4782'),
(TLLU5819760, 'GS-1319', '10000.00', '62719'),
(TLLU5819760, '2020124775', '340.00', '3483'),
(TLLU5389611, '1049243', '63200.00', '4500343723'),
(TLLU4920852, '40-21-72194', '22.00', '53839'),
(TRHU3335270, '4312904', '1050.00', '4550694829'),
(TLLU4540955, '062-06-4580', '86.00', '1002529'),
(TRHU3335270, 'BGM03AWFK', '1000.00', 'PO9912'),
(TLLU4196942, 'Classic Dining Chair,Walnut Legs, SF XH1', '3290.00', 'P279'),
(TLLU4196942, 'BGM61AWFF', '852.00', 'PO10365');
---
--- The data above is a subsample of what I have on the db, what I'm trying to do is to update another table based off this info but with some GROUP_CONCAT()
--- With the data from above, I need to GROUP_CONCAT(item_id),GROUP_CONCAT(quantity), GROUP_CONCAT(po_num) -- grouping by equipment_num field.
---
--- What I'm attempting to do is to do an UPDATE to another table with the GROUPED by equipment_num with and the Group_concats for the fields described above.
---
--- The only way I was able to do what I desired was with a intermediary TEMPORARY table.
---
--- Create the temp table:
--- Since what I need is a "list" of the quantities, I had to do a GROUP_CONCAT(CONCAT(quantity,''))
DROP TABLE __tmp__; CREATE TABLE __tmp__
SELECT equipment_num, GROUP_CONCAT( item_id ), GROUP_CONCAT(CONCAT( quantity , '' ) ), GROUP_CONCAT( po_num )
FROM `test_tbl`
GROUP BY equipment_num
--- Then FINALLY pull the information in the format I desire to the destination table:
UPDATE `dest_tbl` AS ms INNER JOIN `__tmp__` AS isn ON ( ms.equipment_num = isn.equipment_num ) SET ms.item_id = isn.item_id,
ms.piece_count = isn.quantity,
ms.pieces_detail = isn.po_num
I'm trying to create a single queries that generates a derived query that does the group_concat part and then pushes that derived query result to the final destination table.
Any suggestions would be greatly appreciated.
Thank you for your time.
TB.
EDIT: Thank you for the replies I've got, but I'm trying to AVOID using the temp table.
I'm trying to AVOID creating a temp table.... I'm wondering how to do it in one go...
I was thinking something along the lines of:
UPDATE dest
INNER JOIN(
SELECT src.equipment_num, GROUP_CONCAT(src.item_id) as item_id,
GROUP_CONCAT(CONCAT(src.quantity)) as quantity,
GROUP_CONCAT(src.po_num) as po_num
FROM `item_shipped_ns` as src
INNER JOIN milestone_test_20221019 as dest ON(src.equipment_num=dest.equipment_num)
WHERE src.importer_id='123456'
GROUP BY src.equipment_num
) as tmp ON(src.equipment_num=tmp.equipment_num)
SET
dest.item_num=tmp.item_id,
dest.piece_count=tmp.quantity,
dest.pieces_detail=tmp.po_num;
Unfortunately, the above doesn't work, I get the following error msg.
#1146 - Table 'fgcloud.dest' doesn't exist
Edit 2: I had a missing brackets in the above which caused a different error, I've fixed it but having issues with the table aliases. The table in question that should be updated is the "milestone_test_20221019" - it is declared as "dest", yet I it says it cannot find it, suggestions? The source table which I need to get the info and aggregate before updating "milestone_test_20221019" is the "item_shipped_ns" and I believe that "tmp" table is the derived/sub-query table alias...
You need to give an alias to the GROUP_CONCAT() so you'll get a column named item_id. It won't use the argument to GROUP_CONCAT() as the name of the resulting column automatically.
CREATE TABLE __tmp__
SELECT equipment_num,
GROUP_CONCAT( item_id ) AS item_id,
GROUP_CONCAT( quantity ) AS quantity,
GROUP_CONCAT( po_num ) AS po_num
FROM `test_tbl`
GROUP BY equipment_num
To do this in a single query without creating the __tmp__ table, just put the query used to create __tmp__ in a subquery in the UPDATE.
UPDATE milestone_test_20221019 AS dest
JOIN (
SELECT equipment_num,
GROUP_CONCAT( item_id ) AS item_id,
GROUP_CONCAT( quantity ) AS quantity,
GROUP_CONCAT( po_num ) AS po_num
FROM item_shipped_ns
GROUP BY equipment_num
) AS src ON dest.equipment_num = src.equipment_num
SET dest.item_id = src.item_id,
dest.quantity = src.quantity,
dest.po_num = src.po_num
Thanks for the assistance, after a few more test and tweaks I was able to achieve what I desired.
Below is an example of how to use an UPDATE with GROUP_CONCAT() as well an implicit-explicit casting for the quantity field.
UPDATE milestone_test_20221019 as dest
INNER JOIN(
SELECT src.equipment_num, GROUP_CONCAT(src.item_id) as item_id,
GROUP_CONCAT(CONCAT(src.quantity,'')) as quantity,
GROUP_CONCAT(src.po_num) as po_num
FROM item_shipped_ns as src
INNER JOIN milestone_test_20221019 as t1 ON(src.equipment_num=t1.equipment_num)
WHERE src.importer_id='4081836'
GROUP BY src.equipment_num
) AS tmp ON(tmp.equipment_num=dest.equipment_num)
SET
dest.item_num=tmp.item_id,
dest.piece_count=tmp.quantity,
dest.pieces_detail=tmp.po_num;
Thank you for the people that commented and assisted me with their inputs.
Best regards,
TB.

In a SQL Server table how do I filter records based on JSON search on a column having JSON values

I am facing a challenge while filtering records in a SQL Server 2017 table which has a VARCHAR column having JSON type values:
Sample table rows with JSON column values:
Row # 1. {"Department":["QA"]}
Row # 2. {"Department":["DEV","QA"]}
Row # 3. {"Group":["Group 2","Group 12"],"Cluster":[Cluster 11"],"Vertical":
["XYZ"],"Department":["QAT"]}
Row # 4. {"Group":["Group 20"],"Cluster":[Cluster 11"],"Vertical":["XYZ"],"Department":["QAT"]}
Now I need to filter records from this table based on an input parameter which can be in the following format:
Sample JSON input parameter to query:
1. `'{"Department":["QA"]}'` -> This should return Row # 1 as well as Row # 2.
2. `'{"Group":["Group 2"]}'` -> This should return only Row # 3.
So the search should be like if the column value contains "any available json tag with any matching value" then return those matching records.
Note - This is exactly similar to PostgreSQL jsonb as shown below:
PostgreSQL filter clause:
TableName.JSONColumnName #> '{"Department":["QA"]}'::jsonb
By researching on internet I found OPENJSON capability that is available in SQL Server which works as below.
OPENJSON sample example:
SELECT * FROM
tbl_Name UA
CROSS APPLY OPENJSON(UA.JSONColumnTags)
WITH ([Department] NVARCHAR(500) '$.Department', [Market] NVARCHAR(300) '$.Market', [Group] NVARCHAR(300) '$.Group'
) AS OT
WHERE
OT.Department in ('X','Y','Z')
and OT.Market in ('A','B','C')
But the problem with this approach is that if in future there is a need to support any new tag in JSON (like 'Area'), that will also need to be added to every stored procedure where this logic is implemented.
Is there any existing SQL Server 2017 capability I am missing or any dynamic way to implement the same?
Only thing I could think of as an option when using OPENJSON would be break down your search string into its key value pair, break down your table that is storing the json you want to search into its key value pair and join.
There would be limitations to be aware of:
This solution would not work with nested arrays in your json
The search would be OR not AND. Meaning if I passed in multiple "Department" I was searching for, like '{"Department":["QA", "DEV"]}', it would return the rows with either of the values, not those that only contained both.
Here's a working example:
DECLARE #TestData TABLE
(
[TestData] NVARCHAR(MAX)
);
--Load Test Data
INSERT INTO #TestData (
[TestData]
)
VALUES ( '{"Department":["QA"]}' )
, ( '{"Department":["DEV","QA"]}' )
, ( '{"Group":["Group 2","Group 12"],"Cluster":["Cluster 11"],"Vertical": ["XYZ"],"Department":["QAT"]}' )
, ( '{"Group":["Group 20"],"Cluster":["Cluster 11"],"Vertical":["XYZ"],"Department":["QAT"]}' );
--Here is the value we are searching for
DECLARE #SeachJson NVARCHAR(MAX) = '{"Department":["QA"]}';
DECLARE #SearchJson TABLE
(
[Key] NVARCHAR(MAX)
, [Value] NVARCHAR(MAX)
);
--Load the search value into a temp table as its key\value pair.
INSERT INTO #SearchJson (
[Key]
, [Value]
)
SELECT [a].[Key]
, [b].[Value]
FROM OPENJSON(#SeachJson) [a]
CROSS APPLY OPENJSON([a].[Value]) [b];
--Break down TestData into its key\value pair and then join back to the search table.
SELECT [TestData].[TestData]
FROM (
SELECT [a].[TestData]
, [b].[Key]
, [c].[Value]
FROM #TestData [a]
CROSS APPLY OPENJSON([a].[TestData]) [b]
CROSS APPLY OPENJSON([b].[Value]) [c]
) AS [TestData]
INNER JOIN #SearchJson [srch]
ON [srch].[Key] COLLATE DATABASE_DEFAULT = [TestData].[Key]
AND [srch].[Value] = [TestData].[Value];
Which gives you the following results:
TestData
-----------------------------
{"Department":["QA"]}
{"Department":["DEV","QA"]}

SQL JSON array sort

record in DB:
id info
0 [{"name":"a", "time":"2017-9-25 17:20:21"},{"name":"b", "time":"2017-9-25 23:23:41"},{"name":"c", "time":"2017-9-25 12:56:78"}]
my goal is to sort json array column info base on time, like:
id info
0 [{"name":"c", "time":"2017-9-25 12:56:78"},{"name":"a", "time":"2017-9-25 17:20:21"},{"name":"b", "time":"2017-9-25 23:23:41"},]
I use sparkSQL, having no clue
You can do this by converting the json array into a sql result set, extract the sorting column, and finally convert it back to a json array:
DECLARE #json NVARCHAR(MAX);
SET #json = '[
{"name":"a", "time":"2017-09-25 17:20:21"},
{"name":"b", "time":"2017-09-25 23:23:41"},
{"name":"c", "time":"2017-09-25 12:56:59"}
]';
WITH T AS (
SELECT [Value] AS array_element
, TRY_CAST(JSON_VALUE(Value, 'strict $.time') AS DATETIME) AS sorting
FROM OPENJSON(#json, 'strict $')
)
SELECT STRING_AGG(T.array_element, ',') WITHIN GROUP (ORDER BY sorting)
FROM T
Notice:
I changed the sample data slightly, due to invalid months and seconds.
The STRING_AGG() function is only available from SQL 2017/Azure SQL Database. For older versions, use the classic "FOR XML PATH" method, which I will leave as an exercise to the reader.
If you want to apply it to a full sql table, use CROSS APPLY as follows:
DECLARE #json NVARCHAR(MAX);
SET #json = '[
{"name":"a", "time":"2017-09-25 17:20:21"},
{"name":"b", "time":"2017-09-25 23:23:41"},
{"name":"c", "time":"2017-9-25 12:56:59"}
]';
WITH dat AS (
SELECT * FROM (VALUES (1,#json), (2,#json)) AS T(id, info)
)
, T AS (
SELECT id, [Value] AS array_element
, TRY_CAST(JSON_VALUE(Value, 'strict $.time') AS DATETIME) AS sorting
FROM dat
CROSS APPLY OPENJSON(info, 'strict $')
)
SELECT id
, STRING_AGG(T.array_element, ',') WITHIN GROUP (ORDER BY sorting) AS info
FROM T
GROUP BY id
What I Suggest
Alternate Database Storage System
I would not recommend storing data in this manner. It simply makes your data less accessible and malleable as you are experiencing now. If you store your data like this:
id name time
0 a 2017-9-25 17:20:21
0 b 2017-9-25 23:23:41
0 c 2017-9-25 12:56:71
1 ... ...
Then you can select the data in time order using the ORDER BY method at the end of a select query. For example:
SELECT name, time FROM table_name where id=0 ORDER BY time asc;
If you have more columns in this table that you did not show, you may need a another table to efficiently store the information, but the performance benefits of foreign keys and joins between these types of tables would outweigh having all the data in one table as inconvenient JSON arrays.

PostgreSQL json aggregation issue

How with Postgres aggregation function merge one object as element in array field in parent object.
What need
Sector
Project
Result
My SQL request.
select row_to_json(t)
from (
select id, data,
(
select array_to_json(array_agg(row_to_json(p)))
from (
select id, data
from public."Project"
where (s.data -ยป 'projectId') :: UUID = id
) p
) as projects
from public."Sector" s
) t;
It don't work, because projects is null. But need, unwind data field and merge projectId in data with Project table. Like unwind and lookup in MongoDB.

Converting data from multiple Hive Tables to Complex JSON

I have data across two hive tables that I need to join and generate a JSON Object. I found few libraries (BrickHouse, OpenX) to have hive table map to a complex JSON schema. However, I am not able to find a way to get the results from the two tables into this Hive table.
E.g:
Table-A
Col1 Col2
"userLogins" 30
Table B
Col1 Col2 Col3
"userLogins" "Site A" 10
"userLogins" "Site B" 20
I want to generate a JSON Object such as :
{ name: "userLogins",
children: [{name: "Site A", logins:10}, {name: "Site B", logins:20}]
}
I have tried finding any clues to a possible solution but most links online are about converting JSON to Hive Table and not vice versa. Is there a better/easier way to achieve this ?
This can be done using the to_json UDF from Brickhouse. Once you build the jar file, you can add the jar and create a temporary function as:
add jar /path/brickhouse-0.7.0-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION to_json AS 'brickhouse.udf.json.ToJsonUDF';
I tested the UDF with the sample data you had given.
describe table_a;
col_1 string None
col_2 int None
describe table_b;
col_1 string None
col_2 string None
col_3 int None
select * from table_a;
userLogins 30
select * from table_b;
userLogins Site A 10
userLogins Site B 20
select
to_json(named_struct( 'name', a.col_1, 'children' , array(named_struct('name', b.col_2, 'logins', b.col_3))))
from table_a a
join table_b b
on a.col_1 = b.col_1;
{"name":"userLogins","children":[{"name":"Site B","logins":20}]}
{"name":"userLogins","children":[{"name":"Site A","logins":10}]}
You can find more details about the usage of the UDF from Brickhouse blog.
I think you are looking for the the collect UDF from brickhouse.
select named_struct(
'name', b.col_1,
'children', collect(named_struct('name', b.col_2, 'logins', b.col_3)))
from table_a a join table_b b
on a.col_1 = b.col_1
group by b.col_1;
The above outputs the below json
{"name":"userLogins","children":[{"name":"Site A","logins":10},{"name":"Site B","logins":20}]}