I'm attempting to create a single query that UPDATES another table but the SUBQUERY/DERIVED-QUERY that I would use requires me to have them GROUP BY and GROUP_CONCAT().
I was able to get my desired output but to do so I had to create a temporary table to store the "grouped/ concated" data and then push that "re-organized" data to the destination table. TO do so, I have to literally run 2 separate queries one that populates the temp table with the "organized" data in it's fields and then run another UPDATE that pushes the "organized" data from the temp table to the final destination table.
I've created a REPREX that exemplifies what I'm trying to achieve below:
/*
Create a simplified sample table:
*/
CREATE TABLE `test_tbl` (
`equipment_num` varchar(20),
`item_id` varchar(40),
`quantity` decimal(10,2),
`po_num` varchar(20)
)
--
-- Dumping data for table `test_tbl`
--
INSERT INTO `test_tbl` (`equipment_num`, `item_id`, `quantity`, `po_num`) VALUES
(TRHU8399302, '70-8491', '5.00', 'PO10813-Air'),
(TRHU8399302, '40-21-72194', '22.00', '53841'),
(TRHU8399302, '741-PremBundle-CK', '130.00', 'NECTAR-PMBUNDLE-2022'),
(TRHU8399302, '741-GWPBundle-KG', '650.00', 'NECTAR2021MH185-Fort'),
(TRHU6669420, '01-DGCOOL250FJ', '76000.00', '4467'),
(TRHU6669420, '20-2649', '450.00', 'PO9994'),
(TRHU6669420, 'PFL-PC-GRY-KG', '80.00', '1020'),
(TRHU6669420, '844067025947', '120.00', 'Cmax 2 15 22'),
(TRHU5614145, 'Classic Lounge Chair Walnut leg- A XH301', '372.00', 'P295'),
(TRHU5614145, '40-21-72194', '22.00', '53837'),
(TRHU5614145, 'MAR-PLW-55K-BX', '2313.00', 'SF220914R-CA'),
(TRHU5614145, 'OPCP-BH1-L', '150.00', 'PO-00000429B'),
(TRHU5367889, 'NL1000WHT', '3240.00', 'PO1002050'),
(TRHU4692842, '1300828', '500.00', '4500342008'),
(TRHU4560701, 'TSFP-HB2-T', '630.00', 'PO-00000485A'),
(TRHU4319443, 'BGS21ASFD', '20.00', 'PO10456-1'),
(TRHU4317564, 'CSMN-AM1-X', '1000.00', 'PO-00000446'),
(TRHU4249449, '4312970', '3240.00', '4550735164'),
(TRHU4238260, '741-GWPBundle-TW', '170.00', 'NECTAR2022MH241'),
(TRHU3335270, '1301291', '60000.00', '4500330599'),
(TRHU3070607, '36082233', '150.00', '11199460'),
(TLLU8519560, 'BGM03AWFX', '360.00', 'PO10181A'),
(TLLU8519560, '10-1067', '9120.00', 'PO10396'),
(TLLU8519560, 'LUNA-KP-SS', '8704.00', '4782'),
(TLLU5819760, 'GS-1319', '10000.00', '62719'),
(TLLU5819760, '2020124775', '340.00', '3483'),
(TLLU5389611, '1049243', '63200.00', '4500343723'),
(TLLU4920852, '40-21-72194', '22.00', '53839'),
(TRHU3335270, '4312904', '1050.00', '4550694829'),
(TLLU4540955, '062-06-4580', '86.00', '1002529'),
(TRHU3335270, 'BGM03AWFK', '1000.00', 'PO9912'),
(TLLU4196942, 'Classic Dining Chair,Walnut Legs, SF XH1', '3290.00', 'P279'),
(TLLU4196942, 'BGM61AWFF', '852.00', 'PO10365');
---
--- The data above is a subsample of what I have on the db, what I'm trying to do is to update another table based off this info but with some GROUP_CONCAT()
--- With the data from above, I need to GROUP_CONCAT(item_id),GROUP_CONCAT(quantity), GROUP_CONCAT(po_num) -- grouping by equipment_num field.
---
--- What I'm attempting to do is to do an UPDATE to another table with the GROUPED by equipment_num with and the Group_concats for the fields described above.
---
--- The only way I was able to do what I desired was with a intermediary TEMPORARY table.
---
--- Create the temp table:
--- Since what I need is a "list" of the quantities, I had to do a GROUP_CONCAT(CONCAT(quantity,''))
DROP TABLE __tmp__; CREATE TABLE __tmp__
SELECT equipment_num, GROUP_CONCAT( item_id ), GROUP_CONCAT(CONCAT( quantity , '' ) ), GROUP_CONCAT( po_num )
FROM `test_tbl`
GROUP BY equipment_num
--- Then FINALLY pull the information in the format I desire to the destination table:
UPDATE `dest_tbl` AS ms INNER JOIN `__tmp__` AS isn ON ( ms.equipment_num = isn.equipment_num ) SET ms.item_id = isn.item_id,
ms.piece_count = isn.quantity,
ms.pieces_detail = isn.po_num
I'm trying to create a single queries that generates a derived query that does the group_concat part and then pushes that derived query result to the final destination table.
Any suggestions would be greatly appreciated.
Thank you for your time.
TB.
EDIT: Thank you for the replies I've got, but I'm trying to AVOID using the temp table.
I'm trying to AVOID creating a temp table.... I'm wondering how to do it in one go...
I was thinking something along the lines of:
UPDATE dest
INNER JOIN(
SELECT src.equipment_num, GROUP_CONCAT(src.item_id) as item_id,
GROUP_CONCAT(CONCAT(src.quantity)) as quantity,
GROUP_CONCAT(src.po_num) as po_num
FROM `item_shipped_ns` as src
INNER JOIN milestone_test_20221019 as dest ON(src.equipment_num=dest.equipment_num)
WHERE src.importer_id='123456'
GROUP BY src.equipment_num
) as tmp ON(src.equipment_num=tmp.equipment_num)
SET
dest.item_num=tmp.item_id,
dest.piece_count=tmp.quantity,
dest.pieces_detail=tmp.po_num;
Unfortunately, the above doesn't work, I get the following error msg.
#1146 - Table 'fgcloud.dest' doesn't exist
Edit 2: I had a missing brackets in the above which caused a different error, I've fixed it but having issues with the table aliases. The table in question that should be updated is the "milestone_test_20221019" - it is declared as "dest", yet I it says it cannot find it, suggestions? The source table which I need to get the info and aggregate before updating "milestone_test_20221019" is the "item_shipped_ns" and I believe that "tmp" table is the derived/sub-query table alias...
You need to give an alias to the GROUP_CONCAT() so you'll get a column named item_id. It won't use the argument to GROUP_CONCAT() as the name of the resulting column automatically.
CREATE TABLE __tmp__
SELECT equipment_num,
GROUP_CONCAT( item_id ) AS item_id,
GROUP_CONCAT( quantity ) AS quantity,
GROUP_CONCAT( po_num ) AS po_num
FROM `test_tbl`
GROUP BY equipment_num
To do this in a single query without creating the __tmp__ table, just put the query used to create __tmp__ in a subquery in the UPDATE.
UPDATE milestone_test_20221019 AS dest
JOIN (
SELECT equipment_num,
GROUP_CONCAT( item_id ) AS item_id,
GROUP_CONCAT( quantity ) AS quantity,
GROUP_CONCAT( po_num ) AS po_num
FROM item_shipped_ns
GROUP BY equipment_num
) AS src ON dest.equipment_num = src.equipment_num
SET dest.item_id = src.item_id,
dest.quantity = src.quantity,
dest.po_num = src.po_num
Thanks for the assistance, after a few more test and tweaks I was able to achieve what I desired.
Below is an example of how to use an UPDATE with GROUP_CONCAT() as well an implicit-explicit casting for the quantity field.
UPDATE milestone_test_20221019 as dest
INNER JOIN(
SELECT src.equipment_num, GROUP_CONCAT(src.item_id) as item_id,
GROUP_CONCAT(CONCAT(src.quantity,'')) as quantity,
GROUP_CONCAT(src.po_num) as po_num
FROM item_shipped_ns as src
INNER JOIN milestone_test_20221019 as t1 ON(src.equipment_num=t1.equipment_num)
WHERE src.importer_id='4081836'
GROUP BY src.equipment_num
) AS tmp ON(tmp.equipment_num=dest.equipment_num)
SET
dest.item_num=tmp.item_id,
dest.piece_count=tmp.quantity,
dest.pieces_detail=tmp.po_num;
Thank you for the people that commented and assisted me with their inputs.
Best regards,
TB.
I am facing a challenge while filtering records in a SQL Server 2017 table which has a VARCHAR column having JSON type values:
Sample table rows with JSON column values:
Row # 1. {"Department":["QA"]}
Row # 2. {"Department":["DEV","QA"]}
Row # 3. {"Group":["Group 2","Group 12"],"Cluster":[Cluster 11"],"Vertical":
["XYZ"],"Department":["QAT"]}
Row # 4. {"Group":["Group 20"],"Cluster":[Cluster 11"],"Vertical":["XYZ"],"Department":["QAT"]}
Now I need to filter records from this table based on an input parameter which can be in the following format:
Sample JSON input parameter to query:
1. `'{"Department":["QA"]}'` -> This should return Row # 1 as well as Row # 2.
2. `'{"Group":["Group 2"]}'` -> This should return only Row # 3.
So the search should be like if the column value contains "any available json tag with any matching value" then return those matching records.
Note - This is exactly similar to PostgreSQL jsonb as shown below:
PostgreSQL filter clause:
TableName.JSONColumnName #> '{"Department":["QA"]}'::jsonb
By researching on internet I found OPENJSON capability that is available in SQL Server which works as below.
OPENJSON sample example:
SELECT * FROM
tbl_Name UA
CROSS APPLY OPENJSON(UA.JSONColumnTags)
WITH ([Department] NVARCHAR(500) '$.Department', [Market] NVARCHAR(300) '$.Market', [Group] NVARCHAR(300) '$.Group'
) AS OT
WHERE
OT.Department in ('X','Y','Z')
and OT.Market in ('A','B','C')
But the problem with this approach is that if in future there is a need to support any new tag in JSON (like 'Area'), that will also need to be added to every stored procedure where this logic is implemented.
Is there any existing SQL Server 2017 capability I am missing or any dynamic way to implement the same?
Only thing I could think of as an option when using OPENJSON would be break down your search string into its key value pair, break down your table that is storing the json you want to search into its key value pair and join.
There would be limitations to be aware of:
This solution would not work with nested arrays in your json
The search would be OR not AND. Meaning if I passed in multiple "Department" I was searching for, like '{"Department":["QA", "DEV"]}', it would return the rows with either of the values, not those that only contained both.
Here's a working example:
DECLARE #TestData TABLE
(
[TestData] NVARCHAR(MAX)
);
--Load Test Data
INSERT INTO #TestData (
[TestData]
)
VALUES ( '{"Department":["QA"]}' )
, ( '{"Department":["DEV","QA"]}' )
, ( '{"Group":["Group 2","Group 12"],"Cluster":["Cluster 11"],"Vertical": ["XYZ"],"Department":["QAT"]}' )
, ( '{"Group":["Group 20"],"Cluster":["Cluster 11"],"Vertical":["XYZ"],"Department":["QAT"]}' );
--Here is the value we are searching for
DECLARE #SeachJson NVARCHAR(MAX) = '{"Department":["QA"]}';
DECLARE #SearchJson TABLE
(
[Key] NVARCHAR(MAX)
, [Value] NVARCHAR(MAX)
);
--Load the search value into a temp table as its key\value pair.
INSERT INTO #SearchJson (
[Key]
, [Value]
)
SELECT [a].[Key]
, [b].[Value]
FROM OPENJSON(#SeachJson) [a]
CROSS APPLY OPENJSON([a].[Value]) [b];
--Break down TestData into its key\value pair and then join back to the search table.
SELECT [TestData].[TestData]
FROM (
SELECT [a].[TestData]
, [b].[Key]
, [c].[Value]
FROM #TestData [a]
CROSS APPLY OPENJSON([a].[TestData]) [b]
CROSS APPLY OPENJSON([b].[Value]) [c]
) AS [TestData]
INNER JOIN #SearchJson [srch]
ON [srch].[Key] COLLATE DATABASE_DEFAULT = [TestData].[Key]
AND [srch].[Value] = [TestData].[Value];
Which gives you the following results:
TestData
-----------------------------
{"Department":["QA"]}
{"Department":["DEV","QA"]}
record in DB:
id info
0 [{"name":"a", "time":"2017-9-25 17:20:21"},{"name":"b", "time":"2017-9-25 23:23:41"},{"name":"c", "time":"2017-9-25 12:56:78"}]
my goal is to sort json array column info base on time, like:
id info
0 [{"name":"c", "time":"2017-9-25 12:56:78"},{"name":"a", "time":"2017-9-25 17:20:21"},{"name":"b", "time":"2017-9-25 23:23:41"},]
I use sparkSQL, having no clue
You can do this by converting the json array into a sql result set, extract the sorting column, and finally convert it back to a json array:
DECLARE #json NVARCHAR(MAX);
SET #json = '[
{"name":"a", "time":"2017-09-25 17:20:21"},
{"name":"b", "time":"2017-09-25 23:23:41"},
{"name":"c", "time":"2017-09-25 12:56:59"}
]';
WITH T AS (
SELECT [Value] AS array_element
, TRY_CAST(JSON_VALUE(Value, 'strict $.time') AS DATETIME) AS sorting
FROM OPENJSON(#json, 'strict $')
)
SELECT STRING_AGG(T.array_element, ',') WITHIN GROUP (ORDER BY sorting)
FROM T
Notice:
I changed the sample data slightly, due to invalid months and seconds.
The STRING_AGG() function is only available from SQL 2017/Azure SQL Database. For older versions, use the classic "FOR XML PATH" method, which I will leave as an exercise to the reader.
If you want to apply it to a full sql table, use CROSS APPLY as follows:
DECLARE #json NVARCHAR(MAX);
SET #json = '[
{"name":"a", "time":"2017-09-25 17:20:21"},
{"name":"b", "time":"2017-09-25 23:23:41"},
{"name":"c", "time":"2017-9-25 12:56:59"}
]';
WITH dat AS (
SELECT * FROM (VALUES (1,#json), (2,#json)) AS T(id, info)
)
, T AS (
SELECT id, [Value] AS array_element
, TRY_CAST(JSON_VALUE(Value, 'strict $.time') AS DATETIME) AS sorting
FROM dat
CROSS APPLY OPENJSON(info, 'strict $')
)
SELECT id
, STRING_AGG(T.array_element, ',') WITHIN GROUP (ORDER BY sorting) AS info
FROM T
GROUP BY id
What I Suggest
Alternate Database Storage System
I would not recommend storing data in this manner. It simply makes your data less accessible and malleable as you are experiencing now. If you store your data like this:
id name time
0 a 2017-9-25 17:20:21
0 b 2017-9-25 23:23:41
0 c 2017-9-25 12:56:71
1 ... ...
Then you can select the data in time order using the ORDER BY method at the end of a select query. For example:
SELECT name, time FROM table_name where id=0 ORDER BY time asc;
If you have more columns in this table that you did not show, you may need a another table to efficiently store the information, but the performance benefits of foreign keys and joins between these types of tables would outweigh having all the data in one table as inconvenient JSON arrays.
I have data across two hive tables that I need to join and generate a JSON Object. I found few libraries (BrickHouse, OpenX) to have hive table map to a complex JSON schema. However, I am not able to find a way to get the results from the two tables into this Hive table.
E.g:
Table-A
Col1 Col2
"userLogins" 30
Table B
Col1 Col2 Col3
"userLogins" "Site A" 10
"userLogins" "Site B" 20
I want to generate a JSON Object such as :
{ name: "userLogins",
children: [{name: "Site A", logins:10}, {name: "Site B", logins:20}]
}
I have tried finding any clues to a possible solution but most links online are about converting JSON to Hive Table and not vice versa. Is there a better/easier way to achieve this ?
This can be done using the to_json UDF from Brickhouse. Once you build the jar file, you can add the jar and create a temporary function as:
add jar /path/brickhouse-0.7.0-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION to_json AS 'brickhouse.udf.json.ToJsonUDF';
I tested the UDF with the sample data you had given.
describe table_a;
col_1 string None
col_2 int None
describe table_b;
col_1 string None
col_2 string None
col_3 int None
select * from table_a;
userLogins 30
select * from table_b;
userLogins Site A 10
userLogins Site B 20
select
to_json(named_struct( 'name', a.col_1, 'children' , array(named_struct('name', b.col_2, 'logins', b.col_3))))
from table_a a
join table_b b
on a.col_1 = b.col_1;
{"name":"userLogins","children":[{"name":"Site B","logins":20}]}
{"name":"userLogins","children":[{"name":"Site A","logins":10}]}
You can find more details about the usage of the UDF from Brickhouse blog.
I think you are looking for the the collect UDF from brickhouse.
select named_struct(
'name', b.col_1,
'children', collect(named_struct('name', b.col_2, 'logins', b.col_3)))
from table_a a join table_b b
on a.col_1 = b.col_1
group by b.col_1;
The above outputs the below json
{"name":"userLogins","children":[{"name":"Site A","logins":10},{"name":"Site B","logins":20}]}