hive json data parsing - json

My JSON Data is something like this in the table json_table and column: json_col
{
"href": "example.com",
"Hosts": {
"cluster_name": "test",
"host_name": "test.iabc.com"
},
"metrics": {
"cpu": {
"cpu_user": [
[
0.7,
1499795941
],
[
0.3,
1499795951
]
]
}
}
}
I want to get this into a table json_data in the below format
+-------------+-------+------------+
| metric_type | value | timestamp |
+-------------+-------+------------+
| cpu_user | 0.7 | 1499795941 |
+-------------+-------+------------+
| cpu_user | 0.3 | 1499795951 |
+-------------+-------+------------+
I tried getting the values using get_json_object
select get_json_object(json_col,'$.metrics.cpu.cpu_user[1]') from json_table
,this gives me
[0.3,1499795951]
How do I use the explode function from here to get the desired output?

select 'cpu_user' as metric_type
,val_ts[0] as val
,val_ts[1] as ts
from (select split(m.col,',') as val_ts
from json_table j
lateral view explode(split(regexp_replace(get_json_object(json_col,'$.metrics.cpu.cpu_user[*]'),'^\\[\\[|\\]\\]$',''),'\\],\\[')) m
) m
;
+-------------+-----+------------+
| metric_type | val | ts |
+-------------+-----+------------+
| cpu_user | 0.7 | 1499795941 |
| cpu_user | 0.3 | 1499795951 |
+-------------+-----+------------+

You can also implement SerDe and InputFormat interface based on JSON data, instead of using UDF.
here are some referance:
http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/
https://github.com/xjtuzxh/inceptor-inputformat

Related

QueryDSL with DB2 fetching Nested Json object or Json array aggregation Response

I am trying to fetch nested JSON objects and JSON List from Database using QueryDSL. I have used a native query with LISTAGG and JSON_OBJECT.
Native Query :
SELECT b.id,b.bankName,b.account,b.branch,(select CONCAT(CONCAT('[',LISTAGG(JSON_OBJECT('accountId' value c.accountId, 'name' value customer_name,'amount' value c.amount),',')),']') from CUSTOMER_DETAILS c where c.bankId = b.id) as customers from BANK_DETAILS b
BANK_DETAILS
+----+---------+---------+----------+
| id | BankName| account | branch |
+----+---------+---------+----------+
| 1 | bank1 | savings | branch1 |
| 2 | bank2 | current | branch2 |
+----+---------+---------+----------+
CUSTOMER_DETAILS
+----+-----------+---------------+----------+-----------+
| id | accountId | customer_name | amount | BankId |
+----+-----------+---------------+----------+-----------+
| 1 | 50123 | Abc1 | 150000 | 1 |
| 2 | 50124 | Abc2 | 25000 | 1 |
| 3 | 50125 | Abc3 | 50000 | 2 |
| 4 | 50126 | Abc4 | 250000 | 2 |
+----+-----------+---------------+----------+-----------+
Expected Output for the above tables
[{
"id": "1",
"bankName": "bank1",
"account": "savings",
"branch": "branch1",
"customers": [
{
"accountId": "50123",
"Name": "Abc1",
"amount": 150000
},
{
"accountId": "50124",
"Name": "Abc2",
"amount": 25000
},
]
},{
"id": "2",
"bankName": "bank3",
"account": "current",
"branch": "branch2",
"customers": [
{
"accountId": "50125",
"name": "Abc3",
"amount": 50000
},
{
"accountId": "50126",
"Name": "Abc4",
"amount": 250000
},
]
}]
i have tried with writing this native query in QueryDSL with the below multiple queries for make the same expected output with the forEach loop.
class Repository {
private SQLQueryFactory queryFactory;
public Repository (SQLQueryFactory queryFactory){
this.queryFactory = queryFactory;
}
public void fetchBankDetails(){
List<BankDetails> bankList = queryFactory.select(QBankDetails.bankDetails)
.from(QBankDetails.bankDetails);
bankList.forEach(bankData ->{
List<CustomerDetails> customerList = queryFactory.select(QCustomerDetails.customerDetails)
.from(QCustomerDetails.customerDetails)
.where(QCustomerDetails.customerDetails.bankId.eq(bankData.bankId));
bankData.setCustomerList(customerList)
});
System.out.println(bankList);
}
}
I need to improve my code and convert it into a single query using QueryDSL to return the expected output
Is there any other way or any suggestions?

Find matches in JSON array field in MySQL

Given a JSON object type column in table t, e.g.
| id | obj |
| -- | ---------------------------------- |
| 1 | { "params": { "id": [13, 23]} } |
| 2 | { "params": { "id": [13, 24]} } |
| 3 | { "params": { "id": [11, 23, 45]} }|
and a list of numeric values, e.g. [12, 23, 45].
We need to check every record if it contains values from the given list.
So, the desired result would be
| id | matches |
| -- | -------- |
| 1 | [23] |
| 3 | [23, 45] |
Could someone please help with such a query for the MySQL 8?
Thank you!
You can use json_table:
select t2.id, t2.n_obj from (
select t1.id, (select json_arrayagg(ids.v)
from json_table(t1.obj, "$.params.id[*]" columns(v text path '$')) ids
where json_contains('[12, 23, 45]', ids.v, '$'))
n_obj from t t1) t2
where t2.n_obj is not null;

Destructuring deeply nested json in query

I have a database table where one column contains a deeply nested JSON field. Approximately like this:
create temporary table testing (id integer, contents json);
insert into testing values (1, '{
"level1": {
"level2": [
{
"level3": {
"a": {
"value1": 1,
"value2": 2
},
"b": {
"value1": 3,
"value2": 4
}
}
},
{
"level3": {
"d": {
"value1": 5,
"value2": 6
},
"e": {
"value1": 7,
"value2": 8
}
}
}
]
}
}
');
I am trying to get out a result looking like this:
| id | l2_label | data_label | value1 |
+----+----------+------------+--------+
| 1 | level2 | a | 1 |
| 1 | level2 | b | 3 |
| 1 | level2 | d | 5 |
| 1 | level2 | e | 7 |
The top level is always called "level1", but there can be more than one key inside there and "level2" is not a fixed string. Each of these keys contains an array of objects which may have more than the "level3" key, but I'm only looking for "level3". Inside that, "a", "b", "c" could any string. Then I'm looking for one row for every "value1" value.
I've gotten up to the following query:
select id, key as l2_label, json_array_elements(value) from testing, json_each(contents -> 'level1');
which returns
id | l2_label | json_array_elements
----+----------+--------------------------------------
1 | level2 | { +
| | "level3": { +
| | "a": { +
| | "value1": 1,+
| | "value2": 2 +
| | }, +
| | "b": { +
| | "value1": 3,+
| | "value2": 4 +
| | } +
| | } +
| | }
1 | level2 | { +
| | "level3": { +
but I am at a loss at how to unpack the level3 elements now.
My question is firstly how to get to the result I'm looking for, but also advice on how to build a query like this incrementally, since I'm not sure how to operate on that json_array_elements now.
One way we can approach this is by using a Lateral Join, where we compute a given value for an un-nested set of keys.
For example:
WITH getting_lvl3 AS (
SELECT id AS id,
key AS l2_label,
json_array_elements(value) -> 'level3' AS lvl3
FROM testing, json_each(contents -> 'level1')
)
SELECT id,
l2_label,
label,
lvl3 -> label -> 'value1' -- Getting value1 for each key
FROM getting_lvl3
-- Executing this code for every key in level 3
LEFT JOIN LATERAL json_object_keys(lvl3) label ON TRUE
;
The result should look something like:
|id |l2_label|label|value|
|---|--------|-----|-----|
|1 |level2 |a |1 |
|1 |level2 |b |3 |
|1 |level2 |c |5 |
|1 |level2 |d |7 |
If you want to see this incrementally, then try the commented queries under the CTEs to get a picture of what happens with each step:
with testing as (
select 1 as id, '{
"level1": {
"level2": [
{
"level3": {
"a": {
"value1": 1,
"value2": 2
},
"b": {
"value1": 3,
"value2": 4
}
}
},
{
"level3": {
"d": {
"value1": 5,
"value2": 6
},
"e": {
"value1": 7,
"value2": 8
}
}
}
]
}
}
'::json as contents
), keys_lvl_2 as (
select id,
json_object_keys(contents->'level1') as l2_label,
contents->'level1' as contents
from testing
), array_lvl_2 as (
select id, l2_label,
json_array_elements(contents->l2_label) as contents
from keys_lvl_2
), keys_lvl_3 as (
select id, l2_label,
json_object_keys(contents->'level3') as data_label,
contents->'level3' as contents
from array_lvl_2
)
-- select * from keys_lvl_2;
-- select * from array_lvl_2;
-- select * from keys_lvl_3;
select id, l2_label, data_label,
contents->data_label->>'value1' as value1
from keys_lvl_3;

JSONB Array Output - Postgres 11.5

I have a JSONB string in this format
{
"RouteId": "90679754-89f5-48d7-99e1-5192bf0becf9",
"Started": "2019-11-20T21:24:33.7294486Z",
"RouteName": "ProcessRequestsAndPublishResponse",
"MachineName": "5CG8134NJW-LA",
"ChildProfiles": [
{
"ApiMethod": "ProcessApiRequest",
"ExecuteType": null,
"DurationMilliseconds": 2521.4,
},
{
"ApiMethod": "PublishShipViaToQueue",
"ExecuteType": null,
"DurationMilliseconds": 0.6,
}
],
"DataBaseTimings": null,
"DurationMilliseconds": 2522.6
}
How do I get the output in this format
| RouteName | Metrics | Time | TotalDuration |
---------------------------------------------------------------------------------------------
| ProcessRequestsAndPublishResponse | ProcessApiRequest | 2521.4 | 2522.6 |
| ProcessRequestsAndPublishResponse | PublishShipViaToQueue | 0.6 | 2522.6 |
---------------------------------------------------------------------------------------------
Any help on this is appreciated
How do you also extend this in case there are different arrays. Sorry fairly new to the JSONB world.
{
"RouteId": "af2e9cba-11ae-43a9-813c-d24ea574ee62",
"RouteName": "GenerateRequestAndPublishToQueue",
"ChildProfiles": [
{
"ApiMethod": "PublishShipViaRequestToQueue",
"DurationMilliseconds": 0.1,
}
],
"DataBaseTimings": [
{
"ExecuteType": "OpenAsync",
"DurationMilliseconds": 0.1
},
{
"ExecuteType": "Reader",
"DurationMilliseconds": 72.1
},
{
"ExecuteType": "Close",
"DurationMilliseconds": 15.9
}
],
"DurationMilliseconds": 88.6
}
The required output is something like this
| RouteName | Metrics | Time | TotalDuration |
--------------------------------------------------------------------------------------------------------
| GenerateRequestAndPublishToQueue | PublishShipViaRequestToQueue | 0.1 | 88.6 |
| GenerateRequestAndPublishToQueue | OpenAsync | 0.1 | 88.6 |
| GenerateRequestAndPublishToQueue | Reader | 72.1 | 88.6 |
| GenerateRequestAndPublishToQueue | Close | 15.9 | 88.6 |
---------------------------------------------------------------------------------------------------------
You can do a lateral join and use jsonb_to_recordset() to expand the inner json array as an inline table:
select
js ->> 'RouteName' RouteName,
xs."ApiMethod" Metrics,
xs."DurationMilliseconds" "Time",
js ->> 'DurationMilliseconds' TotalDuration
from t
cross join lateral jsonb_to_recordset( js -> 'ChildProfiles')
as xs("ApiMethod" text, "DurationMilliseconds" numeric)
Demo on DB Fiddlde:
routename | metrics | Time | totalduration
:-------------------------------- | :-------------------- | -----: | :------------
ProcessRequestsAndPublishResponse | ProcessApiRequest | 2521.4 | 2522.6
ProcessRequestsAndPublishResponse | PublishShipViaToQueue | 0.6 | 2522.6

Postgres update JSON field

I've got several Postgres 9.4 tables that contain data like this:
| id | data |
|----|-------------------------------------------|
| 1 | {"user": "joe", "updated-time": 123} |
| 2 | {"message": "hi", "updated-time": 321} |
I need to transform the JSON column into something like this
| id | data |
|----|--------------------------------------------------------------|
| 1 | {"user": "joe", "updated-time": {123, "unit":"millis"}} |
| 2 | {"message": "hi", "updated-time": {321, "unit":"millis"}} |
Ideally it would be easy to apply the transformation to multiple tables. Tables that contain the JSON key data->'updated-time' should be updated, and ones that do not should be skipped. Thanks!
You can use the || operator to merge two jsonb objects together.
select '{"foo":"bar"}'::jsonb || '{"baz":"bar"}'::jsonb;
= {"baz": "bar", "foo": "bar"}