Hive - Create a nested Hive table from another non-nested Hive table - json

I have a hive table - Table A as follows:
id | partner | recent_use | count |
1 | ab | 20160101 | 5 |
1 | cd | 20160304 | 12 |
2 | ab | 20160205 | 1 |
2 | cd | 20150101 | 2 |
3 | ab | 20150401 | 4 |
From Table A, I want to end up with a table like this - Table B:
id | partner |
1 | [ ab : { recent_use:20160101, count:5 } , cd : { recent_use:20160304, count:12 } ]
2 | [ ab : { recent_use:20160205, count:1 } , cd : { recent_use:20150101, count:2 } ]
3 | [ ab : { recent_use:20150401, count:4 } ]
Basically, Table B is a nested version of Table A such that for a given id, all the data from each of its partner is grouped into one column.
I have two questions:
How can I create Table B from Table A?
How can I convert Table B into a JSON document such that I can load the document into any NOSQL DB?
Would really appreciate any help on this. Thanks!

Simple to achieve this is using UDAF - user defined aggregation function. You can write custom function to make things simple. Here is some thing you can using inbuilt functions. Give it a try.
select id, CONCAT("[", concat_ws(',', collect_set(CONCAT('"', partner,
'":{ "recent_use":', recent_use, ', "count":', count, "}"))), "]") as
collJ from tableA group by id
Above SQL will get ID and collJ in string you looking for after that can use get_json_object function to convert to JSON object.
Reference
https://www.qubole.com/resources/cheatsheet/hive-function-cheat-sheet/
https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy

Related

Looping through an array in SQL column

I have a SQL table that looks something like this:
| ID | Value |
| --- | ----------------------------------------------------- |
| 1 | {"name":"joe", "lastname":"doe", "age":"34"} |
| 2 | {"name":"jane", "lastname":"doe", "age":"29"} |
| 3 | {"name":"michael", "lastname":"dumplings", "age":"40"}|
How can I using SQL select function, select only the rows where "age" (in value column) is above 30?
Thank you.
The column Value as it is it contains valid JSON data.
You can use the function JSON_EXTRACT() to get the the age and convert it to a numeric value by adding 0:
SELECT *
FROM tablename
WHERE JSON_EXTRACT(Value, "$.age") + 0 > 30;
See the demo.

MySQL - Search JSON data column

I have a MySQL database column that contains JSON array encoded strings. I would like to search the JSON array where the "Elapsed" value is greater than a particular number and return the corresponding TaskID value of the object the value was found. I have been attempting to use combinations of the JSON_SEARCH, JSON_CONTAINS, and JSON_EXTRACT functions but I am not getting the desired results.
[
{
"TaskID": "TAS00000012344",
"Elapsed": "25"
},
{
"TaskID": "TAS00000012345",
"Elapsed": "30"
},
{
"TaskID": "TAS00000012346",
"Elapsed": "35"
},
{
"TaskID": "TAS00000012347",
"Elapsed": "40"
}
]
Referencing the JSON above, if I search for "Elapsed" > "30" then 2 records would return
'TAS00000012346'
'TAS00000012347'
I am using MySQL version 5.7.11 and new to querying json data. Any help would be appreciated. thanks
With MySQL pre-8.0, there is no easy way to turn a JSON array to a recordset (ie, function JSON_TABLE() is not yet available).
So, one way or another, we need to manually iterate through the array to extract the relevant pieces of data (using JSON_EXTRACT()). Here is a solution that uses an inline query to generate a list of numbers ; another classic approchach is to use a number tables.
Assuming a table called mytable with a column called js holding the JSON content:
SELECT
JSON_EXTRACT(js, CONCAT('$[', n.idx, '].TaskID')) TaskID,
JSON_EXTRACT(js, CONCAT('$[', n.idx, '].Elapsed')) Elapsed
FROM mytable t
CROSS JOIN (
SELECT 0 idx
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
) n
WHERE JSON_EXTRACT(js, CONCAT('$[', n.idx, '].Elapsed')) * 1.0 > 30
NB: in the WHERE clause, the * 1.0 operation is there to force the conversion to a number.
Demo on DB Fiddle with your sample data:
| TaskID | Elapsed |
| -------------- | ------- |
| TAS00000012346 | 35 |
| TAS00000012347 | 40 |
Yes , you can definitely to it using JSON_EXTRACT() function in mysql.
lets take a table that contains JSON (table client_services here) :
+-----+-----------+--------------------------------------+
| id | client_id | service_values |
+-----+-----------+------------+-------------------------+
| 100 | 1000 | { "quota": 1,"data_transfer":160000} |
| 101 | 1000 | { "quota": 2,"data_transfer":800000} |
| 102 | 1000 | { "quota": 3,"data_transfer":70000} |
| 103 | 1001 | { "quota": 1,"data_transfer":97000} |
| 104 | 1001 | { "quota": 2,"data_transfer":1760} |
| 105 | 1002 | { "quota": 2,"data_transfer":1060} |
+-----+-----------+--------------------------------------+
And now lets say we want client_id for all who have quota>1 , then use this query :
SELECT
id,client_id,
JSON_EXTRACT(service_values, '$.quota') AS quota
FROM client_services
WHERE JSON_EXTRACT(service_values, '$.quota') > 1;
And hence it will result into :
+-----+-----------+-------+
| id | client_id | quota |
+-----+-----------+--------
| 101 | 1000 | 2 |
| 102 | 1000 | 3 |
| 104 | 1001 | 2 |
| 105 | 1002 | 2 |
+-----+-----------+-------+
hope this helps!

Getting Values From Json Data Inside Array in Mysql

We are saving information in a json Column which contain json data in an array.
Data structure:
[
{
"type":"automated_backfill",
"title":"Walgreens Sales Ad",
"keyword":"Walgreens Sales Ad",
"score":4
},
{
"type":"automated_backfill",
"title":"Nicoderm Coupons",
"keyword":"Nicoderm Coupons",
"score":4
},
{
"type":"automated_backfill",
"title":"Iphone Sales",
"keyword":"Iphone Sales",
"score":3
},
{
"type":"automated_backfill",
"title":"Best Top Load Washers",
"keyword":"Best Top Load Washers",
"score":1
},
{
"type":"automated_backfill",
"title":"Top 10 Best Cell Phones",
"keyword":"Top 10 Best Cell Phones",
"score":1
},
{
"type":"automated_backfill",
"title":"Tv Deals",
"keyword":"Tv Deals",
"score":0
}
]
What we are trying:
SELECT id, ad_meta->'$**.type' FROM window_requests
that returns:
We are looking to get each type as row, which i think only possible with stored procedure, return whole column and then run loop on each row and return data...
Or can you think of a better solution?
Either Update Architecture:
or should we change our database and save information in separate table instead to json column ?
And then we can get easily join to get data with adding a foreign key.
Thanks you.
I understand that you are trying to generate a table structure from the content of your JSON array.
You would need to proceed in two steps :
first, turn each element in the array into a record ; for this, you can generate an inline table of of numbers and use JSON_EXTRACT() to pull up the relevant JSON object.
then, extract the values of each attribute from each object, generating new columns ; the -> operator can be used for this.
Query :
SELECT
id,
rec->'$.type' type,
rec->'$.score' score,
rec->'$.title' title,
rec->'$.keyword' keyword
FROM (
SELECT t.id, JSON_EXTRACT(t.val, CONCAT('$[', x.idx, ']')) AS rec
FROM
mytable t
INNER JOIN (
SELECT 0 AS idx UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) AS x ON JSON_EXTRACT(t.val, CONCAT('$[', x.idx, ']')) IS NOT NULL
) z
This will handle up to 10 objects per JSON array (if you expect more than that, you can add expand the UNION ALL part of the query).
In this DB Fiddle with your test data, this yields :
| id | type | score | title | keyword |
| --- | -------------------- | ----- | ------------------------- | ------------------------- |
| 1 | "automated_backfill" | 4 | "Walgreens Sales Ad" | "Walgreens Sales Ad" |
| 1 | "automated_backfill" | 4 | "Nicoderm Coupons" | "Nicoderm Coupons" |
| 1 | "automated_backfill" | 3 | "Iphone Sales" | "Iphone Sales" |
| 1 | "automated_backfill" | 1 | "Best Top Load Washers" | "Best Top Load Washers" |
| 1 | "automated_backfill" | 1 | "Top 10 Best Cell Phones" | "Top 10 Best Cell Phones" |
| 1 | "automated_backfill" | 0 | "Tv Deals" | "Tv Deals" |
NB : the arrow operator is not available in MariaDB. You can use JSON_EXTRACT() instead, like :
SELECT
id,
JSON_EXTRACT(rec, '$.type') type,
JSON_EXTRACT(rec, '$.score') score,
JSON_EXTRACT(rec, '$.title') title,
JSON_EXTRACT(rec, '$.keyword') keyword
FROM
...

Parsing JSON Payload in Vertica

I am using Vertica DB (and DBeaver as SQL Editor) - I am new to both tools.
I have a view with multiple columns:
someint | xyz | c | json
5 | 1542 | none | {"range":23, "rm": 51, "spx": 30}
5 | 1442 | none | {"range":24, "rm": 50, "spx": 3 }
3 | 1462 | none | {"range":24, "rm": 50, "spx": 30}
(int) | (int) | (Varchar) | (Long Varchar)
I want to create another view (or for the beginning, just be able to query it properly) of the above, but with the "json" column separated into the individual fields/columns "range", "rm" and "spx".
I imagine the output of the query / the new view to be something like the following:
someint | xyz | c | range | rm | spx
5 | 1542 | none | 23 | 51 | 30
5 | 1442 | none | 24 | 50 | 3
....
So far I have not been able to even query the "range" for example.
Hence my questions:
How can I separate the json column key-value structure into individual columns (in a query output)?
How can I transfer the desired output into a new view in Vertica?
I haven't found much help in the documentation as the procedure there is to load json text files from a drive or operate on tables, which I cannot do as I only have access to a view.
I have found a solution, so for anyone else encountering this problem:
SELECT a, xyza, cont,
MAPLOOKUP(MapJSONExtractor(json), 'range') AS range,
MAPLOOKUP(MapJSONExtractor(json), 'rm') AS rm,
MAPLOOKUP(MapJSONExtractor(json), 'spx') AS spx
FROM test;

How to split CSVs from one column to rows in a new table in MSSQL 2008 R2

Imagine the following (very bad) table design in MSSQL2008R2:
Table "Posts":
| Id (PK, int) | DatasourceId (PK, int) | QuotedPostIds (nvarchar(255)) | [...]
| 1 | 1 | | [...]
| 2 | 1 | 1 | [...]
| 2 | 2 | 1 | [...]
[...]
| 102322 | 2 | 123;45345;4356;76757 | [...]
So, the column QuotedPostIds contains a semicolon-separated list of self-referencing PostIds (Kids, don't do that at home!). Since this design is ugly as a hell, I'd like to extract the values from the QuotedPostIds table to a new n:m relationship table like this:
Desired new table "QuotedPosts":
| QuotingPostId (int) | QuotedPostId (int) | DatasourceId (int) |
| 2 | 1 | 1 |
| 2 | 1 | 2 |
[...]
| 102322 | 123 | 2 |
| 102322 | 45345 | 2 |
| 102322 | 4356 | 2 |
| 102322 | 76757 | 2 |
The primary key for this table could either be a combination of QuotingPostId, QuotedPostId and DatasourceID or an additional artificial key generated by the database.
It is worth noticing that the current Posts table contains about 6,300,000 rows but only about 285,000 of those have a value set in the QuotedPostIds column. Therefore, it might be a good idea to pre-filter those rows. In any case, I'd like to perform the normalization using internal MSSQL functionality only, if possible.
I already read other posts regarding this topic which mostly dealt with split functions but neither could I find out how exactly to create the new table and also copying the appropriate value from the Datasource column, nor how to filter the rows to touch accordingly.
Thank you!
€dit: I thought it through and finally solved the problem using an external C# program instead of internal MSSQL functionality. Since it seems that it could have been done using Mikael Eriksson's suggestion, I will mark his post as an answer.
From comments you say you have a string split function that you you don't know how to use with a table.
The answer is to use cross apply something like this.
select P.Id,
S.Value
from Posts as P
cross apply dbo.Split(';', P.QuotedPostIds) as S