Summing values from a JSON array in Snowflake - json

I have a source data which contains the following type of a JSON array:
[
[
"source 1",
250
],
[
"other source",
58
],
[
"more stuff",
42
],
...
]
There can be 1..N pairs of strings and values like this. How can I sum all the values together from this JSON?

You can use FLATTEN, it will produce a single row for each element of the input array. Then you can access the number in that element directly.
Imagine you have this input table:
create or replace table input as
select parse_json($$
[
[
"source 1",
250
],
[
"other source",
58
],
[
"more stuff",
42
]
]
$$) as json;
FLATTEN will do this:
select index, value from input, table(flatten(json));
-------+-------------------+
INDEX | VALUE |
-------+-------------------+
0 | [ |
| "source 1", |
| 250 |
| ] |
1 | [ |
| "other source", |
| 58 |
| ] |
2 | [ |
| "more stuff", |
| 42 |
| ] |
-------+-------------------+
And so you can just use VALUE[1] to access what you want
select sum(value[1]) from input, table(flatten(json));
---------------+
SUM(VALUE[1]) |
---------------+
350 |
---------------+

Related

QueryDSL with DB2 fetching Nested Json object or Json array aggregation Response

I am trying to fetch nested JSON objects and JSON List from Database using QueryDSL. I have used a native query with LISTAGG and JSON_OBJECT.
Native Query :
SELECT b.id,b.bankName,b.account,b.branch,(select CONCAT(CONCAT('[',LISTAGG(JSON_OBJECT('accountId' value c.accountId, 'name' value customer_name,'amount' value c.amount),',')),']') from CUSTOMER_DETAILS c where c.bankId = b.id) as customers from BANK_DETAILS b
BANK_DETAILS
+----+---------+---------+----------+
| id | BankName| account | branch |
+----+---------+---------+----------+
| 1 | bank1 | savings | branch1 |
| 2 | bank2 | current | branch2 |
+----+---------+---------+----------+
CUSTOMER_DETAILS
+----+-----------+---------------+----------+-----------+
| id | accountId | customer_name | amount | BankId |
+----+-----------+---------------+----------+-----------+
| 1 | 50123 | Abc1 | 150000 | 1 |
| 2 | 50124 | Abc2 | 25000 | 1 |
| 3 | 50125 | Abc3 | 50000 | 2 |
| 4 | 50126 | Abc4 | 250000 | 2 |
+----+-----------+---------------+----------+-----------+
Expected Output for the above tables
[{
"id": "1",
"bankName": "bank1",
"account": "savings",
"branch": "branch1",
"customers": [
{
"accountId": "50123",
"Name": "Abc1",
"amount": 150000
},
{
"accountId": "50124",
"Name": "Abc2",
"amount": 25000
},
]
},{
"id": "2",
"bankName": "bank3",
"account": "current",
"branch": "branch2",
"customers": [
{
"accountId": "50125",
"name": "Abc3",
"amount": 50000
},
{
"accountId": "50126",
"Name": "Abc4",
"amount": 250000
},
]
}]
i have tried with writing this native query in QueryDSL with the below multiple queries for make the same expected output with the forEach loop.
class Repository {
private SQLQueryFactory queryFactory;
public Repository (SQLQueryFactory queryFactory){
this.queryFactory = queryFactory;
}
public void fetchBankDetails(){
List<BankDetails> bankList = queryFactory.select(QBankDetails.bankDetails)
.from(QBankDetails.bankDetails);
bankList.forEach(bankData ->{
List<CustomerDetails> customerList = queryFactory.select(QCustomerDetails.customerDetails)
.from(QCustomerDetails.customerDetails)
.where(QCustomerDetails.customerDetails.bankId.eq(bankData.bankId));
bankData.setCustomerList(customerList)
});
System.out.println(bankList);
}
}
I need to improve my code and convert it into a single query using QueryDSL to return the expected output
Is there any other way or any suggestions?

How to transform nested JSON to csv using jq

I have tried to transform json in the following format to csv using jq on Linux cmd line, but with no success. Any help of guidance would be appreciated.
{
"dir/file1.txt": [
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Rule": "Rulechecker.Rule15",
"Description": "",
"Line": 11,
"Link": "www.sample.com",
"Message": "Some message",
"Severity": "error",
"Span": [
1,
3
],
"Match": "[id"
},
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Check": "Rulechecker.Rule16",
"Description": "",
"Line": 27,
"Link": "www.sample.com",
"Message": "Fix the rule",
"Severity": "error",
"Span": [
1,
3
],
"Match": "[id"
}
],
"dir/file2.txt": [
{
"Setting": {
"SettingA": "",
"SettingB": null
},
"Rule": "Rulechecker.Rule17",
"Description": "",
"Line": 51,
"Link": "www.example.com",
"Message": "Fix anoher 'rule'?",
"Severity": "error",
"Span": [
1,
18
],
"Match": "[source,terminal]\n----\n"
}
]
}
Ultimately, I want to present a matrix with dir/file1.txt, dir/file2.txt as rows on the left of the matrix, and all the keys to be presented as column headings, with the corresponding values.
| Filename | SettingA | SettingB | Rule | More columns... |
| -------- | -------------- | -------------- | -------------- | -------------- |
| dir/file1.txt | | null | Rulechecker.Rule15 | |
| dir/file1.txt | | null | Rulechecker.Rule16 | |
| dir/file2.txt | | null | Rulechecker.Rule17 | |
Iterate over the top-level key-value pairs obtained by to_entries to get access to the key names, then once again over its content array in .value to get the array items. Also note that newlines as present in the sample's last .Match value cannot be used as is in a line-oriented format such as CSV. Here, I chose to replace them with the literal string \n using gsub.
jq -r '
to_entries[] | . as {$key} | .value[] | [$key,
(.Setting | .SettingA, .SettingB),
.Rule // .Check, .Description, .Line, .Link,
.Message, .Severity, .Span[], .Match
| strings |= gsub("\n"; "\\n")
] | #csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"
Demo
If you just want to dump all the values in the order they appear, you can simplify this by using .. | scalars to traverse the levels of the document:
jq -r '
to_entries[] | . as {$key} | .value[] | [$key,
(.. | scalars) | strings |= gsub("\n"; "\\n")
] | #csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"
Demo
As for the column headings, for the first case I'd add them manually, as you spell out each value path anyways. For the latter case it will be a little complicated as not all coulmns have immediate names (what should the items of array Span be called?), and some seem to change (in the second record, column Rule is called Check). You could, however, stick to the names of the first record, and taking the deepest field name either as is or add the array indices. Something along these lines would do:
jq -r '
to_entries[0].value[0] | ["Filename", (
path(..|scalars) | .[.[[map(strings)|last]]|last:] | join(".")
)] | #csv
'
"Filename","SettingA","SettingB","Rule","Description","Line","Link","Message","Severity","Span.0","Span.1","Match"
Demo

SQL query to return an attribute as an array of objects

My DB (MySQL) looks as follows:
TASKS:
-----------------
| id | desc |
-----------------
| 1 | 'dishes' |
| 2 | 'dust' |
-----------------
IMAGES:
---------------------------
| id | task_id | url |
---------------------------
| 1 | 1 | 'http1' |
| 2 | 1 | 'http2' |
---------------------------
I would like to get a response in the following structure (nested array of objects with id, url):
"tasks": [
{
"id": 1,
"desc": "dishes",
"images": [
{
"id": 1,
"url": "http1"
},
{
"id": 2,
"url": "http2"
}
]
},
...
]
The closest I have got was with this code:
SELECT
t.id,
t.desc,
JSON_ARRAYAGG(i.url) AS images,
FROM tasks AS t
LEFT JOIN images AS i ON t.id=i.task_id
GROUP BY t.id
And got in return:
[
{
"id": 1,
"desc": "dishes",
"images": [
"http1",
"http2"
]
}
...
]
Above response is problematic as I need the image_ids.
I have also tried using JSON_OBJECTAGG (which is not ideal) however I had below SQL error:
"JSON documents may not contain NULL member names."
Indeed some tasks may not have images matching and I want to have them included in the response.
How should I refactor my code to get the desired response from the server?

MySQL nested JSON column search and extract sub JSON

I have a MySQL table authors with columns id, name and published_books. In this, published_books is a JSON column. With sample data,
id | name | published_books
-----------------------------------------------------------------------
1 | Tina | {
| | "17e9bf8f": {
| | "name": "Book 1",
| | "tags": [
| | "self Help",
| | "Social"
| | ],
| | "language": "English",
| | "release_date": "2017-05-01"
| | },
| | "8e8b2470": {
| | "name": "Book 2",
| | "tags": [
| | "Inspirational"
| | ],
| | "language": "English",
| | "release_date": "2017-05-01"
| | }
| | }
-----------------------------------------------------------------------
2 | John | {
| | "8e8b2470": {
| | "name": "Book 4",
| | "tags": [
| | "Social"
| | ],
| | "language": "Tamil",
| | "release_date": "2017-05-01"
| | }
| | }
-----------------------------------------------------------------------
3 | Keith | {
| | "17e9bf8f": {
| | "name": "Book 5",
| | "tags": [
| | "Comedy"
| | ],
| | "language": "French",
| | "release_date": "2017-05-01"
| | },
| | "8e8b2470": {
| | "name": "Book 6",
| | "tags": [
| | "Social",
| | "Life"
| | ],
| | "language": "English",
| | "release_date": "2017-05-01"
| | }
| | }
-----------------------------------------------------------------------
As you see, the published_books column has nested JSON data (one level). JSON will have dynamic UUIDs as the keys and its values will be book details as a JSON.
I want to search for books with certain conditions and extract those books JSON data alone to return as the result.
The query that I've written,
select JSON_EXTRACT(published_books, '$.*') from authors
where JSON_CONTAINS(published_books->'$.*.language', '"English"')
and JSON_CONTAINS(published_books->'$.*.tags', '["Social"]');
This query performs the search and returns the entire published_books JSON. But I wanted just those books JSON alone.
The expected result,
result
--------
"17e9bf8f": {
"name": "Book 1",
"tags": [
"self Help",
"Social"
],
"language": "English",
"release_date": "2017-05-01"
}
-----------
"8e8b2470": {
"name": "Book 6",
"tags": [
"Social",
"Life"
],
"language": "English",
"release_date": "2017-05-01"
}
There is no JSON function yet that filters elements of a document or array with "WHERE"-like logic.
But this is a task that some people using JSON data may want to do, so the solution MySQL has provided is to use the JSON_TABLE() function to transform the JSON document into a format as if you had stored your data in a normal table. Then you can use a standard SQL WHERE clause to the fields returned.
You can't use this function in MySQL 5.7, but if you upgrade to MySQL 8.0 you can do this.
select authors.id, authors.name, books.* from authors,
json_table(published_books, '$.*'
columns(
bookid for ordinality,
name text path '$.name',
tags json path '$.tags',
language text path '$.language',
release_date date path '$.release_date')
) as books
where books.language = 'English'
and json_search(tags, 'one', 'Social') is not null;
+----+-------+--------+--------+-------------------------+----------+--------------+
| id | name | bookid | name | tags | language | release_date |
+----+-------+--------+--------+-------------------------+----------+--------------+
| 1 | Tina | 1 | Book 1 | ["self Help", "Social"] | English | 2017-05-01 |
| 3 | Keith | 2 | Book 6 | ["Social", "Life"] | English | 2017-05-01 |
+----+-------+--------+--------+-------------------------+----------+--------------+
Note that nested JSON arrays are still difficult to work with, even with JSON_TABLE(). In this example, I exposed the tags as a JSON array, and then use JSON_SEARCH() to find the tag you wanted.
I agree with Rick James — you might as well store the data in normalized tables and columns. You think that using JSON will save you some work, but it's won't. It might make it more convenient to store the data as a single JSON document instead of multiple rows across several tables, but you just have to unravel the JSON again before you can query it the way you want.
Furthermore, if you store data in JSON, you will have to solve this sort of JSON_TABLE() expression every time you want to query the data. That's going to make a lot more work for you on an ongoing basis than if you had stored the data normally.
Frankly, I have yet to see a question on Stack Overflow about using JSON with MySQL that wouldn't lead to the conclusion that storing data in relational tables is a better idea than using JSON, if the structure of the data doesn't need to vary.
You are approaching the task backwards.
Do the extraction as you insert the data. Insert into a small number of tables (Authors, Books, Tags, and maybe a couple more) and build relations between them. No JSON is needed in this database.
The result is an easy-to-query and fast database. However, it requires learning about RDBMS and SQL.
JSON is useful when the data is a collection of random stuff. Your JSON is very regular, hence the data fits very nicely into RDBMS technology. In that case, JSON is merely a standard way to serialize the data. But it should not be used for querying.

hive json data parsing

My JSON Data is something like this in the table json_table and column: json_col
{
"href": "example.com",
"Hosts": {
"cluster_name": "test",
"host_name": "test.iabc.com"
},
"metrics": {
"cpu": {
"cpu_user": [
[
0.7,
1499795941
],
[
0.3,
1499795951
]
]
}
}
}
I want to get this into a table json_data in the below format
+-------------+-------+------------+
| metric_type | value | timestamp |
+-------------+-------+------------+
| cpu_user | 0.7 | 1499795941 |
+-------------+-------+------------+
| cpu_user | 0.3 | 1499795951 |
+-------------+-------+------------+
I tried getting the values using get_json_object
select get_json_object(json_col,'$.metrics.cpu.cpu_user[1]') from json_table
,this gives me
[0.3,1499795951]
How do I use the explode function from here to get the desired output?
select 'cpu_user' as metric_type
,val_ts[0] as val
,val_ts[1] as ts
from (select split(m.col,',') as val_ts
from json_table j
lateral view explode(split(regexp_replace(get_json_object(json_col,'$.metrics.cpu.cpu_user[*]'),'^\\[\\[|\\]\\]$',''),'\\],\\[')) m
) m
;
+-------------+-----+------------+
| metric_type | val | ts |
+-------------+-----+------------+
| cpu_user | 0.7 | 1499795941 |
| cpu_user | 0.3 | 1499795951 |
+-------------+-----+------------+
You can also implement SerDe and InputFormat interface based on JSON data, instead of using UDF.
here are some referance:
http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/
https://github.com/xjtuzxh/inceptor-inputformat