Snowflake "Pivot" for dynamic columns with dbt macro - jinja2

Starting context:
There is a dbt_utils "pivot" function. This question is not concerned with that function.
There is some discussion about the limitations of Snowflake's built-in PIVOT, namely the inability to use dynamic columns and/or values for this function.
example_model.sql
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{{ dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') }}
) )
)
select * from pivot
dbt resolves that handily except for one hickup. When the above is compiled (code block below), it generates the values as a python list [...] when snowflake is expecting more like a tuple (...)
with pivot as (
select *
from DB.SCHEMA.my_base_model
pivot( sum(VALUE) for KEY in ( ['value_1', 'value_2', 'value_3', 'value_4', 'value_5', 'value_6', 'value_7'] ) )
)
select * from pivot
I was looking into using something like as_native to cast the resulting list to a tuple but have been unsuccessful so far.
Error within the dbt run:
001003 (42000): SQL compilation error:
syntax error line 5 at position 39 unexpected '['.
syntax error line 5 at position 961 unexpected ']'.
compiled SQL at target\run\dbtproject\models\staging\my_application
\my_base_model.sql

Perhaps not the best answer, but a working answer is:
pivot_model.sql
{% set pivot_cols = dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') %}
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{% for pivot_col in pivot_cols %}
'{{pivot_col}}'{% if not loop.last %}, {% endif%}
{% endfor %}
))
)
select * from pivot

** UPDATE 2 **:
As tuple is not supported as a filter in Jinja, so we can use join instead:
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{{ dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') | join(",") }}
)
)
select * from pivot
** UPDATE 1 **:
In case you'd like to do it with dbt, let's try this:
Convert dbt_utils.get_column_values's result to tuple
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{{ dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') | tuple }}
) )
)
select * from pivot
You can always try to create a custom dbt macro which is copy of dbt_utils.get_column_values in your local project and to use it:
Origin get_column_values
Custom:
...
{%- set values = value_list['data'] | map(attribute=0) | tuple %}
...
** ORIGIN **
Please see carefully the pivot doc.
SELECT ...
FROM ...
PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
[ ... ]
It doesn't have [ ] which is incorect in your query
So the fix should be:
with pivot as (
select *
from DB.SCHEMA.my_base_model
pivot( sum(VALUE) for KEY in ( 'value_1', 'value_2', 'value_3', 'value_4', 'value_5', 'value_6', 'value_7' ) )
)
select * from pivot

Related

How to remove brackets from Jinja list variable (DBT)

I have a list variable created like this: {% set options = ["a", "b", "c"] %}
I want to use it in a SQL CTE like this:
df as (
select *
from my_table
pivot (sum(value) for option in ({{options}}))
)
When the SQL is compiled by DBT, the result is:
df as (
select *
from my_table
pivot (sum(value) for option in (["a", "b", "c"]))
)
And that won't work because of the brackets. So how can I use the list variable without including the brackets?
#larsks is right that you want to create a string from your list, not change the way the list is displayed.
The easiest way to do this is to use the join filter in jinja:
{% set options = ["a", "b", "c"] %}
{{ options | join(", ")}}
-- compiles to:
a, b, c
That works fine for numbers, but to get string literals into your sql query, you'll need to quote the values in your list. I prefer to do this by adding nested quotes in the list itself:
{% set options = ["'a'", "'b'", "'c'"] %}
{{ options | join(", ")}}
-- compiles to:
'a', 'b', 'c'
But you can also put the extra quotes inside the argument to join, and concatenate an extra quote to the beginning and end of your string:
{% set options = ["a", "b", "c"] %}
{{ "'" ~ options | join("', '") ~ "'"}}
-- compiles to:
'a', 'b', 'c'
Or you can wrap you jinja expression in a single quote to achieve the same thing, but I think this is hard to read::
{% set options = ["a", "b", "c"] %}
'{{ options | join("', '") }}'
-- compiles to:
'a', 'b', 'c'
You're asking for the string representation of a list variable; that's always going to include the brackets. If you want something different you need to take a more active role in formatting the data.
The general recommendation for this sort of thing is to format the string in your code before rendering the template. E.g:
import jinja2
options = ["a", "b", "c"]
template = jinja2.Template('''
df as (
select *
from my_table
pivot (sum(value) for option in ({{options}}))
)
''')
print(template.render(options=', '.join(f'"{x}"' for x in options)))
If you can't do that, here's on option:
import jinja2
template = jinja2.Template('''
{%- set options = ["a", "b", "c"] -%}
{%- set comma = joiner(", ") -%}
df as (
select *
from my_table
pivot (sum(value) for option in ({% for option in options %}{{ comma() }}"{{ option }}"{% endfor %})
)
''')
print(template.render())
Which outputs:
df as (
select *
from my_table
pivot (sum(value) for option in ("a", "b", "c")
)
As a workaround I ended up changing my Jinja variable to a tuple like this:
{% set options = ("a", "b", "c") %}
I will select one of the other answers as the best answer.

Escaped for JSON nested nodes using union command

In a stored procedure I have a for json node (boxes):
select
(
select
os.Name,
os.Address,
ss.carrierCode,
(
select
ob.envelopeCode,
ob.boxNumber,
ob.weight,
ob.width,
ob.length,
ob.height
from OrdersBoxes ob
...
where os.OID=ob.OID
...
for json path
) boxes,
....
for json path
) orderDetails
In this way I correctly get:
"boxes":[{
"envelopeCode":"E70345D2AB90A879D4F53506FB465086",
"boxNumber":1,
"weight":3000,
"width":300,
"length":300,
"height":100
}]
Now I need to get details from 2 tables, therefore I will use union command, wrap the 2 select in another select the query to avoid following error:
The FOR XML and FOR JSON clauses are invalid in views, inline functions, derived tables, and subqueries when they contain a set operator. To work around, wrap the SELECT containing a set operator using derived table or common table expression or view and apply FOR XML or FOR JSON on top of it.
And add JSON_QUERY to avoid to get escaped nested node:
select
(
select
*
from
(
select
os.Name,
os.Address,
ss.carrierCode,
JSON_QUERY((
select
ob.envelopeCode,
ob.boxNumber,
ob.weight,
ob.width,
ob.length,
ob.height
from OrdersBoxes ob
...
where os.OID=ob.OID
...
for json path
)) boxes,
....
from table1
where....
union
select
os.Name,
os.Address,
ss.carrierCode,
JSON_QUERY((
select
ob.envelopeCode,
ob.boxNumber,
ob.weight,
ob.width,
ob.length,
ob.height
from OrdersBoxes ob
...
where os.OID=ob.OID
...
for json path
)) boxes,
....
from table2
where....
) jj
for json path
) orderDetails
That works, but boxes node is returned escaped:
"boxes":"[{\"envelopeCode\":\"E70345D2AB90A879D4F53506FB465086\",\"boxNumber\":1,\"weight\":3000,\"width\":300,\"length\":300,\"height\":100}]"
I tried also this Solution but it works well only if returning data from 1 table:
since it returns objects {} to get an array need to change first line from
select STRING_AGG (order_details,',') ods from (
to
select concat('[',STRING_AGG (order_details,','),']') ods from (
and it seems me not very "elegant" although it works.
Can someone suggest a better way to get all data correctly formatted (thus unescaped boxes node)?
The documentation about JSON_QUERY() explains: ... JSON_QUERY returns a valid JSON fragment. As a result, FOR JSON doesn't escape special characters in the JSON_QUERY return value. If you're returning results with FOR JSON, and you're including data that's already in JSON format (in a column or as the result of an expression), wrap the JSON data with JSON_QUERY without the path parameter.. So, if I understand the schema correctly, you need to use JSON_QUERY() differently:
Tables:
SELECT *
INTO table1
FROM (VALUES
(1, 'Name1', 'Address1')
) v (oid, name, address)
SELECT *
INTO table2
FROM (VALUES
(2, 'Name2', 'Address2')
) v (oid, name, address)
SELECT *
INTO OrdersBoxes
FROM (VALUES
(1, 'E70345D2AB90A879D4F53506FB465086', 1, 3000, 300, 300, 100),
(2, 'e70345D2AB90A879D4F53506FB465086', 2, 3000, 300, 300, 100)
) v (oid, envelopeCode, boxNumber, weight, width, length, height)
Statement:
select Name, Address, JSON_QUERY(boxes) AS Boxes
from (
select
os.Name,
os.Address,
(
select ob.envelopeCode, ob.boxNumber, ob.weight, ob.width, ob.length, ob.height
from OrdersBoxes ob
where os.OID = ob.OID
for json path
) boxes
from table1 os
union all
select
os.Name,
os.Address,
(
select ob.envelopeCode, ob.boxNumber, ob.weight, ob.width, ob.length, ob.height
from OrdersBoxes ob
where os.OID = ob.OID
for json path
) boxes
from table2 os
) j
for json path
As an additional option, you may try to use FOR JSON AUTO (the format of the JSON output is automatically determined based on the order of columns in the SELECT list and their source tables):
SELECT
cte.Name, cte.Address,
boxes.envelopeCode, boxes.boxNumber, boxes.weight, boxes.width, boxes.length, boxes.height
FROM (
SELECT oid, name, address FROM table1
UNION ALL
SELECT oid, name, address FROM table2
) cte
JOIN OrdersBoxes boxes ON cte.oid = boxes.oid
FOR JSON AUTO
Result:
[
{
"Name":"Name1",
"Address":"Address1",
"boxes":[{"envelopeCode":"E70345D2AB90A879D4F53506FB465086","boxNumber":1,"weight":3000,"width":300,"length":300,"height":100}]
},
{
"Name":"Name2",
"Address":"Address2",
"boxes":[{"envelopeCode":"e70345D2AB90A879D4F53506FB465086","boxNumber":2,"weight":3000,"width":300,"length":300,"height":100}]
}
]

Is there an elegant way to turn a BQ nested field in to a key:value JSON?

I would love the option to turn the event_params nested BQ field into a JSON field?
My desired output should look like this:
{"sessionId":123456789,"version":"1.005"}
Consider below
select *, (
select '{' || string_agg(format('%s:%s',
json_extract(kv, '$.key'),
json_extract(kv, '$.string_value')
)) || '}'
from unnest(json_extract_array(to_json_string(event_params))) kv
) json
from `project.dataset.table`
if applied to sample data in your question - output is
Update: I realized you changed/fixed data sample - so see updated query below
select *, (
select '{' || string_agg(format('%s:%s',
json_extract(kv, '$.key'),
json_extract(kv, '$.value.string_value')
)) || '}'
from unnest(json_extract_array(to_json_string(event_params))) kv
) json
from `project.dataset.table`
with output
I made a version where you can define number fields in the JSON object with proper format, and you can filter for certain keys to end up in the JSON object:
with t as (
-- fake example data with same format
select * from unnest([
struct([
struct('session_id' as key, struct('123' as string_value) as value),
('timestamp', struct('1234567')),
('version', struct('2.23.65'))
] as event_params)
,struct([struct('session_id',struct('645')),('timestamp',struct('7653365')),('version',struct('3.675.34'))])
])
)
-- actual query
select
event_params, -- original data for comparison
format('{ %s }', -- for each row create one json object:
(select -- string_agg will return one string with all key-value pairs comma-separated
string_agg( -- within aggregation create key-value pairs
if(key in ('timestamp','session_id'), -- if number fields
format('"%s" : %s',key,value.string_value), -- then number format
format('"%s" : "%s"',key,value.string_value)) -- else string format
, ', ')
from unnest(event_params) -- unnest turns array into a little table per row, so we can run SQL on it
where key in ('session_id','version') -- filter for certain keys
) -- subquery end
) as json
from t

Add a property to every object in a json array in sql sever

I need to add "Description" property with "" value to all items in my json array.
I have tried :
JSON_MODIFY(ReasonCodes, '$[0].Description', '')
and getting result as:
[
{"Name":"jhfghgh","Code":"89798","Note":"dfgbcbxcbx","Description":""},
{"Name":"test7889","Code":"9787","Note":""}
]
basically i want that properties should be also in 2nd or any number of array as well of that json object.
The function JSON_MODIFY() doesn't support wild cards for value of the path parameter, so if the input JSON has a variable structure, you may try to parse the ReasonCodes JSON array with OPENJSON() and default schema, modify each item and aggregate the rows to build the final ouptut:
Table:
CREATE TABLE PD (ReasonCodes varchar(1000))
INSERT INTO PD (ReasonCodes)
VALUES ('[
{"Name":"test1","Code":"0001","Note":"dfgbcbxcbx","Description":null},
{"Name":"test2","Code":"0002","Note":"dfgbcbxcbx","Description":"ABCD"},
{"Name":"test3","Code":"0003","Note":""}
]')
Statement:
UPDATE PD
SET ReasonCodes = CONCAT(
'[',
(
SELECT STRING_AGG(JSON_MODIFY([value], '$.Description', ''), ',')
FROM OPENJSON(ReasonCodes)
),
']'
)
If you need to change the $.Description key, but only when the keys exists, you need a different statement:
UPDATE PD
SET ReasonCodes = CONCAT(
'[',
(
SELECT STRING_AGG(
CASE
WHEN j2.DescriptionCount > 0 THEN JSON_MODIFY(j1.[value], '$.Description', '')
ELSE JSON_QUERY(j1.[value])
END,
','
)
FROM OPENJSON(ReasonCodes) j1
OUTER APPLY (
SELECT COUNT(*)
FROM OPENJSON(j1.[value])
WHERE [key] = 'Description'
) j2 (DescriptionCount)
),
']'
)

Extract feelds as key value from a json object in mariadb

Hello I want to extract the different field values of a json object as key value pairs, but I'm not able to do that.
I tried this
SELECT JSON_EXTRACT(chapters, '$[*].Id', '$[*].Name') AS rec
FROM `Novels`
WHERE 1
but it result looks like this
["1","first Name","2","second name"]
any idea on how to convert it to something like this
{"1":"first Name","2":"second name"}
Thanks in advance!
Depending on the result, the concerned value of the chapters column should be
'[ {"Id":"1","Name":"first name"}, {"Id":"2","Name":"second name"} ]'
JSON_EXTRACT() can be applied for each element of the array in order to determine Id values as keys part, and Name values as values part.
And then, JSON_UNQUOTE() can be applied to get rid of double-quotes while generating rows for each individual array elements. JSON_OBJECTAGG is used to aggregate all those extracted objects at the last step provided that MariaDB version is 10.5+:
WITH n AS
(
SELECT #i := #i + 1 AS rn,
JSON_UNQUOTE(JSON_EXTRACT(chapters, CONCAT('$[',#i-1,'].Id'))) AS js_id,
JSON_UNQUOTE(JSON_EXTRACT(chapters, CONCAT('$[',#i-1,'].Name'))) AS js_name
FROM information_schema.tables
CROSS JOIN ( SELECT #i := 0, chapters FROM `Novels` ) n
WHERE #i < JSON_LENGTH(JSON_EXTRACT(chapters, '$[*]'))
)
SELECT JSON_OBJECTAGG(js_id,js_name) AS Result
FROM n
A Workaround might be given for DB version prior to 10.5 as
SELECT CONCAT('{',
GROUP_CONCAT(
REPLACE(
REPLACE( JSON_OBJECT(js_id,js_name) , '}', '')
, '{', '')
)
, '}') AS Result
FROM n
Demo
One option uses json_table() to unnest the array to rows (available in MySQL 8 only) then aggregation:
select
t.*,
(
select json_objectagg('id', x.id, 'name', x.name)
from json_table(
t.chapter,
'$[*]'
columns (
id int path '$.Id',
name varchar(50) path '$.Name'
)
) as x
) as obj
from mytable t