How to remove brackets from Jinja list variable (DBT) - jinja2

I have a list variable created like this: {% set options = ["a", "b", "c"] %}
I want to use it in a SQL CTE like this:
df as (
select *
from my_table
pivot (sum(value) for option in ({{options}}))
)
When the SQL is compiled by DBT, the result is:
df as (
select *
from my_table
pivot (sum(value) for option in (["a", "b", "c"]))
)
And that won't work because of the brackets. So how can I use the list variable without including the brackets?

#larsks is right that you want to create a string from your list, not change the way the list is displayed.
The easiest way to do this is to use the join filter in jinja:
{% set options = ["a", "b", "c"] %}
{{ options | join(", ")}}
-- compiles to:
a, b, c
That works fine for numbers, but to get string literals into your sql query, you'll need to quote the values in your list. I prefer to do this by adding nested quotes in the list itself:
{% set options = ["'a'", "'b'", "'c'"] %}
{{ options | join(", ")}}
-- compiles to:
'a', 'b', 'c'
But you can also put the extra quotes inside the argument to join, and concatenate an extra quote to the beginning and end of your string:
{% set options = ["a", "b", "c"] %}
{{ "'" ~ options | join("', '") ~ "'"}}
-- compiles to:
'a', 'b', 'c'
Or you can wrap you jinja expression in a single quote to achieve the same thing, but I think this is hard to read::
{% set options = ["a", "b", "c"] %}
'{{ options | join("', '") }}'
-- compiles to:
'a', 'b', 'c'

You're asking for the string representation of a list variable; that's always going to include the brackets. If you want something different you need to take a more active role in formatting the data.
The general recommendation for this sort of thing is to format the string in your code before rendering the template. E.g:
import jinja2
options = ["a", "b", "c"]
template = jinja2.Template('''
df as (
select *
from my_table
pivot (sum(value) for option in ({{options}}))
)
''')
print(template.render(options=', '.join(f'"{x}"' for x in options)))
If you can't do that, here's on option:
import jinja2
template = jinja2.Template('''
{%- set options = ["a", "b", "c"] -%}
{%- set comma = joiner(", ") -%}
df as (
select *
from my_table
pivot (sum(value) for option in ({% for option in options %}{{ comma() }}"{{ option }}"{% endfor %})
)
''')
print(template.render())
Which outputs:
df as (
select *
from my_table
pivot (sum(value) for option in ("a", "b", "c")
)

As a workaround I ended up changing my Jinja variable to a tuple like this:
{% set options = ("a", "b", "c") %}
I will select one of the other answers as the best answer.

Related

Snowflake "Pivot" for dynamic columns with dbt macro

Starting context:
There is a dbt_utils "pivot" function. This question is not concerned with that function.
There is some discussion about the limitations of Snowflake's built-in PIVOT, namely the inability to use dynamic columns and/or values for this function.
example_model.sql
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{{ dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') }}
) )
)
select * from pivot
dbt resolves that handily except for one hickup. When the above is compiled (code block below), it generates the values as a python list [...] when snowflake is expecting more like a tuple (...)
with pivot as (
select *
from DB.SCHEMA.my_base_model
pivot( sum(VALUE) for KEY in ( ['value_1', 'value_2', 'value_3', 'value_4', 'value_5', 'value_6', 'value_7'] ) )
)
select * from pivot
I was looking into using something like as_native to cast the resulting list to a tuple but have been unsuccessful so far.
Error within the dbt run:
001003 (42000): SQL compilation error:
syntax error line 5 at position 39 unexpected '['.
syntax error line 5 at position 961 unexpected ']'.
compiled SQL at target\run\dbtproject\models\staging\my_application
\my_base_model.sql
Perhaps not the best answer, but a working answer is:
pivot_model.sql
{% set pivot_cols = dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') %}
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{% for pivot_col in pivot_cols %}
'{{pivot_col}}'{% if not loop.last %}, {% endif%}
{% endfor %}
))
)
select * from pivot
** UPDATE 2 **:
As tuple is not supported as a filter in Jinja, so we can use join instead:
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{{ dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') | join(",") }}
)
)
select * from pivot
** UPDATE 1 **:
In case you'd like to do it with dbt, let's try this:
Convert dbt_utils.get_column_values's result to tuple
with pivot as (
select *
from {{ ref('my_base_model') }}
pivot( sum(VALUE) for KEY in (
{{ dbt_utils.get_column_values(table=ref('my_base_model'), column='KEY') | tuple }}
) )
)
select * from pivot
You can always try to create a custom dbt macro which is copy of dbt_utils.get_column_values in your local project and to use it:
Origin get_column_values
Custom:
...
{%- set values = value_list['data'] | map(attribute=0) | tuple %}
...
** ORIGIN **
Please see carefully the pivot doc.
SELECT ...
FROM ...
PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
[ ... ]
It doesn't have [ ] which is incorect in your query
So the fix should be:
with pivot as (
select *
from DB.SCHEMA.my_base_model
pivot( sum(VALUE) for KEY in ( 'value_1', 'value_2', 'value_3', 'value_4', 'value_5', 'value_6', 'value_7' ) )
)
select * from pivot

Add a property to every object in a json array in sql sever

I need to add "Description" property with "" value to all items in my json array.
I have tried :
JSON_MODIFY(ReasonCodes, '$[0].Description', '')
and getting result as:
[
{"Name":"jhfghgh","Code":"89798","Note":"dfgbcbxcbx","Description":""},
{"Name":"test7889","Code":"9787","Note":""}
]
basically i want that properties should be also in 2nd or any number of array as well of that json object.
The function JSON_MODIFY() doesn't support wild cards for value of the path parameter, so if the input JSON has a variable structure, you may try to parse the ReasonCodes JSON array with OPENJSON() and default schema, modify each item and aggregate the rows to build the final ouptut:
Table:
CREATE TABLE PD (ReasonCodes varchar(1000))
INSERT INTO PD (ReasonCodes)
VALUES ('[
{"Name":"test1","Code":"0001","Note":"dfgbcbxcbx","Description":null},
{"Name":"test2","Code":"0002","Note":"dfgbcbxcbx","Description":"ABCD"},
{"Name":"test3","Code":"0003","Note":""}
]')
Statement:
UPDATE PD
SET ReasonCodes = CONCAT(
'[',
(
SELECT STRING_AGG(JSON_MODIFY([value], '$.Description', ''), ',')
FROM OPENJSON(ReasonCodes)
),
']'
)
If you need to change the $.Description key, but only when the keys exists, you need a different statement:
UPDATE PD
SET ReasonCodes = CONCAT(
'[',
(
SELECT STRING_AGG(
CASE
WHEN j2.DescriptionCount > 0 THEN JSON_MODIFY(j1.[value], '$.Description', '')
ELSE JSON_QUERY(j1.[value])
END,
','
)
FROM OPENJSON(ReasonCodes) j1
OUTER APPLY (
SELECT COUNT(*)
FROM OPENJSON(j1.[value])
WHERE [key] = 'Description'
) j2 (DescriptionCount)
),
']'
)

Mysql - How can I merge two json arrays of strings without duplicates?

If I have two json arrays of strings in mysql, is there a native(or not native) way to merge these two arrays into one with unique strings?
If I try json_merge I get the following result with duplicates:
set #array1 =JSON_EXTRACT('["apple","pear","banana"]', '$');
set #array2 =JSON_EXTRACT('["pear","banana","apple","kiwi"]', '$');
select json_merge(#array1,#array2);
> ["apple", "pear", "banana", "pear", "banana", "apple", "kiwi"]
And If is try json_merge_preserve gives me the same result:
set #array1 =JSON_EXTRACT('["apple","pear","banana"]', '$');
set #array2 =JSON_EXTRACT('["pear","banana","apple","kiwi"]', '$');
select json_merge_preserve(#array1,#array2);
> ["apple", "pear", "banana", "pear", "banana", "apple", "kiwi"]
Is there a function that will return the unique array?
["apple", "banana", "pear", "kiwi"]
Edit: json_merge_patch doesn't work because it only replaces the first array with the second:
set #array1 =JSON_EXTRACT('["apple","grape","banana"]', '$');
set #array2 =JSON_EXTRACT('["pear","banana","apple","kiwi"]', '$');
select json_merge_patch(#array1,#array2);
> ["pear", "banana", "apple", "kiwi"]
In this case I lose "grape". I believe that the logic in patch is 0 : 'val', 1:'val2' merge with 0:val3 then 0 : 'val3', 1:'val2'
If the question still lives, here's a simple solution using MySQL 8.0's JSON_TABLE.
set #a1 ='["apple","grape","banana","banana","pear"]';
set #a2 ='["pear","banana","apple","kiwi","banana","apple"]';
select fruit
from json_table(
json_merge_preserve(#a1, #a2),
'$[*]' columns (
fruit varchar(255) path '$'
)
) as fruits
group by fruit; # get distinct values
# gives
apple
grape
banana
pear
kiwi
To get a one-line response, we have to drop group by and get a bit more creative.
Unfortunately, JSON_ARRAYAGG doesn't support distinct directive, so we'll have to use GROUP_CONCAT:
select group_concat(distinct fruit)
from json_table(
json_merge_preserve(#a1, #a2),
'$[*]' columns (
fruit varchar(255) path '$'
)
) as fruits;
# without group by directive!
# gives: apple,banana,grape,kiwi,peas
To get a proper json array on-line response, we just play around with CONCATs:
select cast(
concat('["', group_concat(distinct fruit separator '", "'), '"]')
as json)
...
# gives: ["apple", "banana", "grape", "kiwi", "pear"]
EDIT:
I've found out a proper JSON_ARRAYAGG solution using one more nested virtual table to group results in.
select json_arrayagg(fruit)
from (
select fruit
from json_table(
json_merge_preserve(#a1, #a2),
'$[*]' columns (
fruit varchar(255) path '$'
)
) as fruits
group by fruit -- group here!
) as unique_fruits;
Read my Best Practices for using MySQL as JSON storage :)
After too much thinking, and thanks to #lefred. I found a hack that can accomplish this.
This is way too hacky, but i will publish it while someone else comes with a better implementation or the mysql guys make a proper function for this.
First, we replace the string strategically to create a json object instead of an array.
Then, we use json_merge_path and finally we use json_keys to obtain an array :V
set #array1 ='["apple","grape","banana","banana","pear"]';
set #array2 ='["pear","banana","apple","kiwi","banana","apple"]';
set #aux1 = REPLACE(REPLACE(REPLACE(#array1, ',', ' : "1", '), ']', ' : "1" }'), '[', '{');
set #aux2 = REPLACE(REPLACE(REPLACE(#array2, ',', ' : "1", '), ']', ' : "1" }'), '[', '{');
select #aux1, #aux2;
select json_keys(json_merge_patch(json_extract(#aux1, '$'),json_extract(#aux2,'$')))
> ["kiwi", "pear", "apple", "grape", "banana"]
Use SELECT DISTINCT:
set #array1 =JSON_EXTRACT('["apple","grape","banana"]', '$');
set #array2 =JSON_EXTRACT('["pear","banana","apple","kiwi"]', '$');
select json_arrayagg(fruit) from (
select
distinct fruit
from json_table(
json_merge_preserve(#array1, #array2),
'$[*]' columns (fruit varchar(255) path '$')
) fruits
) f;
According to the documentation, json_merge_preserve preserves duplicates. Also, if you are using MySQL 8.0.3 or over, json_merge is deprecated and json_merge_preserve should be used. I think that you need to use JSON_MERGE_PATCH.
More details here https://database.guide/json_merge_preserve-merge-multiple-json-documents-in-mysql/

Why MySQL JSON_EXTRACT do not accept the return value of JSON_SEARCH?

For example:
SET #key = '["a","b"]';
SELECT JSON_SEARCH(#key, 'one', 'b');
...will return the path:
"$[1]"
Insert this as the path in JSON_EXTRACT like:
SET #value = '["1","2"]';
SELECT JSON_EXTRACT(#value, "$[1]");
...this will return the value:
"2"
But if I write following:
SET #key = '["a","b"]';
SET #value = '["1","2"]';
SET #path = (SELECT JSON_SEARCH(#key, 'one', 'b'));
SELECT JSON_EXTRACT(#value, #path);
...this will drop an error:
SQL Fehler (3143): Invalid JSON path expression. The error is around character position 1 in '"$[1]"'.
Trimming the double quotes works, but I don't like this solution:
SELECT JSON_EXTRACT(#value, TRIM(BOTH '"' FROM #path));
Is there an other way or am I missing something?
JSON_PATH returns a JSON object (a JSON string) that needs to be unquoted when used as string:
SELECT JSON_EXTRACT(#value, JSON_UNQUOTE(#path));
"2"

SSIS Substring Extract based on qualifier

I've looked through a few different post trying to find a solution for this. I have a column that contains descriptions that follow the following format:
String<Numeric>
However the column isn't limited to one set of the previous mentioned format it could be something like
UNI<01> JPG<84>
JPG<84> UNI<01>
JPG<84>
UNI<01>
And other variations without any controlled pattern.
What I am needing to do is extract the number between <> into a separate column in another table based on the string before the <>. So UNI would qualify the following numeric to go to a certain table.column, while JPG would qualify to another table etc. I have seen functions to extract the numeric but not qualifying and only pulling the numeric if it is prefaced with a given qualifier string.
Based on the scope limitation mentioned in the question's comments that only one type of token (Foo, Bar, Blat, etc.) needs to be found at a time: you could use an expression in a Derived Column to find the token of interest and then extract the value between the arrows.
For example:
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1) == 0)?
NULL(DT_WSTR, 1) :
SUBSTRING([InputColumn],
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1)
+ LEN(#[User::SearchToken]) + 1,
FINDSTRING(
SUBSTRING([InputColumn],
FINDSTRING([InputColumn], #[User::SearchToken] + "<", 1)
+ LEN(#[User::SearchToken]) + 1,
LEN([InputColumn])
), ">", 1) - 1
)
First, the expression checks whether the token specified in #[User::SearchToken] is used in the current row. If it is, SUBSTRING is used to output the value between the arrows. If not, NULL is returned.
The assumption is made that no token's name will end with text matching the name of another token. Searching for token Bar will match Bar<123> and FooBar<123>. Accommodating Bar and FooBar as distinct tokens is possible but the requisite expression will be much more complex.
You could use an asynchronous Script Component that outputs a row with type and value columns for each type<value> token contained in the input string. Pass the output of this component through a Conditional Split to direct each type to the correct destination (e.g. table).
Pro: This approach gives you the option of using one data flow to process all tag types simultaneously vs. requiring one data flow per tag type.
Con: A Script Component is involved, which it sounds like you'd prefer to avoid.
Sample Script Component Code
private readonly string pattern = #"(?<type>\w+)<(?<value>\d+)>";
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
foreach (Match match in Regex.Matches(Row.Data, pattern, RegexOptions.ExplicitCapture))
{
Output0Buffer.AddRow();
Output0Buffer.Type = match.Groups["type"].Value;
Output0Buffer.Value = match.Groups["value"].Value;
}
}
Note: Script Component will need an output created with two columns (perhaps named Type and Value) and then have the output's SynchronousInputID property set to None).
I ended up writing a CTE for a view to handle the data manipulation and then handled the joins and other data pieces in the SSIS package.
;WITH RCTE (Status_Code, lft, rgt, idx)
AS ( SELECT a.Status_code
,LEFT(a.Description, CASE WHEN CHARINDEX(' ', a.Description)=0 THEN LEN(a.Description) ELSE CHARINDEX(' ', a.Description)-1 END)
,SUBSTRING(a.Description, CASE WHEN CHARINDEX(' ', a.Description)=0 THEN LEN(a.Description) ELSE CHARINDEX(' ', a.Description)-1 END + 1, DATALENGTH(a.Description))
,0
FROM [disp] a WHERE NOT( Description IS NULL OR Description ='')
UNION ALL
SELECT r.Status_Code
,CASE WHEN CHARINDEX(' ', r.rgt) = 0 THEN r.rgt ELSE LEFT(r.rgt, CHARINDEX(' ', r.rgt) - 1) END
,CASE WHEN CHARINDEX(' ', r.rgt) > 0 THEN SUBSTRING(r.rgt, CHARINDEX(' ', r.rgt) + 1, DATALENGTH(r.rgt)) ELSE '' END
,idx + 1
FROM RCTE r
WHERE DATALENGTH(r.rgt) > 0
)
SELECT Status_Code
-- ,lft,rgt -- Uncomment to see whats going on
,SUBSTRING(lft,0, CHARINDEX('<',lft)) AS [Description]
,CASE WHEN ISNUMERIC(SUBSTRING(lft, CHARINDEX('<',lft)+1, LEN(lft)-CHARINDEX('<',lft)-1)) >0
THEN CAST (SUBSTRING(lft, CHARINDEX('<',lft)+1, LEN(lft)-CHARINDEX('<',lft)-1) AS INT) ELSE NULL END as Value
FROM RCTE
where lft <> ''