Snowflake latteral flatten data types - json

I have a table containing an id column and a json column(variant data type). I want to flatten the data, make the value column a variant, assign each value in the value column a data type if a condition is met, then eventually pivot the data and have each column be the correct data type.
Example code that doesn't work:
with cte as (
select
1 as id,
parse_json('{
"field1":"TRUE",
"field2":"some string",
"field3":"1.035",
"field4":"097334"
}') as my_output
)
select
id,
key,
to_variant(
case
when value in ('true', 'false') then value::boolean
when value like ('1.0') then value::decimal
else value::string
end) as value
from cte, lateral flatten(my_output)
Ultimately, I'd like to pivot the data and have a wide table with columns id, field1, field2, etc. where field1 is boolean, field2 is string, field3 is a decimal etc.
This is just a simple example, instead of 4 fields, I'm dealing with hundreds.
Is this possible?
For the pivot, I'm using dbt_utils.get_column_values to get the column names dynamically. I'd really prefer a solution that doesn't involve listing out the column names, especially since there are hundreds.

Since you'd have to define each column in your PIVOT statement, anyway, it'd probably be much easier to simply select each attribute directly and cast to the correct data type, rather than using a lateral flatten.
select
my_output.field1::boolean,
my_output.field2::string,
my_output.field3::decimal(5,3),
my_output.field4::string
from cte;
Alternatively, if you want this to be dynamically created, you could create a stored procedure that dynamically uses your json to create a view over your table that has this select in it.

Solution ended up being
select
id,
key,
ifnull(try_parse_json(value), value) as value_mod,
typeof(value_mod)
from cte, lateral flatten(my_output)
Leading zeros are removed so things like zip codes have to be accounted for.

Related

Cast all columns of numeric type to text when converting a row to json

I have table with numeric/decimal columns and I am converting the rows to json
select to_jsonb(t.*) from my_table t
I need to have the numeric columns casted to text before converted to json.
The reason why I need this is JavaScript don't handle really big numbers well so I may loose a precision. I use decimal.js and the string representation is best to construct the decimal.js number from.
I know I can do this
select to_jsonb(t.*) || jsonb_build_object('numeric_column', numeric_column::text) from my_table t
But I want to have it done automatically. Is there a way to somehow cast all numeric columns to text before passing to to_jsonb function?
It can be user-defined postgres function.
EDIT: Just to clarify my question. What I need is some function similar to to_jsonb except all columns of the type numeric/decimal are stored as string in the resulting JSON.
Thanks
You can run a query like:
select row_to_json(row(t.column1,t.column2,t.column_numeric::text)) from my_table t
Result here
This solution converts all the json values into text :
SELECT jsonb_object_agg(d.key, d.value)
FROM my_table AS t
CROSS JOIN LATERAL jsonb_each_text(to_jsonb(t.*)) AS d
GROUP BY t
whereas this solution only converts json numbers into text :
SELECT jsonb_object_agg(d.key, CASE WHEN jsonb_typeof(d.value) = 'number' THEN to_jsonb(d.value :: text) ELSE d.value END)
FROM my_table AS t
CROSS JOIN LATERAL jsonb_each(to_jsonb(t.*)) AS d
GROUP BY t
test result in dbfiddle.

How to extract a value from JSON that repeats multiple times?

I have the following table:
I need to create a select that returns me something like this:
I have tried this code:
SELECT Code, json_extract_path(Registers::json,'sales', 'name')
FROM tbl_registers
The previous code returns me a NULL in json_extract_path, I have tried the operator ::json->'sales'->>'name', but doesn't work too.
You need to unnest the array, and the aggregate the names back. This can be done using json_array_elements with a scalar sub-query:
select code,
(select string_agg(e ->> 'name', ',')
from json_array_elements(t.products) as x(e)) as products
from tbl_registers t;
I would also strongly recommend to change your column's type to jsonb
step-by-step demo:db<>fiddle
SELECT
code,
string_agg( -- 3
elems ->> 'name', -- 2
','
) as products
FROM tbl_registers,
json_array_elements(products::json) as elems -- 1
GROUP BY code
If you have type text (strictly not recommended, please use appropriate data type json or jsonb), then you need to cast it into type json (I guess you have type text because you do the cast already in your example code). Afterwards you need to extract the array elements into one row per element
Fetch the name value
Reaggregate by grouping and use string_agg() to create the string list

Querying dynamic JSON fields for first non-null value in AWS Athena

I am storing event data in S3 and want to use Athena to query the data. One of the fields is a dynamic JSON field that I do not know the field names for. Therefore, I need to query the keys in the JSON and then use those keys to query for the first non-null for that field. Below is an example of the data stored in S3.
{
timestamp: 1558475434,
request_id: "83e21b28-7c12-11e9-8f9e-2a86e4085a59",
user_id: "example_user_id_1",
traits: {
this: "is",
dynamic: "json",
as: ["defined","by","the", "client"]
}
}
So, I need a query to extract the keys from the traits column (which is stored as JSON), and use those keys to get the first non-null value for each field.
The closest I could come was sampling a value using min_by, but this does not allow for me to add a where clause without returning null values. I will need to use presto's "first_value" option, but I cannot get this to work with the extracted JSON keys from the dynamic JSON field.
SELECT DISTINCT trait, min_by(json_extract(traits, concat('$.', cast(trait AS varchar))), received_at) AS value
FROM TABLE
CROSS JOIN UNNEST(regexp_extract_all(traits,'"([^"]+)"\s*:\s*("[^"]+"|[^,{}]+)', 1)) AS t(trait)
WHERE json_extract(traits, concat('$.', cast(trait AS varchar))) IS NOT NULL OR json_size(traits, concat('$.', cast(trait AS varchar))) <> 0
GROUP BY trait
It's not clear to me what you expect as result, and what you mean by "first non-null value". In your example you have both string and array values, and none of them is null. It would be helpful if you provided more examples and also expected output.
As a first step towards a solution, here's a way to filter out the null values from traits:
If you set the type of the traits column to map<string,string> you should be able to do something like this:
SELECT
request_id,
MAP_AGG(ARRAY_AGG(trait_key), ARRAY_AGG(trait_value)) AS trait
FROM (
SELECT
request_id,
trait_key,
trait_value
FROM some_table CROSS JOIN UNNEST (trait) AS t (trait_key, trait_value)
WHERE trait_value IS NOT NULL
)
However, if you want to also filter values that are arrays and pick out the first non-null value, that becomes more complex. It could probably be done with a combination of casts to JSON, the filter function, and COALESCE.

How to efficiently check a JSON array for values in T-SQL

I want to find multiple rows where a JSON array contains a specific value or values. Sometimes all match items will need to match (ANDs), sometimes only some (ORs) and sometimes a combination of both (ANDs and ORs).
This is in Microsoft SQL Server 2017.
I've tried doing an AS statement in the select but that resulted in the alias created for the subquery not being recognised later on in the subquery.
The bellow example works, it just seems innificent and has code duplication.
How would I only specify SELECT VALUE FROM OPENJSON(JsonData, '$.categories' once? Or perhaps there is some other way to do this?
DECLARE #TestTable TABLE
(
Id int,
JsonData nvarchar(4000)
);
INSERT INTO #TestTable
VALUES
(1,'{"categories":["one","two"]}'),
(2,'{"categories":["one"]}'),
(3,'{"categories":["two"]}'),
(4,'{"categories":["one","two","three"]}');
SELECT [Id]
FROM #TestTable
WHERE ISJSON(JsonData) = 1
-- These two lines are the offending parts of code
AND 'one' in (SELECT VALUE FROM OPENJSON(JsonData, '$.categories'))
AND 'two' in (SELECT VALUE FROM OPENJSON(JsonData, '$.categories'));
The table format cannot change, though I can add computed columns - if need be.
Well, I'm not sure if this helps you...
It might help to transform the nested array to a derived table to use it as a CTE. Check this out:
DECLARE #TestTable TABLE
(
Id int,
JsonData nvarchar(4000)
);
INSERT INTO #TestTable
VALUES
(1,'{"categories":["one","two"]}'),
(2,'{"categories":["one"]}'),
(3,'{"categories":["two"]}'),
(4,'{"categories":["one","two","three"]}');
--This is the query
WITH JsonAsTable AS
(
SELECT Id
,JsonData
,cat.*
FROM #TestTable tt
CROSS APPLY OPENJSON(tt.JsonData,'$.categories') cat
)
SELECT *
FROM JsonAsTable
The approach is very close to the query you formed yourself. The result is a table with one line per array entry. The forme Id is a repeated grouping key, the key is the ordinal position within the array, while the value is one of the words you are searching for.
In your query you can use JsonAsTable like you'd use any other table in this place.
But - instead of the repeated FROM OPENJSON queries - you will need repeated EXISTS() predicates...
A hacky solution might be this:
SELECT Id
,JsonData
,REPLACE(REPLACE(REPLACE(JsonData,'{"categories":[','",'),']}',',"'),'","',',')
FROM #TestTable
This will return all nested array values in one string, separated by a comma. You can query this using a LIKE pattern... You could return this as computed column though...

Call to_json on multiple columns using Postgres

Say I have the following table schema in Postgres:
CREATE TABLE users (id text, email text, phone_number text);
And I, for whatever reason, want to select the email and the phone number as JSON:
SELECT to_json(users.email, users.phone_number) AS user FROM users WHERE id=usr_123;
I get an error that looks like this:
function to_json(text, text) does not exist
No function matches the given name and argument types. You might need to add explicit type casts.
But this works just fine:
SELECT to_json(users.*) AS user FROM users WHERE id=usr_123;
How can I select just a few columns (not all of them) using a to_json call in Postgres?
The fine manual says that to_json's signature is:
to_json(anyelement)
so you're supposed say to_json(one_single_value). When you say:
to_json(users.email, users.phone_number)
you're trying to call to_json with two values and there is no such to_json function. When you say:
to_json(user.*)
you're actually calling to_json with a single ROW argument so it works just fine.
You can use a derived table or CTE as klin suggests or you can build the ROWs by hand:
select to_json(row(users.email, users.phone_number)) ...
The problem with this is that that ROW won't have any column names so your JSON will use useless keys like "f1" and "f2". To get around that you need to cast the ROW to something that does have names. One way to get some names is to create a custom type:
create type email_and_phone_number as (
email text,
phone_number text
)
and then cast your ROW to that type:
select to_json(row(users.email, users.phone_number)::email_and_phone_number) ...
You could also use a temporary table instead of a custom type:
create temporary table email_and_phone_number (
email text,
phone_number text
)
and then use the same cast as with a custom type.
If you're building this specific JSON format a lot then a custom type would make sense. If this is a one-off then a temporary table would make sense, the temporary table will automatically disappear when the session ends so there's nothing to clean up. Of course, a derived table or CTE might also make sense depending on the query and what tools you're using to interface with your database.
Use a subquery, e.g.:
select to_json(sub)
from (
select email, phone_number
from users
where id = 'usr_123'
) sub;
or with query:
with cte as (
select email, phone_number
from users
where id = 'usr_123')
select to_json(cte)
from cte;
If you just want to remove one specific field from the result there is an elegant solution:
SELECT to_json(users)::jsonb - 'id' AS user FROM users WHERE id=usr_123;