Count JSON column in Snowflake - json

I have a table called HISTORY in Snowflake that has column called RECORD with VARIANT datatype, this column contain JSON data in it, I would like to add new column for HISTORY table that counting the JSON columns ( values ) for each row of HISTORY table , pls help.

Json data starts like:
{"prizes":
[ {"year":"2018",
"category":"physics",
"laureates":[ {"id":"960","firstname":"Arthur","surname":"Ashkin"}
, { "id":"961","firstname":"G\u00e9rard","surname":"Mourou" }
]
},
...
]
}
First flatten the data to the lowest level I need (laureates), and then apply on the "year" element, which is one level above the laureates element. you can also filter on the lowest level columns if I need to.
select
count(*)
from NobelPrizeJson
, lateral flatten(INPUT=>json:prizes) prizes
, lateral flatten(INPUT=>prizes.value:laureates) laureates
where prizes.value:year::int > 2010;
This is posted at:
https://community.snowflake.com/s/question/0D50Z00008xAQSY/i-have-a-query-that-counts-the-number-of-objects-inside-a-large-json-document-and-now-i-need-to-filter-on-only-objects-with-a-specific-keyvalue-pair-inside-those-objects-how-can-i-filter

Related

Select and display named elements of an array in a JSON column

In Azure SQL I have a table, "violation", with a JSON column, "course_json" that contains an array. An example is:
[{
"course_int": "1465",
"course_key": "LEND1254",
"course_name": "Mortgage Servicing Introduction",
"test_int": "0"
}, {
"course_int": "1464",
"course_key": "LEND1211",
"course_name": "Mortgage Servicing Transfer",
"test_int": "0"
}]
I would like to select rows in the violation table and display columns of the table and the "course_key" as:
LEND12654,LEND1211
If there were always a fixed number of course_key's I could use:
select person_id,event_date, JSON_VALUE(course_json, '$[0].course_key') + ',' + JSON_VALUE(course_json, '$[1].course_key') from violation
But they aren't fixed... there may be one, two, ten... I'll never know.
So, is it possible to iterate through all the course_keys and display them all in a comma separated format?
Instead of JSON_VALUE, use OPENJSON to get all the courses and STRING_AGG to build the course_key delimited list.
SELECT
person_id
, event_date
, (SELECT STRING_AGG(course_key,',')
FROM OPENJSON(course_json)
WITH (
course_key nvarchar(MAX) '$.course_key'
)) AS course_key
FROM dbo.violation;
person_id
event_date
course_key
1
2022-12-21
LEND1254,LEND1211

Querying element inside a collection on a json field - Postgres

I have the following json structure on my Postgres. The table is named "customers" and the field that contains the json is named "data"
{
customerId: 1,
something: "..."
list: [{ nestedId: 1, attribute: "a" }, { nestedId: 2, attribute: "b" }]
}
I'm trying to query all customers that have an element inside the field "list" with nestedId = 1.
I accomplished that poorly trough the query:
SELECT data FROM customers a, jsonb_array_elements(data->'list') e WHERE (e->'nestedId')::int = 1
I said poorly because since I'm using jsonb_array_elements on the FROM clausule, it is not used as filter, resulting in a seq scan.
I tried something like:
SELECT data FROM customers where data->'list' #> '{"nestedId": 1, attribute: "a"}'::jsonb
But it does not return anything. I imagine because the "list" field is seen as an array and not as each type of my records.
Any ideas how to perform that query filtering nestedId on the WHERE condition?
Try this query:
SELECT data FROM customers where data->'list' #> '[{"nestedId": 1}]';
This query will work in Postgres 9.4+.

mySQL JSON : search array of objects where property value in list

I have a JSON column, manifest, containing an array of objects.
I need to return all table rows where any of the objects in their array have a slide_id that is present in a sub select.
The structure of the JSON field is..
{ matrix:[
{
row:1,
col:1,
slide_id:1
},
{
row:1,
col:2,
slide_id:5
}
]
}
So I want to run something like this....
SELECT id FROM presentation WHERE manifest->'$.matrix[*].slide_id' IN ( (SELECT id from slides WHERE date_deleted IS NOT NULL) );
But this doesn't work as manifest->'$.matrix[*].slide_id' returns a JSON array for each row.
I have managed to get this to work, but its amazingly slow as it scans the whole table...
SELECT
p.id
FROM
(
SELECT id,
manifest->'$.matrix[*].slide_id' as slide_ids
FROM `presentation`
) p
INNER JOIN `pp_slides` s
ON JSON_CONTAINS(p.slide_ids, CAST(s.id as json), '$')
WHERE s.date_deleted IS NOT NULL
If I filter it down to an individual presentation ID, then its not too bad, but still takes 700 ms for a presentation with a couple of hundred slides in it. Is there a cleaner way to do this?
I suppose the best way would be to refactor it to store the matrix as a relational table....

How can I get all keys from a JSON column in Postgres?

If I have a table with a column named json_stuff, and I have two rows with
{ "things": "stuff" } and { "more_things": "more_stuff" }
in their json_stuff column, what query can I make across the table to receive [ things, more_things ] as a result?
Use this:
select jsonb_object_keys(json_stuff) from table;
(Or just json_object_keys if you're using just json.)
The PostgreSQL json documentation is quite good. Take a look.
And as it is stated in the documentation, the function only gets the outer most keys. So if the data is a nested json structure, the function will not return any of the deeper keys.
WITH t(json_stuff) AS ( VALUES
('{"things": "stuff"}'::JSON),
('{"more_things": "more_stuff"}'::JSON)
)
SELECT array_agg(stuff.key) result
FROM t, json_each(t.json_stuff) stuff;
Here is the example if you want to get the key list of each object:
select array_agg(json_keys),id from (
select json_object_keys(json_stuff) as json_keys,id from table) a group by a.id
Here id is the identifier or unique value of each row. If the row cannot be distinguished by identifier, maybe it's better to try PL/pgSQL.
Here's a solution that implements the same semantics as MySQL's JSON_KEYS(), which...:
is NULL safe (i.e. when the array is empty, it produces [], not NULL, or an empty result set)
produces a JSON array, which is what I would have expected from how the question was phrased.
SELECT
o,
(
SELECT coalesce(json_agg(j), json_build_array())
FROM json_object_keys(o) AS j (j)
)
FROM (
VALUES ('{}'::json), ('{"a":1}'::json), ('{"a":1,"b":2}'::json)
) AS t (o)
Replace json by jsonb if needed.
Producing:
|o |coalesce |
|-------------|----------|
|{} |[] |
|{"a":1} |["a"] |
|{"a":1,"b":2}|["a", "b"]|
Insert json_column and table
select distinct(tableProps.props) from (
select jsonb_object_keys(<json_column>) as props from <table>
) as tableProps
I wanted to get the amount of keys from a JSONB structure, so I'm doing something like this:
select into cur some_jsonb from mytable where foo = 'bar';
select into keys array_length(array_agg(k), 1) from jsonb_object_keys(cur) as k;
I feel it is a little bit wrong, but it works. It's unfortunate that we can't get an array directly from the json_object_keys() function. That would save us some code.

How to create an empty JSON object in postgresql?

Datamodel
A person is represented in the database as a meta table row with a name and with multiple attributes which are stored in the data table as key-value pair (key and value are in separate columns).
Simplified data-model
Now there is a query to retrieve all users (name) with all their attributes (data). The attributes are returned as JSON object in a separate column. Here is an example:
name data
Florian { "age":25 }
Markus { "age":25, "color":"blue" }
Thomas {}
The SQL command looks like this:
SELECT
name,
json_object_agg(d.key, d.value) AS data,
FROM meta AS m
JOIN (
JOIN d.fk_id, d.key, d.value AS value FROM data AS d
) AS d
ON d.fk_id = m.id
GROUP BY m.name;
Problem
Now the problem I am facing is, that users like Thomas which do not have any attributes stored in the key-value table, are not shown with my select function. This is because it does only a JOIN and no LEFT OUTER JOIN.
If I would use LEFT OUTER JOIN then I run into the problem, that json_object_agg try's to aggregate NULL values and dies with an error.
Approaches
1. Return empty list of keys and values
So I tried to check if the key-column of a user is NULL and return an empty array so json_object_agg would just create an empty JSON object.
But there is not really a function to create an empty array in SQL. The nearest thing I found was this:
select '{}'::text[];
In combination with COALESCE the query looks like this:
json_object_agg(COALESCE(d.key, '{}'::text[]), COALESCE(d.value, '{}'::text[])) AS data
But if I try to use this I get following error:
ERROR: COALESCE types text and text[] cannot be matched
LINE 10: json_object_agg(COALESCE(d.key, '{}'::text[]), COALES...
^
Query failed
PostgreSQL said: COALESCE types text and text[] cannot be matched
So it looks like that at runtime d.key is a single value and not an array.
2. Split up JSON creation and return empty list
So I tried to take json_object_agg and replace it with json_object which does not aggregate the keys for me:
json_object(COALESCE(array_agg(d.key), '{}'::text[]), COALESCE(array_agg(d.value), '{}'::text[])) AS data
But there I get the error that null value not allowed for object key. So COALESCE does not check that the array is empty.
Qustion
So, is there a function to check if a joined column is empty, and if yes return just a simple JSON object?
Or is there any other solution which would solve my problem?
Use left join with coalesce(). As default value use '{}'::json.
select name, coalesce(d.data, '{}'::json) as data
from meta m
left join (
select fk_id, json_object_agg(d.key, d.value) as data
from data d
group by 1
) d
on m.id = d.fk_id;
name | data
---------+------------------------------------
Florian | { "age" : "25" }
Marcus | { "age" : "25", "color" : "blue" }
Thomas | {}
(3 rows)