Merging json objects when grouping in postgres - json

I'm trying to build a user permissions structure in Postgres 11.5.
The basic idea is a user can belong to multiple groups and a group can have permissions for multiple applications. The user's permissions (if any) will override any permissions set at group level.
Permissions at user and usergroup level will be stored as json objects which I want to merge together with user permissions overwriting usergroup permissions if there is any overlap.
Example:
Brendan, James and n other users are in the exact same usergroups, but James should not be able to access app2.
Set up:
CREATE TABLE public.users
(
uid character varying COLLATE pg_catalog."default" NOT NULL,
ugid character varying[],
permissions json,
CONSTRAINT users_pkey PRIMARY KEY (uid)
);
INSERT INTO public.users VALUES
('brendan','{default,gisteam}','{}'),
('james','{default,gisteam}','{"app2":{"enabled":false}}');
CREATE TABLE public.usergroups
(
ugid character varying COLLATE pg_catalog."default" NOT NULL,
permissions json,
CONSTRAINT usergroups_pkey PRIMARY KEY (ugid)
);
INSERT INTO public.usergroups VALUES
('default','{"app1":{"enabled":true}}'),
('gisteam','{"app2":{"enabled":true},"app3":{"enabled":true}}');
Query:
SELECT uid, json_agg(permissions)
FROM (
SELECT
u.uid,
ug.permissions,
'group' AS type
FROM public.users u
JOIN public.usergroups ug
ON ug.ugid = ANY(u.ugid)
UNION ALL
SELECT
uid,
permissions,
'user' AS type
FROM public.users u2
) a
GROUP BY uid;
Actual query results:
+---------+----------------------------------------------------------------------------------------------------------+
| uid | final_permissions |
+---------+----------------------------------------------------------------------------------------------------------+
| brendan | [{"app1":{"enabled":true}},{"app2":{"enabled":true},"app3":{"enabled":true}},{}] |
| james | [{"app1":{"enabled":true}},{"app2":{"enabled":true},"app3":{"enabled":true}},{"app2":{"enabled":false}}] |
+---------+----------------------------------------------------------------------------------------------------------+
This kind of works, but I would want the object to be flattened and keys merged.
Desired result:
+---------+---------------------------------------------------------------------------+
| uid | final_permissions |
+---------+---------------------------------------------------------------------------+
| brendan | {"app1":{"enabled":true},"app2":{"enabled":true},"app3":{"enabled":true}} |
| james | {"app1":{"enabled":true},"app2":{"enabled":false},"app3":{"enabled":true}}|
+---------+---------------------------------------------------------------------------+
DB Fiddle: https://www.db-fiddle.com/f/9kb1v1T82YVxWERxnWLThL/3
Other info:
The actual permissions object set at usergroup level for each app will be more complex than in the example, e.g featureA is enabled, featureB is disabled etc and in fact more applications will be added in the future so I don't want to hardcode any references to specific apps or features if possible.
I suppose technically, if easier, the desired output would be the permissions object for just a single user so could replace the GROUP BY uid with a WHERE uid = 'x'
Question/Problem:
How can I modify the query to produce a flattened/merged permissions json object?
edit: fixed json

Your indicated desired output is not syntactically valid JSON. If I make a guess as to what you actually want, you can get it with jsonb_object_agg rather than jsonb_agg. You have to first unnest the values you select so that you can re-aggregate them together, which is done here by a lateral join against json_each:
select uid, jsonb_object_agg(key,value)
from (
SELECT
u.uid,
ug.permissions,
'group' AS type
FROM public.users u
JOIN public.usergroups ug
ON ug.ugid = ANY(u.ugid)
UNION ALL
SELECT
uid,
permissions,
'user' AS type
FROM public.users u2) a
CROSS JOIN LATERAL json_each(permissions)
GROUP BY uid;
Yields:
uid | jsonb_object_agg
---------+------------------------------------------------------------------------------------
brendan | {"app1": {"enabled": true}, "app2": {"enabled": true}, "app3": {"enabled": true}}
james | {"app1": {"enabled": true}, "app2": {"enabled": false}, "app3": {"enabled": true}}
Your select of "group" as type is confusing as it is never used.

Related

Does MySQL have a way to "coalesce" as an aggregate function?

I'm attempting to take an existing application and re-architect the schema to support new customer requests and fix several outstanding issues (mostly around our current schema being heavily denormalized). In doing so, I've reached an interesting problem which at first glance seems to have a simple solution, but I can't seem to find the function I'm looking for.
The application is a media organization tool.
Our Old Schema:
Our old schema had separate models for "Groups", "Subgroups", and "Videos". A Group could have many Subgroups (one-to-many) and a Subgroup could have many Videos (one-to-many).
There were certain fields that were shared among Groups, Subgroups, and Videos. For instance, the Google Analytics ID to be used when the Video was embedded on a page. Whenever we displayed the embed page we would first look if the value was set on the Video. If not, we checked its Subgroup. If not, we checked its Group. The query looked roughly like so (I wish this were the real query, but unfortunately our application was written over many years by many junior developers, so the truth is much more painful):
SELECT
v.id,
COALESCE(v.google_analytics_id, sg.google_analytics_id, g.google_analytics_id) as google_analytics_id
FROM
Videos v
LEFT JOIN Subgroups sg ON sg.id = v.subgroup_id
LEFT JOIN Groups g ON g.id = sg.group_id
Pretty straight-forward. Now the issue we've run into is that customers want to be able to nest groups arbitrarily deep, and our schema clearly only allows for 2 levels (and, in fact, necessitates two levels - even if you only want one)
New Schema (First Pass):
As a first pass, I knew we'd want a basic tree structure for the Groups, so I came up with this:
CREATE TABLE Groups (
id INT PRIMARY KEY,
name VARCHAR(255),
parent_id INT,
ga_id VARCHAR(20)
)
We can then easily nest up to N levels deep with N joins like so:
SELECT
v.id,
COALESCE(v.ga_id, g1.ga_id, g2.ga_id, g3.ga_id, ...) as ga_id
FROM
Videos v
LEFT JOIN Groups g1 ON g1.id = v.group_id
LEFT JOIN Groups g2 ON g2.id = g1.parent_id
LEFT JOIN Groups g3 ON g3.id = g2.parent_id
...
There's obvious flaws with this approach: We don't know how many parents there will be so we don't know how many times we should JOIN, forcing us to implement a "max depth". Then even with a max depth, if a person only has a single level of groups we still perform multiple JOINs because our queries can't know how deep they need to go. MySQL offers recursive queries, but while looking into if that was the right option I found a smarter schema that produced the same results
New Schema (Take 2):
Looking into better ways to handle a tree structure, I learned about Adjacency Lists (my prior solution), Nested Sets, Materialized Paths, and Closure Tables. Other than Adjacency Lists (which depend on JOINs to grab the entire tree structure and so produces a single row with multiple columns per node on the tree), the other three solutions all return multiple rows for each node on the tree
I ended up going with a Closure Table solution like so:
CREATE TABLE Groups (
id INT PRIMARY KEY,
name VARCHAR(255),
ga_id VARCHAR(20)
)
CREATE TABLE Group_Closure (
ancestor_id INT,
descendant_id INT,
PRIMARY KEY (ancestor_id, descendant_id)
)
Now given a Video I can get all of its parents like so:
SELECT
v.id,
v.ga_id,
g.id,
g.ga_id
FROM
Videos v
JOIN Group_Closure gc ON v.group_id = gc.descendant
JOIN Groups g ON g.id = gc.ancestor;
This returns each group in the hierarchy as a separate row:
+------+---------+------+---------+
| v.id | v.ga_id | g.id | g.ga_id |
+------+---------+------+---------+
| 1 | abc123 | 2 | new_val |
| 1 | abc123 | 1 | default |
| 2 | NULL | 4 | xyz987 |
| 2 | NULL | 3 | NULL |
| 2 | NULL | 1 | default |
| 3 | NULL | 3 | NULL |
| 3 | NULL | 1 | default |
+------+---------+------+---------+
What I wish to do now is somehow achieve the same result I would have expected from using COALESCE on multiple self-joined Group tables: a single value for ga_id based on whichever node is "lowest" in the tree
Because I have multiple rows per Video, I suspect that this can be accomplished using GROUP BY and some kind of aggregate function:
SELECT
v.id,
COALESCE(v.ga_id, FIRST_NON_NULL(g.ga_id))
FROM
Videos v
JOIN Group_Closure gc ON v.group_id = gc.descendant
JOIN Groups g ON g.id = gc.ancestor
GROUP BY v.id, v.ga_id;
Note that because (ancestor, descendant) is my primary key, I believe the order of the group closure table can be guaranteed to always come back the same - meaning if I put the lowest node first, it will be the first row in the resulting query... If my understanding of this is incorrect, please let me know.
If you were to stick with an adjacency list, you could use a recursive CTE. This one traverses up from each video id value until it finds a non-NULL ga_id:
WITH RECURSIVE CTE AS (
SELECT id, ga_id, group_id
FROM videos
UNION ALL
SELECT CTE.id, COALESCE(CTE.ga_id, g.ga_id), g.parent_id
FROM `groups` g
JOIN CTE ON g.id = CTE.group_id AND CTE.ga_id IS NULL
)
SELECT id, ga_id
FROM CTE
WHERE ga_id IS NOT NULL
For my attempt to reconstruct your data from your question, this yields:
id ga_id
1 abc123
2 xyz987
3 default
Demo on dbfiddle

fan out each row into multiple rows per keys in a JSON column

I have this table:
CREATE TABLE user_stats (username varchar, metadata_by_topic json);
INSERT INTO user_stats VALUES ('matt', '{"tech":["foo","bar"],"weather":"it is sunny"}');
INSERT INTO user_stats VALUES ('fred', '{"tech":{"stuff":"etc"},"sports":"bruins won"}');
The top-level keys in metadata_by_topic are always strings (e.g. "tech", "weather"), but the values under them are arbitrary json. I'd like a query that maps these top-level keys to their own column, and the json values to a different column, like so:
username | topic | metadata
-----------------------------------
matt | tech | ["foo","bar"]
matt | weather | "it is sunny"
fred | tech | {"stuff":"etc"}
fred | sports | "bruins won"
where username and topic are both of type VARCHAR and metadata is of type JSON. This:
select * from json_each((select t.metadata_by_topic from user_stats as t));
only works if I add LIMIT 1 the inner select, but that's not what I want.
UPDATE: This is a better method
select username, key, metadata_by_topic->key
from
(select username,
json_object_keys(
(select t.metadata_by_topic from user_stats as t where t.username=us.username)
) AS KEY,
us.metadata_by_topic
from user_stats us
) x

MySQL returning arrays from subqueries, and NULL

I have two tables, "records", and "info".
The "records" table looks like:
mysql> SELECT * FROM records WHERE num = '7';
+-----+--------+----+------+-----+-----+------------+-----------+----------+---------------------+
| id | city | st | type | num | val | startdate | status | comments | updated |
+-----+--------+----+------+-----+-----+------------+-----------+----------+---------------------+
| 124 | Encino | CA | AAA | 7 | 1 | 1993-09-01 | allocated | | 2014-02-26 08:16:07 |
+-----+--------+----+------+-----+-----+------------+-----------+----------+---------------------+
and so on. Think of the "num" field in this table as a Company ID.
The "info" table contains information about certain companies, and uses that company id as a unique identifier. Not all companies listed in "records" will be in "info". An example of the "info" table:
mysql> SELECT * FROM info LIMIT 2;
+-----+-------+--------------------------+---------------------+
| org | name | description | updated |
+-----+-------+--------------------------+---------------------+
| 0 | ACME | | 2014-02-19 10:35:39 |
| 1 | AT&T | Some Phone Company, Inc. | 2014-02-18 15:29:50 |
+-----+-------+--------------------------+---------------------+
So "org" here will match "num" in the first table.
I want to be able to run a query that returns, on one line, everything but 'id', 'type' and 'val' from the 1st table, and IF APPLICABLE, the 'name' and 'description' from the 2nd table.
I can achieve what I want using this query:
SELECT city,st,num,startdate,status,comments,updated, \
( SELECT name FROM info WHERE org = '7') AS name, \
( SELECT description FROM info WHERE org = '7') AS description \
FROM records WHERE num = '7'
But I see at least two problems with it:
It seems inefficient to run two subqueries
When there is no record in "info", NULL is printed for the name and
description. I would like to print some string instead.
To address the first problem, I tried to return an array. But when no corresponding record exists in the "info" table, then I get nothing, not even the valid info from the "records" table. Here's my array query:
SELECT city,st,num,startdate,status,comments,updated,asinfo.name AS name,asinfo.description AS description \
FROM records, \
( SELECT name,description FROM info WHERE org = '7') AS asinfo \
WHERE num = '7'
This query works fine if a given company id exists in both tables.
To address the second problem, I tried various incantations of IFNULL and coalesce, to no avail.
I'd appreciate any insight.
Thanks.
Apply LEFT JOIN syntax:
SELECT
r.city,
r.st,
r.num,
r.startdate,
r.status,
r.comments,
r.updated,
IF(d.name IS NULL, 'Default', d.name) AS name,
IF(d.description IS NULL, 'Default', d.description) AS description
FROM
records AS r
LEFT JOIN info AS d ON r.num=d.org
WHERE
r.num='7'
that will work such way: LEFT JOIN looks into first table, and, if there are no corresponding records in second, it applies NULL. So you'll discover that with IF (or IFNULL) and do substitution of default string.
Use a LEFT JOIN to get null values when there's no matching row in the info table.
SELECT city,st,num,startdate,status,comments,updated,
IFNULL(name, 'Default Name') name,
IFNULL(description, 'Default Description') description
FROM records r
LEFT JOIN info i ON r.num = i.org
WHERE r.num = 7
It sounds like a simple LEFT JOIN from record to info will do the trick.
LEFT JOIN rather than JOIN in order to ensure you ALWAYS get all rows from the record table, and then the corresponding data in info table if a xref exists for that ID.
Whether using your sub-queries or using joins, if you always want to see all rows in record table, then you will always get NULLs corresponding to the info table where no xref exists. The only way to avoid that is to run some code that calls everything from record, and then iterates over the results to query info, to conditionally add to the record data.

How can I optimize this SQL query with a large IN clause?

I have a fairly complicated operation that I'm trying to perform with just one SQL query but I'm not sure if this would be more or less optimal than breaking it up into n queries. Basically, I have a table called "Users" full of user ids and their associated fb_ids (id is the pk and fb_id can be null).
+-----------------+
| id | .. | fb_id |
|====|====|=======|
| 0 | .. | 12345 |
| 1 | .. | 31415 |
| .. | .. | .. |
+-----------------+
I also have another table called "Friends" that represents a friend relationship between two users. This uses their ids (not their fb_ids) and should be a two-way relationship.
+----------------+
| id | friend_id |
|====|===========|
| 0 | 1 |
| 1 | 0 |
| .. | .. |
+----------------+
// user 0 and user 1 are friends
So here's the problem:
We are given a particular user's id ("my_id") and an array of that user's Facebook friends (an array of fb_ids called fb_array). We want to update the Friends table so that it honors a Facebook friendship as a valid friendship among our users. It's important to note that not all of their Facebook friends will have an account in our database, so those friends should be ignored. This query will be called every time the user logs in so it can update our data if they've added any new friends on Facebook. Here's the query I wrote:
INSERT INTO Friends (id, friend_id)
SELECT "my_id", id FROM Users WHERE id IN
(SELECT id FROM Users WHERE fb_id IN fb_array)
AND id NOT IN
(SELECT friend_id FROM Friends WHERE id = "my_id")
The point of the first IN clause is to get the subset of all Users who are also your Facebook friends, and this is the main part I'm worried about. Because the fb_ids are given as an array, I have to parse all of the ids into one giant string separated by commas which makes up "fb_array." I'm worried about the efficiency of having such a huge string for that IN clause (a user may have hundreds or thousands of friends on Facebook). Can you think of any better way to write a query like this?
It's also worth noting that this query doesn't maintain the dual nature of a friend relationship, but that's not what I'm worried about (extending it for this would be trivial).
If I am not mistaken, your query can be simplified, if you have a UNIQUE constraint on the combination (id, friend_id), to:
INSERT IGNORE INTO Friends
(id, friend_id)
SELECT "my_id", id
FROM Users
WHERE fb_id IN fb_array ;
You should have index on User (fb_id, id) and test for efficiency. if the number of the itmes in the array is too big (more than a few thousands), you may have to split the array and run the query more than once. Profile with your data and settings.
Depends on if if the following columns are nullable (value can be NULL):
USERS.id
FRIENDS.friend_id
Nullable:
SELECT DISTINCT
"my_id", u.id
FROM Users u
WHERE u.fb_id IN fb_array
AND u.id NOT IN (SELECT f.friend_id
FROM FRIENDS f
WHERE f.id = "my_id")
Not Nullable:
SELECT "my_id", u.id
FROM Users u
LEFT JOIN FRIENDS f ON f.friend_id = u.id
AND f.id = "my_id"
WHERE u.fb_id IN fb_array
AND f.fried_id IS NULL
For more info:
http://explainextended.com/2010/05/27/left-join-is-null-vs-not-in-vs-not-exists-nullable-columns/
http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Speaking to the number of values in your array
The tests run in the two articles mentioned above contain 1 million rows, with 10,000 distinct values.

select * from table where column = something or, when unavailable, column = something else

I am looking for something more efficient and/or easier to read than the query that follows. The best way to explain the question is to provide a sample.
Assume the following table structure in MySQL that represents, say, a localization context for various strings in the application I am creating.
create table LOCALIZATION_TABLE (
SET_NAME varchar(36) not null,
LOCALE varchar(8) not null default '_',
ENTRY_KEY varhcar(36) not null,
ENTRY_VALUE text null
);
alter table LOCALIZATION_TABLE
add constraint UQ_ENTRY
unique (SET_NAME, LOCALE, ENTRY_KEY);
Assume the following values are entered in the table:
insert into LOCALIZATION_TABLE (SET_NAME, LOCALE, ENTRY_KEY, ENTRY_VALUE)
values
('STD_TEXT', '_', 'HELLO', 'Hello!'),
('STD_TEXT', '_', 'GOODBYE', 'Goodbye.'),
('STD_TEXT', 'ge', 'GOODBYE', 'Lebewohl')
;
I want to select all the available entries for German ("ge"), and if not available use the English text ("_") by default. The query I am currently using is as follows:
select * from LOCALIZATION_TABLE where SET_NAME = 'STD_TEXT' and LOCALE = 'ge'
union
select * from LOCALIZATION_TABLE where SET_NAME = 'STD_TEXT' and LOCALE = '_'
and ENTRY_KEY not in (
select ENTRY_KEY from LOCALIZATION_TABLE where BUNDLE = 'STD_TEXT' and LOCALE = 'ge'
)
I really do not like the look of this query and I am certain there must be something more concise that can be utilized instead. Any help or clues to the right direction would be appreciated. While this works it just does not seem proper.
You can provide a custom ordering, then take the first row, like this:
select * from (
select *
from LOCALIZATION_TABLE
where SET_NAME = 'STD_TEXT'
order by field(LOCALE, 'ge', '_') -- this provides the custom ordering
) x
group by ENTRY_KEY; -- this captures the first row for each ENTRY_KEY
Explanation:
The inner select's order by field(LOCALE, 'ge', '_') gets you the rows in the order you define - in this case German first if it exists, then English (you could add more languages to the list).
The "trick" here is using mysql's "non-standard" GROUP BY behaviour when not listing the non-group columns (most servers treat this as a syntax error) - it simply returns the first row found for every group. The outer select uses group by without an aggregate to get the first row for each named group by.
Output of this query using your data:
+----------+--------+-----------+-------------+
| SET_NAME | LOCALE | ENTRY_KEY | ENTRY_VALUE |
+----------+--------+-----------+-------------+
| STD_TEXT | ge | GOODBYE | Lebewohl |
| STD_TEXT | _ | HELLO | Hello! |
+----------+--------+-----------+-------------+
I think your query is fine. But here's another approach:
SELECT
en.SET_NAME
, COALESCE(ge.LOCALE, en.LOCALE)
, en.ENTRY_KEY
, COALESCE(ge.ENTRY_VALUE, en.ENTRY_VALUE)
FROM
LOCALIZATION_TABLE AS en
LEFT JOIN
LOCALIZATION_TABLE AS ge
ON ge.ENTRY_KEY = en.ENTRY_KEY
AND ge.LOCALE = 'ge'
AND ge.SET_NAME = 'STD_TEXT'
WHERE
en.LOCALE = '_'
AND en.SET_NAME = 'STD_TEXT'