Indexing and joining JSON data in Postgres - json

We have a table of data that looks like:
CREATE TABLE objects (
id BIGSERIAL NOT NULL,
typeid BIGINT NOT NULL,
fullobject JSON,
PRIMARY KEY (id),
...
);
and
CREATE TABLE types (
id BIGSERIAL NOT NULL,
type VARCHAR(255),
PRIMARY KEY (id),
...
);
in the objects.fullobject column is JSON data for users and orgs like:
// id: 1
{
...
"type":"1", // user
"orgs":[
{"id":"2", "position":"foo"},
{"id":"2", "position":"bar"},
{"id":"3", "position":"foo"},
]
}
// id: 2
{
...
"type":"2", // org
"name":"Org 1"
}
// id: 3
{
...
"type":"2", // org
"name":"Org 2"
}
The question is, if I want to find a user based on the name of the organisation, how do I do the join?
I'm not sure if I can create the right index on the org data inside a user.
The two solutions I can think of are:
Create a single text attribute that contains the id's of the orgs for a user, then do a join based of that (e.g. "searchableOrg": "|org1|org2|").
Then the query looks like:
SELECT * from objects user
INNER JOIN types usertype ON usertype.id = user.typeid
INNER JOIN objects org ON json_extract_path(user.fullobject, 'searchableOrg') LIKE '%|' || org.id || '|%'
INNER JOIN types usertype ON usertype.id = user.typeid
WHERE json_extract_path(org.fullobject, 'name') LIKE '%whatever%'
AND usertype.type = 'user'
AND orgtype.type = 'org'
Here I can have indexes on json_extract_path(user.fullobject, 'searchableOrg') and json_extract_path(org.fullobject, 'name').
However this doesn't work if we add things like a start or end date to the user's organisation membership and need to further filter the join on that.
Create a trigger that when a user is created/update/deleted modifies a table (userorgtable) that contains the user org membership and do the join off of that.
I haven't looked at triggers yet, but hopefully the query would be something like:
SELECT * from objects user
INNER JOIN userorgtable userorg ON user.id = userorg.userid
INNER JOIN types usertype ON usertype.id = user.typeid
INNER JOIN objects org ON userorg.orgid = org.id
INNER JOIN types orgtype ON orgtype.id = org.typeid
WHERE json_extract_path(org.fullobject, 'name') LIKE '%whatever%'
AND usertype.type = 'user'
AND orgtype.type = 'org'
In this case, if we needed to add further attributes to the join (such as start and end dates), we can just put them in the userorgtable and have add conditions like:
AND userorg.startdate < CURRENT_DATE()
Is there another option? Perhaps using more of the Postgres JSON functions?
Thanks in advance

Related

Return a different datatype from postgresql

I have the below query in PG
SELECT
project.project_id,
project.project_name,
category.category_name,
array_agg(row(skill.skill_name,projects_skills.projects_skills_id)) AS skills
FROM project
JOIN projects_skills ON project.project_id = projects_skills.project_id
JOIN skill ON projects_skills.skill_id = skill.skill_id
JOIN category ON project.category_id = category.category_id
GROUP BY project.project_name,project.project_id, category.category_name;
of particular interest is the below line which seems to return a pseudo-type tuple
array_agg(row(skill.skill_name,projects_skills.projects_skills_id)) AS skills
I'm unable to create a view of this because of the pseudo type - in addition to this, the row function seems to return a tuple set like the below:
skills: '{"(Python,3)","(Node,4)","(Javascript,5)"}' }
I could painfully parse it in JavaScript by replacing '(' to '[' etc. but could I do something in postgres to return it preferably as an object?
One possible solution is to register a row type (once):
CREATE TYPE my_type AS (skill_name text, projects_skills_id int);
I am guessing text and int as data types. Use the actual data types of the underlying tables.
SELECT p.project_id, p.project_name, c.category_name
, array_agg((s.skill_name, ps.projects_skills_id)::my_type) AS skills
FROM project p
JOIN projects_skills ps ON p.project_id = ps.project_id
JOIN skill s ON ps.skill_id = s.skill_id
JOIN category c ON p.category_id = c.category_id
GROUP BY p.project_id, p.project_name, c.category_name;
There are many other options, depending on your version of Postgres and what you need exactly.
As well as the excellent suggestions to use JSON in the comments, and #Erwin 's to use a registered composite type, you can use a two-dimension array, or a multivalues approach:
Just replace your line
array_agg(row(skill.skill_name::text,projects_skills.projects_skills_id::text)) AS skills
with the following:
Two dimension array option 1
array_agg(array[skill.skill_name::text,projects_skills.projects_skills_id::text]) AS skills
-- skills will be '{{Python,3},{Node,4},{Javascript,5}}', thus
-- skills[1][1] = 'Python' and skills[1][2] = '3' -- id is text
Two dimension array option 2
array[array_agg(skill.skill_name),array_agg(projects_skills.projects_skills_id)] AS skills
-- skills will be '{{Python,Node,Javascript},{3,4,5}}', thus
-- skills[1][1] = 'Python' and skills[2][1] = '3' -- id is text
Multivalues
array_agg(skill.skill_name) AS skill_names,
array_agg(projects_skills.projects_skills_id) AS skills_ids
-- skills_names = '{Python,Node,Javascript}' and skill_ids = '{3,4,5}', thus
-- skills_names[1] = 'Python' and skills_ids[1] = 3 -- id is integer

Django querysets: Excluding NULL values across multiple joins

I'm trying to avoid using extra() here, but haven't found a way to get the results I want using Django's other queryset methods.
My models relationships are as follows:
Model: Enrollment
FK to Course
FK to User
FK to Mentor (can be NULL)
Model: Course
FK to CourseType
In a single query: given a User, I'm trying to get all of the CourseTypes they have access to. A User has access to a CourseType if they have an Enrollment with both a Course of that CourseType AND an existing Mentor.
This user has 2 Enrollments: one in a Course for CourseType ID 6, and the other for a Course for CourseType ID 7, but her enrollment for CourseType ID 7 does not have a mentor, so she does not have access to CourseType ID 7.
user = User.objects.get(pk=123)
This works fine: Get all of the CourseTypes that the user has enrollments for, but don't (yet) query for the mentor requirement:
In [28]: CourseType.objects.filter(course__enrollment__user=user).values('pk')
Out[28]: [{'pk': 6L}, {'pk': 7L}]
This does not give me the result I want: Excluding enrollments with NULL mentor values. I want it to return only ID 6 since that is the only enrollment with a mentor, but it returns an empty queryset:
In [29]: CourseType.objects.filter(course__enrollment__user=user).exclude(course__enrollment__mentor=None).values('pk')
Out[29]: []
Here's the generated SQL for the last queryset that isn't returning what I want it to:
SELECT `courses_coursetype`.`id` FROM `courses_coursetype` INNER JOIN `courses_course` ON ( `courses_coursetype`.`id` = `courses_course`.`course_type_id` ) INNER JOIN `store_enrollment` ON ( `courses_course`.`id` = `store_enrollment`.`course_id` ) WHERE (`store_enrollment`.`user_id` = 3877 AND NOT (`courses_coursetype`.`id` IN (SELECT U0.`id` AS `id` FROM `courses_coursetype` U0 LEFT OUTER JOIN `courses_course` U1 ON ( U0.`id` = U1.`course_type_id` ) LEFT OUTER JOIN `store_enrollment` U2 ON ( U1.`id` = U2.`course_id` ) WHERE U2.`mentor_id` IS NULL)))
The problem, it seems, is that in implementing the exclude(), Django is creating a subquery which is excluding more rows than I want excluded.
To get the desired results, I had to use extra() to explicitly exclude NULL Mentor values in the WHERE clause:
In [36]: CourseType.objects.filter(course__enrollment__user=user).extra(where=['store_enrollment.mentor_id IS NOT NULL']).values('pk')
Out[36]: [{'pk': 6L}]
Is there a way to get this result without using extra()? If not, should I file a ticket with Django per the docs? I looked at the existing tickets and searched for this issue but unfortunately came up short.
I'm using Django 1.7.10 with MySQL.
Thanks!
Try using isnull.
CourseType.objects.filter(
course__enrollment__user=user,
course__enrollment__mentor__isnull=False,
).values('pk')
Instead of exclude() you can create complex queries using Q(), or in your case ~Q():
filter_q = Q(course__enrollment__user=user) | ~Q(course__enrollment__mentor=None)
CourseType.objects.filter(filter_q).values('pk')
This might lead to a different SQL statement.
See docs:
https://docs.djangoproject.com/en/3.2/topics/db/queries/#complex-lookups-with-q-objects

Retrieve PRIMARY KEY value from SELECT DISTINCT with INNER JOIN

I have a set of joined tables that I am querying in the following way:
$query = $this->mysqli->query("
SELECT DISTINCT name, website, email
FROM sfe
INNER JOIN ef ON sfe.ID_SFE = ef.ID_SFE
INNER JOIN f ON f.ID_F = ef.ID_F
INNER JOIN ad ON ad.ID_SFE = ef.ID_SFE
WHERE name LIKE '%{$sanitized}%' OR
website LIKE '%{$sanitized}%' OR
business_name LIKE '%{$sanitized}%' OR
email LIKE '%{$sanitized}%'
");
where ID_SFE is the primary key of table sfe and also the foreign key of ef.
When I make this query, I then echo the list of results with the following:
while ($result = $query->fetch_object()) {
$query_result = $result->"name";
echo "$query_result"
}
Because now I would like to also find the value of ID_SFE inside the same while loop, I tried to add ID_SFE in the list of SELECT DISTINCT together with name, website, email, however, I get ERROR: There was a problem with the query.
How can I get the value of ID_SFE and store it to another variable inside the while loop?
Thanks
You can not simply add ID_SFE to the list of fields to retrieve, because such field exists in the ad and ef tables.
You will be able to add ad.ID_SFE and/or ad.ID_SFE fields -note you need to specify the table name when specifying the field, as the fields needs to be referenced unequivocally.

Get Unique rows using Entity framework

I am using 3 tables to set roles for users.
1. module
Id, Name
2. actions
Id , Name ,ModuleId (Foreign key with modules)
3. userActions
Id,UserId,ActionId (Foreign key with actions)
I want to get the unique list of modules for a user from the userActions table . I am using Entity Framework and my database is Mysql
I used the query
var result = (from p in my_accountEntities.useractions
where p.UserId == item.Id
select p.action.module).ToList();
List<module> modules = new List<module>();
if (result != null)
{
modules = (List<module>)result;
}
Its not returning a Unique list , but its returning all the rows in Useraction table.
How can i get the unique list of modules(based on moduleId)
try using the linq .Distinct() extension method
var result = (from p in my_accountEntities.useractions
where p.UserId == item.Id
select p.action.module)
.Distinct().ToList();

SQL VIEW WITH JOINS

I have 3 tables
Node table - Nodeid, Node relationship id(NodeRelID)
Node relationship table - id, Nodeid, Node Link id
Eventstatus Tabel - id, Nodeid, Node Status.
I want to create a view where it displays each node's id and the status of the node related to it. I have done that here;
CREATE VIEW `view_alarm` AS
select `node`.`NodeID` AS `NodeID`,`eventstatus`.`EventID` AS `EventID`
from ((`node` join `node_relationship`) join `eventstatus`)
where ((`node`.`NodeRelID` = `node_relationship`.`id`) and (`node_relationship`.`Node_LinkID` = `eventstatus`.`NodeID`));
Now I would like to retrieve any nodes that do not have a relationship and automatically give them a 0 in the place of the relationship status and i would like this stored in the same table so i have attempted this via a case statement in the view. Like so:
CREATE view `view_alarm` AS select
`node`.`NodeID` AS `NodeID`,
(case when (`node_relationship`.`Node_LinkID` = `eventstatus`.`NodeID`) then `eventstatus`.`EventID`
when (`node_relationship`.`Node_LinkID` <> `eventstatus`.`NodeID`) then `eventstatus`.`EventID` '0' end) AS `EventID`
from ((`node` join `node_relationship`) join `eventstatus`)
where (`node`.`NodeRelID` = `node_relationship`.`id`);
Can someone point me in the right direction.
Use outer joins in a recursive join
FROM Node n
LEFT JOIN Node_Relationship nr1
ON n.key = nr.key
LEFT JOIN Node_Relationship nr2
ON n.key = nr2.Key
AND n.key IS NULL
Use it in your case:
CASE
WHEN nr2.[key] IS NOT NULL THEN 0