Sqlalchemy union of two similar tables

Sqlalchemy union of two similar tables - sqlalchemy

I have two sql tables (messages, messages_processed), both are similar. messages_processed table has one column extra than messages one, other columns data types/structure are same in both. While showing all messages (processed/regular) for a particular user, a union of these two tables should be applied.
Class Message(object):
def __init__(self, sender_id, text, user_id):
self.sender_id = sender_id
self.text = text
self.user_id = user_id
self.categories = [] #(N:M relation)
Class MessageProcessed(object):
def __init__(self, sender_id, text, user_id, action):
self.sender_id = sender_id
self.text = text
self.user_id = user_id
self.categories = [] #(N:M relation)
self.action = action
I cannot change the existing structure of table. I need to do some thing like this which will result array of orm objects with N:M mapping.
session.query(Message).filter(Message.user_id==12)
.union(session.query(MessageProcessed)
.filter(MessageProcessed.user_id==12)).all()

It looks like you cannot use UNION in your case because each SELECT statement within the UNION must have the same number of columns.

Related

joinedload and load_only but with filtering

I have two models with a simple FK relationship, Stock and Restriction (Restriction.stock_id FK to Stock).
class Restriction(Model):
__tablename__ = "restrictions"
id = db.Column(db.Integer, primary_key=True)
stock_id = FK("stocks.id", nullable=True)
name = db.Column(db.String(50), nullable=False)
class Stock(Model):
__tablename__ = "stocks"
id = db.Column(db.Integer, primary_key=True)
ticker = db.Column(db.String(50), nullable=False, index=True)
I would like to retrieve Restriction object and related Stock but only Stock's ticker (there are other fields omitted here). I can simply do this with:
from sqlalchemy.orm import *
my_query = Restriction.query.options(
joinedload(Restriction.stock).load_only(Stock.ticker)
)
r = my_query.first()
I get all columns for Restriction and only ticker for Stocks with above. I can see this in the SQL query run and also I can access r.stock.ticker and see no new queries run as it is loaded eagerly.
The problem is I cannot filter on stocks now, SQLAlchemy adds another FROM clause if I do my_query.filter(Stock.ticker == 'GME'). This means there is a cross product and it is not what I want.
On the other hand, I cannot load selected columns from relationship using join. ie.
Restriction.query.join(Restriction.stock).options(load_only(Restriction.stock))
does not seem to work with relationship property. How can I have a single filter and have it load selected columns from relationship table?
SQL I want run is:
SELECT restrictions.*, stocks.ticker
FROM restrictions LEFT OUTER JOIN stocks ON stocks.id = restrictions.stock_id
WHERE stocks.ticker = 'GME'
And I'd like to get a Restriction object back with its stock and stock's ticker. Is this possible?

joinedload basically should not be used with filter. You probably need to take contains_eager option.
from sqlalchemy.orm import *
my_query = Restriction.query.join(Restriction.stock).options(
contains_eager(Restriction.stock).load_only(Stock.ticker)
).filter(Stock.ticker == 'GME')
r = my_query.first()
Because you are joining using stock_id it will also be in the results as Stock.id beside Stock.ticker. But other fields would be omitted as you wish.
I have written short post about it recently if you are interested: https://jorzel.hashnode.dev/an-orm-can-bite-you

SQLAlchemy: Counting multiple relationships - best way?

Imagine the following (example) datamodel:
class Organization(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
friendly_name = db.Column(db.Text, nullable=False)
users = db.relationship('Users', back_populates='organizations')
groups = db.relationship('Groups', back_populates='organizations')
class User(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="users")
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="groups")
(so basically an Organization has User and Group relationships)
What we want is to retrieve the counts for users and groups. Result should be similar to the following:
id
friendly_name
users_count
groups_count
1
o1
33
3
2
o2
12
2
3
o3
1
0
This can be achieved with a query similar to
query = db.session.query(
Organization.friendly_name,
func.count(User.id.distinct()).label('users_count'),
func.count(Group.id.distinct()).label('groups_count'),
) \
.outerjoin(User, Organization.users) \
.outerjoin(Group, Organization.groups) \
.group_by(Organization.id)
which seems quite overkill. The first intuitive approach would be something like
query = db.session.query(
Organization.friendly_name,
func.count(distinct(Organization.users)).label('users_count'),
func.count(distinct(Organization.groups).label('groups_count'),
)# with or without outerjoins
which is not working (Note: With one relationship it would work).
a) Whats the difference between User.id.distinct() and distinct(Organization.users) in this case?
b) What would be the best/most performant/recommended way in SQLAlchemy to get a count for each relationship an Object has?
Bonus): If instead of Organization.friendly_name the whole Model would be selected (...query(Organization, func....)) SQLAlchemy returns a tuple with the format t(Organization, users_count, groups_count) as result. Is there a way to just return the Organization with the two counts as additional fields? (as SQL would)

b:
You can try a window function to count users and groups with good performance:
query = db.session.query(
Organization.friendly_name,
func.count().over(partition_by=(User.id, Organization.id)).label('users_count')
func.count().over(partition_by=(Group.id, Organization.id)).label('groups_count')
)
.outerjoin(User, Organization.users)
.outerjoin(Group, Organization.groups)
bonus:
To return count as a field of Organization, you can use hybrid_property, but you would not be happy with the performance.

How to delete rows with duplicate columns in django

I have a model
class Foo(models.Model):
first = models.CharField()
second = models.CharField()
data I have is
first second
1 2
1 2
1 2
3 4
Now I want to delete all duplicate rows and keep one entry. The end result
first second
1 2
3 4
How do I do this? I checked this question but could not figure it out properly. Annotate
I have tried
foo_ids = Foo.objects.annotate(first_c=Count('first'), second_c=Count('second')).filter(first_c__gt=1, second_c__gt=1).values('first', 'second', 'id')
Then try and figure out how to not delete one of each I list of duplicates.

I ended up doing this.
from django.db.models import Count
duplicate_foo = Foo.objects.values('req_group','child_program__id').annotate(id_c=Count('id')).filter(id_c__gt=1)
for dups in duplicate_foo:
for i, val in enumerate(Foo.objects.filter(first=dups['first'],
second=dups['second'])):
if i ==0:
continue
val.delete()
Not the most optimzed solution. But it works.

It's an older thread, but both answers don't fit the bill for large datasets and lead to a huge number of queries.
You can use this generic method instead:
from django.apps import apps
from django.db.models import Count, Max
def get_duplicates_from_fields(app, model, fields, delete=False, **filters):
"""
Returns duplicate records based on a list of fields,
optionally deletes duplicate records.
"""
Model = apps.get_model(app, model)
duplicates = (Model.objects.filter(**filters).values(*fields)
.order_by()
.annotate(_max_id=Max('id'), _count=Count('id'))
.filter(_count__gt=1))
for duplicate in duplicates:
if delete:
(
Model.objects
.filter(**filters)
.filter(**{x: duplicate[x] for x in fields})
.exclude(id=duplicate['_max_id'])
.delete()
)
else:
print(duplicate)
You can use this method as such:
get_duplicates_from_fields('myapp', 'Foo', ['first', 'second'], True)
This lets you find and delete duplicate records based on any number of fields.

Select records from two SQLFORM.grid and insert the records into the third table

I want to insert multi records into my belongs table, while the records are selected from two tables by SQLFORM.grid.
my table:
db.define_table('problem',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('tasks',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('belongs',
Field('task_id','reference tasks'),
Field('problem_id','reference problem'))
select some records from problem table and select one record from tasks table then insert into belongs table. Can it be realized by SQLFORM.grid ?
def problemtask():
form=SQLFORM.grid(db.problem,selectable =lambda ids:insert(ids,ids1))
form1=SQLFORM.grid(db.tasks,selectable = lambda ids1 :insert(ids,ids1) )
return dict(form=form,form1=form1)
def insert(ids,ids1):
thanks！

Select one record from one table then select some records from another table,last insert the combination into the third table.
def showtask():
id=request.args(0,cast=int)#id为course_id
db.tasks._common_filter = lambda query: db.tasks.course_id ==id
links=[lambda row:A('createproblem',_href=URL("default","addproblem",args=[row.id])),
lambda row:A('showproblem',_href=URL("default","showproblem",args=[row.id]))]
form=SQLFORM.smartgrid(db.tasks,args=request.args[:1],links=links,linked_tables=[''],csv=False)
return dict(form=form)
def mulassignproblem():
taskid=request.args(0,cast=int)
form=SQLFORM.grid(db.problem,args=request.args[:1],selectable = lambda ids :mulproblem(ids,taskid))
return dict(form=form)
def mulproblem(ids,taskid):
problemids=ids
taskids=taskid
for problemid in problemids:
if db((db.belongs.problem_id==problemid)&(db.belongs.task_id==taskids)).count():
pass
else:
db.belongs.insert(task_id=taskid,problem_id=problemid)

expanding the SQL query inside managers in Django models?

Here is the code from django docs that explains the use of managers.
class PollManager(models.Manager):
def with_counts(self):
from django.db import connection
cursor = connection.cursor()
cursor.execute("""
SELECT p.id, p.question, p.poll_date, COUNT(*)
FROM polls_opinionpoll p, polls_response r
WHERE p.id = r.poll_id
GROUP BY p.id, p.question, p.poll_date
ORDER BY p.poll_date DESC""")
result_list = []
for row in cursor.fetchall():
p = self.model(id=row[0], question=row[1], poll_date=row[2])
p.num_responses = row[3]
result_list.append(p)
return result_list
class OpinionPoll(models.Model):
question = models.CharField(max_length=200)
poll_date = models.DateField()
objects = PollManager()
class Response(models.Model):
poll = models.ForeignKey(OpinionPoll)
person_name = models.CharField(max_length=50)
response = models.TextField()
I have two questions based on this code:
1) where is r.poll_id coming from? I understand Response has foreignKey relationship to OpinionPoll. In order to JOIN OpinionPoll table with Response table, I need to join on their id.
HOwever to access the poll id in Response, I would do r.poll.id.
Is the syntax, r.poll_id, a MySQL syntax.
why GROUP BY p.id, p.question, p.poll_date? why GROUP BY p.id alone is not sufficient?
2) Is it possible to turn the above raw SQL query into a django ORM query?If so how would that look like?
I am not a SQL guy. so bear with me, if this sounds stupid
EDIT:
If I want to create OpinionPoll and Response tables outside of Django, how will SQL statment for create look like?
In the Django shell, when I run
python manage.py sqlall appname
I get the following:
BEGIN;
CREATE TABLE "myapp_opinionpoll" (
"id" integer NOT NULL PRIMARY KEY,
"question" varchar(200) NOT NULL,
"poll_date" date NOT NULL
)
;
CREATE TABLE "myapp_response" (
"id" integer NOT NULL PRIMARY KEY,
"poll_id" integer NOT NULL REFERENCES "myapp_opinionpoll" ("id"),
"person_name" varchar(50) NOT NULL,
"response" text NOT NULL
)
;
CREATE INDEX "larb_response_70f78e6b" ON "myapp_response" ("poll_id");
COMMIT;
I see something like REFERENCES "myapp_opinionpoll" and CREATE INDEXabove. I am not sure
if this is how in SQL it is done?

[1] Django model will create foreign keys like fieldname_id as the field in mysql. So you see the field poll = models.ForeignKey(OpinionPoll) creates this field.
About GROUP BY, because these fields are exactly what selected, except for the aggregate function, grouping them exactly can make them distinct.
[2] Try this, I didn't debug, but may helps:
from django.db.models import Count
OpinionPoll.objects.annotate(num_responses=Count('response'))
For more about aggregation, see the docs: https://docs.djangoproject.com/en/1.6/topics/db/aggregation/

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Sqlalchemy union of two similar tables - sqlalchemy

It looks like you cannot use UNION in your case because each SELECT statement within the UNION must have the same number of columns.

Related

joinedload and load_only but with filtering

SQLAlchemy: Counting multiple relationships - best way?

How to delete rows with duplicate columns in django

Select records from two SQLFORM.grid and insert the records into the third table

expanding the SQL query inside managers in Django models?

Categories

Resources