I have two sql tables (messages, messages_processed), both are similar. messages_processed table has one column extra than messages one, other columns data types/structure are same in both. While showing all messages (processed/regular) for a particular user, a union of these two tables should be applied.
Class Message(object):
def __init__(self, sender_id, text, user_id):
self.sender_id = sender_id
self.text = text
self.user_id = user_id
self.categories = [] #(N:M relation)
Class MessageProcessed(object):
def __init__(self, sender_id, text, user_id, action):
self.sender_id = sender_id
self.text = text
self.user_id = user_id
self.categories = [] #(N:M relation)
self.action = action
I cannot change the existing structure of table. I need to do some thing like this which will result array of orm objects with N:M mapping.
session.query(Message).filter(Message.user_id==12)
.union(session.query(MessageProcessed)
.filter(MessageProcessed.user_id==12)).all()
It looks like you cannot use UNION in your case because each SELECT statement within the UNION must have the same number of columns.
Related
I have two models with a simple FK relationship, Stock and Restriction (Restriction.stock_id FK to Stock).
class Restriction(Model):
__tablename__ = "restrictions"
id = db.Column(db.Integer, primary_key=True)
stock_id = FK("stocks.id", nullable=True)
name = db.Column(db.String(50), nullable=False)
class Stock(Model):
__tablename__ = "stocks"
id = db.Column(db.Integer, primary_key=True)
ticker = db.Column(db.String(50), nullable=False, index=True)
I would like to retrieve Restriction object and related Stock but only Stock's ticker (there are other fields omitted here). I can simply do this with:
from sqlalchemy.orm import *
my_query = Restriction.query.options(
joinedload(Restriction.stock).load_only(Stock.ticker)
)
r = my_query.first()
I get all columns for Restriction and only ticker for Stocks with above. I can see this in the SQL query run and also I can access r.stock.ticker and see no new queries run as it is loaded eagerly.
The problem is I cannot filter on stocks now, SQLAlchemy adds another FROM clause if I do my_query.filter(Stock.ticker == 'GME'). This means there is a cross product and it is not what I want.
On the other hand, I cannot load selected columns from relationship using join. ie.
Restriction.query.join(Restriction.stock).options(load_only(Restriction.stock))
does not seem to work with relationship property. How can I have a single filter and have it load selected columns from relationship table?
SQL I want run is:
SELECT restrictions.*, stocks.ticker
FROM restrictions LEFT OUTER JOIN stocks ON stocks.id = restrictions.stock_id
WHERE stocks.ticker = 'GME'
And I'd like to get a Restriction object back with its stock and stock's ticker. Is this possible?
joinedload basically should not be used with filter. You probably need to take contains_eager option.
from sqlalchemy.orm import *
my_query = Restriction.query.join(Restriction.stock).options(
contains_eager(Restriction.stock).load_only(Stock.ticker)
).filter(Stock.ticker == 'GME')
r = my_query.first()
Because you are joining using stock_id it will also be in the results as Stock.id beside Stock.ticker. But other fields would be omitted as you wish.
I have written short post about it recently if you are interested: https://jorzel.hashnode.dev/an-orm-can-bite-you
Imagine the following (example) datamodel:
class Organization(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
friendly_name = db.Column(db.Text, nullable=False)
users = db.relationship('Users', back_populates='organizations')
groups = db.relationship('Groups', back_populates='organizations')
class User(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="users")
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="groups")
(so basically an Organization has User and Group relationships)
What we want is to retrieve the counts for users and groups. Result should be similar to the following:
id
friendly_name
users_count
groups_count
1
o1
33
3
2
o2
12
2
3
o3
1
0
This can be achieved with a query similar to
query = db.session.query(
Organization.friendly_name,
func.count(User.id.distinct()).label('users_count'),
func.count(Group.id.distinct()).label('groups_count'),
) \
.outerjoin(User, Organization.users) \
.outerjoin(Group, Organization.groups) \
.group_by(Organization.id)
which seems quite overkill. The first intuitive approach would be something like
query = db.session.query(
Organization.friendly_name,
func.count(distinct(Organization.users)).label('users_count'),
func.count(distinct(Organization.groups).label('groups_count'),
)# with or without outerjoins
which is not working (Note: With one relationship it would work).
a) Whats the difference between User.id.distinct() and distinct(Organization.users) in this case?
b) What would be the best/most performant/recommended way in SQLAlchemy to get a count for each relationship an Object has?
Bonus): If instead of Organization.friendly_name the whole Model would be selected (...query(Organization, func....)) SQLAlchemy returns a tuple with the format t(Organization, users_count, groups_count) as result. Is there a way to just return the Organization with the two counts as additional fields? (as SQL would)
b:
You can try a window function to count users and groups with good performance:
query = db.session.query(
Organization.friendly_name,
func.count().over(partition_by=(User.id, Organization.id)).label('users_count')
func.count().over(partition_by=(Group.id, Organization.id)).label('groups_count')
)
.outerjoin(User, Organization.users)
.outerjoin(Group, Organization.groups)
bonus:
To return count as a field of Organization, you can use hybrid_property, but you would not be happy with the performance.
I have a model
class Foo(models.Model):
first = models.CharField()
second = models.CharField()
data I have is
first second
1 2
1 2
1 2
3 4
Now I want to delete all duplicate rows and keep one entry. The end result
first second
1 2
3 4
How do I do this? I checked this question but could not figure it out properly. Annotate
I have tried
foo_ids = Foo.objects.annotate(first_c=Count('first'), second_c=Count('second')).filter(first_c__gt=1, second_c__gt=1).values('first', 'second', 'id')
Then try and figure out how to not delete one of each I list of duplicates.
I ended up doing this.
from django.db.models import Count
duplicate_foo = Foo.objects.values('req_group','child_program__id').annotate(id_c=Count('id')).filter(id_c__gt=1)
for dups in duplicate_foo:
for i, val in enumerate(Foo.objects.filter(first=dups['first'],
second=dups['second'])):
if i ==0:
continue
val.delete()
Not the most optimzed solution. But it works.
It's an older thread, but both answers don't fit the bill for large datasets and lead to a huge number of queries.
You can use this generic method instead:
from django.apps import apps
from django.db.models import Count, Max
def get_duplicates_from_fields(app, model, fields, delete=False, **filters):
"""
Returns duplicate records based on a list of fields,
optionally deletes duplicate records.
"""
Model = apps.get_model(app, model)
duplicates = (Model.objects.filter(**filters).values(*fields)
.order_by()
.annotate(_max_id=Max('id'), _count=Count('id'))
.filter(_count__gt=1))
for duplicate in duplicates:
if delete:
(
Model.objects
.filter(**filters)
.filter(**{x: duplicate[x] for x in fields})
.exclude(id=duplicate['_max_id'])
.delete()
)
else:
print(duplicate)
You can use this method as such:
get_duplicates_from_fields('myapp', 'Foo', ['first', 'second'], True)
This lets you find and delete duplicate records based on any number of fields.
I want to insert multi records into my belongs table, while the records are selected from two tables by SQLFORM.grid.
my table:
db.define_table('problem',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('tasks',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('belongs',
Field('task_id','reference tasks'),
Field('problem_id','reference problem'))
select some records from problem table and select one record from tasks table then insert into belongs table. Can it be realized by SQLFORM.grid ?
def problemtask():
form=SQLFORM.grid(db.problem,selectable =lambda ids:insert(ids,ids1))
form1=SQLFORM.grid(db.tasks,selectable = lambda ids1 :insert(ids,ids1) )
return dict(form=form,form1=form1)
def insert(ids,ids1):
thanks!
Select one record from one table then select some records from another table,last insert the combination into the third table.
def showtask():
id=request.args(0,cast=int)#id为course_id
db.tasks._common_filter = lambda query: db.tasks.course_id ==id
links=[lambda row:A('createproblem',_href=URL("default","addproblem",args=[row.id])),
lambda row:A('showproblem',_href=URL("default","showproblem",args=[row.id]))]
form=SQLFORM.smartgrid(db.tasks,args=request.args[:1],links=links,linked_tables=[''],csv=False)
return dict(form=form)
def mulassignproblem():
taskid=request.args(0,cast=int)
form=SQLFORM.grid(db.problem,args=request.args[:1],selectable = lambda ids :mulproblem(ids,taskid))
return dict(form=form)
def mulproblem(ids,taskid):
problemids=ids
taskids=taskid
for problemid in problemids:
if db((db.belongs.problem_id==problemid)&(db.belongs.task_id==taskids)).count():
pass
else:
db.belongs.insert(task_id=taskid,problem_id=problemid)
Here is the code from django docs that explains the use of managers.
class PollManager(models.Manager):
def with_counts(self):
from django.db import connection
cursor = connection.cursor()
cursor.execute("""
SELECT p.id, p.question, p.poll_date, COUNT(*)
FROM polls_opinionpoll p, polls_response r
WHERE p.id = r.poll_id
GROUP BY p.id, p.question, p.poll_date
ORDER BY p.poll_date DESC""")
result_list = []
for row in cursor.fetchall():
p = self.model(id=row[0], question=row[1], poll_date=row[2])
p.num_responses = row[3]
result_list.append(p)
return result_list
class OpinionPoll(models.Model):
question = models.CharField(max_length=200)
poll_date = models.DateField()
objects = PollManager()
class Response(models.Model):
poll = models.ForeignKey(OpinionPoll)
person_name = models.CharField(max_length=50)
response = models.TextField()
I have two questions based on this code:
1) where is r.poll_id coming from? I understand Response has foreignKey relationship to OpinionPoll. In order to JOIN OpinionPoll table with Response table, I need to join on their id.
HOwever to access the poll id in Response, I would do r.poll.id.
Is the syntax, r.poll_id, a MySQL syntax.
why GROUP BY p.id, p.question, p.poll_date? why GROUP BY p.id alone is not sufficient?
2) Is it possible to turn the above raw SQL query into a django ORM query?If so how would that look like?
I am not a SQL guy. so bear with me, if this sounds stupid
EDIT:
If I want to create OpinionPoll and Response tables outside of Django, how will SQL statment for create look like?
In the Django shell, when I run
python manage.py sqlall appname
I get the following:
BEGIN;
CREATE TABLE "myapp_opinionpoll" (
"id" integer NOT NULL PRIMARY KEY,
"question" varchar(200) NOT NULL,
"poll_date" date NOT NULL
)
;
CREATE TABLE "myapp_response" (
"id" integer NOT NULL PRIMARY KEY,
"poll_id" integer NOT NULL REFERENCES "myapp_opinionpoll" ("id"),
"person_name" varchar(50) NOT NULL,
"response" text NOT NULL
)
;
CREATE INDEX "larb_response_70f78e6b" ON "myapp_response" ("poll_id");
COMMIT;
I see something like REFERENCES "myapp_opinionpoll" and CREATE INDEXabove. I am not sure
if this is how in SQL it is done?
[1] Django model will create foreign keys like fieldname_id as the field in mysql. So you see the field poll = models.ForeignKey(OpinionPoll) creates this field.
About GROUP BY, because these fields are exactly what selected, except for the aggregate function, grouping them exactly can make them distinct.
[2] Try this, I didn't debug, but may helps:
from django.db.models import Count
OpinionPoll.objects.annotate(num_responses=Count('response'))
For more about aggregation, see the docs: https://docs.djangoproject.com/en/1.6/topics/db/aggregation/