Django admin MySQL slow INNER JOIN - mysql

I have a simple model with 3 ForeignKey fields.
class Car(models.Model):
wheel = models.ForeignKey('Wheel', related_name='wheels')
created = models.DateTimeField(auto_now_add=True)
max_speed = models.PositiveSmallIntegerField(null=True)
dealer = models.ForeignKey('Dealer')
category = models.ForeignKey('Category')
For the list view in the django admin i get 4 queries. One of them is a SELECT with 3 INNER JOINS. That one query is way to slow. Replacing the INNER JOINs with STRAIGHT_JOIN would fix the issue. Is there a way to patch the admin generated query just before it is evaluated?

I've implemented a fix for INNER JOIN for Django ORM, it will use STRAIGHT_JOIN in case of ordering with INNER JOINs. I talked to Django core-devs and we decided to do this as a separate backend for now. So you can check it out here: https://pypi.python.org/pypi/django-mysql-fix
However, there is one other workaround. Use a snippet from James's answer, but replace select_related with:
qs = qs.select_related('').prefetch_related('wheel', 'dealer', 'category')
It will cancel INNER JOIN and use 4 separate queries: 1 to fetch cars and 3 others with car_id IN (...).
UPDATE:
I've found one more workaround. Once you specify null=True in your ForeignKey field, Django will use LEFT OUTER JOINs instead of INNER JOIN. LEFT OUTER JOIN works without performance issues in this case, but you may face other issues that I'm not aware of yet.

You may just specify list_select_related = () to prevent django from using inner join:
class CarAdmin(admin.ModelAdmin):
list_select_related = ()

You could overwrite
def changelist_view(self, request, extra_context=None):
method in your admin class inherited from ModelAdmin class
something like this(but this question is rather old):
Django Admin: Getting a QuerySet filtered according to GET string, exactly as seen in the change list?

Ok, I found a way to patch the admin generated Query. It is ugly but it seems to work:
class CarChangeList(ChangeList):
def get_results(self, request):
"""Override to patch ORM generated SQL"""
super(CarChangeList, self).get_results(request)
original_qs = self.result_list
sql = str(original_qs.query)
new_qs = Car.objects.raw(sql.replace('INNER JOIN', 'STRAIGHT_JOIN'))
def patch_len(self):
return original_qs.count()
new_qs.__class__.__len__ = patch_len
self.result_list = new_qs
class CarAdmin(admin.ModelAdmin):
list_display = ('wheel', 'max_speed', 'dealer', 'category', 'created')
def get_changelist(self, request, **kwargs):
"""Return custom Changelist"""
return CarChangeList
admin.site.register(Rank, RankAdmin)

I came across the same issue in the Django admin (version 1.4.9) where fairly simple admin listing pages were very slow when backed by MySQL.
In my case it was caused by the ChangeList.get_query_set() method adding an overly-broad global select_related() to the query set if any fields in list_display were many-to-one relationships. For a proper database (cough PostgreSQL cough) this wouldn't be a problem, but it was for MySQL once more than a few joins were triggered this way.
The cleanest solution I found was to replace the global select_related() directive with a more targeted one that only joined tables that were really necessary. This was easy enough to do by calling select_related() with explicit relationship names.
This approach likely ends up swapping in-database joins for multiple follow-up queries, but if MySQL is choking on the large query many small ones may be faster for you.
Here's what I did, more or less:
from django.contrib.admin.views.main import ChangeList
class CarChangeList(ChangeList):
def get_query_set(self, request):
"""
Replace a global select_related() directive added by Django in
ChangeList.get_query_set() with a more limited one.
"""
qs = super(CarChangeList, self).get_query_set(request)
qs = qs.select_related('wheel') # Don't join on dealer or category
return qs
class CarAdmin(admin.ModelAdmin):
def get_changelist(self, request, **kwargs):
return CarChangeList

I've had slow admin queries on MySQL and found the easiest solution was to add STRAIGHT_JOIN to the query. I figured out a way to add this to a QuerySet rather than being forced to go to .raw(), which won't work with the admin, and have open sourced it as part of django-mysql. You can then just:
def get_queryset(self, request):
qs = super(MyAdmin, self).get_queryset(request)
return qs.straight_join()

MySQL still has this problem even in version 8 and Django still doesn't allow you to add STRAIGHT_JOIN in the query set. I found a hackish solution to add STRAIGHT_JOIN...:
This was tested with Django 2.1 and MySQL 5.7 / 8.0
def fixQuerySet(querySet):
# complete the SQL with params encapsulated in quotes
sql, params = querySet.query.sql_with_params()
newParams = ()
for param in params:
if not str(param).startswith("'"):
if isinstance(param, str):
param = re.sub("'", "\\'", param)
newParams = newParams + ("'{}'".format(param),)
else:
newParams = newParams + (param,)
rawQuery = sql % newParams
# escape the percent used in SQL LIKE statements
rawQuery = re.sub('%', '%%', rawQuery)
# replace SELECT with SELECT STRAIGHT_JOIN
rawQuery = rawQuery.replace('SELECT', 'SELECT STRAIGHT_JOIN')
return querySet.model.objects.raw(rawQuery)
Important: This method returns a raw query set so should be called just before consuming the query set

Related

How to make sqlachemy see implicit lateral joins as in json_each or jsonb_each?

I'm trying to figure out the proper way of using json_each. I've seen some tricks like using column or text. So far I've found a quite clean way using table_valued, that works except for the cross join warning.
term = 'connection'
about_exp = func.json_each(EventHistory.event, '$.about').table_valued('value')
events = s.query(EventHistory).filter(about_exp.c.value == term)
EventHistory contains one json field that looks like this: {"about": ["antenna", "connection", "modem", "network"]}
The resulting query works as expected but I'm getting the following warning:
SAWarning: SELECT statement has a cartesian product between FROM element(s) "event_history" and FROM element "anon_1". Apply join condition(s) between each element to resolve.
For any one that would like to experiment here is a working example in from of unit tests: https://gist.github.com/PiotrCzapla/579f76bdf95a485eaaafed1492d9a70e
So far the only way I found not to emit the warning is to add join(about_exp, true())
from sqlalchemy import true
about_exp = func.json_each(EventHistory.event, '$.about').table_valued('value')
events = s.query(EventHistory).join(about_exp, true()).filter(
about_exp.c.value == about_val
)
But it needs additional import of true and additional join statement, if anyone has a better solution please let me know.
As of sqlalchemy version 1.4.33, you can use the joins_implicitly=True option for table_valued.
term = 'connection'
about_exp = func.json_each(EventHistory.event, '$.about').table_valued('value', joins_implicitly=True)
events = s.query(EventHistory).filter(about_exp.c.value == term)
joins_implicitly – when True, the table valued function may be used in the FROM clause without any explicit JOIN to other tables in the SQL query, and no “cartesian product” warning will be generated. May be useful for SQL functions such as func.json_each().
source

How do I form a Django query with an arbitrary number of OR clauses?

I'm using Django 2.0 with Python 3.7. I want to write a query that returns results if the fields contains at least one of the strings in an array, so I want to set up an OR query. I have tried this
class CoopManager(models.Manager):
...
# Meant to look up coops case-insensitively by part of a type
def contains_type(self, types_arr):
queryset = Coop.objects.all()
for type in types_arr:
queryset = queryset.filter(type__name__icontains=type)
print(queryset.query)
return queryset
However, this produces a query that ANDs the clauses together. How do I perform the above but join all the clauses with an OR instead of an AND?
I'm using MySql 5.7 but I'd like to know a db independent solution if one exists.
You can create a Q object that constructs this disjunction:
from django.db.models import Q
filter = Q(
*[('type__name__icontains', type) for type in types_arr],
_connector=Q.OR
)
queryset = Coop.objects.filter(filter)
Here filter is a Q object that is a disjunction of all the type__name__icontains=type filters.

Examining SQLAlchemy query results in Pyramid

I'm trying to add a method to an (quite big) existing project writtent in python with pyramid framework and sqlalchemy ORM. I've wanted to execute an sql query with sqlalchemy but I've never developped with pyramid or sqlalchemy before. So I would like to test it and see if the query returns what I'm expecting but I don't want to add useless code to test my query ( like a new template, a view etc). My SQL query is :
select a.account_type, u.user_id from accounts a inner join account_users au on a.account_id=au.account_id inner join users u on u.user_id=au.user_id where u.user_id = ?;
And my method is :
def find_account_type_from_user_id(self,user_id):
'''
Method that finds the account type (one/several points of sale...)
from the id of the user who is linked to this account
:param user_id:
:return:(string) account_type
'''
q = self.query(Account)\
.join(AccountUser)\
.join(User)\
.filter(User.user_id == user_id)\
.one()
return q
ps: I've already searched on the internet but I only find things like : unit tests etc and I've never did that. (Noob's sorry).
Unit tests are a must to test new services, fixes, refactoring code, etc, you need a good collection of unit tests.
You can start here.
Two ways to see SQLAlchemy query content
Set sqlalchemy logging level to INFO - see instructions https://opensourcehacker.com/2016/05/22/python-standard-logging-pattern/
Use pyramid_debugtoolbar and it shows all queries your view made
Execute query interactively using pshell - no views need to be added

SQLAlchemy query to return only n results?

I have been googling and reading through the SQLAlchemy documentation but haven't found what I am looking for.
I am looking for a function in SQLAlchemy that limits the number of results returned by a query to a certain number, for example: 5? Something like first() or all().
for sqlalchemy >= 1.0.13
Use the limit method.
query(Model).filter(something).limit(5).all()
Alternative syntax
query.(Model).filter(something)[:5].all()
If you need it for pagination you can do like this:
query = db.session.query(Table1, Table2, ...).filter(...)
if page_size is not None:
query = query.limit(page_size)
if page is not None:
query = query.offset(page*page_size)
query = query.all()
Or if you query one table and have a model for it you can:
query = (Model.query(...).filter(...))
.paginate(page=start, per_page=size))
Since v1.4, SQLAlchemy core's select function provides a fetch method for RDBMS that support FETCH clauses*. FETCH was defined in the SQL 2008 standard to provide a consistent way to request a partial result, as LIMIT/OFFSET is not standard.
Example:
# As with limit queries, it's usually sensible to order
# the results to ensure results are consistent.
q = select(tbl).order_by(tbl.c.id).fetch(10)
# Offset is supported, but it is inefficient for large resultsets.
q_with_offset = select(tbl).order_by(tbl.c.id).offset(10).fetch(10)
# A suitable where clause may be more efficient
q = (select(tbl)
.where(tbl.c.id > max_id_from_previous_query)
.order_by(tbl.c.id)
.fetch(10)
)
The syntax is supported in the ORM layer since v1.4.38. It is only supported for 2.0-style select on models; the legacy session.query syntax does not support it.
q = select(Model).order_by(Model.id).fetch(10)
* Currently Oracle, PostgreSQL and MSSQL.
In my case it works like
def get_members():
m = Member.query[:30]
return m

what is the equivalent ORM query in Django for sql join

I have two django models and both have no relation to each other but have JID in common(I have not made it foreign key):
class result(models.Model):
rid = models.IntegerField(primary_key=True, db_column='RID')
jid = models.IntegerField(null=True, db_column='JID', blank=True)
test_case = models.CharField(max_length=135, blank=True)
class job(models.Model):
jid = models.IntegerField(primary_key = True, db_column='JID')
client_build = models.IntegerField(max_length=135,null=True, blank=True)
I want to achieve this sql query in ORM:
SELECT *
FROM result
JOIN job
ON job.JID = result.JID
Basically I want to join two tables and then perform a filter query on that table.
I am new to ORM and Django.
jobs = job.objects.filter(jid__in=result.objects.values('jid').distinct()
).select_related()
I don't know how to do that in Django ORM but here are my 2 cents:
any ORM makes 99% of your queries super easy to write (without any SQL). For the 1% left, you've got 2 options: understand the core of the ORM and add custom code OR simply write pure SQL. I'd suggest you to write the SQL query for it.
if both table result and job have a JID, why won't you make it a foreign key? I find that odd.
a class name starts with an uppercase, class *R*esult, class *J*ob.
You can represent a Foreign Key in Django models by modifying like this you result class:
class result(models.Model):
rid = models.IntegerField(primary_key=True, db_column='RID')
# jid = models.IntegerField(null=True, db_column='JID', blank=True)
job = models.ForeignKey(job, db_column='JID', blank=True, null=True, related_name="results")
test_case = models.CharField(max_length=135, blank=True)
(I've read somewhere you need to add both blank=True and null=True to make a foreign key optional in Django, you may try different options).
Now you can access the job of a result simply by writing:
myresult.job # assuming myresult is an instance of class result
With the parameter related_name="results", a new field will automatically be added to the class job by Django, so you will be able to write:
myjob.results
And obtain the results for the job myjob.
It does not mean it will necessarilly be fetched by Django ORM with a JOIN query (it will probably be another query instead), but the effect will be the same from your code's point of view (performance considerations aside).
You can find more information about models.ForeignKey in Django documentation.