SQLAlchemy query to return only n results? - sqlalchemy

I have been googling and reading through the SQLAlchemy documentation but haven't found what I am looking for.
I am looking for a function in SQLAlchemy that limits the number of results returned by a query to a certain number, for example: 5? Something like first() or all().

for sqlalchemy >= 1.0.13
Use the limit method.
query(Model).filter(something).limit(5).all()

Alternative syntax
query.(Model).filter(something)[:5].all()

If you need it for pagination you can do like this:
query = db.session.query(Table1, Table2, ...).filter(...)
if page_size is not None:
query = query.limit(page_size)
if page is not None:
query = query.offset(page*page_size)
query = query.all()
Or if you query one table and have a model for it you can:
query = (Model.query(...).filter(...))
.paginate(page=start, per_page=size))

Since v1.4, SQLAlchemy core's select function provides a fetch method for RDBMS that support FETCH clauses*. FETCH was defined in the SQL 2008 standard to provide a consistent way to request a partial result, as LIMIT/OFFSET is not standard.
Example:
# As with limit queries, it's usually sensible to order
# the results to ensure results are consistent.
q = select(tbl).order_by(tbl.c.id).fetch(10)
# Offset is supported, but it is inefficient for large resultsets.
q_with_offset = select(tbl).order_by(tbl.c.id).offset(10).fetch(10)
# A suitable where clause may be more efficient
q = (select(tbl)
.where(tbl.c.id > max_id_from_previous_query)
.order_by(tbl.c.id)
.fetch(10)
)
The syntax is supported in the ORM layer since v1.4.38. It is only supported for 2.0-style select on models; the legacy session.query syntax does not support it.
q = select(Model).order_by(Model.id).fetch(10)
* Currently Oracle, PostgreSQL and MSSQL.

In my case it works like
def get_members():
m = Member.query[:30]
return m

Related

SQL check if list parameter is null in different RDBMS

I am creating an app which performs raw queries across different databases and I am struggling with list parameters (IN).
I use SQLAlchemy for performing these queries.
I want to perform a query that accepts list parameter and that parameter might be NULL, which means I don't have to filter by field.
from sqlalchemy import create_engine, text
SQL = """SELECT group, count(1) cnt
FROM some_table
WHERE group IN :groups OR :groups IS NULL
GROUP BY group
"""
params = {'groups': ('group1', 'group2')}
engine = create_engine(connection_string)
query = text(SQL).bindparams(**params)
cursor = engine.execute(query)
Currently I'm testing it on PostgreSQL, MySQL and SQLite, but in production mode it is also supposed to work with SQL Server and Oracle.
The code above works only on PostgreSQL, however if I change params with None
params = {'groups': None}
The code wouldn't work on any databases.
Is there workaround for this problem?
I understand that solution might be specific for each RDBMS.

Retrieving ultimate sql query sentence (with the values in place of any '?')

Since it may be efficient to paste a flawed sql query directly into a database administration tool such as phpmyadmin in order to work on it until it returns the expected result,
Is there any way to retrieve the ultimate sql sentence Sqlalchemy Core supposedly passes to the MySql database, in a ready-to-execute shape ?
This typically means that you want the bound parameters to be rendered inline. There is limited support for this automatically (as of SQLA 0.9 this will work):
from sqlalchemy.sql import table, column, select
t = table('x', column('a'), column('b'))
stmt = select([t.c.a, t.c.b]).where(t.c.a > 5).where(t.c.b == 10)
print(stmt.compile(compile_kwargs={"literal_binds": True}))
also you'd probably want the query to be MySQL specific, so if you already have an engine lying around you can pass that in too:
from sqlalchemy import create_engine
engine = create_engine("mysql://")
print(stmt.compile(engine, compile_kwargs={"literal_binds": True}))
and it prints:
SELECT x.a, x.b
FROM x
WHERE x.a > 5 AND x.b = 10
now, if you have more elaborate values in the parameters, like dates, SQLAlchemy might throw an error, it only has "literal binds" renderers for a very limited number of types. An approach that bypasses that system instead and gives you a pretty direct shot at turning those parameters into strings is then do to a "search and replace" on the statement object, replacing the bound parameters with literal strings:
from sqlalchemy.sql import visitors, literal_column
from sqlalchemy.sql.expression import BindParameter
def _replace(arg):
if isinstance(arg, BindParameter):
return literal_column(
repr(arg.effective_value) # <- do any fancier conversion here
)
stmt = visitors.replacement_traverse(stmt, {}, _replace)
once you do that you can just print it:
print(stmt)
or the MySQL version:
print(stmt.compile(engine))

Django admin MySQL slow INNER JOIN

I have a simple model with 3 ForeignKey fields.
class Car(models.Model):
wheel = models.ForeignKey('Wheel', related_name='wheels')
created = models.DateTimeField(auto_now_add=True)
max_speed = models.PositiveSmallIntegerField(null=True)
dealer = models.ForeignKey('Dealer')
category = models.ForeignKey('Category')
For the list view in the django admin i get 4 queries. One of them is a SELECT with 3 INNER JOINS. That one query is way to slow. Replacing the INNER JOINs with STRAIGHT_JOIN would fix the issue. Is there a way to patch the admin generated query just before it is evaluated?
I've implemented a fix for INNER JOIN for Django ORM, it will use STRAIGHT_JOIN in case of ordering with INNER JOINs. I talked to Django core-devs and we decided to do this as a separate backend for now. So you can check it out here: https://pypi.python.org/pypi/django-mysql-fix
However, there is one other workaround. Use a snippet from James's answer, but replace select_related with:
qs = qs.select_related('').prefetch_related('wheel', 'dealer', 'category')
It will cancel INNER JOIN and use 4 separate queries: 1 to fetch cars and 3 others with car_id IN (...).
UPDATE:
I've found one more workaround. Once you specify null=True in your ForeignKey field, Django will use LEFT OUTER JOINs instead of INNER JOIN. LEFT OUTER JOIN works without performance issues in this case, but you may face other issues that I'm not aware of yet.
You may just specify list_select_related = () to prevent django from using inner join:
class CarAdmin(admin.ModelAdmin):
list_select_related = ()
You could overwrite
def changelist_view(self, request, extra_context=None):
method in your admin class inherited from ModelAdmin class
something like this(but this question is rather old):
Django Admin: Getting a QuerySet filtered according to GET string, exactly as seen in the change list?
Ok, I found a way to patch the admin generated Query. It is ugly but it seems to work:
class CarChangeList(ChangeList):
def get_results(self, request):
"""Override to patch ORM generated SQL"""
super(CarChangeList, self).get_results(request)
original_qs = self.result_list
sql = str(original_qs.query)
new_qs = Car.objects.raw(sql.replace('INNER JOIN', 'STRAIGHT_JOIN'))
def patch_len(self):
return original_qs.count()
new_qs.__class__.__len__ = patch_len
self.result_list = new_qs
class CarAdmin(admin.ModelAdmin):
list_display = ('wheel', 'max_speed', 'dealer', 'category', 'created')
def get_changelist(self, request, **kwargs):
"""Return custom Changelist"""
return CarChangeList
admin.site.register(Rank, RankAdmin)
I came across the same issue in the Django admin (version 1.4.9) where fairly simple admin listing pages were very slow when backed by MySQL.
In my case it was caused by the ChangeList.get_query_set() method adding an overly-broad global select_related() to the query set if any fields in list_display were many-to-one relationships. For a proper database (cough PostgreSQL cough) this wouldn't be a problem, but it was for MySQL once more than a few joins were triggered this way.
The cleanest solution I found was to replace the global select_related() directive with a more targeted one that only joined tables that were really necessary. This was easy enough to do by calling select_related() with explicit relationship names.
This approach likely ends up swapping in-database joins for multiple follow-up queries, but if MySQL is choking on the large query many small ones may be faster for you.
Here's what I did, more or less:
from django.contrib.admin.views.main import ChangeList
class CarChangeList(ChangeList):
def get_query_set(self, request):
"""
Replace a global select_related() directive added by Django in
ChangeList.get_query_set() with a more limited one.
"""
qs = super(CarChangeList, self).get_query_set(request)
qs = qs.select_related('wheel') # Don't join on dealer or category
return qs
class CarAdmin(admin.ModelAdmin):
def get_changelist(self, request, **kwargs):
return CarChangeList
I've had slow admin queries on MySQL and found the easiest solution was to add STRAIGHT_JOIN to the query. I figured out a way to add this to a QuerySet rather than being forced to go to .raw(), which won't work with the admin, and have open sourced it as part of django-mysql. You can then just:
def get_queryset(self, request):
qs = super(MyAdmin, self).get_queryset(request)
return qs.straight_join()
MySQL still has this problem even in version 8 and Django still doesn't allow you to add STRAIGHT_JOIN in the query set. I found a hackish solution to add STRAIGHT_JOIN...:
This was tested with Django 2.1 and MySQL 5.7 / 8.0
def fixQuerySet(querySet):
# complete the SQL with params encapsulated in quotes
sql, params = querySet.query.sql_with_params()
newParams = ()
for param in params:
if not str(param).startswith("'"):
if isinstance(param, str):
param = re.sub("'", "\\'", param)
newParams = newParams + ("'{}'".format(param),)
else:
newParams = newParams + (param,)
rawQuery = sql % newParams
# escape the percent used in SQL LIKE statements
rawQuery = re.sub('%', '%%', rawQuery)
# replace SELECT with SELECT STRAIGHT_JOIN
rawQuery = rawQuery.replace('SELECT', 'SELECT STRAIGHT_JOIN')
return querySet.model.objects.raw(rawQuery)
Important: This method returns a raw query set so should be called just before consuming the query set

Django queryset count with extra select

I have a model with PointField for location coordinates. I have a MySQL function that calculates the distance between two points called dist. I use extra() "select" to calculate distance for each returned object in the queryset. I also use extra() "where" to filter those objects that are within a specific range. Like this
query = queryset.extra(
select={
"distance":"dist(geomfromtext('%s'),geomfromtext('%s'))"%(loc1, loc2)
},
where=["1 having `distance` <= %s"%(km)]
) #simplified example
This works fine for getting and reading the results, except when I try counting the resultset I get the error that 'distance' is not a field. After exploring a bit further, it seems that count ignores the "select" from extra and just uses "where". While the full SQL query looks like this:
SELECT (dist(geomfromtext('POINT (-4.6858300000000003 36.5154300000000021)'),geomfromtext('POINT (-4.8858300000000003 36.5154300000000021)'))) AS `distance`, `testmodel`.`id`, `testmodel`.`name`, `testmodel`.`email`, (...) FROM `testmodel` WHERE 1 having `distance` <= 50.0
The count query is much shorter and doesn't have the dist selection part:
SELECT COUNT( `testmodel`.`id`) FROM `testmodel` WHERE 1 having `distance` <= 50.0
Logically, MySQL gives an error because "distance" is undefined. Is there a way to tell Django it has to include the extra select for the count?
Thanks for any ideas!
You could use a raw query if you are not plannig to use any other database system.
params = {'point1':wktpoint1, 'point2':wktpoint2}
query = """
SELECT
dist(%(point1)s, %(point2)s)
FROM
testmodel
;"""
query_set = self.raw(query, params)
Also, if you need more GIS support, you should evaluate PostgreSQL+PostGIS (If you don't like to reinvent the wheel, you should not make your own dist function)
Django offers GIS support through GeoDjango. There you got functions like distance. You should check support here
In order to use GeoDjango you need to add a field on yout model, to tell them to use the GeoManager, Then you can start doing geoqueries, and you should have no problems with count.
with mysql you cando something like this using geodjango
### models.py
from django.contrib.gis.db import models
class YourModel(models.Model):
your_geo_field=models.PolygonField()
#your_geo_field=models.PointField()
#your_geo_field=models.GeometryField()
objects = models.GeoManager()
### your code
from django.contrib.gis.geos import *
from django.contrib.gis.measure import D
a_geom=fromstr('POINT(-96.876369 29.905320)', srid=4326)
distance=5
YoourModel.objects.filter(your_geo_field__distance_lt=(a_geom, D(m=distance))).count()
you can see better examples here and the reference here

How best to retrieve result of SELECT COUNT(*) from SQL query in Java/JDBC - Long? BigInteger?

I'm using Hibernate but doing a simple SQLQuery, so I think this boils down to a basic JDBC question. My production app runs on MySQL but my test cases use an in memory HSQLDB. I find that a SELECT COUNT operation returns BigInteger from MySQL but Long from HSQLDB.
MySQL 5.5.22
HSQLDB 2.2.5
The code I've come up with is:
SQLQuery tq = session.createSQLQuery(
"SELECT COUNT(*) AS count FROM calendar_month WHERE date = :date");
tq.setDate("date", eachDate);
Object countobj = tq.list().get(0);
int count = (countobj instanceof BigInteger) ?
((BigInteger)countobj).intValue() : ((Long)countobj).intValue();
This problem of the return type negates answers to other SO questions such as getting count(*) using createSQLQuery in hibernate? where the advice is to use setResultTransformer to map the return value into a bean. The bean must have a type of either BigInteger or Long, and fails if the type is not correct.
I'm reluctant to use a cast operator on the 'COUNT(*) AS count' portion of my SQL for fear of database interoperability. I realise I'm already using createSQLQuery so I'm already stepping outside the bounds of Hibernates attempts at database neutrality, but having had trouble before with the differences between MySQL and HSQLDB in terms of database constraints
Any advice?
I don't known a clear solution for this problem, but I will suggest you to use H2 database for your tests.
H2 database has a feature that you can connect using a compatibility mode to several different databases.
For example to use MySQL mode you connect to the database using this jdbc:h2:~/test;MODE=MySQL URL.
You can downcast to Number and then call the intValue() method. E.g.
SQLQuery tq = session.createSQLQuery("SELECT COUNT(*) AS count FROM calendar_month WHERE date = :date");
tq.setDate("date", eachDate);
Object countobj = tq.list().get(0);
int count = ((Number) countobj).intValue();
Two ideas:
You can get result value as String and then parse it to Long or BigInteger
Do not use COUNT(*) AS count FROM ..., better use something like COUNT(*) AS cnt ... but in your example code you do not use name of result column but it index, so you can use simply COUNT(*) FROM ...