SQLAlchemy db.session.query() vs model.query

SQLAlchemy db.session.query() vs model.query - sqlalchemy

For a simple return all results query should one method be preferred over the other? I can find uses of both online but can't really find anything describing the differences.
db.session.query([my model name]).all()
[my model name].query.all()
I feel that [my model name].query.all() is more descriptive.

It is hard to give a clear answer, as there is a high degree of preference subjectivity in answering this question.
From one perspective, the db.session is desired, because the second approach requires it to be incorporated in your model as an added step - it is not there by default as part of the Base class. For instance:
Base = declarative_base()
DBSession = scoped_session(sessionmaker())
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
session = Session()
print(User.query)
That code fails with the following error:
AttributeError: type object 'User' has no attribute 'query'
You need to do something like this:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
query = DBSession.query_property()
However, it could also be argued that just because it is not enabled by default, that doesn't invalidate it as a reasonable way to launch queries. Furthermore, in the flask-sqlalchemy package (which simplifies sqlalchemy integration into the flask web framework) this is already done for you as part of the Model class (doc). Adding the query property to a model can also be seen in the sqlalchemy tutorial (doc):
class User(object):
query = db_session.query_property()
....
Thus, people could argue either approach.
I personally have a preference for the second method when I am selecting from a single table. For example:
serv = Service.query.join(Supplier, SupplierUsr).filter(SupplierUsr.username == usr).all()
This is because it is of smaller line length and still easily readable.
If am selecting from more than one table or specifying columns, then I would use the model query method as it extracting information from more than one model.
deliverables = db.session.query(Deliverable.column1, BatchInstance.column2).\
join(BatchInstance, Service, Supplier, SupplierUser). \
filter(SupplierUser.username == str(current_user)).\
order_by(Deliverable.created_time.desc()).all()
That said, a counter argument could be made in always using the session.query method as it makes the code more consistent, and when reading left to right, the reader immediately knows that the sqlalchemy directive they are going to read will be query, before mentally absorbing what tables and columns are involved.
At the end of the day, the answer to your question is subjective and there is no correct answer, and any code readability benefits either way are tiny. The only thing where I see a strong benefit is not to use model query if you are selecting from many tables and instead use the session.query method.

Related

What is the best practice for lookup values in SQLAlchemy?

I am writing a pretty basic Flask application using Flask-SQLAlchemy for tracking inventory and distribution. I could use some guidance on how the best way to handle a lookup table for common values. My database back end will be MySQL and ElasticSearch for searches.
If I have a common mapping structure where all data going into a specific table, say Vehicle, have a common list of values to look up against for the Vehicle.make column, what would the best way to achieve this be?
My thought for approaching this is one of two ways:
Lookup Table
I could set something up similar to this where I have a relationship, and store the make in VehicleMake. However, if my expected list of makes is low (say 10), this seems unnecessary.
class VehicleMake(Model):
id = Column(Integer, primary_key=True)
name = Column(String(16))
cars = relationship('Vehicle', backref='make', lazy='dynamic')
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
Store as a String
I could just store this as a string on the Vehicle model. But would it be a waste of space to store a common value as a string?
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
make = Column(String(16))
My original idea was just to have a dict containing a mapping like this and reference it as needed within the model. I am just not clear how to tie this in when returning the vehicle model.
MAKE_LIST = {
1: 'Ford',
2: 'Dodge',
3: 'Chevrolet'
}
Any feedback is welcome - and if there is documentation that covers this specific scenario I'm happy to read that and answer this question myself. My expected volume is going to be low (40-80 records per week) so it doesn't need to be ridiculously fast, I just want to follow best practices.

The short answer is it depends.
The long answer is that it depends on what you store along with the make of said vehicles and how often you expect to add new types.
If you need to store more than just the name of each make, but also some additional metadata, like the size of the gas tank, the cargo space, or even a sortkey, go for an additional table. The overhead of such a small table is minimal, and if you communicate with the frontend using make ids instead of make names, there is no problem at all with this. Just remember to add an index to vehicle.make_id to make the lookups efficient.
class VehicleMake(Model):
id = Column(Integer, primary_key=True)
name = Column(String(16))
cars = relationship('Vehicle', back_populates="make", lazy='dynamic')
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
make_id = Column(Integer, ForeignKey('vehicle_make.id'), nullable=False)
make = relationship("VehicleType", innerjoin=True)
Vehicle.query.get(1).make.name # == 'Ford', the make for vehicle 1
Vehicle.query.filter(Vehicle.make_id == 2).all() # all Vehicles with make id 2
Vehicle.query.join(VehicleMake)\
.filter(VehicleMake.name == 'Ford').all() # all Vehicles with make name 'Ford'
If you don't need to store any of that metadata, then the need for a separate table disappears. However, the general problem with strings is that there is a high risk of spelling errors and capital/lowercase letters screwing up your data consistency. If you don't need to add new makes much, it's a lot better to just use Enums, there are even MySQL specific ones in SQLAlchemy.
import enum
class VehicleMake(enum.Enum):
FORD = 1
DODGE = 2
CHEVROLET = 3
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
make = Column(Enum(VehicleMake), nullable=False)
Vehicle.query.get(1).make.name # == 'FORD', the make for vehicle 1
Vehicle.query.filter(Vehicle.make == VehicleMake(2)).all() # all Vehicles with make id 2
Vehicle.query.filter(Vehicle.make == VehicleMake.FORD).all() # all Vehicles with make name 'Ford'
The main drawback of enums is that they might be hard to extend with new values, although at least for Postgres the dialect specific version was a lot better at this than the general SQLAlchemy one, have a look at sqlalchemy.dialects.mysql.ENUM instead. If you want to extend your existing enum, you can always just execute raw SQL in your Flask-Migrate/Alembic migrations.
Finally, the benefits of using strings is that you can always programmatically enforce your data consistency. But, this comes at the cost that you have to programmatically enforce your data consistency. If the vehicle make can be changed or inserted by external users, even colleagues, this will get you in trouble unless you're very strict about what enters your database. For example, it might be nice to uppercase all values for easy grouping, since it effectively reduces how much can go wrong. You can do this during writing, or you can add an index on sqlalchemy.func.upper(Vehicle.make) and use hybrid properties to always query the uppercase value.
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
_make = Column('make', String(16))
#hybrid_property
def make(self):
return self._make.upper()
#make.expression
def make(cls):
return func.upper(cls._make)
Vehicle.query.get(1).make.upper() # == 'FORD', the make for vehicle 1
Vehicle.query.filter(Vehicle.make == 'FORD').all() # all Vehicles with make name 'FORD'
Before you make your choice, also think about how you want to present this to your user. If they should be able to add new options themselves, use strings or the separate table. If you want to show a dropdown of possibilities, use the enum or the table. If you have an empty database, it's going to be difficult to collect all string values to display in the frontend without needing to store this as a list somewhere in your Flask environment as well.

Can not mix get and filter together?

When I am trying to get next query:
answer = sess.query(User).filter(User.id==1).get(1)
I am getting error: sqlalchemy.exc.InvalidRequestError: Query.get() being called on a Query with existing criterion.
The query:
answer = sess.query(User).get(1)
works fine.
Why the first one is not working?
My class definition:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
adr = relationship('Address', backref='uuu')

From documentation of Query.get:
get() is only used to return a single mapped instance, not multiple instances or individual column constructs, and strictly on a single primary key value. The originating Query must be constructed in this way, i.e. against a single mapped entity, with no additional filtering criterion. Loading options via options() may be applied however, and will be used if the object is not yet locally present.

SQLAlchemy Migrating from ORM to Core

When we originally built our app, we were using SQLAlchemy ORM, but as time goes on, we're getting more and more frustrated with its intricacies and overhead and want to move to something faster and more explicit in making queries performant. In some early tests, it looks like Core hits those needs, so we'd like to start making the switch.
All of our current models are defined using Declarative for example:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=False)
autocomplete_crosses = relationship(u'Parent')
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
user = Column(ForeignKey('users.id'), nullable=False)
name = Column(String(100), nullable=False)
users = relationship(User')
Is there an easier way of using these with declarative aside from needing to call __table__ every time. Here's what I'm currently doing, which feels verbose and ugly:
s = select([User.__table__.c.name, Parent.__table__.c.name])\
.select_from(User.__table__.join(Parent.__table__))
Can I access my table names directly and just use those to make this cleaner?

If you'd like to do it automatically for all tables, you can reflect it after you've defined all your classes:
for cls in Base._decl_class_registry.values():
if hasattr(cls, "__table__") and cls.__table__ is not None:
globals()[cls.__table__.name] = cls.__table__
I don't recommend actually doing this because it's very magical. In particular, it breaks linters because it modifies globals(), but this is how you would do it.

Django GenereicForeignKey v/s custom manual fields performance/optimization

I'm trying to build a typical social networking site. there are two types of objects mainly.
photo
status
a user can like photo and status. (Note that these two are mutually exclusive)
means, We have two table (1) for Image only and other for status only.
now when a user likes an object(it could be a photo or status) how should I store that info.
I want to design a efficient SQL schema for this.
Currently I'm using Genericforeignkey(GFK)
class LikedObject(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
but yesterday I thought if I can do this without using GFK efficiently?
class LikedObject(models.Model):
OBJECT_TYPE = (
('status', 'Status'),
('image', 'Image/Photo'),
)
user = models.ForeignKey(User, related_name="liked_objects")
obj_id = models.PositiveIntegerField()
obj_type = models.CharField(max_length=63, choices=OBJECT_TYPE)
the only difference I can understand is that I have to make two queries if I want to get all liked_status of a particular user
status_ids = LikedObject.objects.filter(user=user_obj, obj_type='status').values_list('object_id', flat=True)
status_objs = Status.objects.filter(id__in=status_ids)
Am I correct? so What would be the best approach in terms of easy querying/inserting or performance, etc.

You are basically implementing your own Generic Object, only you limit your ContentType to your hard coded OBJECT_TYPE.
If you are only going to access the database as in your example (get all status objects liked by user x), or a couple specific queries, then your own implementation can be a little faster, of course. But obviously, if later you have to add more objects, or do other things, you may find yourself implementing your whole full generic solution. And like they say, why reinvent the wheel.
If you want better performance, and really only have those two Models to worry about, you may just want to have two different Like tables (StatusLike and ImageLike) and use inheritance to share functionality.
class LikedObject(models.Model):
common_field = ...
class Meta:
abstract = True
def some_share_function():
...
class StatusLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="status_liked_objects")
status = models.ForeignKey(Status, related_name="liked_objects")
class ImageLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="image_liked_objects")
image = models.ForeignKey(Image, related_name="liked_objects")
Basically, either you have a lot of Models to worry about, and then you probably want to use the more Django generic object implementation, or you only have two models, and why even bother with a half generic solution. Just use two tables.

In this case, I would check if your data objects Status and Photo may have many common data fields, e.g. Status.user and Photo.user, Status.title and Photo.title, Status.pub_date and Photo.pub_date, Status.text and Photo.caption, etc.
Could you combine them into an Item object maybe? That Item would have a Item.type field, either "photo" or "status"? Then you would only have a single table and a single object type a user can "like". Much simpler at basically no cost.
Edit:
from django.db import models
from django.utils.timezone import now
class Item(models.Model):
data_type = models.SmallIntegerField(
choices=((1, 'Status'), (2, 'Photo')), default=1)
user = models.ForeignKey(User)
title = models.CharField(max_length=100)
pub_date = models.DateTimeField(default=now)
...etc...
class Like(models.Model):
user = models.ForeignKey(User, related_name="liked_objects")
item = models.ForeignKey(Item)

Edit orm object based on query with label fields

Time for more pushing the limits of sqlalchemy. It never ceases to amaze!
Background
I have table for devices, and a table to record physical links between them.
class Device(Base):
__tablename__ = "device"
device_id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String(255), nullable=False)
class PhysicalLink(Base):
__tablename__ = "physical_link"
physical_links_id = sa.Column(sa.Integer, primary_key=True)
device_id_1 = sa.Column(sa.types.Integer, sa.ForeignKey(Device.device_id), nullable=False)
device_port_1 = sa.Column(sa.String(255), nullable=False)
device_id_2 = sa.Column(sa.types.Integer, sa.ForeignKey(Device.device_id), nullable=False)
device_port_2 = sa.Column(sa.String(255), nullable=False)
cable_number = sa.Column(sa.String(255), nullable=False)
When I dealing with the physical links for a know device, I don't want to have to always have if statements to decide whether I should be looking at device_[id|port]_ 1 or 2, so I did:
physical_links_table = PhysicalLinks.__table__
physical_links_ua = union_all(
select((
physical_links_table.c.physical_links_id,
label('this_device_id', physical_links_table.c.device_id_1),
label('this_device_port', physical_links_table.c.device_port_1),
label('other_device_id', physical_links_table.c.device_id_2),
label('other_device_port', physical_links_table.c.device_port_2),
physical_links_table.c.cable_number,
),),
select((
physical_links_table.c.physical_links_id,
label('this_device_id', physical_links_table.c.device_id_2),
label('this_device_port', physical_links_table.c.device_port_2),
label('other_device_id', physical_links_table.c.device_id_1),
label('other_device_port', physical_links_table.c.device_port_1),
physical_links_table.c.cable_number,
),),
).alias('physical_links_ua')
class PhysicalLinksDir(object):
pass
physical_links_dir_mapper = orm.mapper(PhysicalLinksDir, physical_links_ua)
physical_links_dir_mapper.add_property(
'this_device', orm.relation(Device, primaryjoin=(PhysicalLinksDir.this_device_id == Device.device_id)))
physical_links_dir_mapper.add_property(
'other_device', orm.relation(Device, primaryjoin=(PhysicalLinksDir.other_device_id == Device.device_id)))
This allows me to do:
physical_links = (db_session
.query(PhysicalLinksDir)
.filter(PhysicalLinksDir.this_device_id = my_device.device_id)
.options(joinedload('other_device')))
for pl in physical_links:
print pl.other_device
(Did I remember to tell you that I think that sqlalchmey rocks!)
Question
What do I need to do to make it possible to modify PhysicalLinksDir instance attributes, and be able to commit them back to the db?

In general, you will have to be very careful with updating it the way you want,
because those view objects PhysicalLinksDir will not always be in-sync with the
underlying Device and PhysicalLink you might have in session/database.
I obviously do not know your requirements, but I prefer not to have such inconsistencies when working with my model.
Also, there is a problem with the kind of mapping you have. You would expect to have 2 rows of PhysicalLinksDir for each row of PhysicalLink (one for each side), but if you try it, you will see this is not the case. The reason for this is that the first column (physical_links_id) is considered to be a primary_key so the
query object will discard the second one with the same value.
In order to fix it, you need to configure the primary_key manually. Assuming there can be only one
connection between two different Devices, the solution below will do the trick. You might need to extend it to include the port as well:
physical_links_dir_mapper = orm.mapper(PhysicalLinksDir, physical_links_ua,
# #note: add this
primary_key=[physical_links_ua.c.physical_links_id, physical_links_ua.c.this_device_id],
)
DELETE: Now, to support delete, all you need to do is to add a relationship between your PLD and the actual PhysicalLink and the session.delete(my_PLD); session.commit() will also delete the PhysicalLink it represents:
physical_links_dir_mapper.add_property(
'physical_link', orm.relation(PhysicalLink, primaryjoin=(
PhysicalLinksDir.physical_links_id == PhysicalLink.physical_links_id),
foreign_keys=[PhysicalLinksDir.physical_links_id]
))
But in fact, the deletion might work out of the box as the model is soft-linked to the physical_link table.
INSERT: Well, this is easily done with the PhysicalLink object directly, so I would just keep it this way.
UPDATE: You could potentially probably achieve this with Session Events, but the most simple way would be just to wrap all the attributes in a #property which would delegate the change to the proper object.
IMPORTANT: I still think that this way of working is not really nice, because the links are not updated automatically and your in-memory UnitOfWork might be inconsistent.
If also would be useful to understand why you think this way of working with your objects would be better? What are the use cases of this app?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008