When we originally built our app, we were using SQLAlchemy ORM, but as time goes on, we're getting more and more frustrated with its intricacies and overhead and want to move to something faster and more explicit in making queries performant. In some early tests, it looks like Core hits those needs, so we'd like to start making the switch.
All of our current models are defined using Declarative for example:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=False)
autocomplete_crosses = relationship(u'Parent')
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
user = Column(ForeignKey('users.id'), nullable=False)
name = Column(String(100), nullable=False)
users = relationship(User')
Is there an easier way of using these with declarative aside from needing to call __table__ every time. Here's what I'm currently doing, which feels verbose and ugly:
s = select([User.__table__.c.name, Parent.__table__.c.name])\
.select_from(User.__table__.join(Parent.__table__))
Can I access my table names directly and just use those to make this cleaner?
If you'd like to do it automatically for all tables, you can reflect it after you've defined all your classes:
for cls in Base._decl_class_registry.values():
if hasattr(cls, "__table__") and cls.__table__ is not None:
globals()[cls.__table__.name] = cls.__table__
I don't recommend actually doing this because it's very magical. In particular, it breaks linters because it modifies globals(), but this is how you would do it.
Related
I'm trying to insert data retrieved by scraping into the DB created by the following models.
However, I realized that using django's bulk_create or the external library based on it, bulk_create_or_update, is likely to make the logic too complex. (I felt that orm should be used for simple CRUD, etc.)
So I'm thinking of using Row SQL to save the data, for both maintainability and speed.
I'm not familiar with sql at all, so I'd like to get some advice from you guys.
What SQL code is preferable to this?
Each page to be scraped has multiple pieces of information, and there are multiple pages in total. I'd like to scrape all the pages first, add them to a dictionary, and then save them in a batch using sql, but I don't know the best way to do this.
from django.db import models
from django.forms import CharField
# Create your models here.
# province
class Prefecture(models.Model):
name=models.CharField("都道府県名",max_length=10)
# city
class City(models.Model):
prefecture = models.ForeignKey(Prefecture, on_delete=models.CASCADE, related_name='city')
name=models.CharField("市区町村名",max_length=10)
# seller
class Client(models.Model):
prefecture = models.ForeignKey(Prefecture, on_delete=models.CASCADE, related_name='client',null=True,blank=True)
city = models.ForeignKey(City, on_delete=models.CASCADE, related_name='client',null=True,blank=True)
department = models.CharField("部局",max_length=100)
# detail
class Project(models.Model):
name = models.CharField("案件名",max_length=100)
serial_no = models.CharField("案件番号",max_length=100,null=True,blank=True)
client = models.ForeignKey(Client, on_delete=models.CASCADE, related_name='project')
# etc...
# file
class AttachedFile(models.Model):
project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name='attach_file')
name = models.CharField(max_length=100)
path = models.CharField(max_length=255)
# bid company
class Bidder(models.Model):
name = models.CharField("入札業者名",max_length=100)
prefecture = models.ForeignKey(Prefecture, on_delete=models.CASCADE, related_name='bidder',null=True,blank=True)
city = models.ForeignKey(City, on_delete=models.CASCADE, related_name='bidder',null=True,blank=True)
# etc...
# result
class BidResult(models.Model):
project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name='bid_result')
bidder = models.ForeignKey(Bidder, on_delete=models.CASCADE, related_name='bid_result')
I don't think you will have a drastic performance boost by using raw SQL instead of orm. Also, Orm can be used for complex operations, and operations such as bulk update and bulk create are not complex and as fast as normal raw SQL. Things may go slow with ORM when you try to fetch records into memory and then do the operations but in your case, it is updated and created which can be done easily using Django ORM. As far as using a function doing update_or_create, using an external library won't affect your performance but using raw SQL for marginal speed gains may impact your code maintainability as you already told, You are not much familiar with raw sql.
I am writing a pretty basic Flask application using Flask-SQLAlchemy for tracking inventory and distribution. I could use some guidance on how the best way to handle a lookup table for common values. My database back end will be MySQL and ElasticSearch for searches.
If I have a common mapping structure where all data going into a specific table, say Vehicle, have a common list of values to look up against for the Vehicle.make column, what would the best way to achieve this be?
My thought for approaching this is one of two ways:
Lookup Table
I could set something up similar to this where I have a relationship, and store the make in VehicleMake. However, if my expected list of makes is low (say 10), this seems unnecessary.
class VehicleMake(Model):
id = Column(Integer, primary_key=True)
name = Column(String(16))
cars = relationship('Vehicle', backref='make', lazy='dynamic')
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
Store as a String
I could just store this as a string on the Vehicle model. But would it be a waste of space to store a common value as a string?
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
make = Column(String(16))
My original idea was just to have a dict containing a mapping like this and reference it as needed within the model. I am just not clear how to tie this in when returning the vehicle model.
MAKE_LIST = {
1: 'Ford',
2: 'Dodge',
3: 'Chevrolet'
}
Any feedback is welcome - and if there is documentation that covers this specific scenario I'm happy to read that and answer this question myself. My expected volume is going to be low (40-80 records per week) so it doesn't need to be ridiculously fast, I just want to follow best practices.
The short answer is it depends.
The long answer is that it depends on what you store along with the make of said vehicles and how often you expect to add new types.
If you need to store more than just the name of each make, but also some additional metadata, like the size of the gas tank, the cargo space, or even a sortkey, go for an additional table. The overhead of such a small table is minimal, and if you communicate with the frontend using make ids instead of make names, there is no problem at all with this. Just remember to add an index to vehicle.make_id to make the lookups efficient.
class VehicleMake(Model):
id = Column(Integer, primary_key=True)
name = Column(String(16))
cars = relationship('Vehicle', back_populates="make", lazy='dynamic')
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
make_id = Column(Integer, ForeignKey('vehicle_make.id'), nullable=False)
make = relationship("VehicleType", innerjoin=True)
Vehicle.query.get(1).make.name # == 'Ford', the make for vehicle 1
Vehicle.query.filter(Vehicle.make_id == 2).all() # all Vehicles with make id 2
Vehicle.query.join(VehicleMake)\
.filter(VehicleMake.name == 'Ford').all() # all Vehicles with make name 'Ford'
If you don't need to store any of that metadata, then the need for a separate table disappears. However, the general problem with strings is that there is a high risk of spelling errors and capital/lowercase letters screwing up your data consistency. If you don't need to add new makes much, it's a lot better to just use Enums, there are even MySQL specific ones in SQLAlchemy.
import enum
class VehicleMake(enum.Enum):
FORD = 1
DODGE = 2
CHEVROLET = 3
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
make = Column(Enum(VehicleMake), nullable=False)
Vehicle.query.get(1).make.name # == 'FORD', the make for vehicle 1
Vehicle.query.filter(Vehicle.make == VehicleMake(2)).all() # all Vehicles with make id 2
Vehicle.query.filter(Vehicle.make == VehicleMake.FORD).all() # all Vehicles with make name 'Ford'
The main drawback of enums is that they might be hard to extend with new values, although at least for Postgres the dialect specific version was a lot better at this than the general SQLAlchemy one, have a look at sqlalchemy.dialects.mysql.ENUM instead. If you want to extend your existing enum, you can always just execute raw SQL in your Flask-Migrate/Alembic migrations.
Finally, the benefits of using strings is that you can always programmatically enforce your data consistency. But, this comes at the cost that you have to programmatically enforce your data consistency. If the vehicle make can be changed or inserted by external users, even colleagues, this will get you in trouble unless you're very strict about what enters your database. For example, it might be nice to uppercase all values for easy grouping, since it effectively reduces how much can go wrong. You can do this during writing, or you can add an index on sqlalchemy.func.upper(Vehicle.make) and use hybrid properties to always query the uppercase value.
class Vehicle(Model):
id = Column(Integer, primary_key=True)
name = Column(String(32))
_make = Column('make', String(16))
#hybrid_property
def make(self):
return self._make.upper()
#make.expression
def make(cls):
return func.upper(cls._make)
Vehicle.query.get(1).make.upper() # == 'FORD', the make for vehicle 1
Vehicle.query.filter(Vehicle.make == 'FORD').all() # all Vehicles with make name 'FORD'
Before you make your choice, also think about how you want to present this to your user. If they should be able to add new options themselves, use strings or the separate table. If you want to show a dropdown of possibilities, use the enum or the table. If you have an empty database, it's going to be difficult to collect all string values to display in the frontend without needing to store this as a list somewhere in your Flask environment as well.
I'm working through Miguel Grinberg's Flask book.
In chapter 12, he has you define an association object Follow with followers and the followed, both mapping to a user, as well as adding followers and followed to the Users class.
I originally put the association table after the User table, and got an error when I ran python manage.py db upgrade:
line 75, in User followed = db.relationship('Follow', foreign_keys= [Follow.follower_id],
NameError: name 'Follow' is not defined
Then I moved the association object class Follow above the class User definition, and re-ran the migration. This time it worked.
Can someone explain the reason for this?
Both class definitions seem to need the other.
Is order something I should know about flask-sqlalchemy specifically, sqlalchemy, or ORM in general?
The SQLAlchemy documentation says "we can define the association_table at a later point, as long as it’s available to the callable after all module initialization is complete" and the relationship is defined in the class itself.
That is, for the case you're using and association_table to show the relationship between two separate models. I didn't see anything about this case in the Flask-SQLAlchemy or SQLAlchemy documentation, but it's very possible I just didn't recognize the answer when I saw it.
class User(UserMixin, db.Model):
__tablename__ = 'users'
...
followed = db.relationship('Follow',
foreign_keys=[Follow.follower_id],
backref=db.backref('follower', lazy='joined'),
lazy='dynamic',
cascade='all, delete-orphan')
followers = db.relationship('Follow',
foreign_keys=[Follow.followed_id],
backref=db.backref('followed', lazy='joined'),
lazy='dynamic',
cascade='all, delete-orphan')
Order of definition with:
class Follow(db.Model):
__tablename__ = 'follows'
follower_id = db.Column(db.Integer, db.ForeignKey('users.id'), primary_key=True)
followed_id = db.Column(db.Integer, db.ForeignKey('users.id'), primary_key=True)
timestamp = db.Column(db.DateTime, default=datetime.utcnow)
Or maybe order doesn't matter at all, and I am misattributing a problem?
First of all if you are going to use some class in later it must be defined already. The defination order is important, you can not use a class which doesn't exist yet.
Second, sqlalchemy says you will defined a third table to create relationship. If you use this approach User and Follow class would not access each other attributes so it won't cause defination order error.
Finally, if you won't define an associate table then you have to put classes in right order, to use attributes of them.
For a simple return all results query should one method be preferred over the other? I can find uses of both online but can't really find anything describing the differences.
db.session.query([my model name]).all()
[my model name].query.all()
I feel that [my model name].query.all() is more descriptive.
It is hard to give a clear answer, as there is a high degree of preference subjectivity in answering this question.
From one perspective, the db.session is desired, because the second approach requires it to be incorporated in your model as an added step - it is not there by default as part of the Base class. For instance:
Base = declarative_base()
DBSession = scoped_session(sessionmaker())
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
session = Session()
print(User.query)
That code fails with the following error:
AttributeError: type object 'User' has no attribute 'query'
You need to do something like this:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
password = Column(String)
query = DBSession.query_property()
However, it could also be argued that just because it is not enabled by default, that doesn't invalidate it as a reasonable way to launch queries. Furthermore, in the flask-sqlalchemy package (which simplifies sqlalchemy integration into the flask web framework) this is already done for you as part of the Model class (doc). Adding the query property to a model can also be seen in the sqlalchemy tutorial (doc):
class User(object):
query = db_session.query_property()
....
Thus, people could argue either approach.
I personally have a preference for the second method when I am selecting from a single table. For example:
serv = Service.query.join(Supplier, SupplierUsr).filter(SupplierUsr.username == usr).all()
This is because it is of smaller line length and still easily readable.
If am selecting from more than one table or specifying columns, then I would use the model query method as it extracting information from more than one model.
deliverables = db.session.query(Deliverable.column1, BatchInstance.column2).\
join(BatchInstance, Service, Supplier, SupplierUser). \
filter(SupplierUser.username == str(current_user)).\
order_by(Deliverable.created_time.desc()).all()
That said, a counter argument could be made in always using the session.query method as it makes the code more consistent, and when reading left to right, the reader immediately knows that the sqlalchemy directive they are going to read will be query, before mentally absorbing what tables and columns are involved.
At the end of the day, the answer to your question is subjective and there is no correct answer, and any code readability benefits either way are tiny. The only thing where I see a strong benefit is not to use model query if you are selecting from many tables and instead use the session.query method.
Time for more pushing the limits of sqlalchemy. It never ceases to amaze!
Background
I have table for devices, and a table to record physical links between them.
class Device(Base):
__tablename__ = "device"
device_id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String(255), nullable=False)
class PhysicalLink(Base):
__tablename__ = "physical_link"
physical_links_id = sa.Column(sa.Integer, primary_key=True)
device_id_1 = sa.Column(sa.types.Integer, sa.ForeignKey(Device.device_id), nullable=False)
device_port_1 = sa.Column(sa.String(255), nullable=False)
device_id_2 = sa.Column(sa.types.Integer, sa.ForeignKey(Device.device_id), nullable=False)
device_port_2 = sa.Column(sa.String(255), nullable=False)
cable_number = sa.Column(sa.String(255), nullable=False)
When I dealing with the physical links for a know device, I don't want to have to always have if statements to decide whether I should be looking at device_[id|port]_ 1 or 2, so I did:
physical_links_table = PhysicalLinks.__table__
physical_links_ua = union_all(
select((
physical_links_table.c.physical_links_id,
label('this_device_id', physical_links_table.c.device_id_1),
label('this_device_port', physical_links_table.c.device_port_1),
label('other_device_id', physical_links_table.c.device_id_2),
label('other_device_port', physical_links_table.c.device_port_2),
physical_links_table.c.cable_number,
),),
select((
physical_links_table.c.physical_links_id,
label('this_device_id', physical_links_table.c.device_id_2),
label('this_device_port', physical_links_table.c.device_port_2),
label('other_device_id', physical_links_table.c.device_id_1),
label('other_device_port', physical_links_table.c.device_port_1),
physical_links_table.c.cable_number,
),),
).alias('physical_links_ua')
class PhysicalLinksDir(object):
pass
physical_links_dir_mapper = orm.mapper(PhysicalLinksDir, physical_links_ua)
physical_links_dir_mapper.add_property(
'this_device', orm.relation(Device, primaryjoin=(PhysicalLinksDir.this_device_id == Device.device_id)))
physical_links_dir_mapper.add_property(
'other_device', orm.relation(Device, primaryjoin=(PhysicalLinksDir.other_device_id == Device.device_id)))
This allows me to do:
physical_links = (db_session
.query(PhysicalLinksDir)
.filter(PhysicalLinksDir.this_device_id = my_device.device_id)
.options(joinedload('other_device')))
for pl in physical_links:
print pl.other_device
(Did I remember to tell you that I think that sqlalchmey rocks!)
Question
What do I need to do to make it possible to modify PhysicalLinksDir instance attributes, and be able to commit them back to the db?
In general, you will have to be very careful with updating it the way you want,
because those view objects PhysicalLinksDir will not always be in-sync with the
underlying Device and PhysicalLink you might have in session/database.
I obviously do not know your requirements, but I prefer not to have such inconsistencies when working with my model.
Also, there is a problem with the kind of mapping you have. You would expect to have 2 rows of PhysicalLinksDir for each row of PhysicalLink (one for each side), but if you try it, you will see this is not the case. The reason for this is that the first column (physical_links_id) is considered to be a primary_key so the
query object will discard the second one with the same value.
In order to fix it, you need to configure the primary_key manually. Assuming there can be only one
connection between two different Devices, the solution below will do the trick. You might need to extend it to include the port as well:
physical_links_dir_mapper = orm.mapper(PhysicalLinksDir, physical_links_ua,
# #note: add this
primary_key=[physical_links_ua.c.physical_links_id, physical_links_ua.c.this_device_id],
)
DELETE: Now, to support delete, all you need to do is to add a relationship between your PLD and the actual PhysicalLink and the session.delete(my_PLD); session.commit() will also delete the PhysicalLink it represents:
physical_links_dir_mapper.add_property(
'physical_link', orm.relation(PhysicalLink, primaryjoin=(
PhysicalLinksDir.physical_links_id == PhysicalLink.physical_links_id),
foreign_keys=[PhysicalLinksDir.physical_links_id]
))
But in fact, the deletion might work out of the box as the model is soft-linked to the physical_link table.
INSERT: Well, this is easily done with the PhysicalLink object directly, so I would just keep it this way.
UPDATE: You could potentially probably achieve this with Session Events, but the most simple way would be just to wrap all the attributes in a #property which would delegate the change to the proper object.
IMPORTANT: I still think that this way of working is not really nice, because the links are not updated automatically and your in-memory UnitOfWork might be inconsistent.
If also would be useful to understand why you think this way of working with your objects would be better? What are the use cases of this app?