I have a star-schema architectured database that I want to represent in SQLAlchemy. Now I have the problem on how this can be done in the best possible way. Right now I have a lot of properties with custom join conditions, because the data is stored in different tables.
It would be nice if it would be possible to re-use the dimensions for different fact tablesw but I haven't figured out how that can be done nicely.
A typical fact table in a star schema contains foreign key references to all dimension tables, so usually there wouldn't be any need for custom join conditions - they are determined automatically from foreign key references.
For example a star schema with two fact tables would look like:
Base = declarative_meta()
class Store(Base):
__tablename__ = 'store'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
class Product(Base):
__tablename__ = 'product'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
class FactOne(Base):
__tablename__ = 'sales_fact_one'
store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
units_sold = Column('units_sold', Integer, nullable=False)
store = relation(Store)
product = relation(Product)
class FactTwo(Base):
__tablename__ = 'sales_fact_two'
store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
units_sold = Column('units_sold', Integer, nullable=False)
store = relation(Store)
product = relation(Product)
But suppose you want to reduce the boilerplate in any case. I'd create generators local to the dimension classes which configure themselves on a fact table:
class Store(Base):
__tablename__ = 'store'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
#classmethod
def add_dimension(cls, target):
target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
target.store = relation(cls)
in which case usage would be like:
class FactOne(Base):
...
Store.add_dimension(FactOne)
But, there's a problem with that. Assuming the dimension columns you're adding are primary key columns, the mapper configuration is going to fail since a class needs to have its primary keys set up before the mapping is set up. So assuming we're using declarative (which you'll see below has a nice effect), to make this approach work we'd have to use the instrument_declarative() function instead of the standard metaclass:
meta = MetaData()
registry = {}
def register_cls(*cls):
for c in cls:
instrument_declarative(c, registry, meta)
So then we'd do something along the lines of:
class Store(object):
# ...
class FactOne(object):
__tablename__ = 'sales_fact_one'
Store.add_dimension(FactOne)
register_cls(Store, FactOne)
If you actually have a good reason for custom join conditions, as long as there's some pattern to how those conditions are created, you can generate that with your add_dimension():
class Store(object):
...
#classmethod
def add_dimension(cls, target):
target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
target.store = relation(cls, primaryjoin=target.store_id==cls.id)
But the final cool thing if you're on 2.6, is to turn add_dimension into a class decorator. Here's an example with everything cleaned up:
from sqlalchemy import *
from sqlalchemy.ext.declarative import instrument_declarative
from sqlalchemy.orm import *
class BaseMeta(type):
classes = set()
def __init__(cls, classname, bases, dict_):
klass = type.__init__(cls, classname, bases, dict_)
if 'metadata' not in dict_:
BaseMeta.classes.add(cls)
return klass
class Base(object):
__metaclass__ = BaseMeta
metadata = MetaData()
def __init__(self, **kw):
for k in kw:
setattr(self, k, kw[k])
#classmethod
def configure(cls, *klasses):
registry = {}
for c in BaseMeta.classes:
instrument_declarative(c, registry, cls.metadata)
class Store(Base):
__tablename__ = 'store'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
#classmethod
def dimension(cls, target):
target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
target.store = relation(cls)
return target
class Product(Base):
__tablename__ = 'product'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
#classmethod
def dimension(cls, target):
target.product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
target.product = relation(cls)
return target
#Store.dimension
#Product.dimension
class FactOne(Base):
__tablename__ = 'sales_fact_one'
units_sold = Column('units_sold', Integer, nullable=False)
#Store.dimension
#Product.dimension
class FactTwo(Base):
__tablename__ = 'sales_fact_two'
units_sold = Column('units_sold', Integer, nullable=False)
Base.configure()
if __name__ == '__main__':
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)
sess = sessionmaker(engine)()
sess.add(FactOne(store=Store(name='s1'), product=Product(name='p1'), units_sold=27))
sess.commit()
Related
This is the way that I usually use for m2m relationship implementation.
(Brought from docs.sqlalchemy.org)
association_table = Table('association', Base.metadata,
Column('left_id', Integer, ForeignKey('left.id')),
Column('right_id', Integer, ForeignKey('right.id'))
)
class Parent(Base):
__tablename__ = 'left'
id = Column(Integer, primary_key=True)
children = relationship("Child",
secondary=association_table,
backref="parents")
class Child(Base):
__tablename__ = 'right'
id = Column(Integer, primary_key=True)
Is there any way for using additional columns at the association_table table?
So it should be like
association_table = Table('association', Base.metadata,
Column('left_id', Integer, ForeignKey('left.id')),
Column('right_id', Integer, ForeignKey('right.id')),
Column('is_valid', Boolean, default=True) # Add the validation column
)
class Parent(Base):
__tablename__ = 'left'
id = Column(Integer, primary_key=True)
children = relationship("Child",
secondary=association_table,
backref="parents")
# How can I do implement this??
valid_children = relationship("Child",
secondary="and_(association_table.left_id == Parent.id, association_table.right_id == Child.id)"
class Child(Base):
__tablename__ = 'right'
id = Column(Integer, primary_key=True)
I want to do query depends on is_valid column. How can I modify "secondary" attr in Parent table? Or should I fix the other part?
In this question, time_create column has the same value for all children. But in this case, I need a flag that makes able to retrieve whether this connection is still alive or not.
For example, if you implement a one-on-one chatting, there will be a chatting room consist of two-person, right?
And the table should be like as below:
association_table = Table('association', Base.metadata,
Column('left_id', Integer, ForeignKey('left.id')),
Column('right_id', Integer, ForeignKey('right.id')),
Column('is_left', Boolean, default=False) # Whether the user left or not
)
class Match(Base):
__tablename__ = 'left'
id = Column(Integer, primary_key=True)
user = relationship("User",
secondary=association_table,
backref="matches")
# How can I do implement this??
exist_user = relationship("User",
secondary="and_(association_table.left_id == Parent.id, association_table.right_id == Child.id)"
class User(Base):
__tablename__ = 'right'
id = Column(Integer, primary_key=True)
nickname = Column(String, unique=True)
How can I do for this?
I am using Flask-Migrate in my application, with the following models:
listpull/models.py
from datetime import datetime
from listpull import db
class Job(db.Model):
id = db.Column(db.Integer, primary_key=True)
list_type_id = db.Column(db.Integer, db.ForeignKey('listtype.id'),
nullable=False)
list_type = db.relationship('ListType',
backref=db.backref('jobs', lazy='dynamic'))
record_count = db.Column(db.Integer, nullable=False)
status = db.Column(db.Integer, nullable=False)
sf_job_id = db.Column(db.Integer, nullable=False)
created_at = db.Column(db.DateTime, nullable=False)
compressed_csv = db.Column(db.LargeBinary)
def __init__(self, list_type, created_at=None):
self.list_type = list_type
if created_at is None:
created_at = datetime.utcnow()
self.created_at = created_at
def __repr__(self):
return '<Job {}>'.format(self.id)
class ListType(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(80), unique=True, nullable=False)
def __init__(self, name):
self.name = name
def __repr__(self):
return '<ListType {}>'.format(self.name)
run.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from listpull import app, manager
manager.run()
listpull/__init__.py
from flask import Flask
from flask.ext.sqlalchemy import SQLAlchemy
from flask.ext.script import Manager
from flask.ext.migrate import Migrate, MigrateCommand
from mom.client import SQLClient
from smartfocus.restclient import RESTClient
app = Flask(__name__)
app.config.from_object('config')
db = SQLAlchemy(app)
migrate = Migrate(app, db)
manager = Manager(app)
manager.add_command('db', MigrateCommand)
...
import listpull.models
import listpull.views
I initialized the database using ./run.py db init and then I run ./run.py db migrate and I get the following error:
sqlalchemy.exc.NoReferencedTableError: Foreign key associated with column 'job.list_type_id' could not find table 'listtype' with which to generate a foreign key to target column 'id'
What am I doing wrong here?
You are letting Flask-SQLAlchemy choose the names for your tables. If I remember correctly, for a class called ListType the table name will be list_type (or something similar), not the listtype that you specified in your foreign key.
My recommendation is that you specify your own table names using __tablename__, that way they are explicit in the code and not magically determined for you. For example:
class Job(db.Model):
__tablename__ = 'jobs'
id = db.Column(db.Integer, primary_key=True)
list_type_id = db.Column(db.Integer, db.ForeignKey('list_types.id'),
nullable=False)
# ...
class ListType(db.Model):
__tablename__ = 'list_types'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(80), unique=True, nullable=False)
# ...
I was looking for a solution for similar information problem you deal with for a long time as well... at the end, the solution i found was add into every "table-class" the line __tablename__ = '<your_table_name>'
it seems to help flask to find your table
In my case, the issue was that i accidentally called the foreign key by the class name vs the table name ...
Like: db.Column(db.Integer, db.ForeignKey('ClassListTypes.id') vs db.Column(db.Integer, db.ForeignKey('list_types.id')
I know i can define a table using Table:
user = Table('user', metadata,
Column('user_id', Integer, primary_key = True),
)
and using Base:
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
user_id= Column(Integer, primary_key = True)
but what's the different???
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
id = Column('id', Integer, primary_key=True)
name = Column('name', Unicode(64))
is just syntactic sugar for
metadata = MetaData()
user = Table('user', metadata,
Column('id', Integer, primary_key=True),
Column('name', Unicode(64)),
)
class User(object):
pass
mapper(User, user) # this will modify the User class!
I'm trying to work with the example in the SQLAlchemy docs: Simplifying Association Objects
What I am struggling with understanding is how I can access the special_key. Ultimately I'd like to be able to do something like this:
for user in users
for keyword in user.keywords
keyword.special_key
Here is the code from the example:
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String(64))
# association proxy of "user_keywords" collection
# to "keyword" attribute
keywords = association_proxy('user_keywords', 'keyword')
def __init__(self, name):
self.name = name
class UserKeyword(Base):
__tablename__ = 'user_keyword'
user_id = Column(Integer, ForeignKey('user.id'), primary_key=True)
keyword_id = Column(Integer, ForeignKey('keyword.id'), primary_key=True)
special_key = Column(String(50))
# bidirectional attribute/collection of "user"/"user_keywords"
user = relationship(User,
backref=backref("user_keywords",
cascade="all, delete-orphan")
)
# reference to the "Keyword" object
keyword = relationship("Keyword")
def __init__(self, keyword=None, user=None, special_key=None):
self.user = user
self.keyword = keyword
self.special_key = special_key
class Keyword(Base):
__tablename__ = 'keyword'
id = Column(Integer, primary_key=True)
keyword = Column('keyword', String(64))
def __init__(self, keyword):
self.keyword = keyword
def __repr__(self):
return 'Keyword(%s)' % repr(self.keyword)
Am I on the right track in following this pattern here?
My goal is essentially many-to-many with an extra column containing a boolean value.
This should work:
for user in users:
for keyword in user.user_keywords:
print keyword.special_key
I have an SQLAlchemy scheme that looks roughly like this:
participation = db.Table('participation',
db.Column('artist_id', db.Integer, db.ForeignKey('artist.id'),
primary_key=True),
db.Column('song_id', db.Integer, db.ForeignKey('song.id'),
primary_key=True),
)
class Streamable(db.Model):
id = db.Column(db.Integer, primary_key=True)
kind = db.Column(db.String(10), nullable=False)
score = db.Column(db.Integer, nullable=False)
__mapper_args__ = {'polymorphic_on': kind}
class Artist(Streamable):
id = db.Column(db.Integer, db.ForeignKey('streamable.id'), primary_key=True)
name = db.Column(db.Unicode(128), nullable=False)
__mapper_args__ = {'polymorphic_identity': 'artist'}
class Song(Streamable):
id = db.Column(db.Integer, db.ForeignKey('streamable.id'), primary_key=True)
name = db.Column(db.Unicode(128), nullable=False)
artists = db.relationship("Artist", secondary=participation,
backref=db.backref('songs'))
__mapper_args__ = {'polymorphic_identity': 'song'}
class Video(Streamable):
id = db.Column(db.Integer, db.ForeignKey('streamable.id'), primary_key=True)
song_id = db.Column(db.Integer, db.ForeignKey('song.id'), nullable=False)
song = db.relationship('Song', backref=db.backref('videos', lazy='dynamic'),
primaryjoin="Song.id==Video.song_id")
__mapper_args__ = {'polymorphic_identity': 'video'}
I'd like to do a single query for Songs or Videos that have a particular artist; i.e., these two queries in one query (all queries should be .order_by(Streamable.score)):
q1=Streamable.query.with_polymorphic(Video)
q1.join(Video.song, participation, Artist).filter(Artist.id==1)
q2=Streamable.query.with_polymorphic(Song)
q2.join(participation, Artist).filter(Artist.id==1)
Here's the best I reached; it emits monstrous SQL and always yields empty results (not sure why):
p1=db.aliased(participation)
p2=db.aliased(participation)
a1=db.aliased(Artist)
a2=db.aliased(Artist)
q=Streamable.query.with_polymorphic((Video, Song))
q=q.join(p1, a1).join(Video.song, p2, a2)
q.filter(db.or_((a1.id==1), (a2.id==1))).order_by('score')
What's the right way to do this query, if at all (maybe a relational datastore is not the right tool for my job...)?
Your queries are basically right. I think the change from join to outerjoin should solve the problem:
q=q.outerjoin(p1, a1).outerjoin(Video.song, p2, a2)
I would also replace the order_by with:
q = q.order_by(Streamable.score)