I am back and this time I have a question about SQLAlchemy.
Say I have the following table in some MySQL db:
table Table1 (
id1
id2
primary key(id1, id2)
id1 references Table2.id
id2 references Table3.id
)
I'd like to map it using SQLAlchemy, and so I try the following:
table_1 = Table('Table1', metadata, Column('id1', INTEGER, ForeignKey('Table2.id'),primary_key=True), Column('id2', INTEGER, ForeignKey('Table3.id'),primary_key=True),autoload=True)
class Table_1(object):
pass
mapper(Table_1, table_1)
I tried it, but SQLAlchemy is saying that it cannot find the table with which to establish a relationship. I have tried using an existing database and mapping it with SQLAlchemy and it worked, but it was a simple, test db. It had no foreign keys in it. I think I am lacking some other lines to establish the relationship between the parent table and the child, but looking at the tutorials I've seen aren't helping me that much.
Any help would be greatly appreciated.
Thanks.
UPDATE:
So this is a snippet of the code that I have made to 'connect and map' to my MySQL DB:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import mapper, sessionmaker
from portal.model import DeclarativeBase, metadata, DBSession
bioentry_ec = Table('BioEntry_EC', metadata, Column("bioentry_id", INTEGER, ForeignKey("bioentry.bioentry_id", ondelete='CASCADE')), Column("ec_id", INTEGER, ForeignKey("EC.id", ondelete='CASCADE')), autoload=True)
ec = Table('EC', metadata, Column("id", INTEGER, primary_key=True), autoload=True)
bioentry = Table('bioentry', metadata, Column("bioentry_id", INTEGER, primary_key=True), Column("biodatabase_id", INTEGER, ForeignKey("biodatabase.biodatabase.id", ondelete='CASCADE')), autoload=True)
biodatabase = Table('biodatabase', metadata, Column("biodatabase_id", INTEGER, primary_key=True), autoload=True)
class BioEntryEC(object):
pass
class EC(object):
pass
class Biodatabase(object):
pass
class Bioentry(object):
pass
mapper(BioEntryEC, bioentry_ec)
mapper(Biodatabase, biodatabase)
mapper(Bioentry, bioentry)
mapper(EC, ec)
So these are some of the mappings. I am not sure if I am declaring the Foreign keys incorrectly, or if there is something missing in my code. Take note that the keys in the bioentry_ec table compose a composite key.
When I run turbogears, it tells me that it cannot find biodatabase, even though I have declared it along with the other tables. Am I doing something wrong in my code?
Thanks for any help on this.
Related
I'm considering porting my app to the SQLAlchemy as it's much more extensive than my own ORM implementation, but all the examples I could find show how to set the schema name at class declaration rather than dynamically at runtime.
I need to map my objects to Postgres tables from multiple schemas. Moreover, the application creates new schemas in runtime and I need to map new instances of the class to rows of the table from that new schema.
Currently, I use my own ORM module where I just provide the schema name as an argument when creating new instances of a class (I call class' method with the schema name as an argument and it returns an object(s) that holds the schema name). The class describes a table that can exist in many schemas. The class declaration doesn't contain information about schema, but instances of that class do contain it and include it when generating SQL statements.
This way, the application can work with many schemas simultaneously and even create foreign keys in tables from "other" schemas to the "main" table in the public schema. In such a way it is also possible to delete data in other schemas cascaded when deleting the row in the public schema.
The SQLAlchemy gives this example to set the schema for the table (documentation):
metadata_obj = MetaData(schema="remote_banks")
financial_info = Table(
"financial_info",
metadata_obj,
Column("id", Integer, primary_key=True),
Column("value", String(100), nullable=False),
)
But on ORM level, when I declare the class, I should pass an already constructed table (example from documentation):
metadata = MetaData()
group_users = Table(
"group_users",
metadata,
Column("user_id", String(40), nullable=False),
Column("group_id", String(40), nullable=False),
UniqueConstraint("user_id", "group_id"),
)
class Base(DeclarativeBase):
pass
class GroupUsers(Base):
__table__ = group_users
__mapper_args__ = {"primary_key": [group_users.c.user_id, group_users.c.group_id]}
So, the question is: is it possible to map class instances to tables/rows from dynamically created database schemas (in runtime) in SQLAlchemy? The way of altering the connection to set the current schema is not acceptable to me. I want to work with all schemas simultaneously.
I'm free to use the newest SQLAlchemy 2.0 (currently in BETA release).
You can set the schema per table so I think you have to make a table and class per schema. Here is a made up example. I have no idea what the ramifications are of changing the mapper registry during runtime. Especially as I have done below, mid-transaction and what would happen with threadsafety. You could probably use a master schema list table in public and lock it or lock the same row across connections to syncronize the schema list and provide threadsafety when adding a schema. I'm suprised it works. Kind of cool.
import sys
from sqlalchemy import (
create_engine,
Integer,
MetaData,
Float,
event,
)
from sqlalchemy.schema import (
Column,
CreateSchema,
Table,
)
from sqlalchemy.orm import Session
from sqlalchemy.orm import registry
username, password, db = sys.argv[1:4]
engine = create_engine(f"postgresql+psycopg2://{username}:{password}#/{db}", echo=True)
metadata = MetaData()
mapper_registry = registry()
def map_class_to_some_table(cls, table, entity_name, **mapper_kwargs):
newcls = type(entity_name, (cls,), {})
mapper_registry.map_imperatively(newcls, table, **mapper_kwargs)
return newcls
class Measurement(object):
pass
units = []
cls_for_unit = {}
tbl_for_unit = {}
def add_unit(unit, create_bind=None):
units.append(unit)
schema_name = f"unit_{unit}"
if create_bind:
create_bind.execute(CreateSchema(schema_name))
else:
event.listen(metadata, "before_create", CreateSchema(schema_name))
cols = [
Column("id", Integer, primary_key=True),
Column("value", Float, nullable=False),
]
# One table per schema.
tbl_for_unit[unit] = Table("measurement", metadata, *cols, schema=schema_name)
if create_bind:
tbl_for_unit[unit].create(create_bind)
# One class per schema.
cls_for_unit[unit] = map_class_to_some_table(
Measurement, tbl_for_unit[unit], Measurement.__name__ + f"_{unit}"
)
for unit in ["mm", "m"]:
add_unit(unit)
metadata.create_all(engine)
with Session(engine) as session, session.begin():
# Create a value for each unit (schema).
session.add_all([cls(value=i) for i, cls in enumerate(cls_for_unit.values())])
with Session(engine) as session, session.begin():
# Read back a value for each unit (schema).
print(
[
(unit, cls.__name__, cls, session.query(cls).first().value)
for (unit, cls) in cls_for_unit.items()
]
)
with Session(engine) as session, session.begin():
# Add another unit, add a value, flush and then read back.
add_unit("km", create_bind=session.bind)
session.add(cls_for_unit["km"](value=100.0))
session.flush()
print(session.query(cls_for_unit["km"]).first().value)
Output of last add_unit()
2022-12-16 08:16:13,446 INFO sqlalchemy.engine.Engine CREATE SCHEMA unit_km
2022-12-16 08:16:13,446 INFO sqlalchemy.engine.Engine [no key 0.00015s] {}
2022-12-16 08:16:13,447 INFO sqlalchemy.engine.Engine COMMIT
2022-12-16 08:16:13,469 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-16 08:16:13,469 INFO sqlalchemy.engine.Engine
CREATE TABLE unit_km.measurement (
id SERIAL NOT NULL,
value FLOAT NOT NULL,
PRIMARY KEY (id)
)
Ian Wilson posted a great answer to my question which I'm going to use.
About the same time I got an idea of how it can work and would like to post it here just as a very simple example. I think this is the same mechanism behind it as posted by Ian.
This example only "reads" an object from the schema that can be referenced at runtime.
from sqlalchemy import create_engine, Column, Integer, String, MetaData
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import sessionmaker
import psycopg
engine = create_engine(f"postgresql+psycopg://user:password#localhost:5432/My_DB", echo=True)
Session = sessionmaker(bind=engine)
session = Session()
class Base(DeclarativeBase):
pass
class A(object):
__tablename__ = "my_table"
id = Column("id", Integer, primary_key=True)
name = Column("name", String)
def __repr__(self):
return f"A: {self.id}, {self.name}"
metadata_obj = MetaData(schema="my_schema") # here we create new mapping
A1 = type("A1", (A, Base), {"metadata": metadata_obj}) # here we make a new subclass with desired mapping
data = session.query(A1).all()
print(data)
This info helped me to come to this solution:
https://github.com/sqlalchemy/sqlalchemy/wiki/EntityName
"... SQLAlchemy mapping makes modifications to the mapped class, so it's not really feasible to have many mappers against the exact same class ..."
This means a separate class must be created in runtime for each schema
The example about joined inheritance in the docs uses the declarative mapping. I'm trying to adapt it to use "classic mappings", but is not working as it should.
I've read and used the docs from https://docs.sqlalchemy.org/en/14/orm/inheritance.html as a guide.
I have some simple classes using attrs:
class Person:
pass
#attr.s(auto_attribs=True)
class Manager(Person):
name: str
data: str
#attr.s(auto_attribs=True)
class Engineer(Person):
name: str
info: int
#attr.s(auto_attribs=True)
class Company:
people: list[Person]
And I'm declaring the mappings and tables as follows:
persons_table = Table(
"person",
metadata,
Column("id", Integer, primary_key=True),
)
managers_table = Table(
"manager",
metadata,
Column("id", Integer, primary_key=True),
Column("name", String(50)),
Column("data", String(50)),
)
engineers_table = Table(
"engineer",
metadata,
Column("id", Integer, primary_key=True),
Column("name", String(50)),
Column("info", Integer),
)
company_table = Table(
"company",
metadata,
Column("id", Integer, primary_key=True),
)
pjoin = polymorphic_union(
{"person": persons_table, "manager": managers_table, "engineer": engineers_table},
"type",
"pjoin",
)
company_2_people_table = Table(
"company_2_people",
metadata,
Column("id", Integer, primary_key=True, autoincrement=True),
Column("company_id", ForeignKey("company.id")),
Column("person_id", ForeignKey("person.id")),
)
person_mapper = mapper(
Person,
pjoin,
with_polymorphic=("*", pjoin),
polymorphic_on=pjoin.c.type,
)
manager_mapper = mapper(
Manager,
managers_table,
inherits=person_mapper,
concrete=True,
polymorphic_identity="manager",
)
engineer_mapper = mapper(
Engineer,
engineers_table,
inherits=person_mapper,
concrete=True,
polymorphic_identity="engineer",
)
company_mapper = mapper(
Company,
company_table,
properties={
"people": relationship(
person_mapper,
secondary=company_2_people_table,
collection_class=list,
),
},
)
A simple test:
fn = Path(__file__).with_suffix(".db")
fn.unlink(missing_ok=True)
engine = create_engine(f"sqlite:///{fn}", echo=True)
metadata.create_all(engine)
Session = sessionmaker(bind=engine)
with Session() as session:
m1 = Manager(name="Manager 1", data="Manager Data")
m2 = Manager(name="Manager 2", data="Manager Data")
e1 = Engineer(name="Eng", info=10)
company = Company([m1, e1, m2])
session.add(company)
session.commit()
with Session() as session:
print(session.query(Company).get(1))
This runs, however I get this output:
Company(people=[Engineer(name='Eng', info=10), Manager(name='Manager 1', data='Manager Data'), Manager(name='Manager 2', data='Manager Data')])
Notice that although the instances are correct, the order is not: it should be Manager, Engineer, Manager.
Comparing my database file with the one generated from the example from the docs:
In the table from the docs, the person table contains all people, and a type column with the type of the person.
In mine, the person table is empty, and contains only an id column (no type).
I have debugged the runtime classes generated by the example and tried to mimic the structures there (for example explicitly passing the internal _polymorphic_map, but to no avail).
I've also changed the primary key definition for Manager and Engineer to Column('id', ForeignKey("person.id"), primary_key=True), however I get an exception:
sqlalchemy.orm.exc.FlushError: Instance <Engineer at 0x198e43cd280> has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event.
Any other suggestions or hints that might point me in the right direction?
Thanks.
I've posted the full source code at https://gist.github.com/nicoddemus/26de7bbcdfa9ed4b14fcfdde72b1d63f.
After reading the examples more carefully I found what I was doing wrong: I was mixing concepts from joined inheritance with concrete inheritance.
I want joined inheritance, so:
Each table subclass needs a to define its primary key as a foreign key to the base table:
engineers_table = Table(
"engineer",
metadata,
Column('id', ForeignKey("person.id"), primary_key=True),
Column("name", String(50)),
Column("info", Integer),
)
The base mapper needs to specify which column to use as denominator:
person_mapper = mapper_registry.map_imperatively(
Person,
persons_table,
polymorphic_identity="person",
polymorphic_on=persons_table.c.type,
)
Every subclass also needs to specify their polimorphic identity:
manager_mapper = mapper_registry.map_imperatively(
Manager,
managers_table,
inherits=person_mapper,
polymorphic_identity="manager",
)
And that's it, SQLA takes care of the rest. I've updated the Gist link with the full and now working code, in case it might help others.
https://gist.github.com/nicoddemus/26de7bbcdfa9ed4b14fcfdde72b1d63f
I am trying to run my first Fast Api app, I tried to add "users" table, but nothing is being created in the postgres db
users = Table(
"users",
metadata,
Column("id", Integer, primary_key=True),
Column("name", String(50)),
Column("age", Integer),
Column("birthdate", Date),
Column("blood_type", String(3)),
Column("blood_pressure", Integer),
Column("created_at", DateTime, default=datetime.utcnow().strftime("%Y-%m-%d" "%H:%M:%S"), nullable=False),
)
is there any issue with my code?
I am not sure about the beginning or the continuation of your code but you are intending to do is related to SQLAlchemy. As per the documentation, what you have implemented seems correct, but I presume you forgot some parts of the code. The below implementation is from the documentation, where your implementation fits it, except perhaps you have forgotten metadata.create_all() part perhaps (I am trying to guess)? If not it would be great if you could perhaps share some errors or show the integral of your implementation with more details.
engine = create_engine('sqlite:///:memory:')
metadata = MetaData()
user = Table('user', metadata,
Column('user_id', Integer, primary_key=True),
Column('user_name', String(16), nullable=False),
Column('email_address', String(60), key='email'),
Column('nickname', String(50), nullable=False)
)
user_prefs = Table('user_prefs', metadata,
Column('pref_id', Integer, primary_key=True),
Column('user_id', Integer, ForeignKey("user.user_id"), nullable=False),
Column('pref_name', String(40), nullable=False),
Column('pref_value', String(100))
)
metadata.create_all(engine)
This is my first question in Stackoverflow ever. :P Everything work just fine, except a crawl order, I add a priority method but didn`t work correctly. Need to first write all author data, then all album and songs data and store to DB with this order. I want to query items in a MySql table by order from item in another one.
Database structure: https://i.postimg.cc/GhF4w32x/db.jpg
Example: first write all author items in Author table, and then order album items in Album table by authorId from Author table.
Github repository: https://github.com/markostalma/discogs/tree/master/discogs
P.S. I have a three item class for author, album and song parser.
Also I was tried to make a another flow of spider and put all in one item class, but with no success. Order was a same. :(
Sorry for my bad English.
You need to setup an item pipeline for this. I would suggest using SQL Alchemy to build the SQL item and connect to the DB. You're SQL Alchemy class will reflect all the table relationships you have in your DB schema. Let me show you. This is a working example of a similar pipeline that I have except you would setup your class on the SQLAlchemy to container the m2m or foreignkey relationships you need. You'll have to refer to their documentation [1] .
An even more pythonic way of doing this would be to keep your SQL Alchemy class and item names the same and do something like for k,v in item.items():
This way you can just loop the item and set what is there. Code is long and violates DRY for a purpose though.
# -*- coding: utf-8 -*-
from scrapy.exceptions import DropItem
from sqlalchemy import create_engine, Column, Integer, String, DateTime, ForeignKey, Boolean, Sequence, Date, Text
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
import datetime
DeclarativeBase = declarative_base()
def db_connect():
"""
This function connections to the database. Tables will automatically be created if they do not exist.
See __tablename__ under RateMds class
MySQL example: engine = create_engine('mysql://scott:tiger#localhost/foo')
"""
return create_engine('sqlite:///reviews.sqlite', echo=True)
class GoogleReviewItem(DeclarativeBase):
__tablename__ = 'google_review_item'
pk = Column('pk', String, primary_key=True)
query = Column('query', String(500))
entity_name = Column('entity_name', String(500))
user = Column('user', String(500))
review_score = Column('review_score', Integer)
description = Column('description', String(5000))
top_words = Column('top_words', String(10000), nullable=True)
bigrams = Column('bigrams', String(10000), nullable=True)
trigrams = Column('trigrams', String(10000), nullable=True)
google_average = Column('google_average', Integer)
total_reviews = Column('total_reviews', Integer)
review_date = Column('review_date', DateTime)
created_on = Column('created_on', DateTime, default=datetime.datetime.now)
engine = db_connect()
Session = sessionmaker(bind=engine)
def create_individual_table(engine):
# checks for tables existance and creates them if they do not already exist
DeclarativeBase.metadata.create_all(engine)
create_individual_table(engine)
session = Session()
def get_row_by_pk(pk, model):
review = session.query(model).get(pk)
return review
class GooglePipeline(object):
def process_item(self, item, spider):
review = get_row_by_pk(item['pk'], GoogleReviewItem)
if review is None:
googlesite = GoogleReviewItem(
query=item['query'],
google_title=item['google_title'],
review_score=item['review_score'],
review_count=item['review_count'],
website=item['website'],
website_type=item['website_type'],
top_words=item['top_words'],
bigrams=item['bigrams'],
trigrams=item['trigrams'],
text=item['text'],
date=item['date']
)
session.add(googlesite)
session.commit()
return item
else:
raise DropItem()
[1]: https://docs.sqlalchemy.org/en/13/core/constraints.html
I need create sequence but in generic case not using Sequence class.
USN = Column(Integer, nullable = False, default=nextusn, server_onupdate=nextusn)
, this funcion nextusn is need generate func.max(table.USN) value of rows in model.
I try using this
class nextusn(expression.FunctionElement):
type = Numeric()
name = 'nextusn'
#compiles(nextusn)
def default_nextusn(element, compiler, **kw):
return select(func.max(element.table.c.USN)).first()[0] + 1
but the in this context element not know element.table. Exist way to resolve this?
this is a little tricky, for these reasons:
your SELECT MAX() will return NULL if the table is empty; you should use COALESCE to produce a default "seed" value. See below.
the whole approach of inserting the rows with SELECT MAX is entirely not safe for concurrent use - so you need to make sure only one INSERT statement at a time invokes on the table or you may get constraint violations (you should definitely have a constraint of some kind on this column).
from the SQLAlchemy perspective, you need your custom element to be aware of the actual Column element. We can achieve this either by assigning the "nextusn()" function to the Column after the fact, or below I'll show a more sophisticated approach using events.
I don't understand what you're going for with "server_onupdate=nextusn". "server_onupdate" in SQLAlchemy doesn't actually run any SQL for you, this is a placeholder if for example you created a trigger; but also the "SELECT MAX(id) FROM table" thing is an INSERT pattern, I'm not sure that you mean for anything to be happening here on an UPDATE.
The #compiles extension needs to return a string, running the select() there through compiler.process(). See below.
example:
from sqlalchemy import Column, Integer, create_engine, select, func, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.expression import ColumnElement
from sqlalchemy.schema import ColumnDefault
from sqlalchemy.ext.compiler import compiles
from sqlalchemy import event
class nextusn_default(ColumnDefault):
"Container for a nextusn() element."
def __init__(self):
super(nextusn_default, self).__init__(None)
#event.listens_for(nextusn_default, "after_parent_attach")
def set_nextusn_parent(default_element, parent_column):
"""Listen for when nextusn_default() is associated with a Column,
assign a nextusn().
"""
assert isinstance(parent_column, Column)
default_element.arg = nextusn(parent_column)
class nextusn(ColumnElement):
"""Represent "SELECT MAX(col) + 1 FROM TABLE".
"""
def __init__(self, column):
self.column = column
#compiles(nextusn)
def compile_nextusn(element, compiler, **kw):
return compiler.process(
select([
func.coalesce(func.max(element.column), 0) + 1
]).as_scalar()
)
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, default=nextusn_default(), primary_key=True)
data = Column(String)
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
# will normally pre-execute the default so that we know the PK value
# result.inserted_primary_key will be available
e.execute(A.__table__.insert(), data='single row')
# will run the default expression inline within the INSERT
e.execute(A.__table__.insert(), [{"data": "multirow1"}, {"data": "multirow2"}])
# will also run the default expression inline within the INSERT,
# result.inserted_primary_key will not be available
e.execute(A.__table__.insert(inline=True), data='single inline row')