SQLAlchemy memory leak when instrumented objects not commited (e.g. rollbacked) - sqlalchemy

resolved not an issue, was caused by the presence of:
def __del__(self):
print "deleted!"
in the model class. As soon as I removed it from the model class, I cannot experiment any problems with memory usage.
I have some models that I am using with an ad-hoc session/engine and this is working fine, but when I want to create these objects outside of a database/session context, it seems SQLAlchemy instrumentation keeps a reference on the objects and they are never deleted.
What I would like to do is to create a model object like a "normal" Python object and never add it to any session/database (but in other contexts I need to add it to a database). Is this wrong?
import gc
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, String
base = declarative_base()
class SomeObject(base):
__tablename__ = "SomeObject"
instrumented_name = Column('name', String(55), primary_key=True)
uninstrumented_name = None
def __del__(self):
print "deleted!"
obj1 = SomeObject()
obj1.uninstrumented_name = "foo"
obj1 = None
#obj1 is properly deleted
obj2 = SomeObject()
obj2.instrumented_name = "bar"
obj2 = None
gc.collect()
#obj2 never deleted
Edit I did some additional testing and it seems SQLAlchemy will cause memory leak if the objects are never commited into a session (e.g. rollbacked)
Is there a method that will force SQLAlchemy to release it's references on instrumented objects?
import gc
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, create_engine
Base = declarative_base()
class MemoryMonster(Base):
__tablename__ = "MemoryMonster"
_id = Column('id', Integer(), primary_key=True)
_name = Column('name', String(55))
def __init__(self):
self._name = "some monster name"
self._eat_some_ram = ' ' * 1048576
def __del__(self):
print "deleted!"
engine = create_engine("sqlite:///:memory:")
session_factory = sessionmaker(engine)
Session = scoped_session(session_factory)
Base.metadata.create_all(engine)
def create_and_commit():
session = Session()
for _ in range(100):
session.add(MemoryMonster())
session.commit()
Session.remove()
gc.collect()
def create_and_rollback():
session = Session()
for _ in range(100):
monster = MemoryMonster()
session.add(monster)
session.expunge(monster)
session.rollback()
Session.remove()
gc.collect()
def create_do_not_include_in_session():
session = Session()
for _ in range(100):
monster = MemoryMonster()
session.rollback()
Session.remove()
gc.collect()
# Scenario 1 - Objects included in the session and commited
# No memory leak
create_and_commit()
# Scenario 2 - Objects included in the session and rollbacked
# Memory leak
create_and_rollback()
# Scenario 3 - Objects are not not included in the session
# Memory leak
create_do_not_include_in_session()

Using __del__() can create memory leaks, because if an object becomes the subject of a cycle, it is then unreachable by cyclic GC. That is the case in this test because the SQLAlchemy object instrumentation creates a cycle from the object to itself in the case that the object is "dirty", that is, has pending attributes to be flushed to the database, so that you can add a dirty object to the Session and then lose all references to it, and the changes will still be flushed. This "reference marker" is removed as soon as the object is marked as clean (i.e. flushed).
For SQLAlchemy 0.8.1 I've improved this behavior: one is that this reference cycle is no longer created for "pending" or "detached" objects, that is, objects which aren't associated with a Session. Instead, the object is checked when it is attached to a Session for the .modified flag, and the reference marker is associated only at that point (and is removed when the object becomes clean, as was the case already). If the object is detached from the session, the marker is unconditionally removed, even if the object still has changes - the .modified flag stays true.
Additionally, I've added a warning when a class is first mapped and detected as having a __del__() method. It's very easy for a Python object to have cycles, and in the case of SQLAlchemy the pattern of placing a relationship() on a class with a backref will also have the effect of creating reference cycles, so even with the state management improvement here, using __del__() is a bad idea.

Related

How do I implement polymorphic dataclass models with default values in the base model in SQLAlchemy 2.0 / Declarative?

I'm in the process of migrating to SQLAlchemy 2.0 and adopting new Declarative syntax with MappedAsDataclass. Previously, I've implemented joined table inheritance for my models. The (simplified) code looks like this:
from sqlalchemy import ForeignKey, String
from sqlalchemy.orm import DeclarativeBase, Mapped, MappedAsDataclass, mapped_column
class Base(MappedAsDataclass, DeclarativeBase):
pass
class Foo(Base):
__tablename__ = "foo"
id: Mapped[int] = mapped_column(primary_key=True)
type: Mapped[str] = mapped_column(String(50))
foo_value: Mapped[float] = mapped_column(default=78)
__mapper_args__ = {"polymorphic_identity": "foo", "polymorphic_on": "type"}
class Bar(Foo):
__tablename__ = "bar"
id: Mapped[int] = mapped_column(ForeignKey("foo.id"), primary_key=True)
bar_value: Mapped[float]
__mapper_args__ = {"polymorphic_identity": "bar"}
The important bit for the question is the default value in foo_value. Because of its presence, a TypeError: non-default argument 'bar_value' follows default argument is raised. While moving fields around in the definition of a single class could make this error go away (but why is it raised in first place, since the field order is not really important?), it's not possible with inherited models.
How can I fix or work around this limitation? Am I missing something relevant from the documentation?
It seems I needed to use insert_default with MappedAsDataclass instead of default, as described in the docs.

Interaction between pydantic models/schemas in fastAPI Tutorial

I follow the FastAPI Tutorial and am not quite sure what the exact relationship between the proposed data objects is.
We have the models.py file:
from sqlalchemy import Boolean, Column, ForeignKey, Integer, String
from sqlalchemy.orm import relationship
from .database import Base
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True, index=True)
email = Column(String, unique=True, index=True)
hashed_password = Column(String)
is_active = Column(Boolean, default=True)
items = relationship("Item", back_populates="owner")
class Item(Base):
__tablename__ = "items"
id = Column(Integer, primary_key=True, index=True)
title = Column(String, index=True)
description = Column(String, index=True)
owner_id = Column(Integer, ForeignKey("users.id"))
owner = relationship("User", back_populates="items")
and the schemas.py file:
from typing import List, Union
from pydantic import BaseModel
class ItemBase(BaseModel):
title: str
description: Union[str, None] = None
class ItemCreate(ItemBase):
pass
class Item(ItemBase):
id: int
owner_id: int
class Config:
orm_mode = True
class UserBase(BaseModel):
email: str
class UserCreate(UserBase):
password: str
class User(UserBase):
id: int
is_active: bool
items: List[Item] = []
class Config:
orm_mode = True
Those classes are then used to define db queries like in the crud.py file:
from sqlalchemy.orm import Session
from . import models, schemas
def get_user(db: Session, user_id: int):
return db.query(models.User).filter(models.User.id == user_id).first()
def get_user_by_email(db: Session, email: str):
return db.query(models.User).filter(models.User.email == email).first()
def get_users(db: Session, skip: int = 0, limit: int = 100):
return db.query(models.User).offset(skip).limit(limit).all()
def create_user(db: Session, user: schemas.UserCreate):
fake_hashed_password = user.password + "notreallyhashed"
db_user = models.User(email=user.email, hashed_password=fake_hashed_password)
db.add(db_user)
db.commit()
db.refresh(db_user)
return db_user
def get_items(db: Session, skip: int = 0, limit: int = 100):
return db.query(models.Item).offset(skip).limit(limit).all()
def create_user_item(db: Session, item: schemas.ItemCreate, user_id: int):
db_item = models.Item(**item.dict(), owner_id=user_id)
db.add(db_item)
db.commit()
db.refresh(db_item)
return db_item
and in the FastAPI code main.py:
from typing import List
from fastapi import Depends, FastAPI, HTTPException
from sqlalchemy.orm import Session
from . import crud, models, schemas
from .database import SessionLocal, engine
models.Base.metadata.create_all(bind=engine)
app = FastAPI()
# Dependency
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
#app.post("/users/", response_model=schemas.User)
def create_user(user: schemas.UserCreate, db: Session = Depends(get_db)):
db_user = crud.get_user_by_email(db, email=user.email)
if db_user:
raise HTTPException(status_code=400, detail="Email already registered")
return crud.create_user(db=db, user=user)
#app.get("/users/", response_model=List[schemas.User])
def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
users = crud.get_users(db, skip=skip, limit=limit)
return users
#app.get("/users/{user_id}", response_model=schemas.User)
def read_user(user_id: int, db: Session = Depends(get_db)):
db_user = crud.get_user(db, user_id=user_id)
if db_user is None:
raise HTTPException(status_code=404, detail="User not found")
return db_user
#app.post("/users/{user_id}/items/", response_model=schemas.Item)
def create_item_for_user(
user_id: int, item: schemas.ItemCreate, db: Session = Depends(get_db)
):
return crud.create_user_item(db=db, item=item, user_id=user_id)
#app.get("/items/", response_model=List[schemas.Item])
def read_items(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
items = crud.get_items(db, skip=skip, limit=limit)
return items
Now from what I understand:
the models data classes define the sql tables
the schemas data classes define the api that FastAPI uses to interact with the database
they must be convertible into each other so that the set-up works
What I don't understand:
in crud.create_user_item I expected the return type to be schemas.Item, since that return type is used by FastAPI again
according to my understanding the response model of #app.post("/users/{user_id}/items/", response_model=schemas.Item) in the main.py is wrong, or how can I understand the return type inconsistency?
however inferring from the code, the actual return type must be models.Item, how is that handled by FastAPI?
what would be the return type of crud.get_user?
I'll go through your bullet points one by one.
The models data classes define the sql tables
Yes. More precisely, the ORM classes that map to actual database tables are defined in the models module.
the schemas data classes define the api that FastAPI uses to interact with the database
Yes and no. The Pydantic models in the schemas module define the data schemas relevant to the API, yes. But that has nothing to do with the database yet. Some of these schemas define what data is expected to be received by certain API endpoints for the request to be considered valid. Others define what the data returned by certain endpoints will look like.
they must be convertible into each other so that the set-up works
While the database table schemas and the API data schemas are usually very similar, that is not necessarily the case. In the tutorial however, they correspond quite neatly, which allows succinct CRUD code, like this:
db_item = models.Item(**item.dict(), owner_id=user_id)
Here item is a Pydantic model instance, i.e. one of your API data schemas schemas.ItemCreate containing data you decided is necessary for creating a new item. Since its fields (their names and types) correspond to those of the database model models.Item, the latter can be instantiated from the dictionary representation of the former (with the addition of the owner_id).
in crud.create_user_item I expected the return type to be schemas.Item, since that return type is used by FastAPI again
No, this is exactly the magic of FastAPI. The function create_user_item return an instance of models.Item, i.e. the ORM object as constructed from the database (after calling session.refresh on it). And the API route handler function create_item_for_user actually does return that same object (of class models.Item).
However, the #app.post decorator takes that object and uses it to construct an instance of the response_model you defined for that route, which is schemas.Item in this case. This is why you set this in your schemas.Item model:
class Config:
orm_mode = True
This allows an instance of that class to be created via the .from_orm method. That all happens behind the scenes and again depends on the SQLAlchemy model corresponding to the Pydantic model with regards to field names and types. Otherwise validation fails.
according to my understanding the response model [...] is wrong
No, see above. The decorated route function actually deals with the schemas.Item model.
however inferring from the code, the actual return type must be models.Item
Yes, see above. The return type of the undecorated route handler function create_item_for_user is in fact models.Item. But its return type is not the response model.
I assume that to reduce confusion the documentation example does not annotate the return type of those route functions. If it did, it would look like this:
#app.post("/users/{user_id}/items/", response_model=schemas.Item)
def create_item_for_user(
user_id: int, item: schemas.ItemCreate, db: Session = Depends(get_db)
) -> models.Item:
return crud.create_user_item(db=db, item=item, user_id=user_id)
It may help to remember that a decorator is just syntactic sugar for a function that takes a callable (/function) as argument and returns a function. Typically the returned function actually internally calls the function passed to it as argument and does additional things before and/or after that call. I could rewrite the route above like this and it would be exactly the same:
def create_item_for_user(
user_id: int, item: schemas.ItemCreate, db: Session = Depends(get_db)
) -> models.Item:
return crud.create_user_item(db=db, item=item, user_id=user_id)
create_item_for_user = app.post(
"/users/{user_id}/items/", response_model=schemas.Item
)(create_item_for_user)
what would be the return type of crud.get_user?
That would be models.User because that is the database model and is what the first method of that query returns.
db.query(models.User).filter(models.User.id == user_id).first()
This is then again returned by the read_user API route function in the same fashion as I explained above for models.Item.
Hope this helps.

SQLAlchemy 1.4 abstracting MetaData.reflect into function not returning MetaData object?

I have a class that acts as a PostgreSQL database interface. It has a number of methods which do things with the MetaData, such as get table names, drop tables, etc.
These methods keep calling the same two lines to set up MetaData. I am trying to tidy this up by abstracting this MetaData setup into its own function which is initiated when the class is instantiated, but this isn't working, as the function keeps returning NoneType instead of the MetaData instance.
Here is an example of the class, BEFORE adding the MetaData function:
class Db:
def __init__(self, config):
self.engine = create_async_engine(ENGINE, echo=True, future=True)
self.session = sessionmaker(self.engine, expire_on_commit=False, class_=AsyncSession)
def get_table_names(self):
meta = MetaData()
meta.reflect(bind=sync_engine)
meta = self.meta()
return meta.tables.keys()
This works well, returns a list of table keys:
dict_keys(['user', 'images', 'session'])
When I try to shift the MetaData call into its own function like so:
class Db:
def __init__(self, config):
self.engine = create_async_engine(ENGINE, echo=True, future=True)
self.session = sessionmaker(self.engine, expire_on_commit=False, class_=AsyncSession)
self.meta = self.get_metadata()
def get_metadata(self):
meta = MetaData()
return meta.reflect(bind=sync_engine)
def get_table_names(self):
return self.meta.tables.keys()
It returns this error:
in get_table_names return self.meta.tables.keys()
AttributeError: 'NoneType' object has no attribute 'tables'
How can I achieve this sort of functionality by calling self.meta() from within the various class methods?
Reflect alters the current metadata in-place. So you can just return the meta variable explicitly.
class Db:
# ...
def get_metadata(self):
meta = MetaData()
meta.reflect(bind=sync_engine)
return meta
# ...
Although it might be better to do this in a factory function, like def db_factory(config): and inject these things already prepped in the class constructor, def __init__(self, metadata, engine, session):. Just a thought.
Just wanted to post an answer, as with someone else's help I was able to solve this. The code should look like this:
class Db:
def __init__(self, config):
self.engine = create_async_engine(ENGINE, echo=True, future=True)
self.session = sessionmaker(self.engine, expire_on_commit=False, class_=AsyncSession)
self._meta = MetaData()
#property
def meta(self):
self._meta.reflect(bind=sync_engine)
return self._meta
def get_table_names(self):
return self.meta.tables.keys()

Python objects in dealloc in cython

In the docs it is written, that "Any C data that you explicitly allocated (e.g. via malloc) in your __cinit__() method should be freed in your __dealloc__() method."
This is not my case. I have following extension class:
cdef class SomeClass:
cdef dict data
cdef void * u_data
def __init__(self, data_len):
self.data = {'columns': []}
if data_len > 0:
self.data.update({'data': deque(maxlen=data_len)})
else:
self.data.update({'data': []})
self.u_data = <void *>self.data
#property
def data(self):
return self.data
#data.setter
def data(self, new_val: dict):
self.data = new_val
Some c function has an access to this class and it appends some data to SomeClass().data dict. What should I write in __dealloc__, when I want to delete the instance of the SomeClass()?
Maybe something like:
def __dealloc__(self):
self.data = None
free(self.u_data)
Or there is no need to dealloc anything at all?
No you don't need to and no you shouldn't. From the documentation
You need to be careful what you do in a __dealloc__() method. By the time your __dealloc__() method is called, the object may already have been partially destroyed and may not be in a valid state as far as Python is concerned, so you should avoid invoking any Python operations which might touch the object. In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data.
You don’t need to worry about deallocating Python attributes of your object, because that will be done for you by Cython after your __dealloc__() method returns.
You can confirm this by inspecting the C code (you need to look at the full code, not just the annotated HTML). There's an autogenerated function __pyx_tp_dealloc_9someclass_SomeClass (name may vary slightly depending on what you called your module) does a range of things including:
__pyx_pw_9someclass_9SomeClass_3__dealloc__(o);
/* some other code */
Py_CLEAR(p->data);
where the function __pyx_pw_9someclass_9SomeClass_3__dealloc__ is (a wrapper for) your user-defined __dealloc__. Py_CLEAR will ensure that data is appropriately reference-counted then set to NULL.
It's a little hard to follow because it all goes through several layers of wrappers, but you can confirm that it does what the documentation says.

Check if object is an sqlalchemy model instance

I want to know how to know, given an object, if it is an instance of an sqlalchemy mapped model.
Normally, I would use isinstance(obj, DeclarativeBase). However, in this scenario, I do not have the DeclarativeBase class used available (since it is in a dependency project).
I would like to know what is the best practice in this case.
class Person(DeclarativeBase):
__tablename__ = "Persons"
p = Person()
print isinstance(p, DeclarativeBase)
#prints True
#However in my scenario, I do not have the DeclarativeBase available
#since the DeclarativeBase will be constructed in the depending web app
#while my code will act as a library that will be imported into the web app
#what are my alternatives?
You can use class_mapper() and catch the exception.
Or you could use _is_mapped_class, but ideally you should not as it is not a public method.
from sqlalchemy.orm.util import class_mapper
def _is_sa_mapped(cls):
try:
class_mapper(cls)
return True
except:
return False
print _is_sa_mapped(MyClass)
# #note: use this at your own risk as might be removed/renamed in the future
from sqlalchemy.orm.util import _is_mapped_class
print bool(_is_mapped_class(MyClass))
for instances there is the object_mapper(), so:
from sqlalchemy.orm.base import object_mapper
def is_mapped(obj):
try:
object_mapper(obj)
except UnmappedInstanceError:
return False
return True
the complete mapper utilities are documented here: http://docs.sqlalchemy.org/en/rel_1_0/orm/mapping_api.html
Just a consideration: since specific errors are raised by SQLAlchemy (UnmappedClassError for calsses and UnmappedInstanceError for instances) why not catch them rather than a generic exception? ;)