In SQLAlchemy, what's the difference between DBSession and DBSession()? - sqlalchemy

DBSession = scoped_session(sessionmaker(bind=engine)
#1
DBSession.add(someobject)
DBsession.commit()
#2
session = DBSession()
session.add(someobject)
session.commit()
What's the difference between #1 and #2?
I use #1 in my Pyramid app and there're a lot of exceptions of 'MySQL has gone away'

There isn't any as sqlalchemy makes most Session methods also available as classmethods for ScopedSessions. This includes add() and commit().

Related

How to use a standalone Operations instance and rollback changes?

The Alembic documentation states that "a standalone Operations instance can be made for use cases external to regular Alembic migrations by passing in a MigrationContext". An example is given:
from alembic.migration import MigrationContext
from alembic.operations import Operations
conn = myengine.connect()
ctx = MigrationContext.configure(conn)
op = Operations(ctx)
op.alter_column("t", "c", nullable=True)
How can this be done as a transaction? In other words, how can these operations be rolled back?

Difference between Sqlalchemy scoped_session(..) and scoped_session(..)()

what is the difference between using scoped_session explicitly:
engine = create_engine(url)
session = scoped_session(sessionmaker(bind=engine))
session.add(..)
session.commit()
session.remove()
session.add(..)
session.commit()
session.remove()
and creating instance of scoped_session object:
engine = create_engine(url)
session = scoped_session(sessionmaker(bind=engine))
session().add(..)
session().commit()
session.remove()
session().add(..)
session().commit()
session.remove()
Sqlachemy always return the same session for the same thread by calling session():
>> session() is session()
True
>> session is session()
False
Is that a proper way to manipulate a connection in multithreaded environment? If so, why sqlalchemy allow to querying using session instead of session()?
"scoped_session" returns a factory object, so you must call the factory to return an instance. "scoped_session" will actually return the same session when called from the same scope (in almost every use case, the scope is individual user requests to a web page).
So though you call session() repeatedly, it isn't actually creating multiple sessions, but the same one is being returned each time.
I recommend using a capital S to denote the fact that session is a factory and not an instance of an object.
More documentation here, going into a lot more detail than I did: http://docs.sqlalchemy.org/en/latest/orm/contextual.html
EDIT: Both ways will access the same object. I've always found it more clear to produce an instance from scoped_session (not all factories provide functionality like this), but both will access the the thread local session object.
http://docs.sqlalchemy.org/en/latest/orm/contextual.html#implicit-method-access

o I need to set somehow create_engine put in singleton to be like singleton?

On server I am using combination of Tornado and SQLAlchemy (maybe SQLAlchemy is not best choice for async server but it is temporary) I split project and handlers in 10 files/modules.
In every module I am using session = Session() and session to query database.
common part of every module looks like
...
import tornado.ioloop
engine = create_engine(DB_URL, echo=False, pool_size=100, pool_recycle=3600)
Session = sessionmaker(bind=engine)
class BaseHandler(tornado.web.RequestHandler):
....
Do I need to set somehow
engine = create_engine(DB_URL, echo=False, pool_size=100, pool_recycle=3600)
Session = sessionmaker(bind=engine)
to be like singleton, not to create in every module or this is ok way to do things and create sessions.
You probably want to use scoped_session which essentially serves as a thread-local singleton, creating sessions on-demand using the provided factory function.
In one module imported by all others you write:
engine = create_engine(DB_URL, echo=False, pool_size=100, pool_recycle=3600)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
# or make it a Tornado Application property
And then either use Session as an explicit factory:
session = Session()
session.query(...)
Or use implicit method delegation:
Session.query(...)

"ResourceClosedError: The transaction is closed" error with celery beat and sqlalchemy + pyramid app

I have a pyramid app called mainsite.
The site works in a pretty asynchronous manner mostly through threads being launched from the view to carry out the backend operations.
It connects to mysql with sqlalchemy and uses ZopeTransactionExtension for session management.
So far the application has been running great.
I need to run periodic jobs on it and it needs to use some of the same asynchronous functions that are being launched from the view.
I used apscheduler but ran into issues with that. So I thought of using celery beat as a separate process that treats mainapp as a library and imports the functions to be used.
My celery config looks like this:
from datetime import timedelta
from api.apiconst import RERUN_CHECK_INTERVAL, AUTOMATION_CHECK_INTERVAL, \
AUTH_DELETE_TIME
BROKER_URL = 'sqla+mysql://em:em#localhost/edgem'
CELERY_RESULT_BACKEND = "database"
CELERY_RESULT_DBURI = 'mysql://em:em#localhost/edgem'
CELERYBEAT_SCHEDULE = {
'rerun': {
'task': 'tasks.rerun_scheduler',
'schedule': timedelta(seconds=RERUN_CHECK_INTERVAL)
},
'automate': {
'task': 'tasks.automation_scheduler',
'schedule': timedelta(seconds=20)
},
'remove-tokens': {
'task': 'tasks.token_remover_scheduler',
'schedule': timedelta(seconds=2 * 24 * 3600 )
},
}
CELERY_TIMEZONE = 'UTC'
The tasks.py is
from celery import Celery
celery = Celery('tasks')
celery.config_from_object('celeryconfig')
#celery.task
def rerun_scheduler():
from mainsite.task import check_update_rerun_tasks
check_update_rerun_tasks()
#celery.task
def automation_scheduler():
from mainsite.task import automate
automate()
#celery.task
def token_remover_scheduler():
from mainsite.auth_service import delete_old_tokens
delete_old_tokens()
keep in mind that all the above functions immediately return but launch threads if required
The threads save objects into db by doing transaction.commit() after session.add(object).
The problem is that the whole things works like a gem only for about 30 minutes. After that ResourceClosedError: The transaction is closed errors starts happening wherever there is a transaction.commit(). I am not sure what is the problem and I need help troubleshooting.
The reason I do import inside the tasks was to get rid of this error. Thought importing every time task needed to be run was a good idea and I may get a new transaction each time, but looks like that is not the case.
In my experience trying to reuse a session configured to be used with Pyramid (with ZopeTransactionExtension etc.) with a Celery worker results in a terrible hard-to-debug mess.
ZopeTransactionExtension binds SQLAlchemy session to Pyramid's request-response cycle - a transaction is started and committed or rolled back automatically, you're generally not supposed to use transaction.commit() within your code - if everything is ok ZTE will commit everything, if your code raises and exception your transaction will be rolled back.
With Celery you need to manage SQLAlchemy sessions manually, which ZTE prevents you from doing, so you need to configure your DBSession differently.
Something simple like this would work:
DBSession = None
def set_dbsession(session):
global DBSession
if DBSession is not None:
raise AttributeError("DBSession has been already set to %s!" % DBSession)
DBSession = session
And then from Pyramid startup code you do
def main(global_config, **settings):
...
set_dbsession(scoped_session(sessionmaker(extension=ZopeTransactionExtension())))
With Celery it's a bit trickier - I ended up creating a custom start script for Celery, in which I configure the session.
In setup.py of the worker egg:
entry_points="""
# -*- Entry points: -*-
[console_scripts]
custom_celery = worker.celeryd:start_celery
custom_celerybeat = worker.celeryd:start_celerybeat
""",
)
in worker/celeryd.py:
def initialize_async_session(db_string, db_echo):
import sqlalchemy as sa
from db import Base, set_dbsession
session = sa.orm.scoped_session(sa.orm.sessionmaker(autoflush=True, autocommit=True))
engine = sa.create_engine(db_string, echo=db_echo)
session.configure(bind=engine)
set_dbsession(session)
Base.metadata.bind = engine
def start_celery():
initialize_async_session(DB_STRING, DB_ECHO)
import celery.bin.celeryd
celery.bin.celeryd.main()
The general approach you're using with "threads being launched from the view to carry out the backend operations" feels a bit dangerous to me if you ever plan to deploy the application to a production server - a web server often recycles, kills or creates new "workers" so generally there are no guarantees each particular process would survive beyond the current request-response cycle. I never tried doing this though, so maybe you'll be ok :)

python: sqlalchemy - how do I ensure connection not stale using new event system

I am using the sqlalchemy package in python. I have an operation that takes some time to execute after I perform an autoload on an existing table. This causes the following error when I attempt to use the connection:
sqlalchemy.exc.OperationalError: (OperationalError) (2006, 'MySQL server has gone away')
I have a simple utility function that performs an insert many:
def insert_data(data_2_insert, table_name):
engine = create_engine('mysql://blah:blah123#localhost/dbname')
# Metadata is a Table catalog.
metadata = MetaData()
table = Table(table_name, metadata, autoload=True, autoload_with=engine)
for c in mytable.c:
print c
column_names = tuple(c.name for c in mytable.c)
final_data = [dict(zip(column_names, x)) for x in data_2_insert]
ins = mytable.insert()
conn = engine.connect()
conn.execute(ins, final_data)
conn.close()
It is the following line that times long time to execute since 'data_2_insert' has 677,161 rows.
final_data = [dict(zip(column_names, x)) for x in data_2_insert]
I came across this question which refers to a similar problem. However I am not sure how to implement the connection management suggested by the accepted answer because robots.jpg pointed this out in a comment:
Note for SQLAlchemy 0.7 - PoolListener is deprecated, but the same solution can be implemented using the new event system.
If someone can please show me a couple of pointers on how I could go about integrating the suggestions into the way I use sqlalchemy I would be very appreciative. Thank you.
I think you are looking for something like this:
from sqlalchemy import exc, event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def check_connection(dbapi_con, con_record, con_proxy):
'''Listener for Pool checkout events that pings every connection before using.
Implements pessimistic disconnect handling strategy. See also:
http://docs.sqlalchemy.org/en/rel_0_8/core/pooling.html#disconnect-handling-pessimistic'''
cursor = dbapi_con.cursor()
try:
cursor.execute("SELECT 1") # could also be dbapi_con.ping(),
# not sure what is better
except exc.OperationalError, ex:
if ex.args[0] in (2006, # MySQL server has gone away
2013, # Lost connection to MySQL server during query
2055): # Lost connection to MySQL server at '%s', system error: %d
# caught by pool, which will retry with a new connection
raise exc.DisconnectionError()
else:
raise
If you wish to trigger this strategy conditionally, you should avoid use of decorator here and instead register listener using listen() function:
# somewhere during app initialization
if config.check_connection_on_checkout:
event.listen(Pool, "checkout", check_connection)
More info:
Connection Pool Events
Events API
There is a better way to handle it right now - pool_recycle
engine = create_engine('mysql://...', pool_recycle=3600)
MySQL has a default timeout of 8 hours.
This leads to the connection to be closed by MySQL but the engine above it (such as SQLAlchemy) to not know about it.
There are 2 ways to solve it -
Optimistic - Using pool_recycle
Pessimistic - using pool_pre_ping=True
I prefer to go with the pool_recycle as it doesn't emit a SELECT 1 before each query - causing less stress on the db