I'm having a confusion about the session object in SQLAlchemy. Is it like the PHP session where a session is all the transactions of a users or is a session an entity which scopes the lifetime of a transaction.
For every transaction in SQLAlchemy, is the procedure as follows:
-create and open session
-perform transaction
-commit or rollback
-close session
So, my question is, for a client, do we create a single session object, or a session object is created whenever we have a transaction to perform
I would be hesitant to compare a SQLAlchemy session with a PHP session, since typically a PHP session refers to cookies, whereas SQLAlchemy has nothing to do with cookies or HTTP at all.
As explained by the documentation:
A Session is typically constructed at the beginning of a logical
operation where database access is potentially anticipated.
The Session, whenever it is used to talk to the database, begins a
database transaction as soon as it starts communicating. Assuming the
autocommit flag is left at its recommended default of False, this
transaction remains in progress until the Session is rolled back,
committed, or closed. The Session will begin a new transaction if it
is used again, subsequent to the previous transaction ending; from
this it follows that the Session is capable of having a lifespan
across many transactions, though only one at a time. We refer to these
two concepts as transaction scope and session scope.
The implication here is that the SQLAlchemy ORM is encouraging the
developer to establish these two scopes in his or her application,
including not only when the scopes begin and end, but also the expanse
of those scopes, for example should a single Session instance be local
to the execution flow within a function or method, should it be a
global object used by the entire application, or somewhere in between
these two.
As you can see, it is completely up to the developer of the application to determine how to use the session. In a simple desktop application, it might make sense to create a single global session object and just keep using that session object, committing as the user hits "save". In a web application, a "session per request handled" strategy is often used. Sometimes you use both strategies in the same application (a session-per-request for web requests, but a single session with slightly different properties for background tasks).
There is no "one size fits all" solution for when to use a session. The documentation does give hints as to how you might go about determining this.
Related
Recently we've added a functionality in our RoR application which allows users to open a particular record, let's say in their own individual tabs. Doing so, we've started seeing frequent ActiveRecord::StaleObject errors. On investigating the issue I found that rails is indeed trying to update the session store first whenever a resource is opened in a tab and the exception is raised.
We've lock_version in our active record session store, so Rails is taking it as optimistic locking by default. Is there any way we could solve this issue without introducing much complexity, as the application is already live on the client's machine and without affecting any sessions' data we've stored in our session store DB.
Any suggestions would be much appreciated. Thanks
It sounds like you're using optimistic locking on a db session record and updating the session record when you process an update to other records. Not sure what you'd need to update in the session, but if you're worried about possibly conflicting updates to the session object (and need the locking) then these errors might be desired.
If you don't - you can refresh the session object before saving the session (or disable it's optimistic locking) to avoid this error for these session updates.
You also might look into what about the session is being updated and whether it's strictly necessary. If you're updating something like "last_active_on" then you might be better off sending off a background job to do this and/or using the update_column method which bypasses the rather heavyweight activerecord save callback chain.
--- UPDATE ---
Pattern: Putting side-effects in background jobs
There are several common Rails patterns that start to break down as your app usage grows. One of the most common that I've run into is when a controller endpoint for a specific record also updates a common/shared record (for example, if creating a 'message' also updates the messages_count for a user using counter cache, or updates a last_active_at on a session). These patterns create bottlenecks in your application as multiple different types of requests across your application will compete for write locks on the same database rows unnecessarily.
These tend to creep into your app over time and become hard to refactor later. I'd recommend always handling side-effects of a request in an asynchronous job (using something like Sidekiq). Something like:
class Message < ActiveRecord::Base
after_commit :enqueue_update_messages_count_job
def enqueue_update_messages_count_job
Jobs::UpdateUserMessageCountJob.enqueue(self.id)
end
end
While this may seem like overkill at first, it creates an architecture that is significantly more scalable. If counting the messages becomes slow... that will make the job slower but not impact the usability of the product. In addition, if certain activities create lots of objects with the same side-effects (lets say you have a "signup" controller that creates a bunch of objects for a user that all trigger an update of user.updated_at) it becomes easy to throw out duplicate jobs and prevent updating the same field 20 times.
Pattern: Skipping the activerecord callback chain
Calling save on an ActiveRecord object runs validations and all the before and after callbacks. These can be slow and (at times) unnecessary. For example, updating a message_count cached value doesn't necessarily care about whether the user's email address is valid (or any other validations) and you may not care about other callbacks running. Similar if you're just updating a user's updated_at value to clear a cache. You can bypass the activerecord callback chain by calling user.update_attribute(:message_count, ..) to write that field directly to the database. In theory this shouldn't be necessary for a well designed application but in practice some larger/legacy codebases may make significant use of the activerecord callback chain to handle business logic that you may not want to invoke.
--- Update #2 ---
On Deadlocks
One reason to avoid updating (or generally locking) a common/shared object from a concurrent request is that it can introduce Deadlock errors.
Generally speaking a "Deadlock" in a database is when there are two processes that both need a lock the other one has. Neither thread can continue so it must error instead. In practice, detecting this is hard, so some databases (like postgres) just throw a "Deadlock" error after a thread waits for an exclusive/write lock for x amount of time. While contention for locks is common (e.g. two updates that are both updating a 'session' object), a true deadlock is often rare (where thread A has a lock on the session that thread B needs, but thread B has a lock on a different object that thread A needs), so you may be able to partially address the problem by looking at / extending your deadlock timeout. While this may reduce the errors, it doesn't fix the issue that the threads may be waiting for up to the deadlock timeout. An alternative approach is to have a short deadlock timeout and rescue/retry a few times.
My service is clustered and I am running several instances of it.
I need to collect all entities in the paginated fashion and push them into the caching layer (Redis).
While doing so on one application server, an application that is running on server #2 can already be making the changes.
Those paginated calls to db will be fetching 1000 items at one call.
Now, since I want to prevent modifications while retrieval is ongoing, how do I achieve that?
Can I use SELECT FOR UPDATE mechanism even though I am not updating anything in this transaction, but only fetch the data in a paginated fashion?
If it were one app instance with multiple threads, you could use a critical section. But that doesn't work for a cluster of app instances.
I implemented this for a service a couple of months ago. The app is deployed in several instances. These instances don't communicate with each other, so they can't coordinate directly. But they all connect to the same MySQL database.
What I did was use the GET_LOCK() builtin function of MySQL.
When a routine wants exclusive access, it calls GET_LOCK('mylock', 0). This returns immediately, with a true value if it acquired the lock, or a false value if the lock was already held by some other client. That tells the client app whether it is the "winner" or not.
If a client is not the winner, then it calls GET_LOCK('mylock', -1) which means wait indefinitely. It does this because the winner is working on whatever it needs to do in the critical section.
When the winner finishes, it must call RELEASE_LOCK('mylock'). This unblocks the clients who were waiting. They now know that the work of the critical section is done, and they can feel free to read the contents of the cache or whatever else they need to do.
Also remember that the client who were waiting on GET_LOCK('mylock', -1) need to call RELEASE_LOCK('mylock') immediately, because once they stopped waiting, they actually acquired the lock themselves.
This design allows a single lock coordinator (MySQL) to be used by multiple clients. It implements pessimistic locking, without needing to rely on locking any table or set of rows.
Hi am confused with sql servers session. What does it actually mean? Does it keep track of the client like httpSession? I have read some documents on query life cycle. None talks about the sesion. Most of the documents say that after the query is recived by the server it gets parsed and then maintains a syntax tree and then execution plan and then executes the query and then a dispatch palan and then dispatches the resultset to the client who issued the query on the server. In the whole story where does the session on sql server like mysql server fits in and what actually it does? or There is no session concept on Mysql server(any sql server)? am i in wrong imagination?
A session in this context usually just refers to a single client connection.
The client connects to the DB server and authenticates; this is the start of the session.
When the client disconnects (gracefully or not) the session ends.
This is relevant for things like temporary tables or transactions: Un-committed transactions will be rolled back by the DBMS and all temporary tables created through this connection (=session) are discarded when the client disconnects, i.e. when the session ends.
Note that a client does not necessarily actively end a session or connection. The client may crash, or the network connection may break, or the server may shut down &c. Any of this implicitly terminates the session.
Problems may arise when a (client) application uses a connection pool keeping connections (and sessions) open and handing them out transparently to different application components. When not handled correctly, errors may occur because a given session may already be 'spoiled' by a previous operation. If, for example, one routine on the client creates a temporary table named 'X' and fails to explicitly drop it afterwards, the next routine that 'inherits' this session may encounter an error when trying to create another temporary table of that name, because it already exists in this specific session; which couldn't be the case if the connection/session was freshly created.
"Session" is mainly a generic term. You connect to a server (MySQL, Oracle, FTP, IRC... whatever), you do your stuff and finally disconnect when you're done. That has been a session.
HTTP is a particular case. It's a stateless protocol: if you spend an hour reading a web site, you don't remain connected for a whole hour. You make a quick connection, fetch an item at a time (an HTML document, a style sheet, a picture...) and close the connection. (Internals are actually more complex but that's the general idea.) When you ask for a second page, the server doesn't know who you are: that makes it impossible to keep track of your whole browsing session at protocol level. Thus HTTP sessions were invented: they're a way to emulate physical sessions.
The MySQL session starts when you open a connection to the server. A connection ID is assigned which can be read via the SELECT CONNECTION_ID() statement. The session is terminated when the connection is closed or, in case of persistent connections, after a certain timeout or when the server shuts down.
I'm actually using SQLAlchemy with MySQL and Pyro to make a server program. Many clients connect to this server to make requests. The programs only provides the information from the database MySQL and sometimes make some calculations.
Is it better to create a session for each client or to use the same session for every clients?
What you want is a scoped_session.
The benefits are (compared to a single shared session between clients):
No locking needed
Transactions supported
Connection pool to database (implicit done by SQLAlchemy)
How to use it
You just create the scoped_session:
Session = scoped_session(some_factory)
and access it in your Pyro methods:
class MyPyroObject():
def remote_method(self):
Session.query(MyModel).filter...
Behind the scenes
The code above guarantees that the Session is created and closed as needed. The session object is created as soon as you access it the first time in a thread and will be removed/closed after the thread is finished (ref). As each Pyro client connection has its own thread on the default setting (don't change it!), you will have one session per client.
The best I can try is to create new Session in every client's request. I hope there is no penalty in the performance.
I have a desktop application that runs on a network and every instance connects to the same database.
So, in this situation, how can I implement a mutex that works across all running instances that are connected to the same database?
In other words, I don't wan't that two+ instances to run the same function at the same time. If one is already running the function, the other instances shouldn't have access to it.
PS: Database transaction won't solve, because the function I wan't to mutex doesn't use the database. I've mentioned the database just because it can be used to exchange information across the running instances.
PS2: The function takes about ~30 minutes to complete, so if a second instance tries to run the same function I would like to display a nice message that it can't be performed right now because computer 'X' is already running that function.
PS3: The function has to be processed on the client machine, so I can't use stored procedures.
I think you're looking for a database transaction. A transaction will isolate your changes from all other clients.
Update:
You mentioned that the function doesn't currently write to the database. If you want to mutex this function, there will have to be some central location to store the current mutex holder. The database can work for this -- just add a new table that includes the computername of the current holder. Check that table before starting your function.
I think your question may be confusion though. Mutexes should be about protecting resources. If your function is not accessing the database, then what shared resource are you protecting?
put the code inside a transaction either - in the app, or better -inside a stored procedure, and call the stored procedure.
the transaction mechanism will isolate the code between the callers.
Conversely consider a message queue. As mentioned, the DB should manage all of this for you either in transactions or serial access to tables (ala MyISAM).
In the past I have done the following:
Create a table that basically has two fields, function_name and is_running
I don't know what RDBMS you are using, but most have a way to lock individual records for update. Here is some pseduocode based on Oracle:
BEGIN TRANS
SELECT FOR UPDATE is_running FROM function_table WHERE function_name='foo';
-- Check here to see if it is running, if not, you can set running to 'true'
UPDATE function_table set is_running='Y' where function_name='foo';
COMMIT TRANS
Now I don't have the Oracle PSQL docs with me, but you get the idea. The 'FOR UPDATE' clause locks there record after the read until the commit, so other processes will block on that SELECT statement until the current process commits.
You can use Terracotta to implement such functionality, if you've got a Java stack.
Even if your function does not currently use the database, you could still solve the problem with a specific table for the purpose of synchronizing this function. The specifics would depend on your DB and how it handles isolation levels and locking. For example, with SQL Server you would set the transaction isolation to repeatable read, read a value from your locking row and update it inside a transaction. Don't commit the transaction until your function is done. You can also use explicit table locks in a transaction on most databases which might be simpler. This is probably the simplest solution given you are already using a database.
If you do not want to rely on the database for whatever reason you could write a simple service that would accept TCP connections from your client. Each client would request permission to run and would return a response when done. The server would be able to ensure only one client gets permission to run at a time. Dead clients would eventually drop the TCP connection and be detected as long as you have the correct keep alive setting.
The message queue solution suggested by Xepoch would also work. You could use something like MSMQ or Java Message Queue and have a single message that would act as a run token. All your clients would request the message and then repost it when done. You risk a deadlock if a client dies before reposting so you would need to devise some logic to detect this and it might get complicated.