I'm currently using Peewee as an ORM in my project. In my current situation, I have some processes each having a database connection. All of these processes need to access a certain table simultaneously. I'm looking for some way to make them coordinated without using a central controller. For this, when a row is read by a process, it must get locked and no other process could read that row. Blocked processes should continue with other non-blocked rows.
I searched around and found that MySql already has an internal locking mechanism, described here and apparently it must be used on indexed columns to behave as expected (from here). But I couldn't find anything related in the peewee documents. Is there any extension providing these feature? Or should i write raw SQL queries containing FOR Update clause?
Peewee supports using the FOR UPDATE clause, and I think this is probably what you want. It won't prevent other clients from reading, but it will prevent modifications for as long as the transaction holding the lock is open.
Ex:
with db.atomic():
# lock note.
note = Note.select().where(Note.id == 123).for_update().get()
# As long as lock is held no other client can modify note.
Related
I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks
Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.
The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.
I'm working with Ruby On Rails (but it doesn't really matter) with a SQL backend, either MySQL or Postgres.
The web application will be multi-process, with a cluster of app-server processes running and working on the same DB.
I was wondering: is there any good and common strategy to handle racing conditions?
Since it's going to be a DB-intense application, I can easily see how two clients can try to modify the same data at the same time.
Let's simplify the situation:
Two clients/users GET the same data, it doesn't matter if this happens at the same time.
They are served with two web pages representing the same data.
Later both of them try to write some incompatible modifications to the same record.
Is there a simple way to handle this kind of situation?
I was thinking of using id-tokens associated with each record. This tokens would be changed upon updates of the records, thus invalidating any subsequent update attempt based on stale data (old expired token).
Is there a better way? Maybe something already built in MySQL?
I'm also interested in coding patterns used in this cases.
thanks
Optimistic locking
The standard way to handle this in webapps is to use what's referred to as "optimistic locking".
Each record has a unique ID and an integer (or timestamp, but integer is better) optimistic lock field. This oplock filed is initialized to 0 on record creation.
When you get the record you get the oplock field with it.
When you set the record you set the oplock value to the oplock you retrieved with the SELECT plus one and you make the UPDATE conditional on the oplock value still being what it was when you last looked:
UPDATE thetable
SET field1 = ...,
field2 = ...,
oplock = 1
WHERE record_id = ...
AND oplock = 0;
If you lost a race with another session this statement will still succeed but it will report zero rows affected. That allows you to tell the user their change collided with changes by another user or to merge their changes and re-send, depending on what makes sense in that part of the app.
Many frameworks provide tooling to help automate this, and most ORMs can do it out of the box. Ruby on Rails supports optimistic locking.
Be careful when combining optimistic locking with pessimistic locking (as described below) for traditional applications. It can work, you just need to add a trigger on all optimistically lockable tables that increments the oplock column on an UPDATE if the UPDATE statement didn't do so its self. I wrote a PostgreSQL trigger for Hibernate oplock support that should be readily adaptable to Rails. You only need this if you're going to update the DB from outside Rails, but in my view it's always a good idea to be safe.
Pessimistic locking
The more traditional approach to this is to begin a transaction and do a SELECT ... FOR UPDATE when fetching a record you intend to modify. You then hold the transaction open and idle while the user ponders what they're going to do and issue the UPDATE on the already-locked record before COMMITting.
This doesn't work well and I don't recommend it. It requires an open, often idle transaction for each user. This can cause problems with MVCC row cleanup in PostgreSQL and can cause locking problems in applications. It's also very inefficient for large applications with high user counts.
Insert races
Dealing with races on INSERT requires you to have a suitable application level unique key on the table, so inserts fail when they conflict.
As our Rails application deals with increasing user activity and load, we're starting to see some issues with simultaneous transactions. We've used JavaScript to disable / remove the buttons after clicks, and this works for the most part, but isn't an ideal solution. In short, users are performing an action multiple times in rapid succession. Because the action results in a row insert into the DB, we can't just lock one row in the table. Given the high level of activity on the affected models, I can't use the usual locking mechanims ( http://guides.rubyonrails.org/active_record_querying.html#locking-records-for-update ) that you would use for an update.
This question ( Prevent simultaneous transactions in a web application ) addresses a similar issue, but it uses file locking (flock) to provide a solution, so this won't work with multiple application servers, as we have. We could do something similar I suppose with Redis or another data store that is available to all of our application servers, but I don't know if this really solves the problem fully either.
What is the best way to prevent duplicate database inserts from simultaneously executed transactions?
Try adding a unique index to the table where you are having the issue. It won't prevent the system from attempting to insert duplicate data, but it will prevent it from getting stored in the database. You will just need to handle the insert when it fails.
Hi I am developing a site with JSP/Servlets running on Tomcat for the front-end and with a MySql db for the backend which is accessed through JDBC.
Many users of the site can access and write to the database at the same time ,my question is :
Do i need to explicitly take locks before each write/read access to the db in my code?
OR Does Tomcat handle this for me?
Also do you have any suggestions on how best to implement this ? I have written a significant amount of JDBC code already without taking the locks :/
I think you are thinking about transactions when you say "locks". At the lowest level, your database server already ensure that parallel read writes won't corrupt your tables.
But if you want to ensure consistency across tables, you need to employ transactions. Simply put, what transactions provide you is an all-or-nothing guarantee. That is, if you want to insert a Order in one table and related OrderItems in another table, what you need is an assurance that if insertion of OrderItems fails (in step 2), the changes made to Order tables (step 1) will also get rolled back. This way you'll never end up in a situation where an row in Order table have no associated rows in Order items.
This, off-course, is a very simplified representation of what a transaction is. You should read more about it if you are serious about database programming.
In java, you usually do transactions by roughly with following steps:
Set autocommit to false on your jdbc connection
Do several insert and/or updates using the same connection
Call conn.commit() when all the insert/updates that goes together are done
If there is a problem somewhere during step 2, call conn.rollback()
I have an asp.net website that stores events inside a database table. Then I have a windows service app that reads those events and performs appropriate actions. Currently its possible for the two processes to insert and remove records from the same table at the same time.
What is a better pattern for developing such a system so to insure the two are never working on the same table simultaneously?
I'm not sure about pattern but I'd do a WCF-service and let both use that to access the data. Then share a common lock object between all methods that alter (or read) the table contents.
For this scenario I use a pattern in that ensures that the data cannot be updated concurrently.
I always add a special column to the table, usually 'LastModified' of type 'timestamp'. When adding or inserting a row I always set this column.
When I come to update a record I make sure that the stored procedure checks the value that I am passing in with that stored in the database. If these are different then another user or process has altered this row, and I raise a concurrency error.
This can be propergated up to the calling process or handled in your service.
This could be an architecture problem more than anything else.
Why would you need two processes that delete records?
You generally don't need two different processes to CRUD data in the same tables. One thing you can do is wrap the database/tables with a service, then let all processes that require working with the data use that service. The service can then take care of the serialization of calls. Either way, there will be only 1 process working with the DB directly.
Additionally, it sounds to me like you're in an event-sourcing type of architecture, which makes me wonder why you'd need to delete records in the first place...