Preventing duplicate database inserts/updates in our Rails app from simultaneous transactions - mysql

As our Rails application deals with increasing user activity and load, we're starting to see some issues with simultaneous transactions. We've used JavaScript to disable / remove the buttons after clicks, and this works for the most part, but isn't an ideal solution. In short, users are performing an action multiple times in rapid succession. Because the action results in a row insert into the DB, we can't just lock one row in the table. Given the high level of activity on the affected models, I can't use the usual locking mechanims ( http://guides.rubyonrails.org/active_record_querying.html#locking-records-for-update ) that you would use for an update.
This question ( Prevent simultaneous transactions in a web application ) addresses a similar issue, but it uses file locking (flock) to provide a solution, so this won't work with multiple application servers, as we have. We could do something similar I suppose with Redis or another data store that is available to all of our application servers, but I don't know if this really solves the problem fully either.
What is the best way to prevent duplicate database inserts from simultaneously executed transactions?

Try adding a unique index to the table where you are having the issue. It won't prevent the system from attempting to insert duplicate data, but it will prevent it from getting stored in the database. You will just need to handle the insert when it fails.

Related

Can I INSERT into table while UPDATING multiple different rows with MariaDB or MySQL?

I am creating a custom analytics system and currently in the database designing process. I'm planning to use MariaDB with the InnoDB engine to be able to handle big loads.
The data I'm expecting could be around 500k clicks/day. I will need to insert these rows into the database, which means that I'll have around 5.8 inserts/sec on average. However, at the same time, I want to record if someone visited a page associated with that click. (basically to record funnels)
So what I'm planning to do is to create additional columns and search for the ID of the specific row then update that column with the exact time of the visit.
My first question: is this generally a recommended approach to design the database like that? If not, how else is it worth to design the database?
My only concern is that while updating rows the Table will be locked, and can't do inserts, therefore slowing down the user experience.
My second question: is this something I should worry about, that the table gets locked while updating, and thus slowing down inserts? Does it hurt performance?
InnoDB doesn't lock the table for insert if you're performing the update. Your users won't experience any weird hanging.
It's an MVCC compliant engine, designed to handle concurrent access to underlying tables.
You can control the engine's behavior by choosing an appropriate isolation level, however the default (REPEATABLE READ) is excellent and does the job more than well.
If a table is being modified by multiple users (not users that connect to your site but connections established towards MySQL via a scripting language or some other service) and there's many inserts/updates/deletes - MySQL can throw an error saying a deadlock occurred.
A deadlock is a warning, not an error, that more than 1 thread tried to access an occupied resource (such as two threads tried to update the same row at the same time, but only 1 will be allowed to do so). It's an indication you should repeat the query.
I'm suggesting that you take care of all possible scenarios in the language of your choice when it comes to handling MySQL that's under heavier I/O.
~6 inserts a second isn't a lot, make sure you're allowing MySQL to access sufficient system resources. For InnoDB, check the value of innodb_buffer_pool_size or google a bit to see what it is and how to use it to make your database run fast.
Good luck!
At a mere 5.6/second, there won't be much problem.
I do, however, suggest vertical partitioning for "Likes", "Upvotes", "Clicks", and similar things. These tend to have a lot of UPDATEs of random single rows, and may interfere with other activity.
That is, have a separate table with (perhaps) just 2 columns:
The id of the item being Liked/Clicked/etc.
A counter.
It is simple enough (and fast enough) to JOIN via that id when you want to display info including the counter.
As already pointed out, the row is locked, not the table.

MySQL - Is it possible to run multiple synchronous inserts?

I googled and searched on SO, but was not able to find an answer; maybe you could point me to some reference/docs?
This is more about understanding the way MySQL treats table contents while inserting.
I have a table (Myisam) which has an auto-increment primary key 'autoid'. I am using a simple script to insert 1000s+ of records. What I am trying to do is running multiple instances of this script (you can image it similar to accessing the script from different machines at same time).
Is MySql capable of distributing the auto-increment primary keys accordingly without any further action from my side or do I have to do some sort of table locking for each machine? Maybe I have to choose InnoDb over MyIsam?
What I am trying to achieve is: irrespective of how many machines are simultaneously triggering the script, all inserts should be completed without skipping any auto-increment id or throwing errors like "Duplicate Value for...".
Thanks a lot
The whole point of using a database is that can handle situations like this transactionally. So yes, this scenario works fine on every commonly used DBMS system, including MySQL.
How do you think the average forum would work with 50 users simultaneously posting replies to a topic, all from forked parallel Apache processes so possible only microseconds apart, or from multiple loadbalanced webservers?
Internally it just uses a mutex/semaphore like any other process when accessing and incrementing the shared resource (the autoincrement value of a table in this case) to mitigate the inherent race conditions.

Insert/ update at the same time in a MySql table?

I have a MySql database hosted on a webserver which has a set of tables with data in it. I am distributing my front end application which is build using HTML5 / Javascript /CS3.
Now when multiple users tries to make an insert/update into one of the tables at the same time is it going to create a conflict or will it handle the locking of the table for me automatically example when one user is using, it will lock the table for him and then let the rest follow in a queue once the user finishes it will release the lock and then give it to the next in the queue ? Is this going to happen or do i need to handle the case in mysql database
EXAMPLE:
When a user wants to make an insert into the database he calls a php file located on a webserver which has an insert command to post data into the database. I am concerned if two or more people make an insert at the same time will it make the update.
mysqli_query($con,"INSERT INTO cfv_postbusupdate (BusNumber, Direction, StopNames, Status, comments, username, dayofweek, time) VALUES (".trim($busnum).", '".trim($direction3)."', '".trim($stopname3)."', '".$status."', '".$comments."', '".$username."', '".trim($dayofweek3)."', '".trim($btime3)."' )");
MySQL handles table locking automatically.
Note that with MyISAM engine, the entire table gets locked, and statements will block ("queue up") waiting for a lock to be released.
The InnoDB engine provides more concurrency, and can do row level locking, rather than locking the entire table.
There may be some cases where you want to take locks on multiple MyISAM tables, if you want to maintain referential integrity, for example, and you want to disallow other sessions from making changes to any of the tables while your session does its work. But, this really kills concurrency; this should be more of an "admin" type function, not really something a concurrent application should be doing.
If you are making use of transactions (InnoDB), the issue your application needs to deal with is the sequence in which rows in which tables are locked; it's possible for an application to experience "deadlock" exceptions, when MySQL detects that there are two (or more) transactions that can't proceed because each needs to obtain locks held by the other. The only thing MySQL can do is detect that, and the only recovery MySQL can do for this is to choose one of the transactions to be the victim, that's the transaction that will get the "deadlock" exception, because MySQL killed it, to allow at least one of the transactions to proceed.

Relational DB racing conditions

I'm working with Ruby On Rails (but it doesn't really matter) with a SQL backend, either MySQL or Postgres.
The web application will be multi-process, with a cluster of app-server processes running and working on the same DB.
I was wondering: is there any good and common strategy to handle racing conditions?
Since it's going to be a DB-intense application, I can easily see how two clients can try to modify the same data at the same time.
Let's simplify the situation:
Two clients/users GET the same data, it doesn't matter if this happens at the same time.
They are served with two web pages representing the same data.
Later both of them try to write some incompatible modifications to the same record.
Is there a simple way to handle this kind of situation?
I was thinking of using id-tokens associated with each record. This tokens would be changed upon updates of the records, thus invalidating any subsequent update attempt based on stale data (old expired token).
Is there a better way? Maybe something already built in MySQL?
I'm also interested in coding patterns used in this cases.
thanks
Optimistic locking
The standard way to handle this in webapps is to use what's referred to as "optimistic locking".
Each record has a unique ID and an integer (or timestamp, but integer is better) optimistic lock field. This oplock filed is initialized to 0 on record creation.
When you get the record you get the oplock field with it.
When you set the record you set the oplock value to the oplock you retrieved with the SELECT plus one and you make the UPDATE conditional on the oplock value still being what it was when you last looked:
UPDATE thetable
SET field1 = ...,
field2 = ...,
oplock = 1
WHERE record_id = ...
AND oplock = 0;
If you lost a race with another session this statement will still succeed but it will report zero rows affected. That allows you to tell the user their change collided with changes by another user or to merge their changes and re-send, depending on what makes sense in that part of the app.
Many frameworks provide tooling to help automate this, and most ORMs can do it out of the box. Ruby on Rails supports optimistic locking.
Be careful when combining optimistic locking with pessimistic locking (as described below) for traditional applications. It can work, you just need to add a trigger on all optimistically lockable tables that increments the oplock column on an UPDATE if the UPDATE statement didn't do so its self. I wrote a PostgreSQL trigger for Hibernate oplock support that should be readily adaptable to Rails. You only need this if you're going to update the DB from outside Rails, but in my view it's always a good idea to be safe.
Pessimistic locking
The more traditional approach to this is to begin a transaction and do a SELECT ... FOR UPDATE when fetching a record you intend to modify. You then hold the transaction open and idle while the user ponders what they're going to do and issue the UPDATE on the already-locked record before COMMITting.
This doesn't work well and I don't recommend it. It requires an open, often idle transaction for each user. This can cause problems with MVCC row cleanup in PostgreSQL and can cause locking problems in applications. It's also very inefficient for large applications with high user counts.
Insert races
Dealing with races on INSERT requires you to have a suitable application level unique key on the table, so inserts fail when they conflict.

MySQL table locking for a multi user JSP/Servlets site

Hi I am developing a site with JSP/Servlets running on Tomcat for the front-end and with a MySql db for the backend which is accessed through JDBC.
Many users of the site can access and write to the database at the same time ,my question is :
Do i need to explicitly take locks before each write/read access to the db in my code?
OR Does Tomcat handle this for me?
Also do you have any suggestions on how best to implement this ? I have written a significant amount of JDBC code already without taking the locks :/
I think you are thinking about transactions when you say "locks". At the lowest level, your database server already ensure that parallel read writes won't corrupt your tables.
But if you want to ensure consistency across tables, you need to employ transactions. Simply put, what transactions provide you is an all-or-nothing guarantee. That is, if you want to insert a Order in one table and related OrderItems in another table, what you need is an assurance that if insertion of OrderItems fails (in step 2), the changes made to Order tables (step 1) will also get rolled back. This way you'll never end up in a situation where an row in Order table have no associated rows in Order items.
This, off-course, is a very simplified representation of what a transaction is. You should read more about it if you are serious about database programming.
In java, you usually do transactions by roughly with following steps:
Set autocommit to false on your jdbc connection
Do several insert and/or updates using the same connection
Call conn.commit() when all the insert/updates that goes together are done
If there is a problem somewhere during step 2, call conn.rollback()