How big is the performance hit of doing a single updates in a single row and multiple updates within a single transaction on MySQL?
I'm asking this because AFAIK the database spend most of it's time committing the transaction rather than "just updating" the row itself. Is this correct?
Just to clarify, I have a bunch of routines that alters a row and saves it, sometimes I call some of them sequentially, but they all flush the result to the database before calling the next one(i.e.: update the row).
What is the best approach in this case? Use a transaction or just tell them to not save and then just do a single update in the end?
Related
On a website, when a user posts a comment I do several queries, Inserts and Updates. (On MariaDB 10.1.29)
I use START TRANSACTION so if any query fails at any given point I can easily do a rollback and delete all changes.
Now I noticed that this locks the tables when I do an INSERT from an other INSERT, and I'm not talking while the query is running, that’s obvious, but until the transaction is not closed.
Then DELETE is only locked if they share a common index key (comments for the same page), but luckily UPDATE is no locked.
Can I do any Transaction that does not lock the table from new inserts (while the transaction is ongoing, not the actual query), or any other method that lets me conveniently "undo" any query done after some point?
PD:
I start Transaction with PHPs function mysqli_begin_transaction() without any of the flags, and then mysqli_commit().
I don't think that a simple INSERT would block other inserts for longer than the insert time. AUTO_INC locks are not held for the full transaction time.
But if two transactions try to UPDATE the same row like in the following statement (two replies to the same comment)
UPDATE comment SET replies=replies+1 WHERE com_id = ?
the second one will have to wait until the first one is committed. You need that lock to keep the count (replies) consistent.
I think all you can do is to keep the transaction time as short as possible. For example you can prepare all statements before you start the transaction. But that is a matter of milliseconds. If you transfer files and it can take 40 seconds, then you shouldn't do that while the database transaction is open. Transfer the files before you start the transaction and save them with a name that indicates that the operation is not complete. You can also save them in a different folder but on the same partition. Then when you run the transaction, you just need to rename the files, which should not take much time. From time to time you can clean-up and remove unrenamed files.
All write operations work in similar ways -- They lock the rows that they touch (or might touch) from the time the statement is executed until the transaction is closed via either COMMIT or ROLLBACK. SELECT...FOR UPDATE and SELECT...WITH SHARED LOCK also get involved.
When a write operation occurs, deadlock checking is done.
In some situations, there is "gap" locking. Did com_id happen to be the last id in the table?
Did you leave out any SELECTs that needed FOR UPDATE?
I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result
When doing a transaction in a mysql db, they are talking about the ongoing transaction not being able to see any updates made by external sources until it commits. So does this mean that changes CAN be made but the transaction just will not be able to see them, or is it actually impossible to update the db while the transaction is going on.
Because I need it to be impossible for other queries to change anything about certain tables while the transaction is going. Right now I write lock all those tables, start a transaction for the atomicity, commit, and than unlock. Is this the way to do this?
From my testing it seems that setting the isolation level to SERIALIZABLE accomplishes the same as manual table locking and unlocking? Is this correct?
It's going to depend on the transaction isolation level you have set on your database. You can read more about the levels here. For example, for READ UNCOMMITTED, you can actually read rows that are uncommitted by another transaction. This is usually not what you want to happen.
Locking an entire table is a really extreme choice though, and should probably not be done unless there's no other choice. My recommendation would be to consider the rows you need to lock, and then you can lock those specific rows using a select for update statement.
For example, suppose you have a resources table and a schedules table that contains bookings for those resources. When booking a resource, you have to check the schedules table for a given resource to make sure it's available for the desired time. However, you have to do this is a concurrent way, that is, you want to ensure that between the time you check the schedules table for availability for the resource, and the time you actually insert the row into the schedules table, you want to ensure that some other transaction doesn't book the resource for the same time (or an overlapping time).
You can accomplish this by using a select for update command:
select * from resources where resource_name=’a’ for update;
Assuming you're doing this in a stored procedure, if some other code fires the stored procedure for the same resource, it will block on that statement. This will ensure that resources don't get double booked.
We could also accomplish this by locking the entire resources table. However, there's no need to do that since we're only interested in booking a single resource. So it's good enough to just lock the resource row we care about.
Note that for MySQL, you need to index the columns you use in the for update or it will lock the entire table.
The point to all this is to always consider maximum concurrency. In other words, don't lock more than you need to. Otherwise, you make the application much less scalable and you inhibit concurrency.
I'm currently building a system that does running computations, and every 5 seconds inserts or updates information based on those computations to a few rows in MySQL. I'm working on running this system on a few different servers at once right now with a few agents that are each doing similar processing and then writing on the same set of rows. I already randomize the order in which each agent writes its set of rows, but there's still a lot of deadlock happening. What's the best/fastest way to get through those deadlocks? Should I just rerun the query each time one happens, or do row locks, or something else entirely?
I suggest you try something that won't require more than one client to update your 'few rows.'
For example, you could have each agent that produces results do an INSERT to a staging table with the MEMORY access method.
Then, every five seconds you can run a MySQL event (a stored procedure within the server) that loops through all the rows in that table, posting their results to your 'few rows' and then deleting them. If it's important for the rows in your staging table to be processed in order, then you can use an AUTO_INCREMENT id field. But it might not be important for them to be in order.
If you want to get fancier and more scalable than that, you'll need a queue management system like Apache ActiveMQ.
I am part of the coding team of a high request game.
We've experienced some problems lately where by multiple requests can be sent in at the exact same time and are syndication duplicate actions (which would not be able to happen if they ran entirely after one another).
The problematic routine calls a row in an InnoDB table and if present continues on it's routine until all other checks are okay and at which point it completes and deletes the row.
What appears to be happening is the reads are hitting the row simultaneously (despite the row level locking) and continuing on down the routine path, by which point the deletes make no difference. What this is causing to happen is that the routine is being duplicated by players smart enough to try their luck.
Does anyone have any suggestions for a way to approach fixing this?
Example routine.
// check database row exists (create the initial lock)
// proceed
// check quantity in the row
// if all is okay (few other checks needed here)
// delete the row
// release the lock either way (for the next request to go through)
MySQL has a couple different lock modes
http://dev.mysql.com/doc/refman/5.6/en/innodb-lock-modes.html
I think you'll want to enforce an exclusive lock when executing an update/delete. This way the subsequent requests will wait until the lock is released and the appropriate action has completed.
You may also want to examine the indexes being used for these concurrent queries. An appropriate indexing regime will minimize the number of rows that need to be locked during a given query.