Does this suffice, or do I have race conditions? - mysql

I'm writing a strategy-kind of multi user game for the web. It has a playfield (X by Y squares) that I plan on serialize and store in a BLOB in a MySQL (innodb) database, one row for each ongoing game.
I now try to figure out a good way of keeping the database updated with any changes to the playfield, and at the same time finding a convenient solution to how to handle things that happen to the playfield in the time frame between loading the page and actually making a move.
I don't use AJAX.
There will be at most 20 players in each game, each player making between 1 and 10 moves in 24 hours, so it is a "slow" game.
My plan (so far) is to also store a kind of checksum for the playfield next to the blob and compare the databases state with the state loaded before trying to make changes to the playfield.
What I worry about is how to prevent race conditions.
Is it enough to:
Begin transaction.
load playfield from table
if checksum differs - rollback and update the users view
if checksum unchanged - update table and commit changes
Is the BEGIN TRANSACTION enough to block the race, or do I need to do something more in step 2 to show my intent to update the table?
Thankful for all advice.

If you use SELECT ... FOR UPDATE when you load the playfield from the database, it will block other selects until you commit or rollback the transaction.

No. You will need to issue a LOCK TABLES command for the tables you need to protect against conflicting updates. This would look something like...
LOCK TABLE my_table WRITE;
More details may be found here... http://dev.mysql.com/doc/refman/5.1/en/lock-tables.html
Don't forget to UNLOCK them afterwards!

Related

Updating a row at the same time

Imagine I have a table with two columns that have a point register from two different teams, like the one below:
TABLE:
first_team
second_team
first_team_points
second_team_points
The table has the relation of the two teams and the points they get when they finish a level during a period of time.
The two teams can play a level when they want and update their points when they finish. So it is possible that they update their own column of points at the same time. Like, A team updates first_team_points and B team updates second_team_points.
I know that InnoDB has a row-level locking, so I suppose that in that case the two updates will be realized in a sequencial order.
Im I wrong? Do I need to configurate something? Will the second update request cause a deadlock?
Thanks in advance!
Please provide the code for critique. Meanwhile, in general...
BEGIN; -- start the transaction
SELECT ... FOR UPDATE; -- if you need to look at something before updating
...
INSERT/UPDATE/etc -- make whatever changes
COMMIT;
There are several issues:
You want data integrity; transactions help a lot.
You want to avoid deadlocks -- without further details, I cannot assure that all deadlocks will be prevented. Be ready to re-do the transaction if you do get a deadlock.
One connection could get a "lock_wait_timeout". Think of this is as a deadlock that can be resolved by having one of the contenders wait. But, if the other connection takes too long, you could timeout. This is usually avoidable by making things run faster. (50 seconds is the default wait time; it is rare to hit it.)

Do transactions prevent other updates for a while, or just hide them?

When doing a transaction in a mysql db, they are talking about the ongoing transaction not being able to see any updates made by external sources until it commits. So does this mean that changes CAN be made but the transaction just will not be able to see them, or is it actually impossible to update the db while the transaction is going on.
Because I need it to be impossible for other queries to change anything about certain tables while the transaction is going. Right now I write lock all those tables, start a transaction for the atomicity, commit, and than unlock. Is this the way to do this?
From my testing it seems that setting the isolation level to SERIALIZABLE accomplishes the same as manual table locking and unlocking? Is this correct?
It's going to depend on the transaction isolation level you have set on your database. You can read more about the levels here. For example, for READ UNCOMMITTED, you can actually read rows that are uncommitted by another transaction. This is usually not what you want to happen.
Locking an entire table is a really extreme choice though, and should probably not be done unless there's no other choice. My recommendation would be to consider the rows you need to lock, and then you can lock those specific rows using a select for update statement.
For example, suppose you have a resources table and a schedules table that contains bookings for those resources. When booking a resource, you have to check the schedules table for a given resource to make sure it's available for the desired time. However, you have to do this is a concurrent way, that is, you want to ensure that between the time you check the schedules table for availability for the resource, and the time you actually insert the row into the schedules table, you want to ensure that some other transaction doesn't book the resource for the same time (or an overlapping time).
You can accomplish this by using a select for update command:
select * from resources where resource_name=’a’ for update;
Assuming you're doing this in a stored procedure, if some other code fires the stored procedure for the same resource, it will block on that statement. This will ensure that resources don't get double booked.
We could also accomplish this by locking the entire resources table. However, there's no need to do that since we're only interested in booking a single resource. So it's good enough to just lock the resource row we care about.
Note that for MySQL, you need to index the columns you use in the for update or it will lock the entire table.
The point to all this is to always consider maximum concurrency. In other words, don't lock more than you need to. Otherwise, you make the application much less scalable and you inhibit concurrency.

how can i make database consistent in this situation?

I have problem while dealing with copy of database table entry from one file to other .
suppose i have two tables called one.sql , two.sql , now i wana transfer some record from one.sql to the table two.sql, then delete that entry from one.sql ater copying success.
problem : suppose power was gone after i make copy from one to two, the delete record
from one wasnt done , here in that case same record will be in both tables. that i dont want. so in this situation how to handle these types of inconsistent on fly.
Your RDBMS is not a simple datastore! It supports journaling, transaction isolation and atomic updates. So...
... with transactional tables (InnoDB) and with decent isolation level simply do:
START TRANSACTION -- Or SET autocommit = 0
INSERT INTO two SELECT * FROM one WHERE ...;
DELETE FROM one WHERE ...;
COMMIT
COMMIT will atomicity apply the changes to the DB. That is, from the other transactions point of view, the move is either done or not started. No one can see it half done. Even in case of catastrophic failure (power outage).
Of course, if you move all your records, you could also rely on RENAME TABLE...
You can use transaction blocks to descrease unexpected results at some degree. But solving a power problem is another thing.
You can however use a batch and check if two tables don't contain same records at some interval, if you are worried about a power problem.

MySQL INSERT SELECT on large static table

I need to copy the content of one table to another. So I started using:
INSERT new_table SELECT * FROM old_table
However, I am getting the following error now:
1297, "Got temporary error 233 'Out of operation records in transaction coordinator (increase MaxNoOfConcurrentOperations)' from NDBCLUSTER"
I think I have an understanding why this occurs: My table is huge, and MySQL tries to take a snapshot in time (lock everything and make one large transaction out of it).
However, my data is fairly static and there is no other concurrent session that would modify the data. How can I tell MySQL to copy one row at a time, or in smaller chunks, without locking the whole thing?
Edit note: I already know that I can just read the whole table row-by-row into memory/file/dump and write back. I am interested to know if there is an easy way (maybe setting isolation level?). Note that the engine is InnoDB.
Data Migration is one of the few instances where a CURSOR can make sense, as you say, to ensure that the number of locks stays sane.
Use a cursor in conjunction with TRANSACTION, where you commit after every row, or after N rows (e.g. use a counter with modulo)
select the data from innodb into an outfile and load infile into
cluster

Best Approach for Checking and Inserting Records

EDIT: To clarify the records originally come from a flat-file database and is not in the MySQL database.
In one of our existing C programs which purpose is to take data from the flat-file and insert them (based on criteria) into the MySQL table:
Open connection to MySQL DB
for record in all_record_of_my_flat_file:
if record contain a certain field:
if record is NOT in sql_table A: // see #1
insert record information into sql_table A and B // see #2
Close connection to MySQL DB
select field from sql_table A where field=XXX
2 inserts
I believe that management did not feel it is worth it to add the functionality so that when the field in the flat file is created, it would be inserted into the database. This is specific to one customer (that I know of). I too, felt it odd that we use tool such as this to "sync" the data. I was given the duty of using and maintaining this script so I haven't heard too much about the entire process. The intent is to primarily handle additional records so this is not the first time it is used.
This is typically done every X months to sync everything up or so I'm told. I've also been told that this process takes roughly a couple of days. There is (currently) at most 2.5million records (though not necessarily all 2.5m will be inserted and most likely much less). One of the table contains 10 fields and the other 5 fields. There isn't much to be done about iterating through the records since that part can't be changed at the moment. What I would like to do is speed up the part where I query MySQL.
I'm not sure if I have left out any important details -- please let me know! I'm also no SQL expert so feel free to point out the obvious.
I thought about:
Putting all the inserts into a transaction (at the moment I'm not sure how important it is for the transaction to be all-or-none or if this affects performance)
Using Insert X Where Not Exists Y
LOAD DATA INFILE (but that would require I create a (possibly) large temp file)
I read that (hopefully someone can confirm) I should drop indexes so they aren't re-calculated.
mysql Ver 14.7 Distrib 4.1.22, for sun-solaris2.10 (sparc) using readline 4.3
Why not upgrade your MySQL server to 5.0 (or 5.1), and then use a trigger so it's always up to date (no need for the monthly script)?
DELIMITER //
CREATE TRIGGER insert_into_a AFTER INSERT ON source_table
FOR EACH ROW
BEGIN
IF NEW.foo > 1 THEN
SELECT id AS #testvar FROM a WHERE a.id = NEW.id;
IF #testvar != NEW.id THEN
INSERT INTO a (col1, col2) VALUES (NEW.col1, NEW.col2);
INSERT INTO b (col1, col2) VALUES (NEW.col1, NEW.col2);
END IF
END IF
END //
DELIMITER ;
Then, you could even setup update and delete triggers so that the tables are always in sync (if the source table col1 is updated, it'll automatically propagate to a and b)...
Here's my thoughts on your utility script...
1) Is just a good practice anyway, I'd do it no matter what.
2) May save you a considerable amount of execution time. If you can solve a problem in straight SQL without using iteration in a C-Program, this can save a fair amount of time. You'll have to profile it first to ensure it really does in a test environment.
3) LOAD DATA INFILE is a tactic to use when inserting a massive amount of data. If you have a lot of records to insert (I'd write a query to do an analysis to figure out how many records you'll have to insert into table B), then it might behoove you to load them this way.
Dropping the indexes before the insert can be helpful to reduce running time, but you'll want to make sure you put them back when you're done.
Although... why aren't all the records in table B in the first place? You haven't mentioned how processing works, but I would think it would be advantageous to ensure (in your app) that the records got there without your service script's intervention. Of course, you understand your situation better than I do, so ignore this paragraph if it's off-base. I know from experience that there are lots of reasons why utility cleanup scripts need to exist.
EDIT: After reading your revised post, your problem domain has changed: you have a bunch of records in a (searchable?) flat file that you need to load into the database based on certain criteria. I think the trick to doing this as quickly as possible is to determine where the C application is actually the slowest and spends the most time spinning its proverbial wheels:
If it's reading off the disk, you're stuck, you can't do anything about that, unless you get a faster disk.
If it's doing the SQL query-insert operation, you could try optimizing that, but your'e doing a compare between two databases (the flat-file and the MySQL one)
A quick thought: by doing a LOAD DATA INFILE bulk insert to populate a temporary table very quickly (perhaps even an in-memory table if MySQL allows that), and then doing the INSERT IF NOT EXISTS might be faster than what you're currently doing.
In short, do profiling, and figure out where the slowdown is. Aside from that, talk with an experienced DBA for tips on how to do this well.
I discussed with another colleague and here is some of the improvements we came up with:
For:
SELECT X FROM TABLE_A WHERE Y=Z;
Change to (currently waiting verification on whether X is and always unique):
SELECT X FROM TABLE_A WHERE X=Z LIMIT 1;
This was an easy change and we saw some slight improvements. I can't really quantify it well but I did:
SELECT X FROM TABLE_A ORDER BY RAND() LIMIT 1
and compared the first two query. For a few test there was about 0.1 seconds improvement. Perhaps it cached something but the LIMIT 1 should help somewhat.
Then another (yet to be implemented) improvement(?):
for record number X in entire record range:
if (no CACHE)
CACHE = retrieve Y records (sequentially) from the database
if (X exceeds the highest record number in cache)
CACHE = retrieve the next set of Y records (sequentially) from the database
search for record number X in CACHE
...etc
I'm not sure what to set Y to, are there any methods for determining what's a good sized number to try with? The table has 200k entries. I will edit in some results when I finish implementation.