How can I prevent a stored procedure from running twice at the same time? - message-queue

I'm using an Aurora DB (ie MySQL version 5.6.10) as a queue, and I'm using a stored procedure to pull records out of a table in batches. The sproc works with the following steps...
Select the next batch of data into a temptable
Write the IDs from the records from the temp table into to a log table
Output the records
Once a record has been added to the log, the sproc won't select it again next time it's called, so multiple servers can call this sproc, and both deal with batches of data from the queue without stepping on each others toes.
The sproc runs in a fraction of a second, but my company is now spinning up servers automatically, and these cloned servers are calling the sproc at exactly the same time, and the result is the same records are being selected twice
Is there a way I can make this sproc be limited to one call at a time? Ideally, any additional calls should wait until the first call is finished, and then they can run
Unfortunately, I have very little experience working with MySQL, so I'm not really sure where to start. I'd much appreciate it if anyone could point me in the right direction

This is a job for MySQL table locking. Try something like this. (You didn't show us your queries so there's a lot of guesswork here.)
SET autocommit = 0;
LOCK TABLES logtable WRITE;
CREATE TEMPORARY TABLE temptable AS
SELECT whatever FROM whatevertable FOR UPDATE;
INSERT INTO logtable (id)
SELECT id FROM temptable;
COMMIT;
UNLOCK TABLES;
If more than one connection tries to run this sequence concurrently, one will wait for the other's UNLOCK TABLES; to proceed. You say your SP is quick, so probably nobody will notice the short wait.
Pro tip: When you have the same timed code running on lots of servers, it's best to put in a short random delay before running the job. That way the shared resources (like your MySQL database) won't get hammered by a whole lot of requests precisely timed to be simultaneous.

Related

Will a MySQL SELECT statement interrupt INSERT statement?

I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result

MySQL performing a "No impact" temporary INSERT with replication avoiding Locks

SO, we are trying to run a Report going to screen, which will not change any stored data.
However, it is complex, so needs to go through a couple of (TEMPORARY*) tables.
It pulls data from live tables, which are replicated.
The nasty bit when it comes to take the "eligible" records from
temp_PreCalc
and populate them from the live data to create the next (TEMPORARY*) table output
resulting in effectively:
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
The report is not a "definitive" answer, expectation is that is merely a "snapshot" report and will be out-of-date as soon as it appears on screen.
There is no order or reproducibility issue.
So Ideally, I would turn my TRANSACTION ISOLATION LEVEL down to READ COMMITTED...
However, I can't because live_Tab1,2,3 are replicated with BIN_LOG STATEMENT type...
The statement is lovely and quick - it takes hardly any time to run, so the resource load is now less than it used to be (which did separate selects and inserts) but it waits (as I understand it) because of the SELECT that waits for a repeatable/syncable lock on the live_Tab's so that any result could be replicated safely.
In fact it now takes more time because of that wait.
I'd like to SEE that performance benefit in response time!
Except the data is written to (TEMPORARY*) tables and then thrown away.
There are no live_ table destinations - only sources...
these tables are actually not TEMPORARY TABLES but dynamically created and thrown away InnoDB Tables, as the report Calculation requires Self-join and delete... but they are temporary
I now seem to be going around in circles finding an answer.
I don't have SUPER privilege and don't want it...
So can't SET BIN_LOG=0 for this connection session (Why is this a requirement?)
So...
If I have a scratch Database or table wildcard, which excludes all my temp_ "Temporary" tables from replication...
(I am awaiting for this change to go through at my host centre)
Will MySQL allow me to
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
;
Or will I still get my
"Cannot Execute statement: impossible to write to binary log since
BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine
limited to row-based logging..."
Even though its not technically true?
I am expecting it to, as I presume that the replication will kick in simply because it sees the "INSERT" statement, and will do a simple check on any of the tables involved being replication eligible, even though none of the destinations are actually replication eligible....
or will it pleasantly surprise me?
I really can't face using an unpleasant solution like
SELECT TO OUTFILE
LOAD DATA INFILE
In fact I dont think I could even use that - how would I get unique filenames? How would I clean them up?
The reports are run on-demand directly by end users, and I only have MySQL interface access to the server.
or streaming it through the PHP client, just to separate the INSERT from the SELECT so that MySQL doesnt get upset about which tables are replication eligible....
So, it looks like the only way appears to be:
We create a second Schema "ScratchTemp"...
Set the dreaded replication --replicate-ignore-db=ScratchTemp
My "local" query code opens a new mysql connection, and performs a USE ScratchTemp;
Because I have selected the default database of the "ignore"d one - none of my queries will be replicated.
So I need to take huge care not to perform ANY real queries here
Reference my scratch_ tables and actual data tables by prefixing them all on my queries with the schema qualified name...
e.g.
INSERT INTO LiveSchema.temp_PostCalc (...) SELECT ... FROM LiveSchema.temp_PreCalc JOIN LiveSchema.live_Tab1 etc etc as above.
And then close this connection just as soon as I can, as it is frankly dangerous to have a non-replicated connection open....
Sigh...?

Do Sql Update Statements run at the same time if requested at the same time?

If two independent scripts call a database with update requests to the same field, but with different values, would they execute at the same time and one overwrite the other?
as an example to help ensure clarity, imagine both of these statements being requested to run at the same time, each by a different script, where Status = 2 is called microseconds after Status = 1 by coincidence.
Update My_Table SET Status = 1 WHERE Status= 0;
Update My_Table SET Status = 2 WHERE Status= 0;
What would my results be and why? if other factors play a roll, expand on them as much as you please, this is meant to be a general idea.
Side Note:
Because i know people will still ask, my situation is using MySql with Google App Engine, but i don't want to limit this question to just me should it be useful to others. I am using Status as an identifier for what script is doing stuff to the field. if status is not 0, no other script is allowed to touch it.
This is what locking is for. All major SQL implementations lock DML statements by default so that one query won't overwrite another before the first is complete.
There are different levels of locking. If you've got row locking then your second update will run in parallel with the first, so at some point you'll have 1s and 2s in your table.
Table locking would force the second query to wait for the first query to completely finish to release it's table lock.
You can usually turn off locking right in your SQL, but it's only ever done if you need a performance boost and you know you won't encounter race conditions like in your example.
Edits based on the new MySQL tag
If you're updating a table that used the InnoDB engine, then you're working with row locking, and your query could yield a table with both 1s and 2s.
If you're working with a table that uses the MyISAM engine, then you're working with table locking, and your update statements would end up with a table that would either have all 1s or all 2s.
from https://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html (MySql)
Normally, you do not need to lock tables, because all single UPDATE statements are atomic; no other session can interfere with any other currently executing SQL statement. However, there are a few cases when locking tables may provide an advantage:
from https://msdn.microsoft.com/en-us/library/ms177523.aspx (sql server)
An UPDATE statement always acquires an exclusive (X) lock on the table it modifies, and holds that lock until the transaction completes. With an exclusive lock, no other transactions can modify data.
If you were having two separate connections executing the two posted update statements, whichever statement was started first, would be the one that completed. THe other statement would not update the data as there would no longer be records with a status of 0
The short answer is: it depends on which statement commits first. Just because one process started an update statement before another doesn't mean that it will complete before another. It might not get scheduled first, it might be blocked by another process, etc.
Ultimately, it's a race condition: the operation that completes (and commits) last, wins.
Since you have TWO scripts doing the same thing and using different values for the UPDATE, they will NOT run at the same time, one of the scripts will run before even if you think you are calling them at the same time. You need to specify WHEN each script should run, otherwise the program will not know what should be 1 and what should be 2.

application setup to avoid timeouts and deadlocks in SQL server DB table

My application accesses a local DB where it inserts records into a table (+- 30-40 million a day). I have processes that run and process data and do these inserts. Part of the process involves selecting an id from an IDs table which is unique and this is done using a simple
Begin Transaction
Select top 1 #id = siteid from siteids WITH (UPDLOCK, HOLDLOCK)
delete siteids where siteid = #id
Commit Transaction
I then immediately delete that id with a separate statement from that very table so that no other process grabs it. This is causing tremendous timeout issues and with only 4 processes accessing it, I am surprised though. I also get timeout issues when checking my main post table to see if a record was inserted using the above id. It runs fast but with all the deadlocks and timeouts I think this indicates poor design and is a recipe for disaster.
Any advice?
EDIT
this is the actual statement that someone else here helped with. I then removed the delete and included it in my code as a separately executed statement. Will the order by clause really help here?

How can I parallelize Writes to the same row in MySQL?

I'm currently building a system that does running computations, and every 5 seconds inserts or updates information based on those computations to a few rows in MySQL. I'm working on running this system on a few different servers at once right now with a few agents that are each doing similar processing and then writing on the same set of rows. I already randomize the order in which each agent writes its set of rows, but there's still a lot of deadlock happening. What's the best/fastest way to get through those deadlocks? Should I just rerun the query each time one happens, or do row locks, or something else entirely?
I suggest you try something that won't require more than one client to update your 'few rows.'
For example, you could have each agent that produces results do an INSERT to a staging table with the MEMORY access method.
Then, every five seconds you can run a MySQL event (a stored procedure within the server) that loops through all the rows in that table, posting their results to your 'few rows' and then deleting them. If it's important for the rows in your staging table to be processed in order, then you can use an AUTO_INCREMENT id field. But it might not be important for them to be in order.
If you want to get fancier and more scalable than that, you'll need a queue management system like Apache ActiveMQ.