We have a stored procedure that executes a number of selects and updates on various tables. This stored procedure is being called within an SSIS package. It is only called once. I can see in the execution plan (and in my trace) that the queries are executing in parallel. This isn't an issue when volume is low but when data volume is high, the statements are causing deadlocks and the package fails due to deadlock. I thought to Set the lock timeout in the SP but I'd prefer to force sequential execution of the updates if possible. Any help suggestions on how to force sequential execution within the stored procedure would be a great help.
Related
Perhaps the title is a little misleading, so I'll explain my question in further detail. Obviously the queries inside of the procedure are executed synchronously and in order, but are procedures themselves executed synchronously?
Lets say I have a procedure called "Register" which handles a couple of queries, for example it looks like this:
BEGIN
DECLARE account_inserted INT(11);
INSERT INTO accounts (...) VALUES (...);
SET account_inserted = LAST_INSERTED_ID(); # <------
QUERY using account_inserted...
QUERY using account_inserted...
QUERY using account_inserted...
...
END
Now lets say that there were numerous requests to register coming in at the same time (For example purposes let us say around 200-300 requests) would it execute all of the procedures in order? Or is it possible would the LAST_INSERTED_ID() variable to conflict with a row inserted from another procedure that is being executed in parallel?
You're muddling three things:
Whether MySQL executes procedures synchronously
This could be interpreted to mean either "does MySQL wait for each command within a procedure to complete before moving on to the next?" or "does MySQL wait for the entire procedure to complete before accepting further instructions from the connection that invoked the CALL command?". In both cases, the answer is "yes, it does".
Whether invocations of MySQL procedures are executed atomically
As with any other series of commands, commands within procedures are only atomic if performed within a transaction on tables that use a transactional storage engine. Thus a different connection may well execute another INSERT between the INSERT in your procedure and the command that follows.
Whether LAST_INSERTED_ID() is guaranteed to return the value generated by the immediately preceding INSERT command in the procedure?
Yes, it is. The most recent insertion ID is maintained on a per-connection basis, and as described above the connection waits for CALL to complete before further commands are accepted.
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
https://dev.mysql.com/doc/refman/5.5/en/information-functions.html#function_last-insert-id
I have two applications, both of them are using the same stored procedure in MySQL.
I would like this procedure to be synchronized, that is while one applications calls it, the other one has to wait.
Is there a way to do this without altering the codes of the applications (that is only modifying the stored procedure)?
Thanks,
krisy
You can absolutely do this within the stored procedure without changing your application code, but bear in mind that you're introducing locking issues and the possibility of timeouts.
Use GET_LOCK() and RELEASE_LOCK() to take care of the synchronization. Run GET_LOCK to perform the synchronization at the start of your stored procedure, and RELEASE_LOCK once you're done:
IF (GET_LOCK('lock_name_for_this_SP', 60)) THEN
.... body of SP
RELEASE_LOCK('lock_name_for_this_SP');
ELSE
.... lock timed out
END IF
You'll also need to take care that your application timeouts are longer than the lock timeout so you don't incur other problems.
I have an application
which does around 20000 DATA-OPERATIONS per/hour
DATA-OPERATION has overall 30 parameters(for all 10 queries). Some are text, some are numeric. Some Text params are as long as 10000 chars.
Every DATA-OPERATION does following:
A single DATA-OPERATION, inserts / updates multiple tables(around 10) in database.
For every DATA-OPERATION, I take one connection,
Then I use new prepared-statement for each query in the DATA-OPERATION.
Prepared-statement is closed every time a query is executed.
Connection is reused for all 10 prepared-statements.
Connection is closed when DATA-OPERATION is completed.
Now to perform this DATA-OPERATION,
10 queries, 10 prepared-statement(create, execute, close), 1o n/w calls.
1 connection (Open,Close).
I personally think that, if I create a Stored Procedure from above 10 queries, it will be better choice.
In case of SP, DATA-OPERATION will have:
1 connection, 1 callable statement, 1 n/w hit.
I suggested this, but I am told that
This might be more time consuming than SQL-queries.
It will put additional load on DB server.
I still think SP is a better choice. Please let me know your inputs.
Benchmarking is an option. Will have to search any tools which can help in this.
Also can any one suggest already available benchmarks for this kind of problem.
Any recommendation depends partially on where the script executing the queries resides. If the script executing the queries is on the same server as the MySQL instance then you won't see that much of a difference, but there will still be a small overhead in executing 200k queries compared to 1 stored procedure.
My advice either way would be though to make it as a stored procedure. You would need maybe a couple of procedures.
A procedure that combines the 10 statements you do per-operation
into 1 call
A procedure that can iterate over a table of arguments using a CURSOR to feed into procedure 1
Your process would be
Populate a table with arguments which would be fed into procedure 1 by procedure 2
Execute procedure 2
This would yield performance benefits as there is no need to connect to the MySQL server 20000*10 times. While the overhead per-request may be small, milliseconds add up. Even if the saving is 0.1ms per request, that's still 20 seconds saved.
Another option could be to modify your requests to perform all 20k data operations at once (if viable) by adjusting your 10 queries to pull data from the database table mentioned above. The key to all of this is to get the arguments loaded in a single batch insert, and then using statements on the MySQL server within a procedure to process them without further round trips.
We wrote stored procedures in MYSQL
If the stored procedure is called from one thread its taking 2.5 seconds to return results
If the stored procedure is called from 3 thread its taking approx 8.5 seconds to return results . each thread is taking almost the same time.
We are using MyISM, please let me know if we need to do any settings for the procedure to be executed parellely. We are only retrieving(selects) in the stored procedure no updates/insertion done
Increasing the number of threads to pull data from MySQL not necessarily increase throughput. You're executing the same query in multiple threads which adds to overhead of context switching.
To take advantage of threading you need to make use of idle time(the real idle time), like input/output/network delays.
Example:
A thread pulls some data from MySQL and starts processing, say sending notification over an interface. If that interface is synchronous then thread is stuck.
Get more threads to do the job for you, i.e pull data from DB(Idle) and process.
without such delays/idling threading only incurs overhead IMO.
I have a procedure (procedureA) that loops through a table and calls another procedure (procedureB) with variables derived from that table.
Each call to procedureB is independent of the last call.
When I run procedureA my system resources show a maximum CPU use of 50% (I assume that is 1 out of my 2 CPU cores).
However, if I open two instances of the mysql terminal and execute a query in both terminals, both CPU cores are used (CPU usage can reach close to 100%).
How can I achieve the same effect inside a stored procedure?
I want to do something like this:
BEGIN
CALL procedureB(var1); -> CPU CORE #1
SET var1 = var1+1;
CALL procedureB(var1); -> CPU CORE #2
END
I know its not going to be that easy...
Any tips?
Within MySQL, to get something done asynchronously you'd have to use an CREATE EVENT, but I'm not sure whether creating one is allowed within a stored procedure. (On a side note: async. inserts can of course be done with INSERT DELAYED, but that's 1 thread, period).
Normally, you are much better of having a couple of processes/workers/deamons which can be accessed asynchronously by you program and have their own database connection, but that of course won't be in the same procedure.
You can write your own daemon as a stored procedure, and schedule multiple copies of it to run at regular intervals, say every 5 minutes, 1 minute, 1 second, etc.
use get_lock() with N well defined lock names to abort the event execution if another copy of the event is still running, if you only want up to N parallel copies running at a time.
Use a "job table" to list the jobs to execute, with an ID column to identify the execution order. Be sure to use good transaction and lock practices of course - this is re-entrant programming, after all.
Each row can define a stored procedure to execute and possibly the parameters. You can even have multiple types of jobs, job tables, and worker events for different tasks.
Use PREPARE and EXECUTE with the CALL statement to dynamically call stored procedures whose names are stored in strings.
Then just add rows as needed to the job table, even inserting in big batches, and let your worker events process them as fast as they can.
I've done this before, in both Oracle and MySQL, and it works well. Be sure to handle errors and log them somewhere, as well as success, for that matter, for debugging and auditing, as well as performance tuning. N=#CPUs may not be the best fit, depending on your data and the types of jobs. I've seen N=2xCPUs work best for data-intensive tasks, where lots of parallel disk I/O is more important than computational power.