I have an application
which does around 20000 DATA-OPERATIONS per/hour
DATA-OPERATION has overall 30 parameters(for all 10 queries). Some are text, some are numeric. Some Text params are as long as 10000 chars.
Every DATA-OPERATION does following:
A single DATA-OPERATION, inserts / updates multiple tables(around 10) in database.
For every DATA-OPERATION, I take one connection,
Then I use new prepared-statement for each query in the DATA-OPERATION.
Prepared-statement is closed every time a query is executed.
Connection is reused for all 10 prepared-statements.
Connection is closed when DATA-OPERATION is completed.
Now to perform this DATA-OPERATION,
10 queries, 10 prepared-statement(create, execute, close), 1o n/w calls.
1 connection (Open,Close).
I personally think that, if I create a Stored Procedure from above 10 queries, it will be better choice.
In case of SP, DATA-OPERATION will have:
1 connection, 1 callable statement, 1 n/w hit.
I suggested this, but I am told that
This might be more time consuming than SQL-queries.
It will put additional load on DB server.
I still think SP is a better choice. Please let me know your inputs.
Benchmarking is an option. Will have to search any tools which can help in this.
Also can any one suggest already available benchmarks for this kind of problem.
Any recommendation depends partially on where the script executing the queries resides. If the script executing the queries is on the same server as the MySQL instance then you won't see that much of a difference, but there will still be a small overhead in executing 200k queries compared to 1 stored procedure.
My advice either way would be though to make it as a stored procedure. You would need maybe a couple of procedures.
A procedure that combines the 10 statements you do per-operation
into 1 call
A procedure that can iterate over a table of arguments using a CURSOR to feed into procedure 1
Your process would be
Populate a table with arguments which would be fed into procedure 1 by procedure 2
Execute procedure 2
This would yield performance benefits as there is no need to connect to the MySQL server 20000*10 times. While the overhead per-request may be small, milliseconds add up. Even if the saving is 0.1ms per request, that's still 20 seconds saved.
Another option could be to modify your requests to perform all 20k data operations at once (if viable) by adjusting your 10 queries to pull data from the database table mentioned above. The key to all of this is to get the arguments loaded in a single batch insert, and then using statements on the MySQL server within a procedure to process them without further round trips.
Related
I will get some text from another question here:
The PreparedStatement is a slightly more powerful version of a Statement, and should always be at least as quick and easy to handle as a Statement.
The Prepared Statement may be parametrized
Most relational databases handles a JDBC / SQL query in four steps:
Parse the incoming SQL query
Compile the SQL query
Plan/optimize the data acquisition path
Execute the optimized query / acquire and return data
A Statement will always proceed through the four steps above for each SQL query sent to the database. A Prepared Statement pre-executes steps (1) - (3) in the execution process above. Thus, when creating a Prepared Statement some pre-optimization is performed immediately. The effect is to lessen the load on the database engine at execution time.
Now here is my question:
If I use hundreds or thousands of Statement, will it be cause performance problems in database? (I don't mean that they will perform slower because of more jobs to do every time). Will all those statements be cached in database or they will be lost in space as soon as they are executed?
Since there is no restictions on using prepared statements, you should work carefully with them.
As you said you need hundreds of prepaired, think twice may be you are using it wrong.
The pattern it should be used is having an application that doing a haevy inserts/updates/select hundred or thousand times a second which only differs in variables. So in real world it would be like, connecting, creating session, sending statement, and sending bunch of variables to that statement.
But if your plan is to create prepared on each single operations, it's just better to use common queries.
On your questions:
Hundreds of statements will not kill mysql or drive you to performance degradation
The prepared are stored in memory while client session is up and running. As soon as you close session the prepared die.
To be sure you need it:
Your app able to execute statements fast so you get speed value of using them
Your query will not have a variable number of arguments, otherwise you can kill you app by creating objects and storing in memory on every statement
We wrote stored procedures in MYSQL
If the stored procedure is called from one thread its taking 2.5 seconds to return results
If the stored procedure is called from 3 thread its taking approx 8.5 seconds to return results . each thread is taking almost the same time.
We are using MyISM, please let me know if we need to do any settings for the procedure to be executed parellely. We are only retrieving(selects) in the stored procedure no updates/insertion done
Increasing the number of threads to pull data from MySQL not necessarily increase throughput. You're executing the same query in multiple threads which adds to overhead of context switching.
To take advantage of threading you need to make use of idle time(the real idle time), like input/output/network delays.
Example:
A thread pulls some data from MySQL and starts processing, say sending notification over an interface. If that interface is synchronous then thread is stuck.
Get more threads to do the job for you, i.e pull data from DB(Idle) and process.
without such delays/idling threading only incurs overhead IMO.
I have a routine in MySQL that is very long and has multiple SELECT, INSERT, and UPDATE statements in it with some IFs and REPEATs. It's been running fine until lately, where it's hanging an taking over 20 seconds to complete (which is unacceptable considering it used to take 1 second or so).
What is the quickest and easiest way for me to find out where in the routine the bottleneck is coming from? Basically the routine is getting stopped up and some point... how can I find out where that is without breaking apart the routine and testing one-by-one each section?
If you use Percona Server (a free distribution of MySQL with many enhancements), you can make the slow-query log record times for individual queries, using the log_slow_sp_statements configuration variable. See http://www.percona.com/doc/percona-server/5.5/diagnostics/slow_extended_55.html
If you're using stock MySQL, you can add statements in the stored procedure to set a series of session variables to the value returned by the SYSDATE() function. Use a different session variable at different points in the SP. Then after you run the SP in a test execution, you can inspect the values of these session variables to see what section of the SP took the longest.
To analyze the query can see the execution plan of the same. It is not always an easy task but with a bit of reading will find the solution. I leave some useful links
http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
http://dev.mysql.com/doc/refman/5.0/en/explain.html
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
http://www.lornajane.net/posts/2011/explaining-mysqls-explain
I have tried a query on mysql, the query had called other functions.
Then I added the very same query in a stored procedure and then executed the procedure on mysql.
The execution time of the normal query was less by 1 sec than the procedure.
Weren't it supposed to be the opposite because procedures get cached.
Please explain to me if I'm missing something here. I appreciate your knowledge sharing a lot.
Regards
Stored Procedure is parsed and compiled only once when it's first created in the database while a text query needs to be parsed and compiled every time it's executed. and this is the difference and it's tiny for a limited number of calls.
If you are trying to compare just for a single query, then query is best way to opt, but for large queries, you should use stroed procedures.
I don't know for mysql, but for other database engines like Oracle, the queries may be cached and linked to the connection once compiled. Even the data may be cached in fact.
Did you try to launch the query and the stored procedure several times each? It is mandatory to have a correct estimation of the performance.
I have a procedure (procedureA) that loops through a table and calls another procedure (procedureB) with variables derived from that table.
Each call to procedureB is independent of the last call.
When I run procedureA my system resources show a maximum CPU use of 50% (I assume that is 1 out of my 2 CPU cores).
However, if I open two instances of the mysql terminal and execute a query in both terminals, both CPU cores are used (CPU usage can reach close to 100%).
How can I achieve the same effect inside a stored procedure?
I want to do something like this:
BEGIN
CALL procedureB(var1); -> CPU CORE #1
SET var1 = var1+1;
CALL procedureB(var1); -> CPU CORE #2
END
I know its not going to be that easy...
Any tips?
Within MySQL, to get something done asynchronously you'd have to use an CREATE EVENT, but I'm not sure whether creating one is allowed within a stored procedure. (On a side note: async. inserts can of course be done with INSERT DELAYED, but that's 1 thread, period).
Normally, you are much better of having a couple of processes/workers/deamons which can be accessed asynchronously by you program and have their own database connection, but that of course won't be in the same procedure.
You can write your own daemon as a stored procedure, and schedule multiple copies of it to run at regular intervals, say every 5 minutes, 1 minute, 1 second, etc.
use get_lock() with N well defined lock names to abort the event execution if another copy of the event is still running, if you only want up to N parallel copies running at a time.
Use a "job table" to list the jobs to execute, with an ID column to identify the execution order. Be sure to use good transaction and lock practices of course - this is re-entrant programming, after all.
Each row can define a stored procedure to execute and possibly the parameters. You can even have multiple types of jobs, job tables, and worker events for different tasks.
Use PREPARE and EXECUTE with the CALL statement to dynamically call stored procedures whose names are stored in strings.
Then just add rows as needed to the job table, even inserting in big batches, and let your worker events process them as fast as they can.
I've done this before, in both Oracle and MySQL, and it works well. Be sure to handle errors and log them somewhere, as well as success, for that matter, for debugging and auditing, as well as performance tuning. N=#CPUs may not be the best fit, depending on your data and the types of jobs. I've seen N=2xCPUs work best for data-intensive tasks, where lots of parallel disk I/O is more important than computational power.