Thread-safety of a simple & complex stored procedure in MySQL - mysql

For the purpose of this question, I am defining a complex stored procedure as 'one involving (at least) one cursor and (at least) one loop to insert data into a temporary table and then return the records from the said temporary table'.
When working on such a complex stored procedure, I was told that when two different users logged into the application perform operations which invoke the same procedure, as the procedure, being complex, can take time up to few seconds (~10 seconds) to finish execution, then the results may not faithful on a per-user basis. That is, the results may get mixed up and one user may see the results intended for the other user, as they try to access the same temporary table.
The recommendation was to use a unique system-generated identifier for each user in order to distinguish the result sets for each user.
Now, I'd like to know the following:-
Can this concurrency problem be avoided by using any table or database engine configuration settings?
Is this a violation of one or more ACID properties? How does using a full ACID compliant database engine (such as InnoDB, the one I am using) impact this question?
In the case of a simple stored procedure, one which involves only a single SELECT statement over a join of multiple tables, but no temporary tables, when the execution time is almost always under a second, is concurrency still a problem?

OK, "cursor" is irrelevant. You just want a long-running Stored Procedure running simultaneously in multiple connections by the same user.
A connection (alias a session) is the unit of "independence". An entity (such as a table) is either specific to the connection, or global to the world (aside from permission problems). There is nothing "specific to a user".
A CREATE TEMPORARY TABLE is unique to the connection creating it. Ditto for last_insert_id(). These are "thread-safe" in that the "connection" is the "thread".
If you want to have multiple connections (same user or not) access the same "temporary" table, then it is up to you to create a non-TEMPORARY table and somehow know the name of that table.

Related

Comparison between MySQL Federated, Trigger, and Event Schedule?

I have a very specific problem that requires multiple MYSQL DB instances, and I need to "sync" all data from each DB/table into one DB/table.
Basically, [tableA.db1, tableB.db2, tableC.db3] into [TableAll.db4].
Some of the DB instances are on the same machine, and some are on a separate machine.
About 80,000 rows are added to a table per day, and there are 3 tables(DB).
So, about 240,000 would be "synced" to a single table per day.
I've just been using Event Schedule to copy the data from each DB into the "All-For-One" DB every hour.
However, I've been wondering lately if that's the best solution.
I considered using Trigger, but I've been told it puts heavy burden on DB.
Using statement trigger may be better, but it depends too much on how the statement is formed.
Then I heard about Federated (in Oracle term, "DBLink"),
and I thought I could use it to link each table and create a VIEW table on those tables.
But I don't know much about databases, so I don't really know the implication of each method.
So, my question is..
Considering the "All-For-One" DB only needs to be Read-Only,
which method would be better, performance and resource wise, in order to copy data from multiple databases into one database regularly?
Thanks!

What happen if two or more user send insert query for same table in mysql?

Does MySql put their queries in something like a queue or one of the queries will be accepted and other ones receive an error?
If there is something like a queue, what is its basis? does it work on based on time or something else?
Update: I supposed that every query is unique and happen to be a new record.
For ordinary insert queries of the type that can never create a duplicate key value, SQL table servers (MySQL and the rest) always perform them all.
You can think of the table server as containing a query queue: that's useful even if it is greatly oversimplified. Modern table servers contain vast amounts of code oriented around handling concurrent users correctly.
When two users issue queries at nearly the same time, you cannot rely on the table server to perform them in a particular order, just that they will all complete correctly.
Read up on ACID.

Will a MySQL SELECT statement interrupt INSERT statement?

I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result

MySQL performing a "No impact" temporary INSERT with replication avoiding Locks

SO, we are trying to run a Report going to screen, which will not change any stored data.
However, it is complex, so needs to go through a couple of (TEMPORARY*) tables.
It pulls data from live tables, which are replicated.
The nasty bit when it comes to take the "eligible" records from
temp_PreCalc
and populate them from the live data to create the next (TEMPORARY*) table output
resulting in effectively:
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
The report is not a "definitive" answer, expectation is that is merely a "snapshot" report and will be out-of-date as soon as it appears on screen.
There is no order or reproducibility issue.
So Ideally, I would turn my TRANSACTION ISOLATION LEVEL down to READ COMMITTED...
However, I can't because live_Tab1,2,3 are replicated with BIN_LOG STATEMENT type...
The statement is lovely and quick - it takes hardly any time to run, so the resource load is now less than it used to be (which did separate selects and inserts) but it waits (as I understand it) because of the SELECT that waits for a repeatable/syncable lock on the live_Tab's so that any result could be replicated safely.
In fact it now takes more time because of that wait.
I'd like to SEE that performance benefit in response time!
Except the data is written to (TEMPORARY*) tables and then thrown away.
There are no live_ table destinations - only sources...
these tables are actually not TEMPORARY TABLES but dynamically created and thrown away InnoDB Tables, as the report Calculation requires Self-join and delete... but they are temporary
I now seem to be going around in circles finding an answer.
I don't have SUPER privilege and don't want it...
So can't SET BIN_LOG=0 for this connection session (Why is this a requirement?)
So...
If I have a scratch Database or table wildcard, which excludes all my temp_ "Temporary" tables from replication...
(I am awaiting for this change to go through at my host centre)
Will MySQL allow me to
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
;
Or will I still get my
"Cannot Execute statement: impossible to write to binary log since
BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine
limited to row-based logging..."
Even though its not technically true?
I am expecting it to, as I presume that the replication will kick in simply because it sees the "INSERT" statement, and will do a simple check on any of the tables involved being replication eligible, even though none of the destinations are actually replication eligible....
or will it pleasantly surprise me?
I really can't face using an unpleasant solution like
SELECT TO OUTFILE
LOAD DATA INFILE
In fact I dont think I could even use that - how would I get unique filenames? How would I clean them up?
The reports are run on-demand directly by end users, and I only have MySQL interface access to the server.
or streaming it through the PHP client, just to separate the INSERT from the SELECT so that MySQL doesnt get upset about which tables are replication eligible....
So, it looks like the only way appears to be:
We create a second Schema "ScratchTemp"...
Set the dreaded replication --replicate-ignore-db=ScratchTemp
My "local" query code opens a new mysql connection, and performs a USE ScratchTemp;
Because I have selected the default database of the "ignore"d one - none of my queries will be replicated.
So I need to take huge care not to perform ANY real queries here
Reference my scratch_ tables and actual data tables by prefixing them all on my queries with the schema qualified name...
e.g.
INSERT INTO LiveSchema.temp_PostCalc (...) SELECT ... FROM LiveSchema.temp_PreCalc JOIN LiveSchema.live_Tab1 etc etc as above.
And then close this connection just as soon as I can, as it is frankly dangerous to have a non-replicated connection open....
Sigh...?

How to cache infrequently changing mysql query?

I have a mysql query that is taking 8 seconds to execute/fetch (in workbench).
I won't go into the details of why it may be slow (I think GROUPBY isnt helping though).
What I really want to know is, how I can basically cache it to work more quickly because the tables only change like 5-10 times/hr, while users access the site 1000s times/hour.
Is there a way to just have the results regenerated/cached when the db changes so results are not constantly regenerated?
I'm quite new to sql so any basic thought may go a long way.
I am not familiar with such a caching facility in MySQL. There are alternatives.
One mechanism would be to use application level caching. The application would store the previous result and use that if possible. Note this wouldn't really work well for multiple users.
What you might want to do is store the report in a separate table. Then you can run that every five minutes or so. This would be a simple mechanism using a job scheduler to run the job.
A variation on this would be to have a stored procedure that first checks if the data has changed. If the underlying data has changed, then the stored procedure would regenerate the report table. When the stored procedure is done, the report table would be up-to-date.
An alternative would be to use triggers, whenever the underlying data changes. The trigger could run the query, storing the results in a table (as above). Alternatively, the trigger could just update the rows in the report that would have changed (harder, because it involves understanding the business logic behind the report).
All of these require some change to the application. If your application query is stored in a view (something like vw_FetchReport1) then the change is trivial and all on the server side. If the query is embedded in the application, then you need to replace it with something else. I strongly advocate using views (or in other databases user defined functions or stored procedures) for database access. This defines the API for the database application and greatly facilitates changes such as the ones described here.
EDIT: (in response to comment)
More information about scheduling jobs in MySQL is here. I would expect the SQL code to be something like:
truncate table ReportTable;
insert into ReportTable
select * from <ReportQuery>;
(In practice, you would include column lists in the select and insert statements.)
A simple solution that can be used to speed-up the response time for long running queries is to periodically generate summarized tables, based on underlying data refreshing or business needs.
For example, if your business don't care about sub-minute "accuracy", you can run the process once each minute and make your user interface to query this calculated table, instead of summarizing raw data online.