Is a MySQL procedure thread safe? - mysql

I am developing some websites that need to interact with a database. I will not bring here a complicated example. My question actually comes down to: Is a MySQL procedure thread safe? If one client on my site triggers a procedure, can I assume it is atomic, or could it interfere with another request from another user?

Depends on if you're using SQL transactions. Its possible, without the appropriate use of transactions and the actual serialization level, that a procedure can expose some data in a write call, for instance, that is visible to other queries / procedures before the complete procedure has completed.
in short: a given procedure will only be atomic if you use the appropriate transaction level

The database will handle concurrency for you. This is normally done via transactions - any set of statements within a transaction is considered atomic and isolated from other processes. In some databases, a stored procedure will be in an implicit transaction (so you don't need to declare one) - read the documentation for your RDBMS.
Sometimes this will mean that records are locked while another process tries to use them.
You will need to write your application so it can detect such occurrences and retry.

It really depends on how your server is configured to use transactions. There are tradeoff to consider depending on how your data is used and whether or not dirty, non-repeatable, or phantom reads are acceptable for your application.

Yes.
It's the DB's job to ensure thread safety among its worker threads, and it's your job to ensure thread safety among your application threads. Since there's a separation between the DB server, and your application, you don't need to worry about thread safety in this case. MySQL's data locking mechanisms will prevent you from corrupting the data in the DB due to simultaneous access from multiple threads in your own app.
Thread safety is more about modifying data in-memory, that is also shared among multiple threads within your app. Since the DB server is its own, separate application, it basically protects you from the scenario you've outlined above.

Related

What is the difference between MYSQL and SQLite multi-user functionality?

I am new to server side programming and am trying to understand relational databases a little better. Whenever I read about MYSQL vs SQLite people always talk about SQLite not being able to have multiple users. However, when I program with the Django Framework I am able to create multiple users on the sqlitedb. Can someone explain what people mean by multi-user? Thanks!
When people talk about multiple users in this context, they are talking about simultaneous connections to the database. The users in this case are threads in the web server that are accessing the database.
Different databases have different solutions for handling multiple connections working with the database at once. Generally reading is not a problem, as multiple reading operations can overlap without disturbing each other, but only one connection can write data in a specific unit at a a time.
The difference between concurrency for databases is basically how large units they lock when someone is writing. MySQL has an advanced system where records, blocks or tables can be locked depending on the need, while SQLite has a simpler system where it only locks the entire database.
The impact of this difference is seen when you have multiple threads in the webserver, where some threads want to read data and others want to write data. MySQL can read from one table and write into another at the same time without problem. SQLite has to suspend all incoming read requests whenever someone wants to write something, wait for all current reads to finish, do the write, and then open up for reading operations again.
As you can read here, sqlite supports multi users, but lock the whole db.
Sqlite is used for development ussualy, buy Mysql is a better sql for production, because it has a better support for concurrency access and write, but sqlite dont.
Hope helps
SQLite concurrency is explained in detail here.
In a nutshell, SQLite doesn't have the fine-grained concurrency mechanisms that MySQL does. When someone tries to write to a MySQL database, the MySQL database will only lock what it needs to lock, usually a single record, sometimes a table.
When a user writes to a SQLite database, the entire database file is momentarily locked. As you might imagine, this limits SQLite's ability to handle many concurrent users.
Multi-user means that many tasks (possibly on many separate computers) can have open connections to the database at the same time.
A multi-user database provides things like locks to allow these tasks to update the database safely.
Look at ScimoreDB. It's an embedded database that supports multi-process (or user) read and write access. It also can work as a client-server database.

Do transactions add overhead to the DB?

Would it add overhead to put a DB transactions around every single service method in our application?
We currently only use DB transactions where it's an explicit/obvious necessity. I have recently suggested transactions around all service methods, but some other developers asked the prudent question: will this add overhead?
My feeling is not - auto commit is the same as a transaction from the DB perspective. But is this accurate?
DB: MySQL
You are right, with autocommit every statement is wrapped in transaction. If your service methods are executing multiple sql statements, it would be good to wrap them into a transaction. Take a look at this answer for more details, and here is a nice blog post on the subject.
And to answer your question, yes, transactions do add performance overhead, but in your specific case, you will not notice the difference since you already have autocommit enabled, unless you have long running statements in service methods, which will cause longer locks on tables participating in transactions. If you just wrap your multiple statements inside a transaction, you will get one transaction (instead of transaction for every individual statement), as pointed here ("A session that has autocommit enabled can perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with a COMMIT or ROLLBACK statement") and you will achieve atomicity on a service method level...
At the end, I would go with your solution, if that makes sense from the perspective of achieving atomicity on a service method level (which I think that you want to achieve), but there are + and - effects on performance, depending on your queries, requests/s etc...
Yes, they can add overhead. The extra "bookkeeping" required to isolate transactions from each other can become significant, especially if the transactions are held open for a long time.
The short answer is that it depends on your table type. If you're using MyISAM, the default, there are no transactions really, so there should be no effect on performance.
But you should use them anyway. Without transactions, there is no demarcation of work. If you upgrade to InnoDB or a real database like PostgreSQL, you'll want to add these transactions to your service methods anyway, so you may as well make it a habit now while it isn't costing you anything.
Besides, you should already be using a transactional store. How do you clean up if a service method fails currently? If you write some information to the database and then your service method throws an exception, how do you clean out that incomplete or erroneous information? If you were using transactions, you wouldn't have to—the database would throw away rolled back data for you. Or what do you do if I'm halfway through a method and another request comes in and finds my half-written data? Is it going to blow up when it goes looking for the other half that isn't there yet? A transactional data store would handle this for you: your transactions would be isolated from each other, so nobody else could see a partially written transaction.
Like everything with databases, the only definitive answer will come from testing with realistic data and realistic loads. I recommend that you do this always, no matter what you suspect, because when it comes to databases very different code paths get activated when the data are large versus when they are not. But I strongly suspect the cost of using transactions even with InnoDB is not great. After all, these systems are heavily used constantly, every day, by organizations large and small that depend on transactions performing well. MVCC adds very little overhead. The benefits are vast, the costs are low—use them!

SQL Azure performance considerations

Which are the performance considerations I should keep in mind when I'm planning an SQL Azure application? Azure Storage, and the worker and the web roles looks very scalable, but if at the end they are using one database... it looks like the bottleneck.
I was trying to find numbers about:
How many concurrent connections does
SQL Azure support?
Which is the bandwidth?
But no luck.
For example, I'm planning and application that uses a very high level of inserts, but I need return the result of an aggregate function each time (e.g.: the sum of all records with same key in a column), so I can not go with table storage.
Batching is an option, but time response is critical as well, so I'm afraid the database will be bloated with lot of connections.
Sharding is another option, but even when the amount of inserts is massive, the amount of data is very small, 4 to 6 columns with one PK and no FK. So even a 1Gb DB would be an overkill (and an overpay :D) for a partition.
Which would be the performance keys I should keep in mind when I'm facing these kind of applications?
Cheers.
Achieving both scalability and performance can be very difficult, even in the cloud. Your question was primarily about scalability, so you may want to design your application in such a way that your data becomes "eventually" consistent, using queues for example. A worker role would listen for incoming insert requests and would perform the insert asynchronously.
To minimize the number of roundtrips to the database and optimize connection pooling make sure to batch your inserts as well. So you could send 100 inserts in one shot. Also keep in mind that SQL Azure now supports MARS (multiple active recordsets) so that you can return multiple SELECTs in a single batch back to the calling code. The use of batching and MARS should reduce the number of database connections to a minimum.
Sharding usually helps for Read operations; not so much for inserts (although I never benchmarked inserts with sharding). So I am not sure sharding will help you that much for your requirements.
Remember that the Azure offering is designed first for scalability and reasonable performance in a multitenancy environment, where your database is shared with others on the same server. So if you need strong performance with guaranteed response time you may need to reevaluate your hosting choices or indeed test the performance boundaries of Azure for your needs as suggested by tijmenvdk.
SQL Azure will throttle your connections if any form of resource contention occurs (this includes heavy load but might also occur when your database is physically moved around). Throttling is non-deterministic, meaning that you cannot predict if and when this happens. When throttling, SQL Azure will drop your connection, requiring you to perform a retry. Number of connections supported and bandwidth is not published "by design" due to the flexible nature of the underlying infrastructure. Having said that, the setup is optimized for high availability, not high throughput.
If the bursts happen at a known time, you might consider sharding just during those bursts and consolidating the data after the burst has happened. Another way to handle this, is to start queueing/batching writes if and only if throttling occurs. You can use an Azure Queue for that plus a worker role to empty the queue later. This "overflow mechanism" has the advantage of automatically engaging if throttling occurs.
As an alternative you could use Azure Table Storage and keep a separate table of running totals that you can report back instead of performing an aggregation over the data to return the required sum of all records (this might be tricky due to the lack of locking on the tables though).
Apologies for stating the obvious, but the first step would be to test if you run into throttling at all in your scenario. I would give the overflow solution a try.

MySQL: Transactions across multiple threads

Preliminary:
I have an application which maintains a thread pool of about 100 threads. Each thread can last about 1-30 seconds before a new task replaces it. When a thread ends, that thread will almost always will result in inserting 1-3 records into a table, this table is used by all of the threads. Right now, no transactional support exists, but I am trying to add that now. Also, the table in question is InnoDB. So...
Goal
I want to implement a transaction for this. The rules for whether or not this transaction commits or rollback reside in the main thread. Basically there is a simple function that will return a boolean.
Can I implement a transaction across multiple connections?
If not, can multiple threads share the same connection? (Note: there are a LOT of inserts going on here, and that is a requirement).
1) No, a transaction is limited to a single DB connection.
2) Yes, a connection (and transaction) can be shared across multiple threads.
Well, as stated in a different answer you can't create a transaction across multiple connections. And you can share the single connection across threads. However you need to be very careful with that. You need to make sure that only one thread is writing to the connection at the same time. You can't just have multiple threads talking across the same connection without synchronizing their activities in some way. Bad things will likely happen if you allow two threads to talk at once (memory corruptions in the client library, etc). Using a mutex or critical section to protect the connection conversations is probably the way to go.
-Don
Sharing connections between lots of threads is usually implemented by using a connection pool. Every thread can request a connection from the pool, use it for its purposes (one or more transactions, committed or rolled back) and hand it back to the pool once the task is finished.
This is what application servers offer you. They will take care of transactions, too, i. e. when the method that requested the transaction finishes normally, changes are committed, if it throws an exception, the database transaction is rolled back.
I suggest you have a look at Java EE 5 or 6 - it is very easy to use and can even be employed in embedded systems. For easy start, have a look at Netbeans and the Glassfish application server. However the general concepts apply to all application servers alike.
As for InnoDB, it will not have any problems handling lots of transactions. Under the supervision of the app server you can concentrate on the business logic and do not have to worry about half-written updates or anyone seeing updates/inserts before the transaction they originate from has been committed.
InnoDB uses MVCC (multi version concurrency control), effectively presenting each transaction with a snapshot of the whole database as of the time when it was started. You can read more about MVCC here in a related question: Question 812512

How To Mutex Across a Network?

I have a desktop application that runs on a network and every instance connects to the same database.
So, in this situation, how can I implement a mutex that works across all running instances that are connected to the same database?
In other words, I don't wan't that two+ instances to run the same function at the same time. If one is already running the function, the other instances shouldn't have access to it.
PS: Database transaction won't solve, because the function I wan't to mutex doesn't use the database. I've mentioned the database just because it can be used to exchange information across the running instances.
PS2: The function takes about ~30 minutes to complete, so if a second instance tries to run the same function I would like to display a nice message that it can't be performed right now because computer 'X' is already running that function.
PS3: The function has to be processed on the client machine, so I can't use stored procedures.
I think you're looking for a database transaction. A transaction will isolate your changes from all other clients.
Update:
You mentioned that the function doesn't currently write to the database. If you want to mutex this function, there will have to be some central location to store the current mutex holder. The database can work for this -- just add a new table that includes the computername of the current holder. Check that table before starting your function.
I think your question may be confusion though. Mutexes should be about protecting resources. If your function is not accessing the database, then what shared resource are you protecting?
put the code inside a transaction either - in the app, or better -inside a stored procedure, and call the stored procedure.
the transaction mechanism will isolate the code between the callers.
Conversely consider a message queue. As mentioned, the DB should manage all of this for you either in transactions or serial access to tables (ala MyISAM).
In the past I have done the following:
Create a table that basically has two fields, function_name and is_running
I don't know what RDBMS you are using, but most have a way to lock individual records for update. Here is some pseduocode based on Oracle:
BEGIN TRANS
SELECT FOR UPDATE is_running FROM function_table WHERE function_name='foo';
-- Check here to see if it is running, if not, you can set running to 'true'
UPDATE function_table set is_running='Y' where function_name='foo';
COMMIT TRANS
Now I don't have the Oracle PSQL docs with me, but you get the idea. The 'FOR UPDATE' clause locks there record after the read until the commit, so other processes will block on that SELECT statement until the current process commits.
You can use Terracotta to implement such functionality, if you've got a Java stack.
Even if your function does not currently use the database, you could still solve the problem with a specific table for the purpose of synchronizing this function. The specifics would depend on your DB and how it handles isolation levels and locking. For example, with SQL Server you would set the transaction isolation to repeatable read, read a value from your locking row and update it inside a transaction. Don't commit the transaction until your function is done. You can also use explicit table locks in a transaction on most databases which might be simpler. This is probably the simplest solution given you are already using a database.
If you do not want to rely on the database for whatever reason you could write a simple service that would accept TCP connections from your client. Each client would request permission to run and would return a response when done. The server would be able to ensure only one client gets permission to run at a time. Dead clients would eventually drop the TCP connection and be detected as long as you have the correct keep alive setting.
The message queue solution suggested by Xepoch would also work. You could use something like MSMQ or Java Message Queue and have a single message that would act as a run token. All your clients would request the message and then repost it when done. You risk a deadlock if a client dies before reposting so you would need to devise some logic to detect this and it might get complicated.