How does MySQL InnoDB implement Read Uncommitted isolation level - mysql

Oracle doesn't allow dirty reads, so Read Uncommitted is not even allowed to be set from JDBC.
PostgreSQL also falls back to Read Committed, when choosing Read Uncommitted.
SQL Server defines a Read Uncommitted isolation level, because its concurrency control model is based on locking (unless switching to the two snapshot isolation levels), so it's probably the only database which can see some performance advantage from avoiding locking for reports that don't really need strict consistency.
InnoDB is also using MVCC but unlike Oracle and PostgreSQL, it allows dirty reads. Why is it so? Is there any performance advantage from going directly to the latest version, instead of rebuilding the previous version from the rollback segments? Is the rollback segment query-time restoring such an intensive process that would call for allowing dirty reads?

The main advantage I'm aware of, is that if all your sessions are READ-UNCOMMITTED then house-keeping (cleaning up UNDO) will never be blocked waiting for old sessions.
There may be some other performance gains if read-view structures (example) do not need to be created for READ-UNCOMMITTED transactions themselves, but I have not confirmed this myself. Generally speaking, this is not an isolation level that the InnoDB team targets optimizations for.
Edit: In terms of performance from unrolling rollback segments, yes it is possible it can be slow with many revisions. AFAIK it is a simple link list, and many traversals could be required. The comparison to PostgreSQL is a difficult one to make here, because the architecture (mysql features UNDO) is quite different. Generally speaking I would say that UNDO works well when the relocation is "logical only + fits in working set"; i.e. it is performed in memory, but cleaned up before physical IO was required.

Related

Are there *application*-driven reasons to prefer multi-primary topologies over clustering, or vice-versa?

I have an application that currently uses a single primary and I'm looking to do multi-primary by either setting up a reciprocal multi-primary (just two primaries with auto-increment-increment and auto-increment-offset set appropriately) or Clustering-with-a-capital-C. The database is currently MariaDB 10.3, so the clustering would be Galera.
My understanding of multi-primary is that the application would likely require no changes: the application would connect to a single database (doesn't matter which one), and any transaction that needed to obtain any locks would do so locally, any auto-increment values necessary would be generated, and once a COMMIT occurs, that engine would complete the commit and the likelihood of failure-to-replicate to the other node would be very low.
But for Clustering, a COMMIT actually requires that the other node(s) are updated to ensure success, the likelihood of failure during COMMIT (as opposed to during some INSERT/UPDATE/DELETE) is much higher, and therefore the application would really require some automated retry logic to be built into it.
Is the above accurate, or am I overestimating the likelihood of COMMIT-failure in a Clustered deployment, or perhaps even underestimating the likelihood of COMMIT-failure in a multi-primary environment?
From what I've read, it seems that the Galera Cluster is a little more graceful about handling nodes leaving the re-joining the Cluster and adding new nodes. Is Galera Cluster really just multi-master with the database engine handling all the finicky setup and management, or is there some major difference between the two?
Honestly, I'm more looking for reassurance that moving to Galera Cluster isn't going to end up being an enormous headache relative to the seemingly "easier" and "safer" move to multi-primary.
By "multi-primary", do you mean that each of the Galera nodes would be accepting writes? (In other contexts, "multi-primary" has a different meaning -- and only one Replica.)
One thing to be aware of: "Critical read".
For example, when a user posts something and it writes to one node, and then that user reads from a different node, he expects his post to show up. See wsrep_sync_wait.
(Elaborating on Bill's comment.) The COMMIT on the original write waits for each other node to say "yes, I can and will store that data", but a read on the other nodes may not immediately "see" the value. Using wsrep_sync_wait just before a SELECT makes sure the write is actually visible to the read.

Does mySQL replication have immediate data consistency?

I am considering a noSQL solution for a current project, but I'm hesitant about the 'eventual consistency' clause in many of these databases. Is eventual consistency different than dealing with a mySQL database where replication lags? One solution I have used in the past with lagging replication is to read from the master when immediate data consistency is needed.
However, I am confused then as to why relational database claim to have strong data consistency. I guess I should use transactions and that will give me strong consistency. Is it a good practice then to write applications assuming mySQL replication may lag?
Consistency in the sense it is used in ACID means that all constraints are satisfied before and after any change. When a system assures that you can't read data that is inconsistent, they're saying for example that you will never read data where a child row references a non-existent parent row, or where half of a transaction has been applied but the other half hasn't yet been applied (the textbook example is debiting one bank account but not yet having credited the recipient bank account).
Replication in MySQL is asynchronous by default, or "semi-synchronous" at best. Certainly it does lag in either case. In fact, the replication replica is always lagging behind at least a fraction of a second, because the master doesn't write changes to its binary log until the transaction commits, then the replica has to download the binary log and relay the event.
But the changes are still atomic. You can't read data that is partially changed. You either read committed changes, in which case all constraints are satisfied, or else the changes haven't been committed yet, in which case you see the state of data from before the transaction began.
So you might temporarily read old data in a replication system that lags, but you won't read inconsistent data.
Whereas in an "eventually consistent" system, you might read data that is partially updated, where the one account has been debited but the second account has not yet been credited. So you can see inconsistent data.
You're right that you may need to be careful about reading from replicas if your application requires absolutely current data. Each application has a different tolerance for replication lag, and in fact within one application, different queries have different tolerance for lag. I did a presentation about this: Read/Write Splitting for MySQL and PHP (Percona webinar 2013)
For completeness I'll also answer the question with the CAP theorem point of view. Oh and Consistency in ACID is not same as Consistency in CAP.
In terms of Consistency in CAP theorem, which says every read receives the most recent write or an error(this is referred as linearizability, a.k.a strong consistency a.k.a atomic consistency), MySQL is not strongly consistent by default because it uses asynchronous replication. So there is a period of time which some nodes in the group has the most recent write while some nodes still hasn't.
Also if your MySQL version is 8.0.14 or higher, then group_replication_consistency is configurable but still it's default value is EVENTUAL(this isn't configurable and is the default value in previous MySQL versions which I belive most apps running on). Details: https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-consistency-guarantees.html
Furthermore if you're using MySQL Cluster(which is a different product/technology and I find it confusing they've called it cluster), MySQL documentation itself says it only guarantees eventual consistency. Details: https://dev.mysql.com/doc/mysql-cluster-manager/1.4/en/mcm-eventual-consistency.html
So we are safe to say that it's an eventually consistent system. And every asynchronously replicated system is eventually consistent by definition.

Do transactions add overhead to the DB?

Would it add overhead to put a DB transactions around every single service method in our application?
We currently only use DB transactions where it's an explicit/obvious necessity. I have recently suggested transactions around all service methods, but some other developers asked the prudent question: will this add overhead?
My feeling is not - auto commit is the same as a transaction from the DB perspective. But is this accurate?
DB: MySQL
You are right, with autocommit every statement is wrapped in transaction. If your service methods are executing multiple sql statements, it would be good to wrap them into a transaction. Take a look at this answer for more details, and here is a nice blog post on the subject.
And to answer your question, yes, transactions do add performance overhead, but in your specific case, you will not notice the difference since you already have autocommit enabled, unless you have long running statements in service methods, which will cause longer locks on tables participating in transactions. If you just wrap your multiple statements inside a transaction, you will get one transaction (instead of transaction for every individual statement), as pointed here ("A session that has autocommit enabled can perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with a COMMIT or ROLLBACK statement") and you will achieve atomicity on a service method level...
At the end, I would go with your solution, if that makes sense from the perspective of achieving atomicity on a service method level (which I think that you want to achieve), but there are + and - effects on performance, depending on your queries, requests/s etc...
Yes, they can add overhead. The extra "bookkeeping" required to isolate transactions from each other can become significant, especially if the transactions are held open for a long time.
The short answer is that it depends on your table type. If you're using MyISAM, the default, there are no transactions really, so there should be no effect on performance.
But you should use them anyway. Without transactions, there is no demarcation of work. If you upgrade to InnoDB or a real database like PostgreSQL, you'll want to add these transactions to your service methods anyway, so you may as well make it a habit now while it isn't costing you anything.
Besides, you should already be using a transactional store. How do you clean up if a service method fails currently? If you write some information to the database and then your service method throws an exception, how do you clean out that incomplete or erroneous information? If you were using transactions, you wouldn't have to—the database would throw away rolled back data for you. Or what do you do if I'm halfway through a method and another request comes in and finds my half-written data? Is it going to blow up when it goes looking for the other half that isn't there yet? A transactional data store would handle this for you: your transactions would be isolated from each other, so nobody else could see a partially written transaction.
Like everything with databases, the only definitive answer will come from testing with realistic data and realistic loads. I recommend that you do this always, no matter what you suspect, because when it comes to databases very different code paths get activated when the data are large versus when they are not. But I strongly suspect the cost of using transactions even with InnoDB is not great. After all, these systems are heavily used constantly, every day, by organizations large and small that depend on transactions performing well. MVCC adds very little overhead. The benefits are vast, the costs are low—use them!

HandlerSocket transactions

In Redis can complete the transaction in this way:
redis.watch('powerlevel')
current = redis.get('powerlevel')
redis.multi()
redis.set('powerlevel', current + 1)
redis.exec()
Is it possible to perform this operation using the HandlerSocket?
What are the general features for working with transaction provides handlersotsket?
Comparing Redis "transactions" to a general purpose transactional engine is always a bit misleading. A Redis WATCH/MULTI/EXEC block is:
Not atomic (no rollback in case of error)
Consistent (there are not many consistency rules anyway with Redis)
Fully isolated (everything is serialized)
Possibly durable if AOF+fsync strategy is selected
So the full ACID properties which are commonly used to define a transaction are not completely provided by Redis. Contrary to most transactional engines, Redis provides very strong isolation, and does not attempt to provide any rollback capabilities.
The example provided in the question is not really representative IMO, since the same behavior can be achieved in a simpler way by just using:
redis.incr( "powerlevel" )
because Redis single operations are always atomic and isolated.
WATCH/MULTI/EXEC blocks are typically used when consistency between various keys must be enforced, or to implement optimistic locking patterns. In other words, if your purpose is just to increment isolated counters, there is no need to use a WATCH/MULTI/EXEC block.
The HandlerSocket is a complete different beast. It is built on top of the generic handler of MySQL, and depending on the underlying storage engine, the transactional behavior will be different. For instance when it is used with MyISAM, it will use no ACID transactions, but consistency will be ensured by a R/W lock at the table level. With InnoDB, ACID transactions will be used with the default isolation level (which can be set in the InnoDB configuration AFAIK). InnoDB implements MVCC (multi-versioning concurrency control), so locking is much more complex than with MyISAM.
The HandlerSocket works with two pools of worker threads (one for read-only connections, one for write oriented connections). People are supposed to use several read worker threads, but only one write thread though (probably to decrease locking contention). So in the base configuration, write operations are serialized, but not read operations. AFAIK, the only possibility to have the same isolation semantic than Redis is to only use the write oriented socket to perform both read and write operations, and keep only one write thread (full serialization of all operations). It will impact scalability though.
From the HandlerSocket protocol, there is no access to transactional capabilities. At each event loop iteration, it collects all the operations (coming from all the sockets), and perform a unique transaction (only relevant with InnoDB) for all these operations. AFAIK, the user has no way to alter the scope of this transaction.
The conclusion is it is not generally possible to emulate the behavior of a Redis WATCH/MULTI/EXEC block with HandlerSocket.
Now, back to the example, if the purpose is just to increment counters in a consistent way, this is fully supported by the HandlerSocket protocol. For instance, the +/- (increment/decrement) operations are available, and also the U? operation (similar to Redis GETSET command), or +?/-? (increment/decrement, returning the previous value).

XA vs. Non-XA JDBC Driver Performance?

We are using an XA JDBC driver in a case where it is not required (read-only work that doesn't participate in a distributed transaction).
Just wondering if there are any known performance gains to be had to switch to the Non-XA JDBC driver - if not it's probably not worth switching?
FWIW we are using MySQL 5.1
As with all things performance related, the answer is: it depends. Specifically, it depends on exactly how you are using the driver.
The cost of interacting transactionally with a database is divided roughly into: code complexity overhead, communication overhead, sql processing and disk I/O.
Communication overhead differs somewhat between the XA and non-XA cases. All else being equal, an XA transaction carries a little more cost here as it requires more round trips to the db. For a non-XA transaction in manual commit mode, the cost is at least two calls: the sql operation(s) and the commit. In the XA case it's start, sql operation(s), end, prepare and commit. For your specific use case that will automatically optimize to start, sql operation(s), end, prepare. Not all the calls are of equal cost: the data moved in the result set will usually dominate. On a LAN the cost of the additional round trips is not usually significant.
Note however that there are some interesting gotchas lurking in wait for the unwary. For example, some drivers don't support prepared statement caching in XA mode, which means that XA usage carries the added overhead of re-parsing the SQL on every call, or requires you to use a separate statement pool on top of the driver. Whilst on the topic of pools, correctly pooling XA connections is a little more complex than pooling non-XA ones, so depending on the connection pool implementation you may see a slight hit there too. Some ORM frameworks are particularly vulnerable to connection pooling overhead if they use aggressive connection release and reacquire within transaction scope. If possible, configure to grab and hold a connection for the lifetime of the tx instead of hitting the pool multiple times.
With the caveat mentioned previously regarding the caching of prepared statements, there is no material difference in the cost of the sql handling between XA and non-XA tx. There is however a small difference to resource usage on the db server: in some cases it may be possible for the server to release resources sooner in the non-XA case. However, transactions are normally short enough that this is not a significant consideration.
Now we consider disk I/O overhead. Here we are concerned with I/O occasioned by the XA protocol rather than the SQL used for the business logic, as the latter is unchanged in either case. For read-only transactions the situation is simple: a sensible db and tx manager won't do any log writes, so there is no overhead. For write cases the same is true where the db is the only resource involved, due to XA's one phase commit optimization. For the 2PC case each db server or other resource manager needs two disk writes instead of the one used in non-XA cases, and the tx manager likewise needs two. Thanks to the slowness of disk storage this is the dominant source of performance overhead in XA vs. non-XA.
Several paragraphs back I mentioned code complexity. XA requires slightly more code execution than non-XA. In most cases the complexity is buried in the transaction manager, although you can of course drive XA directly if you prefer. The cost is mostly trivial, subject to the caveats already mentioned. Unless you are using a particularly poor transaction manager, in which case you may have a problem. The read-only case is particularly interesting - transaction manager providers usually put their optimization effort into the disk I/O code, whereas lock contention is a more significant issue for read-only use cases, particularly on highly concurrent systems.
Note also that code complexity in the tx manager is something of a red-herring in architectures featuring an app server or other standard transaction manager provider, as these usually use much the same code for XA and non-XA transaction coordination. In non-XA cases, to miss out the tx manager entirely you typically have to tell the app server / framework to treat the connection as non-transactional and then drive the commit directly using JDBC.
So the summary is: The cost of your sql queries is going to dominate the read-only transaction time regardless of the XA/non-XA choice, unless you mess up something in the configuration or do particularly trivial sql operations in each tx, the latter being a sign your business logic could probably use some restructuring to change the ratio of tx management overhead to business logic in each tx.
For read-only cases the usual transaction protocol agnostic advise therefore applies: consider a transaction aware level second level cache in an ORM solution rather than hitting the DB each time. Failing that, tune the sql, then increase the db's buffer cache until you see a 90%+ hit rate or you max out the server's RAM slots, whichever comes first. Only worry about XA vs. non-XA once you've done that and found things are still too slow.
To explain this briefly,
An XA transaction is a "global transaction".
A non-XA transaction is a "local transaction".
An XA transaction involves a coordinating transaction manager, with one or more databases (or other resources, like JMS) all involved in a single global transaction.
Non-XA transactions have no transaction coordinator, and a single resource is doing all its transaction work itself.