I use a Berkeley database with transactions in an environment with multiple threads but only a single process.
Since I only have a single process, I'd like to cache the parsed representation of the data from some records so that I don't have to re-parse it unnecessarily every time I access it, but if I do that naïvely without issuing any database operation when accessing the cached data, I will clearly not be following the locking protocol properly, leading transactions not to be properly isolated from each other.
I could obviously fix this by just running a get operation on a record I intend to use during a transaction, but that seems unnecessary in terms of I/O and also just ugly since I don't actually need to re-read the data. Is there a way to do locking "as if" I had issued a get operation?
Likewise, is there a way to do the corresponding write-locks for put operations, since I need to do them before modifying the cached data (which needs to be done before serializing the data in order to actually put it)?
Keeping a (parsed) cache consistent with an underlying database store isn't trivial.
Have you considered saving the already parsed data directly in BerkeleyDB to remove parsing overhead? Yes this involves a schema change, and may also require more disk space.
BerkeleyDB goes to some lengths to minimize I/O within the mpool and should be near optimal for a properly configured database.
If you are using transactions, then isolation between get/put operations is handled during the transaction commit without a need for additional locking. Yes the commit may may fail or clobber another commit. Add an additional locking layer (using BerkeleyDB locking subsystem or anything you wish) on the put record key if you need exclusivity while updating a record.
Related
There a few large tables in one of the databases of a customer (each table is ~50M rows in size and is not too wide). The intent is to infrequently read these tables (completely). As there are no reasonable CDC indices present, the plan is to read the tables by querying them
SELECT * from large_table;
The reads will be performed using a jdbc driver. With the following fetch configuration present, the intent is to read the data approximately one record at a time (it may require a significant amount of time) so that the client code is never overwhelmed.
PreparedStatement stmt = connection.prepareStatement(queryString, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
I was going through the execution path of a query in High Performance MySQL, however some questions seemed unanswered:
Without the temp tables being explicitly created and the query cache being made use of, "how" are the stream reads tracked on the server?
Is any temporary data created (in main memory or files on disk) whatsoever? If so, where is it created and how much?
If temporary data is not created, how are the rows to be returned tracked? Does the query engine keep track of all the page files to be read for this query on this connection? In case there are several such queries running on the server, are the earliest "Tracked" files purged in favor of queries submitted recently?
PS: I want to understand the effect of this approach on the MySql server (not saying that there aren't better ways of reading the tables)
That simple query will not use a temp table. It will simply fetch the rows and transfer them to the client until it finishes. Nor would any possible index be useful. (If the real query is more complex, let's see it.)
The client may wait for all the rows (faster, but memory intensive) before it hands any to the user code, or it may hand them off one at a time (much slower).
I don't know the details in JDBC on specifying it.
You may want to page through the table. If so, don't use OFFSET, but use the PRIMARY KEY and "remember where you left off". More discussion: http://mysql.rjweb.org/doc.php/pagination
Your Question #3 leads to a complex answer...
Every query brings all the relevant data (and index entries) into RAM. The data/index is read in chunks ("blocks") of 16KB from the BTree structure that is persisted on disk. For a simple select like that, it will read the blocks 'sequentially' until finished.
But, be aware of "caching":
If a block is already in RAM, no I/O is needed.
If a block is not in the cache ("buffer_pool"), it will, if necessary, bump some block out and read the desired block in. This is very normal, and very common. Do not fear it.
Because of the simplicity of the query, only a few blocks ever need to be in RAM at any moment. Hence, if your buffer pool were only a few megabytes, it could still handle, say, a 1TB table. There would be a lot of I/O, and that would impact other operations.
As for "tracking", let me use the analogy of reading a long book in a single sitting. There is nothing to track, you are simply turning pages ('blocks'). You don't even need a 'bookmark' for tracking, it is next-next-next...
Another note: InnoDB uses "B+Tree", which includes a link from one block to the "next", thereby making the page turning efficient.
Another interpretation of tracking... "Transactions" and "ACID". When any query (read or write) touches a table, there is some form of lock applied to each row touched. For SELECT the lock is rather light-weight. For writes it can cause delays or even a "deadlock". The locks are unavoidable, but sometimes actions can be taken to minimize their impact.
Logically (but not actually), a "snapshot" of all rows in all tables is taken at the instant you start a transaction. This allows you to see a consistent view of everything, even if other connections are changing rows. The underlying mechanism is very lightweight on reading, but heavier for writes. Writes will make a copy of the row so that each connection sees the snapshot that it 'should' see. Also, the copy allows for ROLLBACK and recovery from a crash (eg power failure).
(Transaction "isolation" mode allows some control over the snapshot.) To get the optimal performance for your case, do nothing special.
Here's a way to conceptualize the handling of transactions: Each row has a timestamp associated with it. Each query saves the start time of the query. The query can "see" only rows that are older than that start time. A subsequent write in another connection will be creating copies of rows with a later timestamp, hence not visible to the SELECT. Hence, the onus is on writes to do extra work; reads are cheap.
As part of the persistence process in one of my models an md5 check_sum of the entire record is generated and stored with the record. The md5 check_sum contains a flattened representation of the entire record including all EAV attributes etc. This makes preventing absolute duplicates very easy and efficient.
I am not using a unique index on this check_sum for a specific reason, I want this all to be silent, i.e. if a user submits a duplicate then the app just silently ignores it and returns the already existing record. This ensures backwards compatibility with legacy app's and api's.
I am using Laravel's eloquent. So once a record has been created and before committing the application does the following:
$taxonRecords = TaxonRecord::where('check_sum', $taxonRecord->check_sum)->get();
if ($taxonRecords->count() > 0) {
DB::rollBack();
return $taxonRecords->first();
}
However recently I encountered a 60,000/1 shot incident(odds based on record counts at that time). A single duplicate ended up in the database with the same check_sum. When I reviewed the logs I noticed that the creation time was identical down to the second. Further investigation of Apache logs showed a valid POST but the POST was duplicated. I presume the users browser malfunctioned or something but both POSTS arrived simultaneously resulting in two simultaneous transactions.
My question is how can I ensure that a transaction and its contained SELECT for the previous check_sum is Atomic & Isolated. Based upon my reading the answer lies in https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html and isolation levels.
If transaction A and transaction B arrive at the server at the same time then they should not run side by side but should wait for the first to complete.
You created a classic race condition. Both transactions are calculating the checksum while they're both in progress, not yet committed. Neither can read the other's data, since they're uncommitted. So they calculate that they're the only one with the same checksum, and they both go through and commit.
To solve this, you need to run such transactions serially, to be sure that there aren't other concurrent transactions submitting the same data.
You may have to use use GET_LOCK() before starting your transaction to calculate the checksum. Then RELEASE_LOCK() after you commit. That will make sure other concurrent requests wait for your data to be committed, so they will see it when they try to calculate their checksum.
I was just wondering how most relational databases handled maintaining your set of results if another query has edited those rows that you were working on. For instance if I do a select of like 100k rows and while I am still fetching those another query comes in and does an update on 1 of the rows that hasn't been read yet the update is not going to be seen in the fetching of those rows and I was wondering how the database engine handles that. If you only have specifics for one type of database thats fine I would like to hear it anyway.
Please lookup Multi Version Concurrency Control. Different databases have different approaches to managing this. For MySQL, InnoDB, you can try http://dev.mysql.com/doc/refman/5.0/en/innodb-multi-versioning.html. PostgreSQL - https://wiki.postgresql.org/wiki/MVCC. A great presentation here - http://momjian.us/main/writings/pgsql/mvcc.pdf. It is explained in stackoverflow in this thread Database: What is Multiversion Concurrency Control (MVCC) and who supports it?
The general goal you are describing in concurrent programming (Wikipedia concurrency control) is serialization (Wikipedia serializability): an implementation manages the database as if transactions occurred without overlap in some order.
The importance of that is that only then does the system act in a way described by the code as we normally interpret it. Otherwise the results of operations are a combination of all processes acting concurrently. Nevertheless by having limited categories of non-normal non-isolated so-called anomalous behaviours arise transaction throughput can be increased. So those implementation techniques are also apropos. (Eg MVCC.) But understand: such non-serialized behaviour is not isolating one transaction from another. (Ie so-called "isolation" levels are actually non-isolation levels.)
Isolation is managed by breaking transaction implementations into pieces based on reading and writing shared resources and executing them interlaced with pieces from other transactions in such a way that the effect is the same as some sequence of non-overlapped execution. Roughly speaking, one can "pessimistically" "lock" out other processes from changed resources and have them wait or "optimistically" "version" the database and "roll back" (throw away) some processes' work when changes are unreconcilable (unserializable).
Some techniques based on an understanding of serializability by an implementer for a major product are in this answer. For relevant notions and techniques, see the Wikipedia articles or a database textbook. Eg Fundamentals of database systems by Ramez Elmasri & Shamkant B. Navathe. (Many textbooks, slides and courses are free online.)
(Two answers and a comment to your question mention MVCC. As I commented, not only is MVCC just one implementation technique, it doesn't even support transaction serialization, ie actually isolating transactions as if each was done all at once. It allows certain kinds of errors (aka anomalies). It must be mixed with other techniques for isolation. The MVCC answers, comments and upvoting reflects a confusion between a popular and valuable technique for a useful and limited failure to isolate per your question vs the actual core issues and means.)
As Jayadevan notes, the general principle used by most widely used databases that permit you to modify values while they're being read is multi-version concurrency control or MVCC. All widely used modern RDBMS implementations that support reading rows that're being updated rely on some concept of row versioning.
The details differ between implementations. (I'll explain PostgreSQL's a little more here, but you should accept Jayadevan's answer not mine).
PostgreSQL uses transaction ID ranges to track row visibility. So there'll be multiple copies of a tuple in a table, but any given transaction can only "see" one of them. Each transaction has a unique ID, with newer transactions having newer IDs. Each tuple has hidden xmin and xmax fields that track which transactions can "see" the tuple. Insertion is implimented by setting the tuple's xmin so that transactions with lower xids know to ignore the tuple when reading the table. Deletion is implimented by setting the tuple's xmax so that transactions with higher xids know to ignore the tuple when reading the table. Updates are implemented by effectively deleting the old tuple (setting xmax) then inserting a new one (setting xmin) so that old transactions still see the old tuple, but new transactions see the new tuple. When no running transaction can still see a deleted tuple, vacuum marks it as free space to be overwritten.
Oracle uses undo and redo logs, so there's only one copy of the tuple in the main table, and transactions that want to find old versions have to go and find it in the logs. Like PostgreSQL it uses row versioning, though I'm less sure of the details.
Pretty much every other DB uses a similar approach these days. Those that used to rely on locking, like older MS-SQL versions, have moved to MVCC.
MySQL uses MVCC with InnoDB tables, which are now the default. MyISAM tables still rely on table locking (but they'll also eat your data, so don't use them for anything you care about).
A few embedded DBs, like SQLite, still rely only on table locking - which tends to require less wasted disk space and I/O overhead, at the cost of greatly reduced concurrency. Some databases let you bypass MVCC if you take an exclusive lock on a table.
(Marked community wiki, since I also close-voted this question).
You should also read the PostgreSQL docs on transaction isolation and locking, and similar documentation for other DBs you use. See the Wikipedia article on isolation.
Snapshot isolation solves the problem you are describing. If you use locking, you can see the same record twice as the iterator through the database, as the unlocked records change underneath your feet as you're doing the scan.
Read committed isolation level with locking suffers from this problem.
Depending on the granularity of the lock, the WHERE predicate may lock matching pages and tuples for locking so that the running read query doesn't see phantom data appearing (phantom reads)
I implemented multiversion concurrency control in my Java project. A transaction is given a monotonically increasing timestamp which starts at 0 and goes up by 1 each time the transaction is aborted. Later transactions have higher timestamps. When a transaction goes to read, it can only see data that has a timestamp that is less than or equal to itself and is committed for that key (or column of that tuple). (Equal to so it can see its own writes)
When a transaction writes, it updates the committed timestamp for that key to that of that transactions timestamp.
I'm in a situation where an entire column in a table (used for user tokens) needs to be wiped, i.e., all user tokens are reset simultaneously. There are two ways of going about it: reset each user's token individually with a separate UPDATE query; or make one big query that affects all rows.
The advantage of one big query is that it will obviously be much faster, but I'm worried about the implications of a large UPDATE query when the database is big. Will requests that occur during the query be affected?
Afraid it's not that simple. Even if you enable dirty reads, running one big update has a lot of drawbacks:
long running transaction that updates one column will effectively block other insert, update and delete transactions.
long running transaction causes enourmous load on disk because server is having to write to a log file everything that is taking place so that you can roll back that huge transaction.
if a transaction fails, you would have to rerun it entirely, it is not restartable.
So if simultaneous requirement can be interpreted "in one batch that may take a while to run", I would opt for batching it. A good research write up on performance of DELETEs in MySql is here: http://mysql.rjweb.org/doc.php/deletebig, and I think most of the findings are applicable to UPDATE.
The trick will be finding the optimal "batch size".
Added benefits of batching is that you can make this process resilient to failures and restart-friendly.
The answer depends on the transaction and isolation level you've established.
You can set isolation to allow "dirty reads", "phantom reads", or force serialization of reads and writes.
However you do that UPDATE, you'll want it to be a single unit of work.
I'd recommend minimizing network latency and updating all the user tokens in one network roundtrip. This means either writing a single query or batching many into one request.
As the question says, is there ever a reason to wrap read-only sql statements in a transaction? Obviously updates require transactions.
You still need a read-lock on the objects you operate on. You want to have consistent reads, so writing the same records shouldn't be possible while you're reading them...
If you issue several SELECT statements in a single transaction, you will also produce several read-locks.
SQL Server has some good documentation on this (the "read-lock" is called shared lock, there):
http://msdn.microsoft.com/en-us/library/aa213039%28v=sql.80%29.aspx
I'm sure MySQL works in similar ways
Yes, if it's important that the data is consistent across the select statements run. For instance if you were getting the balance of several bank accounts for a user, you wouldn't want the balance values read to be inconsistent. Eg if this happened:
With balance values B1=10 and B2=20
Your code reads B1= 10.
Transaction TA1 starts on another DB client
TA1 writes B1 to 20, B2 to 10
TA1 commits
Your code reads B2 = 10
So you now think that B1 is 10 and B2 is 10, which could be displayed to the user and that says that $10 has disappeared!
Transactions for reading will prevent this, since we would read B2 as 20 in step 5 (assuming a multiversioning concurrency control DB, which mysql+innodb is).
MySQL 5.1, with the innodb engine has a default transaction isolation level which is REPEATABLE READS. So if you perform your SELECT inside a transaction no Dirty reads or Nonrepeatable reads can happen. That means even with transaction commiting between two of your queries you'll always get a consistent database. In theory in REPEATABLE READS you couls only fear phantom reads, but with innodb this cannot even occurs. So by simply opening a Transaction you can assume database consistency (coherence) and perform as much select as you want without fearing parallel-running-and-ending write transactions.
Do you have any interest in having such a big consistency constraint? Well it depends of what you're doing with your queries. having inconsistent reads means that if one of your query is based on a result from a previous one you may have problems:
if you're performing only one query you do not care, at all
if none of your queries assumes a result from a previous one, do not care
if you never re-read a record in the same session, same thing
if you always read dependencies of your main record in the same query and do not use lazy loading, no problem
if a small inconsistency between your first and last query will not break your code, then forget about it. But be careful, this can make a very hard to debug application bug (and hard to reproduce). So get a robust application code, something which could maybe handle databases errors and crash nicely (or not even crash) when this occurs (2 time in one year?).
if you show critical data (I mean bank accounts and not blogs or chats), then you should maybe care about it
if you have a lot of write operations, then you increase the risk of inconsistent reads, you may need to add transactions at least on some key points
you may need to test impact on performances, having all read requests in transactions, when several write transactions are really altering the data, is certainly slowing the engine, he needs to handle several versions of the data. So you shoul dcheck if the impact is not too big for your application