Getting stale results in multiprocessing environment - sqlalchemy

I am using 2 separate processes via multiprocessing in my application. Both have access to a MySQL database via sqlalchemy core (not the ORM). One process reads data from various sources and writes them to the database. The other process just reads the data from the database.
I have a query which gets the latest record from the a table and displays the id. However it always displays the first id which was created when I started the program rather than the latest inserted id (new rows are created every few seconds).
If I use a separate MySQL tool and run the query manually I get correct results, but SQL alchemy is always giving me stale results.

Since you can see the changes your writer process is making with another MySQL tool that means your writer process is indeed committing the data (at least, if you are using InnoDB it does).
InnoDB shows you the state of the database as of when you started your transaction. Whatever other tools you are using probably have an autocommit feature turned on where a new transaction is implicitly started following each query.
To see the changes in SQLAlchemy do as zzzeek suggests and change your monitoring/reader process to begin a new transaction.
One technique I've used to do this myself is to add autocommit=True to the execution_options of my queries, e.g.:
result = conn.execute( select( [table] ).where( table.c.id == 123 ).execution_options( autocommit=True ) )

assuming you're using innodb the data on your connection will appear "stale" for as long as you keep the current transaction running, or until you commit the other transaction. In order for one process to see the data from the other process, two things need to happen: 1. the transaction that created the new data needs to be committed and 2. the current transaction, assuming it's read some of that data already, needs to be rolled back or committed and started again. See The InnoDB Transaction Model and Locking.

Related

MySQL/MariaDB InnoDB Simultaneous Transactions & Locking Behaviour

As part of the persistence process in one of my models an md5 check_sum of the entire record is generated and stored with the record. The md5 check_sum contains a flattened representation of the entire record including all EAV attributes etc. This makes preventing absolute duplicates very easy and efficient.
I am not using a unique index on this check_sum for a specific reason, I want this all to be silent, i.e. if a user submits a duplicate then the app just silently ignores it and returns the already existing record. This ensures backwards compatibility with legacy app's and api's.
I am using Laravel's eloquent. So once a record has been created and before committing the application does the following:
$taxonRecords = TaxonRecord::where('check_sum', $taxonRecord->check_sum)->get();
if ($taxonRecords->count() > 0) {
DB::rollBack();
return $taxonRecords->first();
}
However recently I encountered a 60,000/1 shot incident(odds based on record counts at that time). A single duplicate ended up in the database with the same check_sum. When I reviewed the logs I noticed that the creation time was identical down to the second. Further investigation of Apache logs showed a valid POST but the POST was duplicated. I presume the users browser malfunctioned or something but both POSTS arrived simultaneously resulting in two simultaneous transactions.
My question is how can I ensure that a transaction and its contained SELECT for the previous check_sum is Atomic & Isolated. Based upon my reading the answer lies in https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html and isolation levels.
If transaction A and transaction B arrive at the server at the same time then they should not run side by side but should wait for the first to complete.
You created a classic race condition. Both transactions are calculating the checksum while they're both in progress, not yet committed. Neither can read the other's data, since they're uncommitted. So they calculate that they're the only one with the same checksum, and they both go through and commit.
To solve this, you need to run such transactions serially, to be sure that there aren't other concurrent transactions submitting the same data.
You may have to use use GET_LOCK() before starting your transaction to calculate the checksum. Then RELEASE_LOCK() after you commit. That will make sure other concurrent requests wait for your data to be committed, so they will see it when they try to calculate their checksum.

View database entries created by factory_girl in a mysql database

Is it possible to view the database entries (for example with PhpMyAdmin) which were created by a factory? My tests are successful, so the Database entry should exist. But when i add sleep(60) to my test (after creating the entry), i can't find any database entries in my database.
In most setups for FactoryGirl, your database entries will be inserted in a transaction that is never committed. That means the records will never be visible outside that one test.
If you're using RSpec, you can set config.use_transactional_fixtures = false.
If you're using DatabaseCleaner, you can use DatabaseCleaner.strategy = :truncation.
After doing this, the transaction will be committed and records will be visible outside the test. This will likely make your tests a little slower.

Should I commit after a single select

I am working with MySQL 5.0 from python using the MySQLdb module.
Consider a simple function to load and return the contents of an entire database table:
def load_items(connection):
cursor = connection.cursor()
cursor.execute("SELECT * FROM MyTable")
return cursor.fetchall()
This query is intended to be a simple data load and not have any transactional behaviour beyond that single SELECT statement.
After this query is run, it may be some time before the same connection is used again to perform other tasks, though other connections can still be operating on the database in the mean time.
Should I be calling connection.commit() soon after the cursor.execute(...) call to ensure that the operation hasn't left an unfinished transaction on the connection?
There are thwo things you need to take into account:
the isolation level in effect
what kind of state you want to "see" in your transaction
The default isolation level in MySQL is REPEATABLE READ which means that if you run a SELECT twice inside a transaction you will see exactly the same data even if other transactions have committed changes.
Most of the time people expect to see committed changes when running the second select statement - which is the behaviour of the READ COMMITTED isolation level.
If you did not change the default level in MySQL and you do expect to see changes in the database if you run a SELECT twice in the same transaction - then you can't do it in the "same" transaction and you need to commit your first SELECT statement.
If you actually want to see a consistent state of the data in your transaction then you should not commit apparently.
then after several minutes, the first process carries out an operation which is transactional and attempts to commit. Would this commit fail?
That totally depends on your definition of "is transactional". Anything you do in a relational database "is transactional" (That's not entirely true for MySQL actually, but for the sake of argumentation you can assume this if you are only using InnoDB as your storage engine).
If that "first process" only selects data (i.e. a "read only transaction"), then of course the commit will work. If it tried to modify data that another transaction has already committed and you are running with REPEATABLE READ you probably get an error (after waiting until any locks have been released). I'm not 100% about MySQL's behaviour in that case.
You should really try this manually with two different sessions using your favorite SQL client to understand the behaviour. Do change your isolation level as well to see the effects of the different levels too.

Mysql InnoDB table locked but I can "select" from another session. What gives?

During my development of some code, I needed to 'write lock' an InnoDB table in order to avoid race conditions concurrency problems. 'read lock' is not good enough as some parallel session that will 'read' a locked table (locked by other session) will get false data as what it reads might evaporate (deleted) once the locking session finishes its job.
Thus far as to why I need 'write lock'. Comments are welcome on this but it will simply take long to explain why (to my humble mind) I cannot see any way other than complete terminal lock of the table.
Now, for my tests, I have opened two mysql command line sessions, both with regular user (no root or similar). In one session I did:
lock tables mytable write;
which resulted ok (uery OK, 0 rows affected...)
On the second command line session I connected to same DB and run a simple select * on the same table. To my surprise I got a full response.
In more tests from the actual web application I did notice that on some use cases that involve the web app (PHP + PDO with persistent connections attribute on) a command line or web mysql connectivity did block until the lock was released but I did not identified what exactly caused this (desired) effect, and it involves also different environment (PHP + PDO as detailed and command line vs. 2 command line sessions).
My question is: why? why wouldn't the second command line session, running a simple 'select' on the write-locked table blocked?
Does this has to do with the nature of InnoDB locks which is row-based? If so, how exactly does this relate?
How do I get such a simple lock implemented on an InnoDB table. I know I can create a 'semaphore' MyIsam table with no purpose other than act as a 'traffic light' but that will lose the effect of DB level protection and will move all the protection to be done (or wrongly done) in the app level.
TIA!
MySQL version is 5.1.54 (Ubuntu 11.04).
While InnoDB has row level locking, it also has multi-version concurrency control http://en.wikipedia.org/wiki/Multiversion_concurrency_control, so this means that readers don't need to be blocked by writers. They can just see the current version of the record. (Technical implementation, on update the row is modified in place and the previous edition will be written to undo space for older transactions.)
If you want to make the write lock block readers, you need to change the SELECT to be FOR UPDATE (i.e. SELECT * FROM my_table WHERE cola = n FOR UPDATE).

django/innodb -- problem with old sessions and transactions

We just switched our MySQL database from MyIsam to Innodb, and we are seeing an odd issue arise in Django. Whenever we make a database transaction, the existing sessions do not pick it up...ever. We can see the new record in the database from a mysql terminal, but the existing django sessions (ie a shell that was already open), would not register the change. For example:
Shell 1:
>>> my_obj = MyObj.objects.create(foo="bar")
>>> my_obj.pk
1
Shell 2 (was open before the above)
>>> my_obj = MyObj.objects.filter(pk=1)
[]
Shell 3 (MySQL):
mysql> select id from myapp_my_obj where id = 1;
id
1
Does anyone know why this might be happening?
EDIT: To clarify, Shell 2 was opened before Shell 1, then I make the create Shell 1, then I try to view the object that I created in Shell 2.
EDIT2: The big picture is that I have a celery task that is being passed the primary key from the object that is created. When I was using MyISAM, it found it every time, and now it throws ObjectDoesNotExist, even though I can see that the object is created in the database.
Your create() command commits the transaction for the current shell, but doesn't do anything to the transaction in the second shell.
https://docs.djangoproject.com/en/dev/topics/db/transactions/
Your second thread that can't see what's done in the first because it is in a transaction of its own. Transactions isolate the database so that when a transaction is committed, everything happens at a single point in time, including select statements. This is the A in ACID. Try running
from django.db import transaction; transaction.commit()
in the second shell. That should commit the current transaction and start a new one. You can also use transaction.rollback() to acheive the same thing if you haven't modified anything in the db in the current shell.
Edit Edit:
You may need to grab your specific db connection to make this work. Try this:
import django.db
django.db.connection._commit()
More information about this problem here:
http://groups.google.com/group/django-users/msg/55fa3724d2754013
The relevant bit is:
If you want script1.py (using an InnoDB table) to see committed updates from
other transactions you can change the transaction isolation level like so:
from django.db import connection
connection.cursor().execute('set transaction isolation level read
committed')
Alternatively you can enable the database's version of auto-commit, which
"commits" queries as well as updates, so that each new query by script1 will
be in its own transaction:
connection.cursor().execute('set autocommit=1')
Either one allows script1 to see script2's updates.
So, the tl;dr is that you need to set your InnoDB transaction isolation to READ-COMMITTED.