MySQL second connection does not see committed change from first connection - mysql

I have a problem with a Grails based application that is connected to MySQL where there is a process that updates a record as part of a larger transaction. This process also kicks off a 2nd thread via a Quartz job that will perform some additional changes. The Quartz job typically starts before the first thread commits the transaction therefore the job loops up to one minute checking for the record to change to the expected state. Oddly it works consistently in some environments, fails consistently in one and infrequently in yet another.
My question has to do with how MySQL recognizes transaction commits between two concurrent connections. One would expect that when connection A performs the commit, that subsequent queries from connection B would recognize the committed change. In my case connection B will have made the same query one or more times before connection A has made the commit. It appears that mySQL is caching the query results for the connection. Oddly enough, while connection B is repeatedly querying and getting the old value, I can issue the same query via the mysql client and see the new value. Does anyone aware of a caching issue or concurrency issues?
For the above observation I have the MySQL log enabled in order to see the individual update, commits and queries occurring.
The various environments are using different versions of MySQL as shown below. I'm in the process of upgrading my environments to the latest MySQL to see if that resolves it.
5.0.51a - two environments that have been very stable with infrequent occurrences however one environment started having increased occurrences over the weekend with moderate traffic.
5.1.55 - one environment consistently fails
Thanks,
John

Related

What could cause mysql db read to return stale data

I am chasing a problem on a mysql application. At some point my client INSERTs some data, using a query wrapped in a START TRANSACTION; .... COMMIT; statement. Right after that another client comes are read back the data, and it is not there (I am sure of the order of things).
I am running nodejs, express, mysql2, and use connection pooling, with multiple statements queries.
What is interesting is that I see weird things on mysqlworkbench. I just had a workbench instance which would not see the newly inserted data either. I opened a second one, it saw the new data. Minutes later, the first instance would still not see the new data. Hit 'Reconnect to DBMS', and now it sees it. The workbench behaviour, if applied to my node client, would explain the bad result I see in node / mysql2.
There is some sort of caching going on somewhere... no idea where to start :-( Any pointers? Thanks!
It sounds like your clients are living in their own snapshot of the database, which would be true if they have an open transaction using the REPEATABLE-READ isolation level. In other words, no data committed after that client started its transaction will be visible to that client.
One workaround is to force a new transaction to start. Just run COMMIT in the client session where it appears to be viewing stale data. That will resolve any open transaction and the next query will start a new transaction.
Another way you can test is to use a locking read query such as SELECT ... FOR UPDATE. This will read the most recently committed data, regardless of the client's transaction isolation level. That is, even if the client had started their transaction using REPEATABLE-READ, a locking read behaves as if they had started their transaction with READ-COMMITTED.

Is AWS Aurora Multi-Master cluster suitable for WordPress

I have a situation where during peak moments my writer database even on the largest 96 core AWS instance becomes maxed (due to limited edition promotions where we process hundreds of orders per second).
I have seen that Aurora offer a multi-master setup where all nodes of the cluster are able to write - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html
In the docs they mention:
If two DB instances attempt to modify the same data page at almost the same instant, a write conflict occurs. The earliest change request is approved using a quorum voting mechanism. That change is saved to permanent storage. The DB instance whose change isn't approved rolls back the entire transaction containing the attempted change. Rolling back the transaction ensures that data is kept in a consistent state, and applications always see a predictable view of the data. Your application can detect the deadlock condition and retry the entire transaction.
I am not really sure what they mean here by "data page". I am pretty sure WordPress doesn't use transactions at all but when thousands of orders are coming in and being pushed into the same table will this cause write errors that will cause orders to fail?
I have looked online and cannot find anyone talking about using WordPress with Aurora multi-master cluster. Is it compatible?

What's causing subsequent errors when restarting deadlocked transaction?

When restarting a failed transaction at commit stage I get a second failure when restarting the transaction. This is running Galera Cluster under MariaDB 10.2.6.
The sequence of events goes like this:
Commit a transaction (say a single insert).
COMMIT fails with error 1213 "Deadlock found when trying to get lock"
Begin a new transaction to replay the SQL statement[s].
BEGIN fails with error 1047 "WSREP has not yet prepared node for application use"
My application bails to avoid a more serious crash (see notes below)
This happens quite regularly and although the cluster recovers, individual threads receive failures. Yesterday this happened 15 times in one second.
I cannot identify any root cause for this. It seems that the deadlock is the initiator of the problem. The situation should be recoverable (and often is) But with multiple clients all trying to resolve their deadlocks at the same time, the whole thing seems to just fail.
Notes:
This is related to an earlier question where retrying failed transactions caused total crash of the cluster. I've managed to prevent crashes by retrying transactions only on deadlocks. i.e. if a different type of error occurs during a restart the application gives up.
I'm aware that 10.2.6 is not the latest version of MariaDB. I'm nervous to upgrade right now as I've had such bad experiences. I would like to understand the current problem before doing an upgrade and I've been unable to reproduce the errors in a test environment.
I'm not sure, but I suspect 3 tries (not 2) is appropriate. Committing involves two steps:
Checking for a Deadlock purely within the node you are connected to. (Eg: another query is touching the same row or gap.)
Checking with the other nodes to see if they will complain. (Eg: The same row has already been inserted into another node.)
Sure, either of those could happen repeatedly, and in any order. But making 3 tries seems reasonable.
Now, once you have failed "too many" times, it is right to abort and get a human (a DBA type) involved. I suspect that you could restructure your code / application logic / etc in some way to avoid most of the failures. Would you like to provide more details, so we can discuss that possibility...
What kind of table? (Queue, transactions, logging, etc)
SHOW CREATE TABLE. (auto_inc, unique keys, etc; too many UNIQUE keys can aggravate the situation)
What does the INSERT look like?
How often do you run inserts like this one? How often does it fail? (Instrument your code so you count even those that you can recover from.)
How spread out is the Cluster? (ping time)
What other queries are hitting the table? (They may be aggravating the issue.)

MySQL Interface with LabView - LabView is Freezing

I have a LabView data acquisition system that is writing data to a MySQL Database. It is writing data every second. The LabView system recently froze around the time when I was playing with the SQL queries.
I have a client computer, which is supposed to send queries to that MySQL Database every hour. This client computer set up a cron job to send the command to query the database every hour.
I recently added an index to my time_stamp, in order to optimize my query.
This may be a shot in the dark, but could there be any deleterious interaction between the fact that I had created an index on our time_stamp (to optimize the query), and set up a cron job to send the query every hour? Around that time, I think I may have also sent a query and aborted quickly before it completed, so I was wondering if something like that may cause the LabView system to freeze?
It doesn't appear to be an issue on the MySQL side, because the server was still running.
Check if you currently have deadlocks at your MySQL server, the query from your LabVIEW application might be suspended and is waiting for completion forever. Hint, look to the query execution time. Why it is happened should be investigated separately, but there is a good chance that if you just kill that suspended query the system will unfreeze and keeps running normally.

Writing into multiple MySQL databases async

I am using AWS RDS so database replication between regions are impossible.
My application written in PHP and deployed on all regions, i am looking for a fast and reliable way to achieve that.
I am going to make MySQL connections :
SET ##auto_increment_increment= NUMBER_OF_WRITEABLE_DATABASES;
SET ##auto_increment_offset = REGION_ID ;
so AI pk's will be unique all over regions.
And my current plan is keeping a query log table with fields => id,queries,status,user_id. It will log all insert,update,delete queries into queries field in same page load.
Status Codes:
Status 0 => not executed
Status 1 => successfully executed on all regions
Status 2 => failed
Status 3 => failed with affected rows not match
Example Row:
id=>1
queries=>
INSERT INTO PROFILES VALUES (1,{USER_ID},'Username','Email')##SEPERATOR##AFFECTED_COUNT
UPDATE USERS SET last_modified='2012-12...' where id={USER_ID}##SEPERATOR##AFFECTED_COUNT
status=0
user_id=>{USER_ID}
and there will be a daemon which reads records which status != 1 and will process them on all regions without commit , once all run without error it will commit or roll back in case of error.
That is what i thought and going to use.
My question is there any more decent/tested approach to that scenario or is there any problem about my approach.
thanks in advance
My initial thought is that you are going down the wrong path if you are trying to use RDS as a solution to enforce unique record ID's across multiple regions. I would think you might want to rethink your actual need for uniqueness across regions or enforce uniqueness using multiple columns (i.e. an autoincrement plus a region identifier). That could be read and put into some eventually consistent data store for read purposes.
You're making a commendable effort, but as the other commenters have stated, your solution isn't viable, for a number of reasons.
You don't really want to use auto_increment_offset and auto_increment_increment at the session level. You want to set those at the server level. If RDS won't let you do that, this is another reason why RDS is probably not the best solution.
If I came out and suggested that you deploy a global network of MySQL servers (EC2, not RDS) in a multi-master ring, where data replicates 1 => 2 => 3 => 4 => 1 and each server ignores incoming replication messages with its own server id, my fellow MySQL DBAs would accuse me of having lost my mind and setting you up for a difficult-to-manage situation; however, I am convinced that this would be a much easier solution than what you have proposed, because at least, then, the data would be changing around the world in pretty much the same order in which it actually changed -- which would reduce the likelihood of conflicting updates originating from multiple locations. MySQL replication is asynchronous, in the sense that server 1 does not wait for a transaction to be committed on server 2 before returning success to the client (indicating that the transaction has committed), but don't confuse that fact with the fact that it is sequential -- transactions are replicated on each server in the order in which they were committed. (New options in MySQL 5.6 allow some exceptions to this by with parallel replication threads, but that isn't significant to this discussion).
Since you have devised a scheme for avoiding conflicting auto-increment values, your bigger problems are likely to come from updates and deletes. In the scenario I just described, if server 2 deleted a record and server 4 deleted the same record at the same time, then server 4 would stop replicating incoming events when it received the delete from server 2, because the "rows affected" would have been different. Your scenario would similarly fail. The difference is that using actual MySQL replication, nothing happening after the conflicting event happened, so until you resolved that conflict, at least your data would not diverge any further into inconsistency because of the sequential nature discussed above and the fact that MySQL replication completely stops whenever a conflict is encountered. In a ring of master servers, the server that has stopped replicating continues collecting a log of replication events from the upstream systems, but execution halts and the data on that server is frozen unless changed locally until the conflict is resolved and replication restarted.
Note also that in your scenario, you need to preserve "from" and "to" values for each column on updates, because you can't roll anything back unless you know that it rolls back to.
That being noted, a rollback needs to occur in real-time, not later. If I transfer money between two bank accounts, and for some reason that transfer needs to roll back, I need to see that while I'm using the bank's web site -- the bank can't roll that transaction back in the middle of the night just because one of their servers has a different balance in my bank account.
Here's a thought: In your scenario, it the account I was transferring "to" was consistent among all the servers, but the account I was transferring "from" was not, then I wonder... would your setup roll back the withdrawal from the "from" account, but leave the deposit in the "to" account? I think it might.
Keep in mind that you are limited by the CAP theorem. No system can be globally consistent, available, and tolerate isolation among the nodes. At best, you can pick any two.
With that thought, the question I have is this: why do all of the nodes in your global system need to be synchronized? If the main reason is performance, consider the possibility of deploying a single global master server, with read replicas distributed among the regions. Write your application with two pools of database connection threads so that most SELECT queries go to the local read replica, while INSERT, DELETE, UPDATE, and CALL (stored procedures that update data), are sent to the global master server. Your biggest worry, then, becomes the fact that you only have eventual consistency on the read replicas. With properly-sized servers and well-written queries, this is very fast (subject to the laws of physics for global travel of optical and electrical signals) but it is not instantaneous. What you have to do to accomplish this is for sessions that have recently made changes to the database, their reads may need to hit the global master -- if you place an order, you need to see the order immediately, so the master might be the best place to look, right away. Later, looking at the local replica will work. You're still out of scope for RDS with this, because of the cross-regional issue... but MySQL on EC2 is a good fit.
Read replicas impose a very small load on the master, but even this load can be mitigated by connecting a single read replica to the master and then connecting the downstream read replicas to that intermediate server.
Setting slave_compressed_protocol = 1 on the masters and the replicas will enable the machines to use compressed connections for transferring the replication events. I have found this to be anywhere from 3:1 to 10:1 depending on the nature of the data being replicated and the delay of compressing and decompressing the data seems insignificant.
Additionally, you could set up a second master, adjacent to the primary master (perhaps in a different A/Z), link those two servers with master-master replciation, chain the read replicas to the 2nd master, use auto increment increment and offsets appropriately, but do not write to or read from to the second master under normal conditions. Why would you do this? This way, you have a 2nd global master that could be placed into service immediately in case of failure of the primary master by redirecting your application to access it.
Of course, the nature of your application plays a large factor in how much global integration is actually required. Solving this problem will require you to rethink how the application works, to determine whether architectural changes are needed.
As a DBA, I don't like some of the restrictions and flexibility constraints that RDS imposes on me. All I really get in return for the loss-of-control is a relative ease of backups and point-in-time restoration... which I like... but, to me, these don't make up for the restrictions.
Footnote: In the 3rd paragraph, I said "transactions are replicated on each server in the order in which they were committed." But that doesn't necessarily mean in the real-world wall-clock actual-order in which they were committed... it actually means the order in which they were committed to each server relative to the other transactions being committed by that server... so a transaction on Server #1 that actually committed before a different transaction on Server #3 might arrive at server #4 after the transaction from #3 instead of before it, depending on how long the transaction took to propagate through server #2 and be committed on server #3. However, this is still "true enough" in principle, because if the transaction on #1 is perceived at server #3 as conflicting with whatever happened on #3, it will not actually replicate to #4 because #3 will stop replicating.