Quartz JobStoreTX instances disappear on cluster recovery - mysql

I have configured two Java WARs with quartz schedulers (version 2.2.1) starting with XMLSchedulingProcessorPlugin. Both web applications are also running in a cluster mode (they are deployed in two identical machines), so I enabled the properties for quartz:
#===========================================================================
# Clustering
#===========================================================================
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval = 60000
Both applications are running in JBOSS AS 7.1 configured with Quartz´s JobStoreTX. They save their jobs, triggers and so on into a MySQL database, which currently is configured with Galera DB (1 virtual IP address, 2 real nodes).
Currently, I am testing the failure of one of the real nodes so the jobs keep up firing even when a power outage occurs. In that case I noticed some failures, such as the one described in this Terracotta issue (the patch is not applied in the current version of Quartz).
In my case, I should have 4 Quartz instances in the QRTZ_SCHEDULER_STATE table...even if one of the MySQL nodes restart. The fact is that sometimes one or two instances are deleted from the table (maybe the ones which do not have any active job) so I am afraid it is possible to lose both instances of an application during the cluster recovery.
Has anyone experienced the same? Any other solution than restarting the JBOSS in order to reload Jobs and Triggers?
Thanks in advance.

Related

Current status of support for parallell migration on a MariaDB Galera cluster

The Flyway docs state that
Flyway uses the locking technology of your database to coordinate multiple nodes. This ensures that even if multiple instances of your application attempt to migrate the database at the same time, it still works. Cluster configurations are fully supported.
When migrating on a MariaDB Galera cluster on Flyway versions > 6 (and maybe earlier, haven't checked), MySQLConnection or MySQLNamedLockTemplate is used to coordinate locks between multiple nodes migrating simultaneously. MySQLConnection/MySQLNamedLockTemplate implement locking through get_lock()
However, get_lock() is not supported in Galera:
Unsupported explicit locking include [...] GET_LOCK(), [...]
And indeed, when migrating in parallel on 2 nodes on our Galera cluster we frequently see errors because both nodes try to migrate at the same time (even in grouped mode).
It looks like this was a known issue in 2018. What is the current status of the issue? Is there a plan for supporting parallel migration on Galera clusters in the Flyway project, or is there some external project that implements this?

what happens when Amazon is backuping RDS instance?

I'm using RDS(MySQL) with one of my Laravel project. but one question is floating in my mind that what happens to the project when amazon is creating a backup of the rds instance. Is it:
Freeze the project
The project throws an exception
working Normal
For a single instance RDS the database I/O may be suspended for a few seconds while the snapshot is created. During this period all requests to the database will be paused, but they will be resumed after the snapshot is created.
So if you have a webapp, requests received during the I/O suspension period will be served slower then usually.
You can mitigate this with a multi-AZ RDS deployment, because in case of multi-AZ, the snapshot is taken from the standby instance. So there is no I/O suspension on the master instance.
Relevant documentation: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html#USER_WorkingWithAutomatedBackups.BackupWindow
Your application will continue to work normally during backups. Since AWS RDS uses volume snapshots the MySQL service is running without any interruption. This is how manual snapshots or point-in-time recovery works as well.

Google Cloud MySQL 2nd Generation vs Compute Instance with MySQL

The new Google Cloud MySQL 2nd Generation spins up its own VM instance to run the MySQL server. Please see the following picture:
What is the difference between using the 2nd Generation instance, or using my own Compute VM instance with a manually installed version of MySQL on it? Are there any advantages when it comes to high availability, security, or performance?
Adding to the answer that Terry posted, and answering your question in the comment:
You can create a highly available Cloud SQL Second Generation by doing the following:
Set up your master instance correctly including sizing it appropriately and setting up binary logging. The master instance must have one backup after binary logging has been enabled. You should place your master instance in a zone that's close to your other services. See preparing the master instance.
Create one failover replica in a different zone than the master. See creating a failover replica.
Optionally, create one or more read replicas. Note that a master instance with a failover replica is sufficient for creating a highly available configuration.
Optionally, test failover. Keep in mind that testing the failover moves the master to a new zone.
To answer your question "So what happens if the VM instance they create fails?"
A master instance falls out of high availability mode when the failover replica becomes unavailable. This can happen, for example, if the network connection between the master instance and failover replica is interrupted, or if the failover replica is down due to its own zone failure. During this time, the master instance is not in high availability mode, and you will not be able to failover to the replica because it is not safe to do so. The failover replica resumes replication on reconnection, and high availability mode is re-enabled when the failover replica finishes catching up.
The major difference is that Cloud SQL v2 does not have to be managed. Google Cloud handles management, replication, and snapshots. Additionally Cloud SQL v2 using Cloud SQL Proxy works with App Engine standard and flexible runtimes to allow for flexible, but secure connections to SQL from other clients.
In return you do not have any access to any of the underlying system.

Rails app generating MySQL load though now database access triggered

I have a Rails 4.x application running on server A and MySQL on server B.
Using ab to do a load test of my API calls I notice that the MySQL server is showing CPU activity. So I go back to the code and check, but no SQL statements are triggered, to be sure I also deactivate all before filters, but still the MySQL server shows CPU load.
I went to MySQL and run
show processlist;
but that also shows no active SQL statements
Why would there be load on the DB server?
A Rails application initializes connection pools to the configured database on app load and also loads basic schema data for each ActiveModel defined to populate runtime mappings from the DB to instances of that model.
These connections/queries will happen as soon as you have loaded the application and running traffic.
If this is not what is responsible for the activity on your database server, you will need to use other tools to see what is responsible. For example, NewRelic's system monitoring tools are great for snapshotting CPU/memory usage over time correlated to what processes were running. This will help you rule out MySQL itself using resources vs. other things running on the DB server.
According to this article, storage engines like Innodb may have their own per thread and/or global memory allocations which is probably accounting for the CPU overhead. If this is a stock (non-tuned) MySql install, you're probably just seeing baseline CPU activity. The article mentions a number of places to look that might indicate areas that can be tuned to reduce this footprint.

How to delay ActiveRecord MySQL reconnect during a failover

We have a Rails 3.1.3 app, connecting to MySQL via the mysql2 gem. Standard config. We also have a handful of Resque workers performing background jobs. The DB hostname we point to (in database.yml) is actually a Virtual IP (VIP) that points to either node1 or node2.
Behind the scenes, the two MySQL servers (nodes) are setup in a High Availability configuration. The data folders are replicated via DRBD, with mysqld only running on the "active" node. When the cluster detects that node1 is not available, it starts mysqld on node2 and points the VIP to it.
If you want more details on the specific setup, it's very similar to this MySQL HA cookbook.
Here's the issue: When a failover happens, it takes approx 30-60 seconds to complete, during which there is no MySQL server available. Any Resque jobs that are currently running fail badly.
Here's the question(s): How can we tell ActiveRecord to reconnect after a delay? Maybe attempt several reconnects with a backoff timer? Or is there a better way of dealing with this?
Your HA setup is going to cause you infinite amounts of pain in the future. Use database-layer replication instead of block-device-layer replication; MySQL Proxy was designed to do this.