aws DMS replicate-changes-only error - mysql

I have prod aws Aurora DB and I want
to replicate changes to test mysql DB (schema is same - Aurora is based on mysql)
I am using aws DMS for this.
When performing full replication for certain tables the replication works fine,
When I want to perform replicate-changes-only, the replication fails.
I've set binlog_checksum=NONE and binlog_format=ROW in the parameter group.
The error I am receiving while running is:
Last Error The task stopped abnormally Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE
Last Error Task 'task-id' was suspended due to 6 successive unexpected failures Stop Reason FATAL_ERROR Error Level FATAL
Loading a snapshot to the test db isn't an option.
I just want to replicate the changes between specific tables.
Thanks in advance.

I am having the same error, it was always stopping 10min after starting. Adding more verbose logs didn't show more information but by changing the task configuration, especially the parameter MaxFullLoadSubTasks.
By default the value is "MaxFullLoadSubTasks": 8,, I changed it to "MaxFullLoadSubTasks": 1,.
It is slower but it's working for now. You may be able to increase it a bit to be quicker without having the same error.
You can modify the task configuration by first copying the task json settings you will find under DMS > TASK > overview, then changing the value and saving it to a file and then:
aws dms modify-replication-task --replication-task-arn <TASK_ARN_ID> --replication-task-settings file:///path/to/your/task_config.json

Related

Communication link failure: 1047 WSREP has not yet prepared node for application use in

We had a three-node cluster with MariaDB 10.4. We had an outage and the servers all rebooted with one having an irrecoverable network issue at the time.
We set up another server and added it to the cluster as a third member later.
However, ever since that, we have constantly been getting this error every now and then.
*3287799 FastCGI sent in stderr: "PHP message: An Error occurred while handling another error:
PDOException: SQLSTATE[08S01]: Communication link failure: 1047 WSREP has not yet prepared node for application use in /var/....yii2/db/Command.php:1293
In order to fix this issue, we turned down all three nodes one by one and then re-initialized the cluster, even with a new cluster name and all.
The first one was started with "galera_new_cluster" and the remaining two were added to this cluster. However, we still kept getting the same error intermittently.
The workaround at mariadb galera - Error when a node shutdown ERROR 1047 WSREP has not yet prepared node for application use was followed but that didn't do anything, as expected.
Next, what we did is set up a single fresh server and installed the new 10.5.X MariaDB server on it. Took backup from the old cluster using mariabackup and restored it onto this new single server.
This single server was set up as a new cluster with fresh details and everything. We wanted to run it as a single node cluster to make sure if the error still persisted. Oddly enough, the error is still there and it comes off every half an hour or so.
Has anyone got any clue what could be the reason for this weird issue we're facing? Currently, we don't know what exactly is the issue which is why we're facing a hard time solving it.
Any help would be greatly appreciated.
Update:
We turned off galera on this single-node cluster and ran it as a simple stand-alone mariadb server. However we still go the same errors in our web-server's logs. This is bonkers.
Any idea? Anyone?

AWSCLI: Can't specify db parameter group when creating mysql read replica

Using awscli, I'm trying to create a cross-region read replica, in us-west-1, of a mysql RDS in us-east-1. The db must have the lower_case_table_names parameter set to 1(default is 0). I have created a custom db parameter group with this setting. When I call "aws rds create-db-instance-read-replica" and specify my custom parameter group with "--db-parameter-group-name", the command fails with the following error:
An error occurred (InvalidParameterCombination) when calling the CreateDBInstanceReadReplica operation: A parameter group can't be specified during Read Replica creation for the following DB engine: mysql
AWS documentation makes no mention of this limitation(that I can find). Obviously, in this case, changing the parameter group after the replicant is created is not an option. Has anyone else encountered this, and is there a work-around?
Edit: Wound up just letting the replica come up with default parameters. Even though that caused the replication to fail and left status at "error", once the replica was available I switched it to my custom parameter group. Then I rebooted it, and it came right up and replicated without issue. May not work in every case, but seems to have worked in mine.

Azure database for MySQL DB 5.7 Transient handling in .net core

I am creating .net core 2.1 MVC application and using Azure database for MySQL DB 5.7.
I have read below links but seems they are applicable for MS SQL DB.
https://learn.microsoft.com/en-us/azure/mysql/concepts-high-availability
https://learn.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific
Transient handling for MySQL not possible? Help me link to MYSQL related similar pages.
A transient error, also known as a transient fault, is an error that will resolve itself. Most typically these errors manifest as a connection to the database server being dropped. Also new connections to a server can't be opened. Transient errors can occur for example when hardware or network failure happens.
Transient errors should be handled using retry logic. Situations that must be considered:
An error occurs when you try to open a connection
An idle connection is dropped on the server side. When you try to issue a command it can't be executed
An active connection that currently is executing a command is dropped.
The first and second case are fairly straight forward to handle. Try to open the connection again. When you succeed, the transient error has been mitigated by the system. You can use your Azure Database for MySQL again. We recommend having waits before retrying the connection. Back off if the initial retries fail. This way the system can use all resources available to overcome the error situation. A good pattern to follow is:
Wait for 5 seconds before your first retry.
For each following retry, the increase the wait exponentially, up to 60 seconds.
Set a max number of retries at which point your application considers the operation failed.
Read more here.
And you can read more on how to troubleshoot connection issues to Troubleshoot connection issues to Azure Database for MySQL here.

Error creating AWS MySQL RDS instance with Terraform

When creating an Amazon AWS RDS MySQL 5.7 db instance using Terraform "terraform-aws-modules/rds/aws" module, I started getting a strange error after > 1 hour. In other contexts in the past the same script worked (even more involved versions creating cross-region read replica in 2 other regions (3 in total).
When I tried to deploy to a different VPC recently I started getting an error after spending ~1 hour on the db options group resource (so not even reaching db deploy).
The error message is:
aws_db_option_group.this: Error creating DB Option Group: InternalFailure: An internal error has occurred. Please try your query again at a later time.
status code: 500 root.rds-virginia.db.db_option_group: eval: *terraform.EvalSequence
How to resolve or work around this?
Creating a dummy db option group (even though we don't need it for this use case) seems to work around this issue:
resource "aws_db_option_group" "some-option-group" {
name = "dummy-mysql-option-group"
option_group_description = "Dummy Mysql option group"
engine_name = "mysql"
major_engine_version = "5.7"
}
Terraform db option group docs: https://www.terraform.io/docs/providers/aws/r/db_option_group.html

Google MySQL fails with ERROR 2013 HY000 system error 2

I have a D8 google mysql instance. I'm running an etl process trying to push about 100GB of data in but the script keep stopping because of the error.
In order to get it working I have to restart the mysql instance and then re run the process from where it failed. Any help is greatly appreciated. I haven't found anything on google.
This error tends to mean the connection to the MySQL server was lost at some point. As mentioned in this 'Lost connection to MySQL server' article, this error may be encountered when writing a significant number or rows.
The article suggests setting the net_read_timeout to something greater than the default value of 30 seconds. You may also want to consider breaking up your write tasks as well.