pt-online-schema-change breaks AWS DMS Replication - mysql

Currently using AWS DMS to replicate data from our Aurora MySQL database to S3. This results in a low-latency data lake we can use to get lineage of all changes occurring and build additional data pipelines off of. However, when making a change via pt-online-schema-change script the modified table stops replicating at all. Is there any reason why this would happen?
After running the change the logs show that the schemas for the source table no longer match what DMS is expecting, and the CDC changes are skipped. The only possible reason for this is DMS is not properly tracking the DML statements.
Table alter triggered with percona (in this case, add column)
New table synced by AWS DMS
Trigger adds throw warnings in AWS DMS as not supported
Table is renamed
Table column count does not match, ignoring extra columns.
Table column size mismatch, skipping.
Notably, all the DML statements being used by Percona (outside triggers) are supported by AWS DMS and S3 as a target. Does anyone else have any experience with this situation or combination of tools?
Edit:
Here's an example of the command used to make these changes with Percona:
pt-online-schema-change --host=<host> \
--user=<user> \
--ask-pass \
--execute \
--no-drop-old-table \
--no-check-alter \
--alter="ADD COLUMN db_row_update_stamp TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3)" \
D=<db>,t=<REPLACE_TABLE_NAME_HERE>

So looking at the DETAILED_DEBUG logs for this task, I was testing a RENAME scenario in AWS DMS manually.
This resulted in the following.
2021-02-27T00:38:43:255381 [SOURCE_CAPTURE ]T: Event timestamp '2021-02-27 00:38:38' (1614386318), pos 54593835 (mysql_endpoint_capture.c:3293)
2021-02-27T00:38:43:255388 [SOURCE_CAPTURE ]D: > QUERY_EVENT (mysql_endpoint_capture.c:3306)
2021-02-27T00:38:43:255394 [SOURCE_CAPTURE ]T: Default DB = 'my_db' (mysql_endpoint_capture.c:1713)
2021-02-27T00:38:43:255399 [SOURCE_CAPTURE ]T: SQL statement = 'RENAME TABLE test_table TO _test_table_old' (mysql_endpoint_capture.c:1720)
2021-02-27T00:38:43:255409 [SOURCE_CAPTURE ]T: DDL DB = '', table = '', verb = 0 (mysql_endpoint_capture.c:1734)
2021-02-27T00:38:43:255414 [SOURCE_CAPTURE ]T: >>> Unsupported or commented out DDL: 'RENAME TABLE test_table TO _test_table_old' (mysql_endpoint_capture.c:1742)
It seems that this version of DMS does not properly read RENAME statements despite the documentation claiming support for RENAME's.
Am looking into opening a bug on AWS's side. This impacted AWS DMS server version 3.4.3.
Will be testing against previous versions, will post an update if I find a specific version has this fixed until it is resolved in a newer version. Can't 100% claim it's a bug in DMS, but taking Percona out of the picture I was able to replicate the problem.

The single option here how to fix broken replication is to "Reload table data".
In AWS Console DMS select your migration task, go to "Table statistics" tab and
select your table, which have been altered (RENAME-ed) under the hood by Percona tool.
In a nutshell "Reload table data" action refreshes replication instance, cleans up caches and creates new snapshot for your data in S3.
Creating new snapshot in S3 will cause your replication being out-of-sync for some time. The period of time for recovering replication will linearly depends of volume of your data in table and performance of chosen replication instance.
Unfortunately RENAME TABLE statement isn't supported by DMS and assuming, that no one else could support it properly, as it breaks data checksums (or checkpoints in AWS).

Related

How to setup heterogeneous replication with tungsten?

Recently i am working on a replication between heterogeneous dbs with Tungsten Replicator. We have a mysql master and an oracle slave. According to the docs such a setup should work. I am using tungsten-replicator-2.0.5. I call
$TUNGSTEN_HOME/tools/configure \
--verbose \
--home-directory=$INSTALL_HOME \
--cluster-hosts=$MA_HOST,$SL_HOST \
on the master node for creating a basic installation on both nodes. Note: using the installer (as recommended) fails, due to the heterogeneous setup, since the installer fails to find a mysql instance on the slave node. The replicator instances are configured by adding static-$SERVICENAME.properties to the conf directory and modifying conf/services.properties (replicator.host=$HOSTNAME, replicator.masterListenPortStart=12112, replicator.rmi_port=20000).
Launching the replicators resulted in an ORA-01850 when issuing an update statement against trep_commit_seqno in the tungsten schema, due to a missing 'timestamp' keyword in the SQL-Statement. Just in order to get beyond this error, i altered datatype of update_timestamp and extract_timestamp to varchar. The replicators are now starting up und some simple inserts where replicated but when the test script issues a
DROP TABLE IF EXISTS table1;
replication fails due to an ORA-00933, because of the 'IF EXISTS' clause. I am not sure if this is an error in my configuration or if tungsten in general has problems with the differences in DDL statements between those two products. Has somebody successfully set up a similar replication using tungsten?
The Tungsten docuemntation has some useful guidance. In particular, this point from the "Advanced Principles of Operation" is relevant: "Also, DDL statements beyond the simplest CREATE TABLE expressions are rarely at all portable. ". In your case, DROP TABLE IF EXISTS table1; is not valid Oracle DDL.
Read it here.
For anybody who is interested: Up to now, Tungsten does not perform any transformation of ddl statements in a heterogeneous environment (as MithunSasidharan wrote). Now i wrote a custom filter, that skips ddl statements using regular expressions. For synchronizing the schema defition, we will use Apache DdlUtils, which serves quite well for transforming a schema definition between mysql and oracle. I assume it works for other vendors similarly well. Thanks.

MySQL backup multi-client DB for single client

I am facing a problem for a task I have to do at work.
I have a MySQL database which holds the information of several clients of my company and I have to create a backup/restore procedure to backup and restore such information for any single client. To clarify, if my client A is losing his data, I have to be able to recover such data being sure I am not modifying the data of client B, C, ...
I am not a DB administrator, so I don't know if I can do this using standard mysql tools (such as mysqldump) or any other backup tools (such as Percona Xtrabackup).
To backup, my research (and my intuition) led my to this possibile solution:
create the restore insert statement using the insert-select syntax (http://dev.mysql.com/doc/refman/5.1/en/insert-select.html);
save this inserts into a sql file, either in proper order or allowing this script to temporarily disable the foreign key checks to meet foreign keys' constraint;
of course, I do this for all my clients on a daily base, using a file for each client (and day).
Then, in the case I have to restore the data for a specific client:
I delete all his data left;
I restore the correct data using his sql file I created during the backup.
This way I believe I may recover the right data of client A without touching the data of client B. Is my solution eventually working? Is there any better way to achieve the same result? Or do you need more information about my problem?
Please, forgive me if this question is not well-formed, but I am new here and this is my first question so I may be unprecise...thanks anyway for the help.
Note: we will also backup the entire database with mysqldump.
You can use the --where parameter, you could provide a condition like *client_id=N* . Of course I am making an assumption since you don't provide any information on your schema.
If you have a Star schema , then you could probably write a small script that backups all lookup tables (considering they are adequately small) by using this parameter --tables and use the --where condition for your client data table. For additional performance, perhaps you could partition the table by the client_id.

SQL Server to MySQL data transfer

I am trying to transfer bulk data on a constant and continuous based from a SQL Server database to a MYSQL database. I wanted to use SQL Server's SSMS's replication but this apparently is only for SQL Server to Oracle or IBM DB2 connection. Currently we are using SSIS to transform data and push it to a temporary location at the MYSQL database where it is copied over. I would like the fastest way to transfer data and am complication several methods.
I have a new way I plan on transforming the data which I am sure will solve most time issues but I want to make sure we do not run into time problems in the future. I have set up a linked server that uses a MYSQL ODBC driver to talk between SQL Server and MYSQL. This seems VERY slow. I have some code that also uses Microsoft's ODBC driver but is used so little that I cannot gauge the performance. Does anyone know of lightening fast ways to communicate between these two databases? I have been researching MYSQL's data providers that seem to communicate with a OleDB layer. Im not too sure what to believe and which way to steer towards, any ideas?
I used the jdbc-odbc bridge in Java to do just this in the past, but performance through ODBC is not great. I would suggest looking at something like http://jtds.sourceforge.net/ which is a pure Java driver that you can drop into a simple Groovy script like the following:
import groovy.sql.Sql
sql = Sql.newInstance( 'jdbc:jtds:sqlserver://serverName/dbName-CLASS;domain=domainName',
'username', 'password', 'net.sourceforge.jtds.jdbc.Driver' )
sql.eachRow( 'select * from tableName' ) {
println "$it.id -- ${it.firstName} --"
// probably write to mysql connection here or write to file, compress, transfer, load
}
The following performance numbers give you a feel for how it might perform:
http://jtds.sourceforge.net/benchTest.html
You may find some performance advantages to dumping data to a mysql dumpfile format and using mysql loaddata instead of writing row by row. MySQL has some significant performance improvements for large data sets if you load infile's and doing things like atomic table swaps.
We use something like this to quickly load large datafiles into mysql from one system to another e.g. This is the fastest mechanism to load data into mysql. But real time row by row might be a simple loop to do in groovy + some table to keep track of what row had been moved.
mysql> select * from table into outfile 'tablename.dat';
shell> myisamchk --keys-used=0 -rq '/data/mysql/schema_name/tablename'
mysql> load data infile 'tablename.dat' into table tablename;
shell> myisamchk -rq /data/mysql/schema_name/tablename
mysql> flush tables;
mysql> exit;
shell> rm tablename.dat
The best way I have found to transfer SQL data (if you have the space) is a SQL dump in one language and then to use a converting software tool (or perl script, both are prevalent) to convert the SQL dump from MSSQL to MySQL. See my answer to this question about what converter you may be interested in :) .
We've used the ado.net driver for mysql in ssis with quite a bit of success. Basically, install the driver on the machine with integration services installed, restart bids, and it should show up in the driver list when you create an ado.net connection manager.
As for replication, what exactly are you trying to accomplish?
If you are monitoring changes, treat it as a type 1 slowly changing dimension (data warehouse terminology, but same principal applies). Insert new records, update changed records.
If you are only interested in new records and have no plans to update previously loaded data, try an incremental load strategy. Insert records where source.id > max(destination.id).
After you've tested the package, schedule a job in the sql server agent to run the package every x minutes.
Cou can also try the following.
http://kofler.info/english/mssql2mysql/
I tried this a longer time before and it worked for me. But I woudn't recommend it to you.
What is the real problem, what you try to do?
Don´t you get a MSSQL DB Connection, for example from Linux?

MySQL replication changes not being sent

I've setup mysql replication only for a specific database on the master.
If I connect to the master and don't specify a database (e.g. in the connection string or with the 'use database' command) the statement is not sent to the slave. Is this a bug?? Why does this happen?
Example 1
with no db specified up till now: won't replicate
insert into exampledb.mytable values(1,2,3);
Example 2
replicates
use exampeldb;
insert into mytable values(1,2,3);
Not a bug. This behavior is defined in the MySql docs:
The main reason for this “check just
the default database” behavior is that
it is difficult from the statement
alone to know whether it should be
replicated (for example, if you are
using multiple-table DELETE or
multiple-table UPDATE statements that
go across multiple databases). It is
also faster to check only the default
database rather than all databases if
there is no need.

question about MySQL database migration

If I have a MySQL database with several tables on a live server, now I would like to migrate this database to another server. Of course, the migration I mean here involves some database tables, for example: add some new columns to several tables, add some new tables etc..
Now, the only method I can think of is to use some php/python(two scripts I know) script, connect two databases, dump the data from the old database, and then write into the new database. However, this method is not efficient at all. For example: in old database, table A has 28 columns; in new database, table A has 29 columns, but the extra column will have default value 0 for all the old rows. My script still needs to dump the data row by row and insert each row into the new database.
Using MySQLDump etc.. won't work. Here is the detail. For example: I have FOUR old databases, I can name them as 'DB_a', 'DB_b', 'DB_c', 'DB_d'. Now the old table A has 28 columns, I want to add each row in table A into the new database with a new column ID 'DB_x' (x to indicate which database it comes from). If I can't differentiate the database ID by the row's content, the only way I can identify them is going through some user input parameters.
Is there any tools or a better method than writing a script yourself? Here, I dont need to worry about multithread writing problems etc.., I mean the old database will be down (not open to public usage etc.., only for upgrade ) for a while.
Thanks!!
I don't entirely understand your situation with the columns (wouldn't it be more sensible to add any new columns after migration?), but one of the arguably fastest methods to copy a database across servers is mysqlhotcopy. It can copy myISAM only and has a number of other requirements, but it's awfully fast because it skips the create dump / import dump step completely.
Generally when you migrate a database to new servers, you don't apply a bunch of schema changes at the same time, for the reasons that you're running into right now.
MySQL has a dump tool called mysqldump that can be used to easily take a snapshot/backup of a database. The snapshot can then be copied to a new server and installed.
You should figure out all the changes that have been done to your "new" database, and write out a script of all the SQL commands needed to "upgrade" the old database to the new version that you're using (e.g. ALTER TABLE a ADD COLUMN x, etc). After you're sure it's working, take a dump of the old one, copy it over, install it, and then apply your change script.
Use mysqldump to dump the data, then echo output.txt > msyql. Now the old data is on the new server. Manipulate as necessary.
Sure there are tools that can help you achieving what you're trying to do. Mysqldump is a premier example of such tools. Just take a glance here:
http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html
What you could do is:
1) You make a dump of the current db, using mysqldump (with the --no-data option) to fetch the schema only
2) You alter the schema you have dumped, adding new columns
3) You create your new schema (mysql < dump.sql - just google for mysql backup restore for more help on the syntax)
4) Dump your data using the mysqldump complete-insert option (see link above)
5) Import your data, using mysql < data.sql
This should do the job for you, good luck!
Adding extra rows can be done on a live database:
ALTER TABLE [table-name] ADD [row-name] MEDIUMINT(8) default 0;
MySql will default all existing rows to the default value.
So here is what I would do:
make a copy of you're old database with MySql dump command.
run the resulting SQL file against you're new database, now you have an exact copy.
write a migration.sql file that will modify you're database with modify table commands and for complex conversions some temporary MySql procedures.
test you're script (when fail, go to (2)).
If all OK, then goto (1) and go live with you're new database.
These are all valid approaches, but I believe you want to write a sql statement that writes other insert statements that support the new columns you have.