How to setup heterogeneous replication with tungsten? - mysql

Recently i am working on a replication between heterogeneous dbs with Tungsten Replicator. We have a mysql master and an oracle slave. According to the docs such a setup should work. I am using tungsten-replicator-2.0.5. I call
$TUNGSTEN_HOME/tools/configure \
--verbose \
--home-directory=$INSTALL_HOME \
--cluster-hosts=$MA_HOST,$SL_HOST \
on the master node for creating a basic installation on both nodes. Note: using the installer (as recommended) fails, due to the heterogeneous setup, since the installer fails to find a mysql instance on the slave node. The replicator instances are configured by adding static-$SERVICENAME.properties to the conf directory and modifying conf/services.properties (replicator.host=$HOSTNAME, replicator.masterListenPortStart=12112, replicator.rmi_port=20000).
Launching the replicators resulted in an ORA-01850 when issuing an update statement against trep_commit_seqno in the tungsten schema, due to a missing 'timestamp' keyword in the SQL-Statement. Just in order to get beyond this error, i altered datatype of update_timestamp and extract_timestamp to varchar. The replicators are now starting up und some simple inserts where replicated but when the test script issues a
DROP TABLE IF EXISTS table1;
replication fails due to an ORA-00933, because of the 'IF EXISTS' clause. I am not sure if this is an error in my configuration or if tungsten in general has problems with the differences in DDL statements between those two products. Has somebody successfully set up a similar replication using tungsten?

The Tungsten docuemntation has some useful guidance. In particular, this point from the "Advanced Principles of Operation" is relevant: "Also, DDL statements beyond the simplest CREATE TABLE expressions are rarely at all portable. ". In your case, DROP TABLE IF EXISTS table1; is not valid Oracle DDL.
Read it here.

For anybody who is interested: Up to now, Tungsten does not perform any transformation of ddl statements in a heterogeneous environment (as MithunSasidharan wrote). Now i wrote a custom filter, that skips ddl statements using regular expressions. For synchronizing the schema defition, we will use Apache DdlUtils, which serves quite well for transforming a schema definition between mysql and oracle. I assume it works for other vendors similarly well. Thanks.

Related

pt-online-schema-change breaks AWS DMS Replication

Currently using AWS DMS to replicate data from our Aurora MySQL database to S3. This results in a low-latency data lake we can use to get lineage of all changes occurring and build additional data pipelines off of. However, when making a change via pt-online-schema-change script the modified table stops replicating at all. Is there any reason why this would happen?
After running the change the logs show that the schemas for the source table no longer match what DMS is expecting, and the CDC changes are skipped. The only possible reason for this is DMS is not properly tracking the DML statements.
Table alter triggered with percona (in this case, add column)
New table synced by AWS DMS
Trigger adds throw warnings in AWS DMS as not supported
Table is renamed
Table column count does not match, ignoring extra columns.
Table column size mismatch, skipping.
Notably, all the DML statements being used by Percona (outside triggers) are supported by AWS DMS and S3 as a target. Does anyone else have any experience with this situation or combination of tools?
Edit:
Here's an example of the command used to make these changes with Percona:
pt-online-schema-change --host=<host> \
--user=<user> \
--ask-pass \
--execute \
--no-drop-old-table \
--no-check-alter \
--alter="ADD COLUMN db_row_update_stamp TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3)" \
D=<db>,t=<REPLACE_TABLE_NAME_HERE>
So looking at the DETAILED_DEBUG logs for this task, I was testing a RENAME scenario in AWS DMS manually.
This resulted in the following.
2021-02-27T00:38:43:255381 [SOURCE_CAPTURE ]T: Event timestamp '2021-02-27 00:38:38' (1614386318), pos 54593835 (mysql_endpoint_capture.c:3293)
2021-02-27T00:38:43:255388 [SOURCE_CAPTURE ]D: > QUERY_EVENT (mysql_endpoint_capture.c:3306)
2021-02-27T00:38:43:255394 [SOURCE_CAPTURE ]T: Default DB = 'my_db' (mysql_endpoint_capture.c:1713)
2021-02-27T00:38:43:255399 [SOURCE_CAPTURE ]T: SQL statement = 'RENAME TABLE test_table TO _test_table_old' (mysql_endpoint_capture.c:1720)
2021-02-27T00:38:43:255409 [SOURCE_CAPTURE ]T: DDL DB = '', table = '', verb = 0 (mysql_endpoint_capture.c:1734)
2021-02-27T00:38:43:255414 [SOURCE_CAPTURE ]T: >>> Unsupported or commented out DDL: 'RENAME TABLE test_table TO _test_table_old' (mysql_endpoint_capture.c:1742)
It seems that this version of DMS does not properly read RENAME statements despite the documentation claiming support for RENAME's.
Am looking into opening a bug on AWS's side. This impacted AWS DMS server version 3.4.3.
Will be testing against previous versions, will post an update if I find a specific version has this fixed until it is resolved in a newer version. Can't 100% claim it's a bug in DMS, but taking Percona out of the picture I was able to replicate the problem.
The single option here how to fix broken replication is to "Reload table data".
In AWS Console DMS select your migration task, go to "Table statistics" tab and
select your table, which have been altered (RENAME-ed) under the hood by Percona tool.
In a nutshell "Reload table data" action refreshes replication instance, cleans up caches and creates new snapshot for your data in S3.
Creating new snapshot in S3 will cause your replication being out-of-sync for some time. The period of time for recovering replication will linearly depends of volume of your data in table and performance of chosen replication instance.
Unfortunately RENAME TABLE statement isn't supported by DMS and assuming, that no one else could support it properly, as it breaks data checksums (or checkpoints in AWS).

How to migrate existing schemas to known baseline using liquibase or flyway

We want to introduce database schema versioning on an existing schema, because the current way of handling the schema versioning (manually) has lead to differences in the schema in the various databases in production.
For the sake of definition: when the word 'database' is used in this question, I mean the set of tables, views, procedures etc. that are available to 1 customer. Each customer has his own database. Several databases in 1 MySQL server instance. We have several MySQL servers running.
Context:
multiple MySQL databases in production
version of database is defined by the product that uses the database, i.e. multiple versions exists in production
Problem:
databases of the same version differ slightly in schema definition
unclear if schema migrations have been correctly applied
I've tried Flyway (commandline) and defined a baseline of a certain version and created migration scripts to migrate the database version to schema of the latest version. Obviously this works well when you start with a known baseline.
So we need to migrate all databases to a known baseline.
When I read the documentation and multiple articles about setting the baseline it all comes down to: 'figure out what the baseline should be and migrate the database manually to this baseline'.
However, since there are multiple databases in production (> 500) which differ slightly even when 'in the same version', it is not possible to manually change the databases to the desired baseline.
Is there a way with either Flyway or Liquibase to say something like 'migrate this schema to this baseline' which comes down to a set of 'if this column doesn't match the desired schema alter table modify column....' statements?
I'd settle for changes in tables only and recreate all views, triggers, procedures etc. anyway.
I read that Liquibase can do a 'rollback', but since MySQL doesn't support transactions on DDL, I wonder if this is implemented in Liquibase or only available for database servers that DO support transactions on DDL.
If anyone knows, please let me know. This might be an argument to move from Flyway to Liquibase.

MYSQL to HSQLDB migration issue

I have an issue with HSQLDB, I have a MySql database that I'm dumping to an in memory HSQLDB i get the following error when I run the script: Error: unexpected token: ( which is on a create table script and the offending line is TINYINT(3)
if I remove the brackets and the number it works fine, this is a valid declaration on MYSQL and I have tried turning MYSQL compatibility on by changing my url to: jdbc:hsqldb:mem:dataSource;sql.syntax_mys=true but this still doesn't work. :(
just as additional info I'm using a Spring hibernate connection and using Liquibase to do the dumping from MySQL to HSQLDB and I'm running HSQLDB v2.3.2
SQL Syntax especially DDL is not very well portable between different databases. You will have to learn the correct syntax for create table in HSQLDB witch is somewhat different from MySQL.
You can not just export table definition from one flavor of database and import into another.
Would be great if this would be the case but SQL Standard is quite loose...
I assume you have a DDL-Script, you can add SET DATABASE SQL SYNTAX MYS TRUE; to the top of it, see also here (Table 13.29. MySQL Style Syntax).
You may use this only for tests though; if you want to fully migrate to HSQLDB, changing the scripts themselves is sure the long term solution.

SQL Server to MySQL data transfer

I am trying to transfer bulk data on a constant and continuous based from a SQL Server database to a MYSQL database. I wanted to use SQL Server's SSMS's replication but this apparently is only for SQL Server to Oracle or IBM DB2 connection. Currently we are using SSIS to transform data and push it to a temporary location at the MYSQL database where it is copied over. I would like the fastest way to transfer data and am complication several methods.
I have a new way I plan on transforming the data which I am sure will solve most time issues but I want to make sure we do not run into time problems in the future. I have set up a linked server that uses a MYSQL ODBC driver to talk between SQL Server and MYSQL. This seems VERY slow. I have some code that also uses Microsoft's ODBC driver but is used so little that I cannot gauge the performance. Does anyone know of lightening fast ways to communicate between these two databases? I have been researching MYSQL's data providers that seem to communicate with a OleDB layer. Im not too sure what to believe and which way to steer towards, any ideas?
I used the jdbc-odbc bridge in Java to do just this in the past, but performance through ODBC is not great. I would suggest looking at something like http://jtds.sourceforge.net/ which is a pure Java driver that you can drop into a simple Groovy script like the following:
import groovy.sql.Sql
sql = Sql.newInstance( 'jdbc:jtds:sqlserver://serverName/dbName-CLASS;domain=domainName',
'username', 'password', 'net.sourceforge.jtds.jdbc.Driver' )
sql.eachRow( 'select * from tableName' ) {
println "$it.id -- ${it.firstName} --"
// probably write to mysql connection here or write to file, compress, transfer, load
}
The following performance numbers give you a feel for how it might perform:
http://jtds.sourceforge.net/benchTest.html
You may find some performance advantages to dumping data to a mysql dumpfile format and using mysql loaddata instead of writing row by row. MySQL has some significant performance improvements for large data sets if you load infile's and doing things like atomic table swaps.
We use something like this to quickly load large datafiles into mysql from one system to another e.g. This is the fastest mechanism to load data into mysql. But real time row by row might be a simple loop to do in groovy + some table to keep track of what row had been moved.
mysql> select * from table into outfile 'tablename.dat';
shell> myisamchk --keys-used=0 -rq '/data/mysql/schema_name/tablename'
mysql> load data infile 'tablename.dat' into table tablename;
shell> myisamchk -rq /data/mysql/schema_name/tablename
mysql> flush tables;
mysql> exit;
shell> rm tablename.dat
The best way I have found to transfer SQL data (if you have the space) is a SQL dump in one language and then to use a converting software tool (or perl script, both are prevalent) to convert the SQL dump from MSSQL to MySQL. See my answer to this question about what converter you may be interested in :) .
We've used the ado.net driver for mysql in ssis with quite a bit of success. Basically, install the driver on the machine with integration services installed, restart bids, and it should show up in the driver list when you create an ado.net connection manager.
As for replication, what exactly are you trying to accomplish?
If you are monitoring changes, treat it as a type 1 slowly changing dimension (data warehouse terminology, but same principal applies). Insert new records, update changed records.
If you are only interested in new records and have no plans to update previously loaded data, try an incremental load strategy. Insert records where source.id > max(destination.id).
After you've tested the package, schedule a job in the sql server agent to run the package every x minutes.
Cou can also try the following.
http://kofler.info/english/mssql2mysql/
I tried this a longer time before and it worked for me. But I woudn't recommend it to you.
What is the real problem, what you try to do?
DonĀ“t you get a MSSQL DB Connection, for example from Linux?

What commands would be required to export selected records from a MySQL database to a SQLite database?

I would like to write a script that supports the following workflow.
given: a defined set of queries (select statements with table joins) that return sets of data from a single MySQL database
create: a SQLite database that contains the information (tables, data) required to returned the same results to the same set of queries sent in step 1.
Outside of select, delete, and update, I am relatively unfamiliar with SQL, so I would appreciate specific command line or SQL syntax... anything required beyond installing SQLite.
This's an half answer but can be usefull also for other kind of DB.
mysqldump has the option "compatible" to control the standard compliance of the sql dumped.
From the "mysqldump --help" execution:
--compatible=name Change the dump to be compatible with a given mode. By
default tables are dumped in a format optimized for
MySQL. Legal modes are: ansi, mysql323, mysql40,
postgresql, oracle, mssql, db2, maxdb, no_key_options,
no_table_options, no_field_options. One can use several
modes separated by commas. Note: Requires MySQL server
version 4.1.0 or higher. This option is ignored with
earlier server versions.
Abe, this doesn't answer your question but might help you get started. You can export the database using mysqldump with --complete-insert (since sqlite does not support multi-row / compound inserts), then use sqlite3_exec() to import the dump to SQLite