Syncing/Streaming MySQL Table/Tables(Joined Tables) with PostgreSQL Table/Tables - mysql

I am having one MySQL Server and one PostgreSQL server.
Need to replicate or re-insert set of data from multiples tables
of MySQL to be Streamed/Synced to the PostgreSQL table.
This replication can be based on time(Sync) or event such as
a new insert in the table(Stream).
I tried using the below replication tools but all these tools will be able to sync table to table only.Its not allowing to choose the columns from different tables of the source database(MySQL) and insert in to different tables in the destination database(PostgreSQL).
Symmetricds
dbconvert
pgloader
Postgresql FDW
Now I have to write an application to query the data from MySQL
and insert in to PostgreSQL as a cron job .
Its cumbersome and error prone to sync the data.
This is not able to stream(Event based) the data for realtime replication.
it would be great if some tools already solving this problem.
Please let me know if there is opensource library or tool can do this for me.
Thanks in advance.

To achieve a replication with one of tools you proposed you can do the following:
Create a separate schema in PostgreSQL and add views so that they completely copy the table structure of MySQL. You will then add rules or triggers to the views to handle inserts/updates/deletes and redirect them to the tables of your choice.
This way you have the complete freedom to transform your data during the replication, yet still use the common tools.

maybe this tool can help you. https://github.com/the4thdoctor/pg_chameleon
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7/3.5. The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object. A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool can initialise the replica pulling out the data from MySQL but this requires the FLUSH TABLE WITH READ LOCK; to work properly.
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Related

Query data from database for 2 different server

I want to query data from 2 different database server using mysql. Is there a way to do that without having to create Federated database as Google Cloud Platform does not support Federated Engine.
Thanks!
In addition to #MontyPython's excellent response, there is a third, albeit a bit cumbersome, way to do this if by any chance you cannot use Federated Engine and you also cannot manage your databases replication.
Use an ETL tool to do the work
Back in the day, I faced a very similar problem: I had to join data from two separate database servers, neither of which I had any administrative access to. I ended up setting up Pentaho's ETL suite of tools to Extract data from both databases, Transform if (basically having Pentaho do a lot of work with both datasets) and Loading it on my very own local database engine where I ended up with exactly the merged and processed data I needed.
Be advised, this IS a lot of work (you have to "teach" your ETL tool what you need and depending on what tool you use, it may involve quite some coding) but once you're done, you can schedule the work to happen automatically at regular intervals so you always have your local processed/merged data readily accesible.
FWIW, I used Pentaho's community edition so free as in beer
You can achieve this in two ways, one you have already mentioned:
1. Use Federated Engine
You can see how it is done here - Join tables from two different server. This is a MySQL specific answer.
2. Set up Multi-source Replication on another server and query that server
You can easily set up Multi-source Replication using Replication channels
Check out their official documentation here - https://dev.mysql.com/doc/refman/8.0/en/replication-multi-source-tutorials.html
If you have an older version of MySQL where Replication channels are not available, you may use one of the many third-party replicators like Tungsten Replicator.
P.S. - There is no such thing in MySQL as a FDW in PostgreSQL. Joins across servers are easily possible in other database management systems but not in MySQL.

Streaming replication from PostgreSQL to MySQL/MemSQL

I have two database systems currently in use in production: the main PostgreSQL database, and another database (MemSQL) used primarily for analytical purposes.
Is there a way to setup streaming replication of some of the tables in PostgreSQL to MemSQL (a MySQL-compatible database) rowstores, assuming the tables on both databases have the same schema?
I do not think that there is a tool for that (yet) but there is one for limited MySQL to Postgres replication:
https://github.com/the4thdoctor/pg_chameleon
Maybe this can be adapted to work the other way.
And you could always build your own little data shovel by using the NOTIFY/LISTEN feature of Postgres.
Which is probably only an option if you only have to copy a very simple set of data.

Bad practice to use MySQL and RedShift together?

A current project I am working on has been exclusively using MySQL as our RDMS. We are currently looking to segment the database into two different databases. One will be moving to RedShift (which runs using a modified Postgresql) while the other will continue using MySQL.
My concern does not stem from splitting the data, but rather how applications will interact with the segmented data. Effectively our current application will be reading static data from RedShift and writing to the MySQL database and I am curious if it is a bad practice to intermingle these Query Languages.
Would it be better to migrate the MySQL DB to Postgres to limit complications arising from their differences?
We (Looker) work with many customers (100s) that have both MySQL and Redshift. The progression as their needs grow is usually:
MySQL
MySQL + MySQL slave
MySQL + MySQL Writable Slave
MySQL + MySQL Writable Slave + Redshift
So your best bet, if you haven't done so is to setup a MySQL Replica slave database. The replica slave follows your master write database and is essentially an exact copy of your master.
You can also make your Replica Writable. This becomes really useful for building summary tables. Here are some instructions on how to make a writable replica in RDS, but you can do it with in other systems too.
http://www.looker.com/docs/setup-and-management/database-config/mysql-rds
If have big event data that you want to integrate with your transactional data, the next step is to setup a process that migrates all your MySQL data into Redshift and pumps in data from other sources (like your event data, for example). Moving all the data, gives you the ability to ask any question from Redshift.
Redshift will lag hours or more behind the MySQL database. If you need to answer real time questions, query MySQL. If you want general insights, query the Redshift database.

Replicating data from mySQL to Hbase using flume: how?

I have a large mySQL database with heavy load and would like to replicate the data in this database to Hbase in order to do analytical work on it.
edit: I want the data to replicate relatively quickly, and without any schema changes (no timestamped rows, etc)
I've read that this can be done using flume, with mySQL as a source, possibly the mySQL bin logs, and Hbase as a sink, but haven't found any detail (high or low level). What are the major tasks to make this work?
Similar question were asked and answered earlier but didn't really explain how or point to resources that would:
Flume to migrate data from MySQL to Hadoop
Continuous data migration from mysql to Hbase
You are better off using SQOOP for this purpose, IMHO. It was developed for exactly this purpose. Flume was made for a rather different purpose, like aggregating log data, data generated from sensors etc.
See this for more details.
So far there are three options worth considering:
Sqoop: After initial bulk import, it supports two types of incremental udpates import: APPEND, LAST-MODFIED. But being said, It won't give you Real-Time or even near Real-Time replication. It's not because Sqoop can't run that fast, it's because you don't want to plug in a Sqoop pipe to your Mysql server and puling data every 1 or 2 mins.
Trigger: This is a quick-dirty solution, by adding triggers to the source RDBMS, and update your HBase according. This one gives you Real-Time satisfaction. But you have to mess up the source DB by adding triggers. It might be ok as a temporal solution, but long term, it just won't do.
Flume: This one, you will need to put in the most development effort. It doesn't need to touch the DB, it doesn't add in Reading traffic to the DB neither(It tails the transaction logs).
Personally I'd go for flume, not only it channels the data from RDBMS to your HBase, but also can you do something with the data while they are streaming through your flume pipe. (e.g. transformation, notification, alerting etc etc)

How to create linked server MySQL

Is it possible create/configure MySQL for functionality like SQL Server's Linked Server?
If yes, would you please tell me how? I'm using MySQL 5.5.
MySQL's FEDERATED engine provides functionality similar to SQL Server's Linked Server (and Oracle's dblink) functionality, but doesn't support connecting to vendors other than MySQL. It's not clear from the question if you need the functionality to connect to vendors other than MySQL.
You might want to look into MySQL Proxy. This doesn't match the architecture of Linked Servers/dblink, but you can probably solve a similar set of problems that you would use Linked Servers/dblink to solve.
I am the developer of the MySQL Data Controller. Unfortunately, since we had lack of requests we have stopped development on it. The plugin was 100% functional with MySQL connecting to Oracle, MSSQL or MySQL.
Base on some requests, we had added back a blog and video about it :
http://www.acentera.com/mysql-datacontroller/
Regards,
Francis L.
Unfortunately you cannot link an entire MySQL database to another MySQL database like you can with MS SQL. However, you can link individual tables. A federated table is a local table you create that points to a table on another server.
You can run queries and stored procedures just like any other table. The two tables must have the same structure, except the Federated table uses a different database engine: Federated. If you make ANY changes to the structure of the remote table, you should re-create the local federated table.
The process is actually quite easy, here is an example: https://docs.oracle.com/cd/E17952_01/mysql-5.0-en/federated-use.html
In my experience, the time needed to create and implement this process is minimal, even when compared to linked servers. It should take you less than 30 minutes to get your first federated table working, after that its a 5 min process. Last item, when naming your federated table, I give it the same name as the remote table with a "federated_" in front, like federated_customer.
Also, store your federated table definitions as separate stored procedures so you can reuse them anytime you need to create the federated table again, AND so that other developers can see how you generated the federated table.
The MySQL Data Controller is a follow-on to the federated engine that allows connection to different database types, such as Microsoft SQL Server or Oracle. I am not sure how development is going, yet.
See: http://en.wikipedia.org/wiki/MySQL_DataController
or: https://launchpad.net/datacontroller