Streaming replication from PostgreSQL to MySQL/MemSQL - mysql

I have two database systems currently in use in production: the main PostgreSQL database, and another database (MemSQL) used primarily for analytical purposes.
Is there a way to setup streaming replication of some of the tables in PostgreSQL to MemSQL (a MySQL-compatible database) rowstores, assuming the tables on both databases have the same schema?

I do not think that there is a tool for that (yet) but there is one for limited MySQL to Postgres replication:
https://github.com/the4thdoctor/pg_chameleon
Maybe this can be adapted to work the other way.
And you could always build your own little data shovel by using the NOTIFY/LISTEN feature of Postgres.
Which is probably only an option if you only have to copy a very simple set of data.

Related

Query data from database for 2 different server

I want to query data from 2 different database server using mysql. Is there a way to do that without having to create Federated database as Google Cloud Platform does not support Federated Engine.
Thanks!
In addition to #MontyPython's excellent response, there is a third, albeit a bit cumbersome, way to do this if by any chance you cannot use Federated Engine and you also cannot manage your databases replication.
Use an ETL tool to do the work
Back in the day, I faced a very similar problem: I had to join data from two separate database servers, neither of which I had any administrative access to. I ended up setting up Pentaho's ETL suite of tools to Extract data from both databases, Transform if (basically having Pentaho do a lot of work with both datasets) and Loading it on my very own local database engine where I ended up with exactly the merged and processed data I needed.
Be advised, this IS a lot of work (you have to "teach" your ETL tool what you need and depending on what tool you use, it may involve quite some coding) but once you're done, you can schedule the work to happen automatically at regular intervals so you always have your local processed/merged data readily accesible.
FWIW, I used Pentaho's community edition so free as in beer
You can achieve this in two ways, one you have already mentioned:
1. Use Federated Engine
You can see how it is done here - Join tables from two different server. This is a MySQL specific answer.
2. Set up Multi-source Replication on another server and query that server
You can easily set up Multi-source Replication using Replication channels
Check out their official documentation here - https://dev.mysql.com/doc/refman/8.0/en/replication-multi-source-tutorials.html
If you have an older version of MySQL where Replication channels are not available, you may use one of the many third-party replicators like Tungsten Replicator.
P.S. - There is no such thing in MySQL as a FDW in PostgreSQL. Joins across servers are easily possible in other database management systems but not in MySQL.

Syncing/Streaming MySQL Table/Tables(Joined Tables) with PostgreSQL Table/Tables

I am having one MySQL Server and one PostgreSQL server.
Need to replicate or re-insert set of data from multiples tables
of MySQL to be Streamed/Synced to the PostgreSQL table.
This replication can be based on time(Sync) or event such as
a new insert in the table(Stream).
I tried using the below replication tools but all these tools will be able to sync table to table only.Its not allowing to choose the columns from different tables of the source database(MySQL) and insert in to different tables in the destination database(PostgreSQL).
Symmetricds
dbconvert
pgloader
Postgresql FDW
Now I have to write an application to query the data from MySQL
and insert in to PostgreSQL as a cron job .
Its cumbersome and error prone to sync the data.
This is not able to stream(Event based) the data for realtime replication.
it would be great if some tools already solving this problem.
Please let me know if there is opensource library or tool can do this for me.
Thanks in advance.
To achieve a replication with one of tools you proposed you can do the following:
Create a separate schema in PostgreSQL and add views so that they completely copy the table structure of MySQL. You will then add rules or triggers to the views to handle inserts/updates/deletes and redirect them to the tables of your choice.
This way you have the complete freedom to transform your data during the replication, yet still use the common tools.
maybe this tool can help you. https://github.com/the4thdoctor/pg_chameleon
Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7/3.5. The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object. A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.
The tool can initialise the replica pulling out the data from MySQL but this requires the FLUSH TABLE WITH READ LOCK; to work properly.
The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Joining MySQL and Informix tables

I have a table in MySQL that I need to join with a couple of tables in a different server. The catch is that these other tables are in Informix.
I could make it work by selecting the content of a MySQL table and creating a temp table in Informix with the selected data, but I think in this case it would be too costly.
Is there an optimal way to join MySQL tables with Informix tables?
I faced a similar problem a number of years ago while developing a Rails app that needed to draw data from both an Informix and a MySQL database. What I ended up doing was using of an ORM library that could connect to both databases, thereby abstracting away the fact that the data was coming from two different databases. Not sure if this will end up as a better technique than your proposed temp table solution. A quick google search also brought up this, which might be promising.
This can sometimes be solved in the database management system with a technique called federation. The idea is that you create virtual tables in one of the two systems that makes queries to the other database system on demand.
For both MySQL and MariaDB there is the FederatedX storage engine that unfortunately only works with other MySQL/MariaDB systems. This is a fork of the older, but as far as I know unmaintained, Federated storage engine.
Some might also consider migrating to MariaDB where you can use the CONNECT storage engine which contains an ODBC client.
What I ended up doing is manually (that is, from the php app) keeping in sync the mysql tables with their equivalents in informix, so I didn't need to change older code. This a temporary solution, given that the older system, which is using informix, is going to be replaced.

Cassandra + Spark vs MySQL + Spark

I have to design a piece of software on a three layer architecture:
A process periodically polling a data source such an ftp to inject in a database
A database
Spark for the processing of the data
My data is simple and perfectly suitable for being stored in a single RDMS table, or I can store it in Cassandra, then periodically I would need Spark to run some machine learning algorithms on the whole set of data.
Which of the database better suits my use case? In detail, I do not need to scale on multiple nodes and I think the main underlying questions are:
Is simple querying (SELECT) faster on Cassandra or MySQL on a simple table?
Does the Spark Connector from Cassandra benefit of some features of it that will make it faster than a SQL connector?
You can use MySQL if data size is less than 2Tb. Select on MySQL table will be more flexible than in Cassandra.
You should use Cassandra when your data storage requirement crosses single machine. Cassandra needs careful data modeling to be done for each lookup or Select scenario.
You can use suggested approach below for MySQL Spark Integration
How to work with MySQL and Apache Spark?
It all depends on Data : size, integrity, scale, Flexible schema sharding etc.
Use MySQL if:
Data size is small ( in single digit TBs)
Strong Consistency( Atomicity, Consistency, Isolation & Durability) is required
Use Cassandra if:
Data size is huge and horizontal scalability is required
Eventual Consistency ( Basically Available Soft-state Eventual consistency)
Flexible schema
Distributed application.
Have a look at this benchmarking article and this pdf
I think it's better to use a sql database as mysql, cassandra only should be used if you need to scale your data in bigger proportions and along many datacenters. The java cassandra jdbc driver is just a normal driver to connect to cassandra, it doesn't have any especial advantages over other database drivers.

Converting Mysql to No sql databases

I have a production database server running on MYSQL 5.1, now we need to build a app for reporting which will fetch the data from the production database server, since reporting queries through entire database may slow down, hence planning to switch to nosql. The whole system is running aws stack planning to use DynamoDb. Kindly suggest me the ways to sync data from the production nosql server to nosql database server.
Just remember the simple fact that any NoSQL database is essentially a document database; it's really difficult to automatically convert a typical relational database in MySQL to a good document design.
In NoSQL you have a single collection of documents, and each document will probably contain data that would be in related rows in multiple tables. The advantage of a NoSQL redesign is that most data access is simpler and faster without requiring you to write complex join statements.
If you automatically convert each MySQL table to a corresponding NoSQL collection, you really won't be taking advantage of a NoSQL DB. This is because you'll end up loading many more documents, and thus make many more calls to the database than needed and thus loosing simplicity and speediness of NoSQL DB.
Perhaps a better approach is to look at how your applications use the MySQL database and go from there. You might then consider writing a simple utility script knowing fully well your MySQL database design.
As the data from a NoSQL database like MongoDB, RIAK or CouchDB has a very different structure than a relational database like MySQL the only way to migrate/synchronise the data would be to actually write a job which would write the data from MySQL to the NoSQL database using SELECT queries as stated on the MongoDB website:
Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
Depending of the quantity of your data this could take awhile to process.
If you have any other questions don't hesitateo to ask.