How kafka connector works for postgresql and mysql database - mysql

I'm following the quick start tutorials from here quick-start-kafka-connect
This tutorial shows how to stream the mysql database table changes into kafka topic.
The only part is download everything and just add /tmp/kafka-connect-jdbc-source.json file with some config properties and start
How does this work in background ?
1 : Does it create connection with database and monitor tables for specific intervals of time? OR
2 : Does it uses replication log? (i don't know how this works)
3 : Is this same mechanism for mysql and postgresql?

Debezium monitors the OpLog.
Kafka Connect JDBC by Confluent (which you've linked to) can use a time-interval, and that configuration is shared by all JDBC-compliant connections, MySQL and Postgres included.
For incremental query modes that use timestamps, the source connector uses a configuration timestamp.delay.interval.ms ...
replication log? (i don't know how this works)
You can find the Debezium guide here, but this mechanism differs for Mongo, Postgres, MySQL, MSSQL, Oracle, etc.

Related

Is there any way for sync MySql function from kafka MySql source connector to kafka Oracle sink connector

I'm using kafka connector for CDC from MySQL to Oracle database.
It's my server specification.
- MySQL : 8.0
- OS: centos 7.8(64-bit)
- Oracle : 18c xe
- OS: Oracle Linux 7.6(64-bit)
- Source Connector : debezium-connector-mysql-1.9.5
- Sink Connector : confluentinc-kafka-connect-jdbc-10.5.0
I have trying to sync different database MySQL and Oracle.
I have succeeded in create table, insert in MySQL and sync to oracle.
However, when function is created, topic is created by reading bin log, but it is not sink to Oracle.
When the confluent sink connector sinks data, it appears to be running SQL by referring to a table-based topic.
But when a function is generated, a separate topic other than the topic of the bin log is not generated, so the sink does not seem to be normal.
Is there any way to transfer the function of the database using CDC?

How to feed hadoop periodically

My company is generating around 6 million of records per day and I have seen hadoop is a good solution to handle big amounts of data. I found how to load data from mysql but it is exporting full database, Is there a way to keep sync data between my operational mysql DB and Hadoop?
One of the best solution you can use is Debezium. Debezium is built on top of Apache Kafka Connect API and provides connectors that monitor specific databases.
It records all row-level changes within each database table in a change event stream, and applications simply read these streams to see the change events in the same order in which they occurred.
The Architecture will something like this:
MySQL --> Debezium(Kafka Connect Plugin) --> Kafka Topic --> HDFS Sink
You can find more information and documentation about Debezium Here.
Furthermore, Apache NiFi has a processor named CaptureChangeMySQL, You can design NiFi flow like below to do this:
MySQL --> CaptureChangeMySQL(Processor) --> PutHDFS(Processor)
You can read more about CaptureChangeMySQL Here.
There are multiple solutions available which you may need to choose as per your architectural requirement or deployment setup.
Debezium :
Debezium is a distributed platform deployed via Apache Kafka Connect that can help in continuously monitoring the various databases of your system and let the applications stream every row-level change in the same order they were committed to the database. It turns your existing databases into event streams, whereby the applications can see and accordingly respond to each row-level change in the databases.
Kafka Connect is a framework and runtime for implementing and operating source connectors such as Debezium, which ingest data into Kafka and sink connectors, which propagate data from Kafka topics into other systems.
For the case of MySQL, the Debezium's MySQL Connector can help in monitoring and recording all of the row-level changes in the databases on a MySQL server . All of the events for each table are recorded in a separate Kafka topic and the client applications can read the Kafka topics that correspond to the database tables it’s interested in following, and react to every row-level event it sees in those topics as per the requirement.
Once the data of interest is available in topics, the Kafka Connect HDFS Sink connector can be used to export the data from Kafka topics to HDFS files in a variety of formats as per your use case or requirement and integrates with Hive and when it is enabled. This connector helps application in periodically polling data from Apache Kafka and writing them to HDFS. This connector also automatically creates an external Hive partitioned table for each Kafka topic and updates the table according to the available data in HDFS.
Maxwell's daemon :
Maxwell's daemon is a CDC (Change Data Capture) application that reads MySQL binlogs (events from MyQSQL database) and writes row updates as JSON to Kafka or other streaming platforms . Once the data of interest is available in kafka topics, the Kafka Connect HDFS Sink connector can be used to export the data from Kafka topics to HDFS files.
NiFi :
Apache NiFi helps in automating the flow of data between systems. Apache NiFi CDC (Change Data Capture) flow also uses MySQL bin logs(via CaptureChangeMySQL) to create a copy of a table and ensures that it is in sync with row-level changes to the source. This inturn can be operated upon by NiFi PutHDFS for writing the data to HDFS.

Database sync between SQL Server and Mysql

I have two different application which run on different database, SQL Server and MySQL. Now I want real time data sync between SQL Server and Mysql - can anybody please tell me. I have already tried data migration, but it copies only data not sync data
Take a look at StreamSets. Open source and can read the SQL Server change tracking (CT) tables and turn them into inserts, updates, and deletes to any database with a JDBC driver including MySQL. It can also use the MySQL binlog to build data pipelines in the other direction.

How to backup/restore a single database running with mysql 5.0 from one server to other available server?

I am currently using mysql server 5.0 with innodb storage engine. I want to backup a database from source server and restore the same to one of the available destination servers.
Option 1: Use innodb_file_per_table option in my .cnf and try to copy the table.ibd file to the other server and recover. I saw examples over other websites where it was being supported in mysql 5.6, but I am not sure if that is supported in mysql server 5.0. I tried the steps given in https://dev.mysql.com/doc/refman/5.6/en/innodb-migration.html, but that did not work for me.
Option 2: Use mysqldump to get a dump of the database and use mysqlimport in the destination to perform mysql export/import. But, by doing so, I need to lock the database at source before performing the export. This can prevent any incoming requests to the source database while the mysqldump is ongoing.
I am still exploring other options, but I am not sure if option 1 is not viable due to mysql version 5.0 or because I am missing something.
http://dev.mysql.com/doc/refman/5.7/en/replication-howto.html
You are talking about replication.

Is it possible to downgrade a AWS RDS from mysql 5.7 to lower version (say 5.6)

This is something i need to figure out, my company runs a number of prod RDS on AWS. Some of the mysql RDS run with 5.7 , i need to downgrade the mysql to 5.6 or 5.5 . Is this functionality provided by AWS.
Scenario: A mysql server already up and running with mysql version 5.7, Downgrade this to 5.6
-> If this is possible then what are the possible ways ?
-> How to do this ?
This is not something that AWS provides out of the box, however it can be solved with below 2 approaches depending on your database size and downtime that you can accept.
It might worth considering fixing application compatibility instead of downgrading DB which is more risky operation.
1. Dump, restore and switch method
Dump your currently running database with mysqldump utility. Start a new RDS instance with downgraded engine, load your dumped data into it. Switch your application to use RDS instance with downgraded engine.
2. Dump, restore, replicate & switch method
Dump your currently running database with mysqldump utility. Start a new RDS instance with downgraded MySQL engine, load your dumped data into it.
Set the new, downgraded DB instance as read replica of your old DB instance using mysql.rds_set_external_master and then start replication using mysql.rds_start_replication. Stop writes to your original DB, once the read replica catches up (you must monitor replication lag), run mysql.rds_reset_external_master which will promote your downgraded instance and turn off replication. Point your application to the downgraded RDS DB instance.
Method 2 will shorten your downtime to minimum, but is a bit more complex to execute. Here is a command reference to get familiar with to help you succeed: MySQL on Amazon RDS SQL Reference
You will find a great amount of examples in RDS documentation also - Importing and Exporting Data From a MySQL DB Instance: