Debezium - MySQL manually change binlog position from connect-offsets - mysql

I have a running debezium setup for doing CDC from MySQL. Now I want to create one more MySQL connector for another MySQL server. But I don't want snapshot for existing data, I want to start the debezium new connector from a specific file and position.
I read some questions from stackoverflow, they told to manually insert the record to connect-offsets topic. But if I do this what will happen to my existing setup?
On a test server, I tried to set the above solution, but it was not working.
kafka-console-producer --broker-list localhost:9092 --topic connect-offsets
>{"file":"mysql-bin.000002","pos":2012}
>[2019-12-30 05:43:52,666] WARN [Producer clientId=console-producer] Got error produce response with correlation id 4 on topic-partition connect-offsets-5. Error: CORRUPT_MESSAGE (org.apache.kafka.clients.producer.internals.Sender)
[2019-12-30 05:43:52,767] WARN [Producer clientId=console-producer] Got error produce response with correlation id 5 on topic-partition connect-offsets-5, Error: CORRUPT_MESSAGE (org.apache.kafka.clients.producer.internals.Sender)
[2019-12-30 05:43:52,870] WARN [Producer clientId=console-producer] Got error produce response with correlation id 6 on topic-partition connect-offsets-5, Error: CORRUPT_MESSAGE (org.apache.kafka.clients.producer.internals.Sender)
[2019-12-30 05:43:52,975] ERROR Error when sending message to topic connect-offsets with key: null, value: 38 bytes with error: (org.apache.kafka.clients.ingCallback)
org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted to.
Im not sure how to achieve this. Can somebody help me on this?

Connect-offsets topic records are used for connectors offset management, For Debezium MySQL connector the records in that topic has the key which contains the connector name and the MySQL server name which you have configured under your connector configurations.
So for this, you will require that key as well if you want to produce record in that topic only then the Debezium connector will be able to read those offsets.

Related

AWS Aurora MySQL Blue/Green Deployment replication failure

I wanted to use the new fully managed Blue/Green deployments to upgrade Aurora 1 databases (MySQL 5.6) to Aurora 2 (MySQL 5.7). Though it worked great on one pre-production environment the replication fails on other environments, including production. The replication fails with errors like:
Read Replica Replication Error - SQLError: 1032, reason: Could not execute Delete_rows event on table my_schema.my_table; Can't find record in 'my_table', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin-changelog.000122, end_log_pos 2040
On other instances there were duplicate entries for the primary key (Error code 1062). I tried ROW and MIXED as binlog_format. AWS support's recommendation is to solve those problems manually, which does seem inpractical. For debugging purposes, I tried to read the binlog with the mysqlbinlog utility. This led to the following error:
ERROR: Could not construct log event object: Found invalid event in binary log
Did anybody encounter similar issues? Is there a way to get more insight into these errors and solve them?

Cannot start debezium MySQL connector due to Error code 1236

When I check the status of my debezium connector via the kakfa-connect's REST API, I see this error message for the connector:
org.apache.kafka.connect.errors.ConnectException: The slave is
connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the
master has purged binary logs containing GTIDs that the slave
requires. Error code: 1236; SQLSTATE: HY000.\n\tat
io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)\n\tat
io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:197)\n\tat
io.debezium.connector.mysql.BinlogReader$ReaderThreadLifecycleListener.onCommunicationFailure(BinlogReader.java:997)\n\tat
com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:950)\n\tat
com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:580)\n\tat
com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:825)\n\tat
java.lang.Thread.run(Thread.java:748)\nCaused by:
com.github.shyiko.mysql.binlog.network.ServerException: The slave is
connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the
master has purged binary logs containing GTIDs that the slave
requires.\n\tat
com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:914)\n\t...
3 more\n
Is this an issue with how I am configuring my debezium connector or an issue with MySQL? Whats crazy is that even when I tried setting the option snapshot.mode to never and this error is still being thrown! According to the documentation, when snapshot.mode is set to either never or when_needed it should not require the GTID so I am super confused as to what is happening
The problem is that Debezium was probably down for some time and some of the transactions it has not seen are no longer available on the server.
That could be an issue with the wrong offsets for the connector.
So I've deleted the connector, deleted all related Kafka topics (like schema history, etc), and cleaned the offsets using the following guide https://debezium.io/documentation/faq/#how_to_remove_committed_offsets_for_a_connector
And it helped! After re-creation - the connector works as expected now.

Kafka connect setup to send record from Aurora using AWS MSK

I have to send records from Aurora/Mysql to MSK and from there to Elastic search service
Aurora -->Kafka-connect--->AWS MSK--->kafka connect --->Elastic search
The record in Aurora table structure is something like this
I think record will go to AWS MSK in this format.
"o36347-5d17-136a-9749-Oe46464",0,"NEW_CASE","WRLDCHK","o36347-5d17-136a-9749-Oe46464","<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?><caseCreatedPayload><batchDetails/>","CASE",08-JUL-17 10.02.32.217000000 PM,"TIME","UTC","ON","0a348753-5d1e-17a2-9749-3345,MN4,","","0a348753-5d1e-17af-9749-FGFDGDFV","EOUHEORHOE","2454-5d17-138e-9749-setwr23424","","","",,"","",""
So in order to consume by elastic search i need to use proper schema so schema registry i have to use.
My question
Question 1
How should i use schema registry for above type of message schema registry is required ?.
Do i have to create JSON structure for this and if yes where i have keep that.
More help required here to understand this ?
I have edited
vim /usr/local/confluent/etc/schema-registry/schema-registry.properties
Mentioned zookeper but i did not what is kafkastore.topic=_schema
How to link this to custom schema .
Even i started and got this error
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Topic _schemas not present in metadata after 60000 ms.
Which i was expecting because i did not do anything about schema .
I do have jdbc connector installed and when i start i get below error
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
Question 2
Can i create two onnector on one ec2 (jdbc and elastic serach one ).If yes do i have to start both in sepearte cli ?
Question 3
When i open vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
I see only propeties value like below
name=test-source-sqlite-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
mode=incrementing
incrementing.column.name=id
topic.prefix=trf-aurora-fspaudit-
In the above properties file where i can mention schema name and table name?
Based on answer i am updating my configuration for Kafka connect JDBC
---------------start JDBC connect elastic search -----------------------------
wget /usr/local http://packages.confluent.io/archive/5.2/confluent-5.2.0-2.11.tar.gz -P ~/Downloads/
tar -zxvf ~/Downloads/confluent-5.2.0-2.11.tar.gz -C ~/Downloads/
sudo mv ~/Downloads/confluent-5.2.0 /usr/local/confluent
wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
tar -xzf mysql-connector-java-5.1.48.tar.gz
sudo mv mysql-connector-java-5.1.48 mv /usr/local/confluent/share/java/kafka-connect-jdbc
And then
vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
Then i modified below properties
connection.url=jdbc:mysql://fdgfgdfgrter.us-east-1.rds.amazonaws.com:3306/trf
mode=incrementing
connection.user=admin
connection.password=Welcome123
table.whitelist=PANStatementInstanceLog
schema.pattern=dbo
Last i modified
vim /usr/local/confluent/etc/kafka/connect-standalone.properties
and here i modified below properties
bootstrap.servers=b-3.205147-ertrtr.erer.c5.ertert.us-east-1.amazonaws.com:9092,b-6.ertert-riskaudit.ertet.c5.kafka.us-east-1.amazonaws.com:9092,b-1.ertert-riskaudit.ertert.c5.kafka.us-east-1.amazonaws.com:9092
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/local/confluent/share/java
When i list topic i do not see any topic listed for table name .
Stack trace for the error message
[2020-01-03 07:40:57,169] ERROR Failed to create job for /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties (org.apache.kafka.connect.cli.ConnectStandalone:108)
[2020-01-03 07:40:57,169] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:119)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:116)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
at org.apache.kafka.connect.runtime.AbstractHerder.maybeAddConfigErrors(AbstractHerder.java:423)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:188)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:113)
curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" IPaddressOfKCnode:8083/connectors/ -d '{"name": "emp-connector", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": "1", "connection.url": "jdbc:mysql://IPaddressOfLocalMachine:3306/test_db?user=root&password=pwd","table.whitelist": "emp","mode": "timestamp","topic.prefix": "mysql-" } }'
schema registry is required ?
No. You can enable schemas in json records. JDBC source can create them for you based on the table information
value.converter=org.apache.kafka...JsonConverter
value.converter.schemas.enable=true
Mentioned zookeper but i did not what is kafkastore.topic=_schema
If you want to use Schema Registry, you should be using kafkastore.bootstrap.servers.with the Kafka address, not Zookeeper. So remove kafkastore.connection.url
Please read the docs for explanations of all properties
i did not do anything about schema .
Doesn't matter. The schemas topic gets created when the Registry first starts
Can i create two onnector on one ec2
Yes (ignoring available JVM heap space). Again, this is detailed in the Kafka Connect documentation.
Using standalone mode, you first pass the connect worker configuration, then up to N connector properties in one command
Using distributed mode, you use the Kafka Connect REST API
https://docs.confluent.io/current/connect/managing/configuring.html
When i open vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
First of all, that's for Sqlite, not Mysql/Postgres. You don't need to use the quickstart files, they are only there for reference
Again, all properties are well documented
https://docs.confluent.io/current/connect/kafka-connect-jdbc/index.html#connect-jdbc
I do have jdbc connector installed and when i start i get below error
Here's more information about how you can debug that
https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/
As stated before, I would personally suggest using Debezium/CDC where possible
Debezium Connector for RDS Aurora
I'm guessing that you're planning to use AVRO in order to transfer data so don't forget to specify AVROConverter as the default converter when you start up your Kafka Connect workers. If you will use JSON then Schema Registry is not needed.
1.1 kafkastore.topic=_schema
Have you started up your own schema registry? When you start Schema Registry you'll have to specify the "schemas" topic. Basically, this topic will be used by Schema Registry to store the schemas registered by it and in case of a failure, it can recover them from there.
1.2 jdbc connector installed and when i start i get below error
By default, JDBC Connector only works with SQLite and PostgreSQL. If you would like it to work with a MySQL database then you should add the MySQL Driver to the classpath as well.
2.It depends on how you are deploying your Kafka Connect workers. If you go for Distributed mode ( recommended ) then you don't really need separate CLI's. You can deploy your connectors through the Kafka Connect REST API.
3.There is another property called table.whitelist on which you can specify your schemas and tables. e.g: table.whitelistusers,products,transactions

Debezium CDC Connector task having error : javax.management.InstanceAlreadyExistsException

I had my debezium mysql source connector working on Kafka. I added another debezium mysql source connector using the same database but with different data formats. As a result, my first connector started showing the following error :
[2019-07-11 10:29:09,125] ERROR WorkerSourceTask{id=debezium-connector-0} Task threw an uncaught and unrecoverable exception
(org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: Encountered change event for table db.user whose schema isn't known to this connector
at io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)
at io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:208)
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:508)
at com.github.shyiko.mysql.binlog.BinaryLogClient.notifyEventListeners(BinaryLogClient.java:1095)
at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:943)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:580)
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:825)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.ConnectException: Encountered change event for table db.user whose schema isn't known to this connector
at io.debezium.connector.mysql.BinlogReader.informAboutUnknownTableIfRequired(BinlogReader.java:758)
at io.debezium.connector.mysql.BinlogReader.handleUpdateTableMetadata(BinlogReader.java:733)
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:492)
... 5 more
[2019-07-11 10:29:09,125] ERROR WorkerSourceTask{id=debezium-
connector-krazybee-0} Task is being killed and will not recover
until manually restarted
(org.apache.kafka.connect.runtime.WorkerTask:178)
[2019-07-11 10:29:09,125] INFO Stopping MySQL connector task
(io.debezium.connector.mysql.MySqlConnectorTask:430)
[2019-07-11 10:29:09,125] INFO ChainedReader: Stopping the binlog
reader (io.debezium.connector.mysql.ChainedReader:121)
[2019-07-11 10:29:09,126] INFO Discarding 0 unsent record(s) due
to the connector shutting down
(io.debezium.connector.mysql.BinlogReader:129)
[2019-07-11 10:29:09,126] INFO Discarding 0 unsent record(s) due to the connector shutting down (io.debezium.connector.mysql.BinlogReader:129)
I have restarted the debezium connector using REST API.
Though I understood to the best of my knowledge that the debezium connector is having a mismatch in database history schema, but unable to figure out how to correct it without deleting the existing connector.
I also reloaded the existing connector with previous values using PUT request but of no use.
I believe you are using the same database.history.kafka.topic for both connectors. You should use unqiue topic for each instance.

Kafka Connect - Failed to flush, timed out / Failed to commit offsets

I am getting the following error:
"ERROR WorkerSourceTask(id=test-mysql-dbc-source-0) Failed to flush, timed out while waiting for producer to flush outstanding N messages.
ERROR Failed to commit offsets. (org.apache.kafka.connect.runtime.WorkerSourceTask:XXX)"
The Setup:
I have an EC2 instance in AWS (t2.medium - 2 cores 4GB RAM) which serves as Kafka Server. Another EC2 instance has a sandbox MySQL database and Kafka Connect with Confluent JDBC Source Connector. Python script inserts a couple of rows in a table randomly, to simulate some activity. On my laptop I opened Kafka Console Consumer to read the messages from Kafka Server.
Ports 22, 3306, 9092, 2888 are open to all IP addresses on both EC2 instances.
Below are config files I used for Kafka Connect Source
Config files:
connect-standalone.properties
bootstrap.servers=FIRST_EC2_PUBLIC_IP:9092
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
acks=1
compression.type=lz4
plugin.path=/usr/share/java
jdbc_source.properties
name=test-mysql-jdbc-source
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://localhost:3306/DB_NAME
connection.user=root
connection.password=DB_PASSWORD
table.whitelist=TEST_TABLE
mode=timestamp+incrementing
incrementing.column.name=ID_RECORD
timestamp.column.name=TIMESTMP
topic.prefix=mysql-source-
acks=1
compression.type=lz4
I tried to manipulate with various settings and options. Mostly I tried to play around with offset.flush.timeout.ms and buffer.memory options as advised in this link.
The Behavior of Producer:
So basically, after starting a producer on my second EC2 instance, I can see the messages on my laptop in kafka console consumer, so it works. I can see new records as they appear for some time. However, very often when a new row in a table is created, producer just does not push it to kafka server (first EC2 instance) throwing above mentioned error for about 5 to 20 minutes. The number of outstanding messages doesn't get big. Most of the time it is 2-6 messages. After throwing an error for 5-20 minutes it finally sends the data and the console consumer consumes the data and work fine for some time and after that an error appears again.
If I manually stop the producer and start it again, the outstanding messages flush instantly and can be seen in a console consumer on my laptop.
Could you please point me, where the problem can be, and what can cause such a behavior.