I am using Debezium on Mysql table to capture changelogs to Kafka with below kafka connect configuration:
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "xxxx",
"database.password": "xxxx",
"database.server.id": "42",
"database.server.name": "xxxx",
"table.whitelist": "demo.movies",
"database.history.kafka.bootstrap.servers": "broker:9092",
"database.history.kafka.topic": "dbhistory.demo" ,
"decimal.handling.mode": "double",
"include.schema.changes": "true",
"transforms": "unwrap,dropTopicPrefix",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.dropTopicPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex":"asgard.demo.(.*)",
"transforms.dropTopicPrefix.replacement":"$1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
}
However it is sending all old records from the table to Kafka topic.
Is there any way to read only new changelog data?
The default behavior is to snapshot the table (take all existing data), then read new data.
To only read new data, you need to add "snapshot.mode" : "schema_only"
Related
We are facing issue with our new connector, we want to capture a change event for mysql table, but we are noticing the error with trace
org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.\n\tat io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42)\n\tat io.debezium.connector.mysql.MySqlStreamingChangeEventSource.handleEvent(MySqlStreamingChangeEventSource.java:369)\n\tat io.debezium.connector.mysql.MySqlStreamingChangeEventSource.lambda$execute$25(MySqlStreamingChangeEventSource.java:860)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient.notifyEventListeners(BinaryLogClient.java:1125)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:973)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:599)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:857)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: io.debezium.DebeziumException: Error processing binlog event\n\t... 7 more\nCaused by: io.debezium.DebeziumException: Encountered change event for table tablename whose schema isn't known to this connector\n\tat io.debezium.connector.mysql.MySqlStreamingChangeEventSource.informAboutUnknownTableIfRequired(MySqlStreamingChangeEventSource.java:654)\n\tat io.debezium.connector.mysql.MySqlStreamingChangeEventSource.handleUpdateTableMetadata(MySqlStreamingChangeEventSource.java:633)\n\tat io.debezium.connector.mysql.MySqlStreamingChangeEventSource.lambda$execute$13(MySqlStreamingChangeEventSource.java:831)\n\tat io.debezium.connector.mysql.MySqlStreamingChangeEventSource.handleEvent(MySqlStreamingChangeEventSource.java:349
We are anticipating that after downgrading from mysql version 8.0 to 5.7 we have started facing this issue. We tried deleting the history topic and snapshot mode configs but aren't able to solve it.
The properties are as follows :
{
"name": "speed-account-table-v3",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "none",
"topic.creation.default.partitions": "1",
"tasks.max": "1",
"database.history.consumer.sasl.jaas.config": "jass config",
"database.history.kafka.topic": "speed-history.speed-account-table-new",
"bootstrap.servers": "cluster name",
"database.history.consumer.security.protocol": "SASL_SSL",
"tombstones.on.delete": "true",
"snapshot.new.tables": "parallel",
"topic.creation.default.replication.factor": "2",
"database.history.skip.unparseable.ddl": "true",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"database.allowPublicKeyRetrieval": "true",
"database.history.producer.sasl.mechanism": "SCRAM-SHA-512",
"database.user": "username",
"database.server.id": "server id",
"database.history.producer.security.protocol": "SASL_SSL",
"database.history.kafka.bootstrap.servers": "cluster name",
"database.server.name": "speed-account-v3",
"database.port": "portnumber",
"key.converter.schemas.enable": "false",
"value.converter.schema.registry.url": "xxxx",
"database.hostname": "xxxxxx",
"database.password": "xxxxx",
"value.converter.schemas.enable": "false",
"name": "speed-account-table-v3",
"table.include.list": "speed.tbl_account",
"database.history.consumer.sasl.mechanism": "SCRAM-SHA-512",
"snapshot.mode": "initial",
"database.include.list": "speed"
}
}
We have tried changing our infrastructure by using a different kafka connect as well as completely different MSK, but failed.
So i want to extract the value of email field from mysql database user's table only when an INSERT operation happens.
See my configuration so far:
{
"name": "smartdevnewuserconnector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "root",
"database.password": "*PASSWORD*",
"database.server.id": "184055",
"database.allowPublicKeyRetrieval":"true",
"database.server.name": "smartdevdbserver1",
"database.include.list": "smartdev_db",
"database.history.kafka.bootstrap.servers": "kafka:29092",
"database.history.kafka.topic": "schema-changes.smartdev_db",
"table.whitelist": "smartdev_db.users",
"column.blacklist": "smartdev_db.users.id,smartdev_db.users.password, smartdev_db.users.fullName, smartdev_db.users.address, smartdev_db.users.phoneNo, smartdev_db.users.gender, smartdev_db.users.userRole, smartdev_db.users.User_status, smartdev_db.users.reason_for_inactive, smartdev_db.users.firstvisit, smartdev_db.users.last_changed_PW, smartdev_db.users.regDate",
"transforms": "unwrap, copyEmailValue",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.copyEmailValue.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.copyEmailValue.field": "email"
}
}
I have succeeded in extracting the email value but don't know how to go about the part b - which is extract the value of email only for an INSERT operation.
See what i tried:
{
"name": "smartdevnewuserconnector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "root",
"database.password": "*PASSWORD*",
"database.server.id": "184055",
"database.allowPublicKeyRetrieval":"true",
"database.server.name": "smartdevdbserver1",
"database.include.list": "smartdev_db",
"database.history.kafka.bootstrap.servers": "kafka:29092",
"database.history.kafka.topic": "schema-changes.smartdev_db",
"table.whitelist": "smartdev_db.users",
"column.blacklist": "smartdev_db.users.id,smartdev_db.users.password, smartdev_db.users.fullName, smartdev_db.users.address, smartdev_db.users.phoneNo, smartdev_db.users.gender, smartdev_db.users.userRole, smartdev_db.users.User_status, smartdev_db.users.reason_for_inactive, smartdev_db.users.firstvisit, smartdev_db.users.last_changed_PW, smartdev_db.users.regDate",
"transforms": "unwrap, copyEmailValue, filter",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.copyEmailValue.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.copyEmailValue.field": "email",
"transforms.filter.type": "io.debezium.transforms.Filter",
"transforms.filter.language": "jsr223.groovy",
"transforms.filter.condition": "value.op == 'c'"
}
}
After doing this, i got the error: op is not a property of value; then i realized that ought to be true because i am overwriting the original content of value with the extracted email value.
Please can someone guide me on how to achieve my purpose.
Your filter transform needs to be listed before you remove the op field with the unwrap transform
Try
"transforms": "filter,unwrap,copyEmailValue",
i had connect my postgres data base to sync on MySql database.
The create and update events work's fine on sink, but when i delete a row on source (no just a data from column) it's gives a error.
I've tried somethings but without lucky.
1 - When i don't put "createKey" and "extractInt" in "transform" on my MySql Sink, i receive a error and the column don't create with bigserial.
"BLOB/TEXT column 'id_consultor' used in key specification without a key length".
2 - But if i put in my configurations to "createKey" and "extractInt" work's fine on create and delete, but gives this error on delete events:
"Only Map objects supported in absence of schema for [copying fields from value to key], found: null".
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id_consultor",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id_consultor"
3 - If i put on my source (Postgres)
**"transforms.unwrap.delete.handling.mode":"rewrite"**
The delete works partially executing a "soft delete" don't erase the row, just erase all data and preserve not null fields filling with 0.
Somebody could help me? Thanks!
Postgres Connector:
"name": "postgres-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "**",
"database.port": "5432",
"database.user": "**",
"database.password": "**",
"database.dbname" : "**",
"database.server.name": "kafkaPostgres",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "history",
"schema.include.list": "public",
"table.include.list": "public.consultor",
"time.precision.mode": "connect",
"tombstones.on.delete": "true",
"plugin.name": "pgoutput",
"transforms": "unwrap, dropPrefix",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode":"rewrite",
"transforms.unwrap.add.fields": "table,lsn",
"transforms.unwrap.add.headers": "db",
"transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex":"kafkaPostgres.public.(.*)",
"transforms.dropPrefix.replacement":"$1"
MySql Sink:
"name": "mysql-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "consultor",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"connection.url": "**,
"connection.user":"**",
"connection.password":"**",
"auto.create": "true",
"auto.evolve": "true",
"insert.mode": "upsert",
"dialect.name": "MySqlDatabaseDialect",
"Database Dialect": "MySqlDatabaseDialect",
"table.name.format": "consultor",
"pk.mode": "record_key",
"pk.fields": "id_consultor",
"delete.enabled": "true",
"drop.invalid.message": "true",
"delete.retention.ms": 1,
"fields.whitelist": "id_consultor, idempresaorganizacional, cd_consultor_cpf, dt_consultor_nascimento , ds_justificativa, nn_consultor , cd_consultor_rg, id_motivo, id_situacao , id_sub_motivo",
"transforms": "unwrap, flatten, route, createKey, extractInt ",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode":"rewrite",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.flatten.delimiter": ".",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "(?:[^.]+)\\.(?:[^.]+)\\.([^.]+)",
"transforms.route.replacement": "$1",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id_consultor",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field": "id_consultor"
i've added this properties on connector:
"key.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"key.converter.apicurio.registry.url" :"http://apicurio:8080/api",
"key.converter.apicurio.registry.global-id": "io.apicurio.registry.utils.serde.strategy.GetOrCreateIdStrategy",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url":"http://apicurio:8080/api",
"value.converter.apicurio.registry.global-id": "io.apicurio.registry.utils.serde.strategy.GetOrCreateIdStrategy",
"key.converter.schemas.enable": "true",
"value.converter.schemas.enable": "true",
and replaces this on sink:
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
to:
"key.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"key.converter.apicurio.registry.url" :"http://apicurio:8080/api",
"key.converter.apicurio.registry.global-id": "io.apicurio.registry.utils.serde.strategy.GetOrCreateIdStrategy",
"value.converter": "io.apicurio.registry.utils.converter.AvroConverter",
"value.converter.apicurio.registry.url":"http://apicurio:8080/api",
"value.converter.apicurio.registry.global-id": "io.apicurio.registry.utils.serde.strategy.GetOrCreateIdStrategy",
"key.converter.schemas.enable": "true",
"value.converter.schemas.enable": "true",
and everything worked well, but i can't read the mensages in topic because i'm using the debezium kafka version and there hasn't a avro console reader.
Now i'm try to some plugin for this version to able read avro files.
I hope help's somebody.
I started the MySQL Debezium Kafka Connector(Version: 0.9.2.Final) with one table in the "table.whitelist" and It was working fine. While adding another table in the whitelist and restarts the connector, I am getting the below error.
org.apache.kafka.connect.errors.ConnectException: Encountered change event for table paperclip.iltwhose schema isn't known to this connector
at io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)
at io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:208)
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:477)
at com.github.shyiko.mysql.binlog.BinaryLogClient.notifyEventListeners(BinaryLogClient.java:1095)
at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:943)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:580)
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:825)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.ConnectException: Encountered change event for table paperclip.iltwhose schema isn't known to this connector
at io.debezium.connector.mysql.BinlogReader.informAboutUnknownTableIfRequired(BinlogReader.java:727)
at io.debezium.connector.mysql.BinlogReader.handleUpdateTableMetadata(BinlogReader.java:702)
at io.debezium.connector.mysql.BinlogReader.handleEvent(BinlogReader.java:461)
... 5 more
Please find the below configuration I have used. I hope with this setting("database.history.store.only.monitored.tables.ddl": "false"), it should work.
How can I resolve the case?
{
"name": "Mysql-rnd-engagex",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "3",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"database.hostname": "devmysql.xxxx.net",
"database.port": "3306",
"database.user": "xxxxxx",
"database.password": "xxxxx",
"database.server.name": "rnd_engagex_cdc",
"database.history.kafka.bootstrap.servers": "xxxxxx.aivencloud.com:xxxx",
"database.history.kafka.topic": "rnd_engagex_dbhistory",
"database.history.skip.unparseable.ddl": "false",
"database.history.store.only.monitored.tables.ddl": "false",
"include.schema.changes": "false",
"include.query": "false",
"table.ignore.builtin": "true",
"database.whitelist": "paperclip",
"table.whitelist": "paperclip.elearning", //added new table : "paperclip.elearning,paperclip.ilt"
"column.blacklist": "paperclip.elearning.description",
"gtid.source.filter.dml.events": "true",
"tombstones.on.delete": "true",
"connect.keep.alive": "true",
"snapshot.minimal.locks": "true",
"database.history.producer.ssl.truststore.location": "/xxxx/yyyy/keys/public.truststore.jks",
"value.converter.schemas.enable": "false",
"database.history.consumer.ssl.truststore.location": "/xxxx/yyyy/keys/public.truststore.jks",
"database.history.producer.ssl.truststore.password": "password",
"database.history.producer.ssl.keystore.location": "/xxxx/yyyy/keys/public.keystore.p12",
"database.history.consumer.ssl.truststore.password": "password",
"database.history.consumer.ssl.keystore.location": "/xxxx/yyyy/keys/public.keystore.p12",
"database.history.producer.ssl.keystore.type": "PKCS12",
"database.history.producer.ssl.keystore.password": "ppppppppp",
"database.history.consumer.ssl.key.password": "ppppppppp",
"database.history.producer.security.protocol": "SSL",
"database.history.consumer.ssl.keystore.type": "PKCS12",
"database.history.consumer.ssl.keystore.password": "ppppppppp",
"database.history.producer.ssl.key.password": "ppppppppp",
"database.history.consumer.security.protocol": "SSL",
"key.converter.schemas.enable": "false"
}
You need to add property "snapshot.new.tables":"parallel" while creating connector, then only you will be able to whitelist more tables at later stage. This is not given in documentation, since the feature came as beta in 0.9.x
I'm using the Debezium MySQL CDC source connector to move a database from mysql to Kafka. The connector is working fine except for the snapshots where it's acting weird; the connector took the first snapshots successfully then after few hours went down for some heap memory limit (This is not the problem). I paused the connector, stoped the worker on the cluster, fixed the issue then started the worker again... The connector is now running fine but taking snapshots again!
it looks like the connector is not resuming from where it left off. and I think something is wrong in my configs.
I'm using debezium 0.95.
I changed the snapshot.mode=initial to initial_only but it didn't work.
Connect properties:
{
"properties": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"snapshot.locking.mode": "minimal",
"errors.log.include.messages": "false",
"table.blacklist": "mydb.someTable",
"include.schema.changes": "true",
"database.jdbc.driver": "com.mysql.cj.jdbc.Driver",
"database.history.kafka.recovery.poll.interval.ms": "100",
"poll.interval.ms": "500",
"heartbeat.topics.prefix": "__debezium-heartbeat",
"binlog.buffer.size": "0",
"errors.log.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"snapshot.fetch.size": "100000",
"errors.retry.timeout": "0",
"database.user": "kafka_readonly",
"database.history.kafka.bootstrap.servers": "bootstrap:9092",
"internal.database.history.ddl.filter": "DROP TEMPORARY TABLE IF EXISTS .+ /\\* generated by server \\*/,INSERT INTO mysql.rds_heartbeat2\\(.*\\) values \\(.*\\) ON DUPLICATE KEY UPDATE value \u003d .*,FLUSH RELAY LOGS.*,flush relay logs.*",
"heartbeat.interval.ms": "0",
"header.converter": "org.apache.kafka.connect.json.JsonConverter",
"autoReconnect": "true",
"inconsistent.schema.handling.mode": "fail",
"enable.time.adjuster": "true",
"gtid.new.channel.position": "latest",
"ddl.parser.mode": "antlr",
"database.password": "pw",
"name": "mysql-cdc-replication",
"errors.tolerance": "none",
"database.history.store.only.monitored.tables.ddl": "false",
"gtid.source.filter.dml.events": "true",
"max.batch.size": "2048",
"connect.keep.alive": "true",
"database.history": "io.debezium.relational.history.KafkaDatabaseHistory",
"snapshot.mode": "initial_only",
"connect.timeout.ms": "30000",
"max.queue.size": "8192",
"tasks.max": "1",
"database.history.kafka.topic": "history-topic",
"snapshot.delay.ms": "0",
"database.history.kafka.recovery.attempts": "100",
"tombstones.on.delete": "true",
"decimal.handling.mode": "double",
"snapshot.new.tables": "parallel",
"database.history.skip.unparseable.ddl": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"table.ignore.builtin": "true",
"database.whitelist": "mydb",
"bigint.unsigned.handling.mode": "long",
"database.server.id": "6022",
"event.deserialization.failure.handling.mode": "fail",
"time.precision.mode": "adaptive_time_microseconds",
"errors.retry.delay.max.ms": "60000",
"database.server.name": "host",
"database.port": "3306",
"database.ssl.mode": "disabled",
"database.serverTimezone": "UTC",
"task.class": "io.debezium.connector.mysql.MySqlConnectorTask",
"database.hostname": "host",
"database.server.id.offset": "10000",
"connect.keep.alive.interval.ms": "60000",
"include.query": "false"
}
}
I can confirm Gunnar's answer above. Ran into some issues during snapshotting, and had to restart the whole snapshotting process. Right now, the connector does not support resuming snapshot at a certain point. Your configs seems fine to me. Hope this helps.