Kafka Connect sink connector configured for both insert and update - mysql

I have a MySQL database and I use the Confluent Kafka Connect JDBC Sink Connector to insert elements into the table (I am not able to make any changes to the database schemas... other systems rely upon it in its current state). The primary key is set to auto-increment to prevent the possibility of two clients trying to claim the same ID at the same time, so I can't specify it for the insert. The config below is representative of what I would use for insert:
{
"name": "sink-connector-insert",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:mysql://server:3306/database?serverTimezone=UTC&useLegacyDatetimeCode=false"
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "https://registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "https://registry:8081",
"topics": "table",
"connection.user": "user",
"connection.password": "XXXXXXXXXX",
"ssl.mode": "prefer",
"insert.mode": "insert",
}
Later I may want to update the same record, but at that point I know the key and so will include it in the update message. And this one for the update:
{
"name": "sink-connector-update",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:mysql://server:3306/database?serverTimezone=UTC&useLegacyDatetimeCode=false"
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "https://registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "https://registry:8081",
"topics": "table",
"connection.user": "user",
"connection.password": "XXXXXXXXXX",
"ssl.mode": "prefer",
"insert.mode": "update",
"pk.mode": "record_value",
"pk.fields": "id"
}
The naming conventions Kafka Connect uses to tie the topic name to the table name lead to a conflict in the connector actually used... when I do an insert, the update (or upsert) connector will enter a failed state since it needs a key, which I don't have for the insert, and when I try to run an update operation with the key, the insert connector will enter a failed state because it will induce a key collision.
Is there a way I can configure some number of Kafka Connect JDBC Sink connectors and Kafka topics to distinguish between insert and update operations on the same table in a MySQL database?

lead to a conflict in the connector actually used
The connector name isn't the problem for the table. It's that both create unique consumer groups, therefore are processing the exact same offsets, so there's no ordering guarantee that an "update" gets a record after any insert. And even if it did, it'd immediately send an update query for the exact same data.
One connector with upsert insert mode does what you need, regardless of the key requirements. Besides, you're using record_value mode, so it's already ignoring any Kafka record keys you may have.
Related post kafka-connect sink connector pk.mode for table with auto-increment
removed the primary key from the schema and set the pk.mode to none, everything works properly.
Although, I'm not sure how that'll work out for update queries. You may need a different topic for that, but you can still use RegexRouter transform to manipulate the table name so it doesn't need to match the topic.

Related

Newly created tables are not being migrated to Redshift in dms

I have set up a DMS with RDS MYSQL as source endpoint and REDSHIFT as target endpoint with "full and CDC".
setup is working fine and even update, delete stmts are being replicated to Redshift. however when i create a new table in my source RDS MYSQL it's not being repliacted to targer Redshift.
please note- there isnt any primary key assosiated with the new table.
So this is because, whenever a new table is created the DMS user (mysql user) does not have access to read these new tables. You will have to explicitly grant permission to the user to read this new table-
GRANT SELECT ON SCHEMA.TABLE_NAME TO dmsuser_readonly;
Then add supplement logging to allow the user to access the logs for the table-
ALTER TABLE SCHEMA.TABLE_NAME ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
PS: Allow all accesses to the dmsuser using the owner schema user.
Let me know in case of any issues.
To specify the table mappings that you want to apply during migration, you can create a JSON file. If you create a migration task using the console, you can browse for this JSON file or enter the JSON directly into the table mapping box. If you use the CLI or API to perform migrations, you can specify this file using the TableMappings parameter of the CreateReplicationTask or ModifyReplicationTask API operation.
Example of Migrate some tables in a schema
{
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "Test",
"table-name": "%"
},
"rule-action": "include"
}
]
}
create a new rule like above mentioned format and specify your table
name.
rule-id and rule-name should be unique.
for more information please checkout this

"Set foreign_key_checks= 0;" but only for one database

I have an umbrella MySQL server with many databases. I'd like to disable foreign keys for all tables in only one database. However the usual command disables foreign keys for all databases.
Is there a way for it to work in the scope of only one database?
Also, I'd like it to work not per session. But for session-wise globally.
No. The variable applies to all foreign keys on the MySQL instance. There is no way to limit it to the scope of one schema.
The only solutions are the ones you already know:
set foreign_key_checks=0 as a session variable, only for sessions that will access the schema you have in mind.
Drop foreign key constraints in the tables of the schema you have in mind.
Host the schema in a separate MySQL instance.

Spring Boot Switching from MySQL to PostgreSQL and getting unique constraint violations

I am in the process of switching databases for a Spring Boot project from MySQL to PostgreSQL. For the most part this hasn't been too painful, however I am finding that I am getting an exception I can't seem to sort out.
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "r_i_UNIQUE"
Detail: Key (cola_id, colb_id)=(1234567, 12345) already exists.
The behaviour in MySQL when this occurred was to perform an update, which is what I need to occur in PostgreSQL as well.
Code wise, I search for an existing item by the cola_id and colb_id, the object is returned from my repository, is manipulated as required and then persisted using repository.saveAndFlush(existingItem);
It is on the saveAndFlush that I get the exception.
My assumption is that Spring is seeing this as a new object and trying to do an insert rather than an update, but I am not understanding why this would occur as it worked as expected using MySQL.
How does PostgreSQL handle updates using saveAndFlush, and how can I get an update to apply correctly?

on duplicate key update sqlite FMDB

We are creating an app wich uses local and remote DB, the information comes from the remote server and if necessary gets stored in the local DB.
The problem is, some records, will come again to the local DB, and i dont want to duplicate the entry, just update it, and if does not exists, insert a new one.
In Mysql i probably use :
INSERT INTO 'table'
('key', 'name', 'time')
VALUES
(4815162342, user, NOW())
ON DUPLICATE KEY UPDATE
'name' = newname
Is there a way to use it in SQLite, more specifically with FMDB?
I think what you want is more generally referred to as an 'upsert'. See this answer for some recommendations (or search further as there's a bunch more around).
https://stackoverflow.com/a/15277374/297472

Mysql sync relational database

My question :
How can i sync 2 mysql databases (offline local database with master online database) ?
problem is database is relational,and id as always is auto incremenate, so if i just sync using insert it will mess with my referals.
this for a clinic management app i made, problem is currently its on
server but sometimes internet connection goes down/slow on my users
clinic, so i need to let him work on offline mode (store every thing
to local db) and manually sync (bi-directional) with remote database
at end of day.
so basically each clinic should have its own local db and let them all sync to central db
example of tables.
db.Cental.users
|id|user|clinic |
|01|demo|day care|
|02|nurs|er |
|03|demX|day care|
db.day care.users
|id|user|clinic |
|01|demo|day care|
|02|demX|day care|
(note id doesnt necessarily match between central and local db, yet structure of db on both is identical)
example:
database info:
each user have many visits,plugins.
each visit contain 1user as patient_id, and 1user as doctor_id
each plugin have one user, many inputs
plugin_inputs have one plugin
i have 2 databases 1 on the server and other hosted locally -for offline mode-
what i want is to be able to sync locally db with online one, but since i have more than 1 user, more than one local db so each one will have nearly same id's while online db should contain all of them combined.
so how can i sync them together ?
i user php/mysql(online)/sqlite(local)
There are a couple of things you can do.
Firstly, you could consider a composite primary key for the user table - for instance, you could include "Clinic" in the primary key. That way, you know that even if the auto increment values overlap, you can always uniquely identify the user.
The "visit" table then needs to include patient_clinic and doctor_clinic in the foreign keys.
Alternatively - and far dirtier - you can simply set the auto increment fields for each clinic to start at different numbers: ALTER TABLE user AUTO_INCREMENT = 10000; will set the first auto_increment for the user table to 10000; if the next clinic sets it at 20000 (or whatever), you can avoid overlaps (until you exceed that magic number, which is why it's dirty).
I'd be tempted to redesign the application architecture to have an API at the "remote" end and a job that runs at the end of each day at the "local" end to post all the "local" data to the "remote" API.
This way you could have the "local" job create a JSON message similar to this:
{ user_visit:
{
user_id: 1,
doctor_id: 1,
plugins: [
{
plugin_name: name,
data: {
data_field1: "VALUE",
data_field2: "VALUE2",
data_fieldxx: "VALUExx"
}
]
}
}
And send each message to the API. In the API, you would then iterate over the JSON message, reconstructing the relationships and inserting them into the "remote" database accordingly.
Create the "id" and "id_offline" column in the local database. make all relationships between tables with "id_offline" and use automatic increment.
The "id" will match the MySQL "id" and will start with a null value.
When sending the data to the server, a check will be made if the "id" is equal to null, if it is, "insert" if not "update", after that use the MySQL "RETURNING" function to return the automatically generated id and return this information to the device that made the request.
When the device receives the new ids, update the "id" column of the local database to be the same as MySQL.