I'm trying to use an incrementing ingest to produce a message to a topic on update of a table in mysql. It works using timestamp but doesn't seem to be working using incrementing column mode. When I insert a new row into the table, I do not see any message published to the topic.
{
"_comment": " --- JDBC-specific configuration below here --- ",
"_comment": "JDBC connection URL. This will vary by RDBMS. Consult your manufacturer's handbook for more information",
"connection.url": "jdbc:mysql://localhost:3306/lte?user=root&password=tiger",
"_comment": "Which table(s) to include",
"table.whitelist": "candidate_score",
"_comment": "Pull all rows based on an timestamp column. You can also do bulk or incrementing column-based extracts. For more information, see http://docs.confluent.io/current/connect/connect-jdbc/docs/source_config_options.html#mode",
"mode": "incrementing",
"_comment": "Which column has the timestamp value to use? ",
"incrementing.column.name": "attempt_id",
"_comment": "If the column is not defined as NOT NULL, tell the connector to ignore this ",
"validate.non.null": "true",
"_comment": "The Kafka topic will be made up of this prefix, plus the table name ",
"topic.prefix": "mysql-"
}
attempt_id is an auto incrementing non null column which is also the primary key.
Actually, its my fault. I was listening to the wrong topic.
Related
I'm trying to configure a Debezium connector for multiple tables in a MySQL database (i'm using debezium 1.4 on a MySQL 8.0).
My company have a nomenclature pattern to follow when creating topics in kafka, and this pattern does not allow the use of underscores (_), so I had to replace them with hyphens (-)
So, my topics names are:
Topic 1
fjf.db.top-domain.domain.sub-domain.transaction-search.order-status
WHERE
- transaction-search = schema "transaction_search"
- order-status = table "order_status".
- All changes in that table, must go to that topic.
Topic 2
fjf.db.top-domain.domain.sub-domain.transaction-search.shipping-tracking
WHERE
- transaction-search = schema "transaction_search"
- shipping-tracking = table "shipping_tracking"
- All changes in that table, must go to that topic.
Topic 3
fjf.db.top-domain.domain.sub-domain.transaction-search.proposal
WHERE
- transaction-search = schema "transaction_search"
- proposal = table "proposal"
- All changes in that table, must go to that topic.
I'm trying to use the transforms "ByLogicalTableRouter", but i can't find a regex solution that solve my case.
{ "name": "debezium.connector",
"config":
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "myhostname",
"database.port": "3306",
"database.user": "debezium",
"database.password": "password",
"database.server.id": "1000",
"database.server.name": "fjf.db.top-domain.domain.sub-domain.transaction-search",
"schema.include.list": "transaction_search",
"table.include.list": "transaction_search.order_status,transaction_search.shipping_tracking,transaction_search.proposal",
"database.history.kafka.bootstrap.servers": "kafka.intranet:9097",
"database.history.kafka.topic": "fjf.db.top-domain.domain.sub-domain.transaction-search.schema-history",
"snapshot.mode": "schema_only",
"transforms":"RerouteName,RerouteUnderscore",
"transforms.RerouteName.type":"io.debezium.transforms.ByLogicalTableRouter",
"transforms.RerouteName.topic.regex":"(.*)transaction_search(.*)",
"transforms.RerouteName.topic.replacement": "$1$2"
"transforms.RerouteUnderscore.type":"io.debezium.transforms.ByLogicalTableRouter",
"transforms.RerouteUnderscore.topic.regex":"(.*)_(.*)",
"transforms.RerouteUnderscore.topic.replacement": "$1-$2"
}
}
In the first transforms,im trying to remove the duplicated schema
name in the topic routering.
In the second transforms, to replace all
remains underscores _ for hiphens -
But with that, I'm getting the error below, which indicates that it is trying to send everything to the same topic
Caused by: org.apache.kafka.connect.errors.SchemaBuilderException: Cannot create field because of field name duplication __dbz__physicalTableIdentifier
How can i make a transform that will forward the events of each table to their respective topic?
Removing the schema name
In the first transforms,im trying to remove the duplicated schema name in the topic routering.
After transforamtion with your regex you'll have two dots, so you need to fix it:
"transforms.RerouteName.topic.regex":"([^.]+)\\.transaction_search\\.([^.]+)",
"transforms.RerouteName.topic.replacement": "$1.$2"
Replace underscores for hiphens
You can try to use ChangeCase SMT from Kafka Connect Common Transformations.
My table have Cas field, I want implement CompareAndSet in save operation
here is my sql code
INSERT INTO `test_cas_table`(id,name,cas) VALUES(3, "test data", 2)
ON DUPLICATE KEY UPDATE
id = VALUES(id),
name = VALUES(name),
cas = IF(cas = VALUES(cas) - 1, VALUES(cas) , "update failure")
because cas field is BIGINT, when cas != VALUES(cas) - 1 will set it with "update failure" cause this execution to fail
but this way is so ugly, Is there a pretty implementation?
and I want know did postgresql have pretty implementation?
I want implement it in once execution
Is your identity column auto generated? If so, there's no need to check for duplicates. Just insert your new information and your database will handle it.
However, if you have an identity column which isn't auto generated (like an email or official document instead of an auto increment integer primary Key), you need to first query your database looking for the value You're about to persist. That way, instead of receiving a SQLException (Java), you check your query result and tell the user to change its email or official document if it was already taken by another user.
I have Cassandra DB with data that has TTL of X hour's for every column value and this needs to be pushed to ElasticSearch Cluster real time.
I have seen past posts on StackOverflow that advise using tools such as LogStash or pushing data directly from application layer.
However, How can one preserve the TTL of the data imported once the data is copied in ES Version >=5.0?
There was once a field called _ttlwhich has been deprecated in ES 2.0 and removed in ES 5.0.
As of ES 5, there are now two official ways of preserving the TTL of your data. First make sure to create a TTL field in your ES documents that would be set to the creation date of your row in Cassandra + the TTL seconds. So if in Cassandra you have a record like this:
INSERT INTO keyspace.table (userid, creation_date, name)
VALUES (3715e600-2eb0-11e2-81c1-0800200c9a66, '2017-05-24', 'Mary')
USING TTL 86400;
Then in ES, you should export the following document to ES:
{
"userid": "3715e600-2eb0-11e2-81c1-0800200c9a66",
"name": "mary",
"creation_date": "2017-05-24T00:00:00.000Z",
"ttl_date": "2017-05-25T00:00:00.000Z"
}
Then you can either:
A. Use a cron that will regularly perform a delete by query based on one of your ttl_date field, i.e. call the following command from your cron:
curl -XPOST localhost:9200/your_index/_delete_by_query -d '{
"query": {
"range": {
"ttl_date": {
"lt": "now"
}
}
}
}'
B. Or use time-based indices and insert each document in an index matching it's ttl_date field. For instance, the above document would be inserted in the index named your_index-2017-05-25. Then with the curator tool you can easily delete indices that have expired.
While creating index I get this error:
[
{
"code": 3000,
"msg": "syntax error - at -",
"query_from_user": "create primary index on sample-partner"
}
]
If I change the bucket name to sample_partner, then it works. Using Couchbase 4.5 Enterprise edition.
Yeah that's because N1QL will interpret the - as a minus sign... You simply need to escape the bucket name using backquotes:
CREATE PRIMARY INDEX ON `sample-partner`;
It should work that way. Remember to always escape that bucket name in all N1QL queries and you should be fine. Or use the underscore in the bucket name, as an alternative :)
DUPLICATE: Couchbase 4 beta “ORDER BY” performance
Like Question title shows, I am facing huge response delay like 13s for one call using Couchbase 4 (N1QL) ORDER BY clause. If I don't use ORDER BY clause every thing is fine.
My Primary Index is
Definition: CREATE PRIMARY INDEX `#primary` ON `default` USING GSI
and secondary index is
Definition: CREATE INDEX `index_location_name` ON `default`(`name`) USING GSI
N1QL Query
req.params.filter can be any key in the location document.
SELECT _id AS id FROM default WHERE type = 'location' ORDER BY " +
req.params.filter + (req.query.descending?' DESC':'') + " LIMIT " +
limit + " OFFSET " + skip
Location Document in my Bucket is
{
"_id": "location::370794",
"name": "Kenai Riverside Fishing",
"avgRating": 0,
"city": "Cooper Landing",
"state": "Alaska",
"country": "USA",
"zipCode": "99572",
"created": "2013-07-10T17:30:00.000Z",
"lastModified": "2015-02-13T12:34:36.923Z",
"type": "location",
}
Any one can tell why ORDER BY clause is making so much delay?
I believe couchbase is not built to handle queries that can be ordered by any field. Since, ordering is an expensive operation in CB, it's always recommended to create an index based on the sorting fields. Also, if the index is built in ascending order, then it can't be used for descending ordering and vice versa. Your best option with CB is to create all the possible indices with asc & desc order if feasible.
I'd also recommend that you consider if elasticsearch would be a better fit for your dynamic search use cases.