Kafka connect setup to send record from Aurora using AWS MSK - mysql

I have to send records from Aurora/Mysql to MSK and from there to Elastic search service
Aurora -->Kafka-connect--->AWS MSK--->kafka connect --->Elastic search
The record in Aurora table structure is something like this
I think record will go to AWS MSK in this format.
"o36347-5d17-136a-9749-Oe46464",0,"NEW_CASE","WRLDCHK","o36347-5d17-136a-9749-Oe46464","<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?><caseCreatedPayload><batchDetails/>","CASE",08-JUL-17 10.02.32.217000000 PM,"TIME","UTC","ON","0a348753-5d1e-17a2-9749-3345,MN4,","","0a348753-5d1e-17af-9749-FGFDGDFV","EOUHEORHOE","2454-5d17-138e-9749-setwr23424","","","",,"","",""
So in order to consume by elastic search i need to use proper schema so schema registry i have to use.
My question
Question 1
How should i use schema registry for above type of message schema registry is required ?.
Do i have to create JSON structure for this and if yes where i have keep that.
More help required here to understand this ?
I have edited
vim /usr/local/confluent/etc/schema-registry/schema-registry.properties
Mentioned zookeper but i did not what is kafkastore.topic=_schema
How to link this to custom schema .
Even i started and got this error
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Topic _schemas not present in metadata after 60000 ms.
Which i was expecting because i did not do anything about schema .
I do have jdbc connector installed and when i start i get below error
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
Question 2
Can i create two onnector on one ec2 (jdbc and elastic serach one ).If yes do i have to start both in sepearte cli ?
Question 3
When i open vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
I see only propeties value like below
name=test-source-sqlite-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
mode=incrementing
incrementing.column.name=id
topic.prefix=trf-aurora-fspaudit-
In the above properties file where i can mention schema name and table name?
Based on answer i am updating my configuration for Kafka connect JDBC
---------------start JDBC connect elastic search -----------------------------
wget /usr/local http://packages.confluent.io/archive/5.2/confluent-5.2.0-2.11.tar.gz -P ~/Downloads/
tar -zxvf ~/Downloads/confluent-5.2.0-2.11.tar.gz -C ~/Downloads/
sudo mv ~/Downloads/confluent-5.2.0 /usr/local/confluent
wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
tar -xzf mysql-connector-java-5.1.48.tar.gz
sudo mv mysql-connector-java-5.1.48 mv /usr/local/confluent/share/java/kafka-connect-jdbc
And then
vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
Then i modified below properties
connection.url=jdbc:mysql://fdgfgdfgrter.us-east-1.rds.amazonaws.com:3306/trf
mode=incrementing
connection.user=admin
connection.password=Welcome123
table.whitelist=PANStatementInstanceLog
schema.pattern=dbo
Last i modified
vim /usr/local/confluent/etc/kafka/connect-standalone.properties
and here i modified below properties
bootstrap.servers=b-3.205147-ertrtr.erer.c5.ertert.us-east-1.amazonaws.com:9092,b-6.ertert-riskaudit.ertet.c5.kafka.us-east-1.amazonaws.com:9092,b-1.ertert-riskaudit.ertert.c5.kafka.us-east-1.amazonaws.com:9092
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/local/confluent/share/java
When i list topic i do not see any topic listed for table name .
Stack trace for the error message
[2020-01-03 07:40:57,169] ERROR Failed to create job for /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties (org.apache.kafka.connect.cli.ConnectStandalone:108)
[2020-01-03 07:40:57,169] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:119)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:116)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
at org.apache.kafka.connect.runtime.AbstractHerder.maybeAddConfigErrors(AbstractHerder.java:423)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:188)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:113)
curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" IPaddressOfKCnode:8083/connectors/ -d '{"name": "emp-connector", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": "1", "connection.url": "jdbc:mysql://IPaddressOfLocalMachine:3306/test_db?user=root&password=pwd","table.whitelist": "emp","mode": "timestamp","topic.prefix": "mysql-" } }'

schema registry is required ?
No. You can enable schemas in json records. JDBC source can create them for you based on the table information
value.converter=org.apache.kafka...JsonConverter
value.converter.schemas.enable=true
Mentioned zookeper but i did not what is kafkastore.topic=_schema
If you want to use Schema Registry, you should be using kafkastore.bootstrap.servers.with the Kafka address, not Zookeeper. So remove kafkastore.connection.url
Please read the docs for explanations of all properties
i did not do anything about schema .
Doesn't matter. The schemas topic gets created when the Registry first starts
Can i create two onnector on one ec2
Yes (ignoring available JVM heap space). Again, this is detailed in the Kafka Connect documentation.
Using standalone mode, you first pass the connect worker configuration, then up to N connector properties in one command
Using distributed mode, you use the Kafka Connect REST API
https://docs.confluent.io/current/connect/managing/configuring.html
When i open vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties
First of all, that's for Sqlite, not Mysql/Postgres. You don't need to use the quickstart files, they are only there for reference
Again, all properties are well documented
https://docs.confluent.io/current/connect/kafka-connect-jdbc/index.html#connect-jdbc
I do have jdbc connector installed and when i start i get below error
Here's more information about how you can debug that
https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/
As stated before, I would personally suggest using Debezium/CDC where possible
Debezium Connector for RDS Aurora

I'm guessing that you're planning to use AVRO in order to transfer data so don't forget to specify AVROConverter as the default converter when you start up your Kafka Connect workers. If you will use JSON then Schema Registry is not needed.
1.1 kafkastore.topic=_schema
Have you started up your own schema registry? When you start Schema Registry you'll have to specify the "schemas" topic. Basically, this topic will be used by Schema Registry to store the schemas registered by it and in case of a failure, it can recover them from there.
1.2 jdbc connector installed and when i start i get below error
By default, JDBC Connector only works with SQLite and PostgreSQL. If you would like it to work with a MySQL database then you should add the MySQL Driver to the classpath as well.
2.It depends on how you are deploying your Kafka Connect workers. If you go for Distributed mode ( recommended ) then you don't really need separate CLI's. You can deploy your connectors through the Kafka Connect REST API.
3.There is another property called table.whitelist on which you can specify your schemas and tables. e.g: table.whitelistusers,products,transactions

Related

Debezium - MySQL manually change binlog position from connect-offsets

I have a running debezium setup for doing CDC from MySQL. Now I want to create one more MySQL connector for another MySQL server. But I don't want snapshot for existing data, I want to start the debezium new connector from a specific file and position.
I read some questions from stackoverflow, they told to manually insert the record to connect-offsets topic. But if I do this what will happen to my existing setup?
On a test server, I tried to set the above solution, but it was not working.
kafka-console-producer --broker-list localhost:9092 --topic connect-offsets
>{"file":"mysql-bin.000002","pos":2012}
>[2019-12-30 05:43:52,666] WARN [Producer clientId=console-producer] Got error produce response with correlation id 4 on topic-partition connect-offsets-5. Error: CORRUPT_MESSAGE (org.apache.kafka.clients.producer.internals.Sender)
[2019-12-30 05:43:52,767] WARN [Producer clientId=console-producer] Got error produce response with correlation id 5 on topic-partition connect-offsets-5, Error: CORRUPT_MESSAGE (org.apache.kafka.clients.producer.internals.Sender)
[2019-12-30 05:43:52,870] WARN [Producer clientId=console-producer] Got error produce response with correlation id 6 on topic-partition connect-offsets-5, Error: CORRUPT_MESSAGE (org.apache.kafka.clients.producer.internals.Sender)
[2019-12-30 05:43:52,975] ERROR Error when sending message to topic connect-offsets with key: null, value: 38 bytes with error: (org.apache.kafka.clients.ingCallback)
org.apache.kafka.common.errors.CorruptRecordException: This message has failed its CRC checksum, exceeds the valid size, has a null key for a compacted to.
Im not sure how to achieve this. Can somebody help me on this?
Connect-offsets topic records are used for connectors offset management, For Debezium MySQL connector the records in that topic has the key which contains the connector name and the MySQL server name which you have configured under your connector configurations.
So for this, you will require that key as well if you want to produce record in that topic only then the Debezium connector will be able to read those offsets.

AWS Glue - Unable to connect to mysql

Hi in this instance database security group is open for all inbound traffic (ALL ports - All sources).
I am also able to connect to the database fine in mysql workbench or Datagrip which for sure use jdbc connection string.
2019-05-21 14:12:03 INFO CatalogClient:651 - Got connection 'tem-sas-main' info from Catalog with url: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:12:03 INFO CatalogClient:684 - JDBC configuration for connection tem-sas-main: JDBCConfiguration(url=jdbc:mysql://my-database-example:3306/sas_tem_central, hostname=my-database-example, port=3306, databaseVendor=mysql, databaseVersion=null, connectionName=tem-sas-main, path=sas_tem_central, subnetId=subnet-0717c4db096e84393, availabilityZone=eu-west-1a, securityGroups=[sg-074b074ebc51c2315], enforceSSL=false)
2019-05-21 14:12:03 INFO JdbcConnection:42 - Starting connecter. driver com.mysql.jdbc.Driver#7e5d9a50
2019-05-21 14:12:03 INFO JdbcConnection:60 - Attempting to connect with SSL host matching: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:14:15 INFO JdbcConnection:69 - SSL connection to data store using host matching failed. Retrying without host matching.
2019-05-21 14:14:15 INFO JdbcConnection:83 - Attempting to connect with SSL: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:16:26 INFO JdbcConnection:88 - SSL connection to data store failed. Retrying without SSL.
2019-05-21 14:16:26 INFO JdbcConnection:102 - Attempting to connect without SSL: jdbc:mysql://my-database-example:3306/sas_tem_central
Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
Also note my JDBC connection string is
jdbc:mysql://my-database-example:3306/sas_tem_central
And Require SSL connection is set to false
After speaking to an AWS Architect, I can confirm that Glue does not work with MySql Version 8 at this time. This is NOT documented anywhere I could find.
Glue's supported Databases (not pointed out prominently in the Glue documentation / faqs)
The following table lists the JDBC driver versions that AWS Glue supports.
Microsoft SQL Server 6.x
MySQL 5.1
Oracle Database 11.2
PostgreSQL 42.x
Amazon Redshift 4.1
Now you can customize your own configuration to connect to MySQL 8 and other newer databases from AWS Glue Jobs.
Download the JDBC driver for MySQL 8
Upload to S3
Create a JDBC connection to your database via Glue - Tables - Connections
(Note this will not work using the "test connection" since the latest version supported is (currently) 5.7
Edit the Glue Job - Dependent jars path - e.g.
s3://yourS3bucket/path/mysql-connector-java-8.0.21.jar
Modify the Glue Job e.g. Python script
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf
(frame = dropnullfields3, catalog_connection = “GLUE-CONNECTION-NAME”,
connection_options = {
"customJdbcDriverS3Path": "s3://yourS3Bucket/path/mysql-connector-java-8.0.21.jar",
"customJdbcDriverClassName": "com.mysql.cj.jdbc.Driver",
"user": “#DBuserName#”, "password": “DbUsersPassword”,
"url":"jdbc:mysql://dbname.url.region.rds.amazonaws.com/schemaName,
"connectionType": "mysql", "dbtable": “tablenameInDb”,
"database": “schemaName”}, transformation_ctx = "datasink4")
(no new lines in that script)
Your connection string looks right.
I was struggling with this same issue and found that I had to troubleshoot a few issues:
Ensure that your security group that you are using for your RDS is self-referenced. This means that you should add the security group from your db instance on RDS to both the inbound and outbound rules for All TCP.
Ensure that you have created an endpoint for your VPC. To do this, navigate to your VPC, click on 'Endpoint', add an 'Endpoint' for the VPC that the RDS is in and create an endpoint for S3.
Finally, the version you create your MySQL RDS instance on should be version 5 (not version 8). I noticed that AWS Glue JDBC connector only worked with MySQL version 5 (5.7)
When you use the AWS RDS with the MySQL Engine, you don't need to create a JDBC connection. You can use the Amazon RDS when you are creating your connection.
New Connection Screen
I first tried with mysql8. It didn't work. It thrown same error.
Later i created mysql5.7 for testing connection. It worked now. I can able to connect Mysql5.7 from Glue. Makesure your VPC-s3 Endpoint are configured properly.
Thanks
Raaz

Unable to access mysql db via jmeter JDBC connection configuraition due to Client side certificate

We have created .jks file to access mysqldb to work on jdbc requests and it has client side certificate security as per Dmitri suggestion in few blogs i read that creating .pem file from ppk, from .pem file to jks using open ssl, I did everything and generated .jks file and imported certificate from jmeter>>options>>ssl manager, but it's giving following error message
"Cannot create PoolableConnectionFactory (Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.)"
- if i need to configure keystore configuration and csvdataset config, how to add value(certificate data which is in JKS format) in csv and pass it into variable in keystore configuration, and also do i need to use ${varname} format in keystore configuraiton?
really i did everything still i did not get resolved the problem
i have attached screen shots of jmeter configuration, please help me to get it resolved
Please go to link for jmeter jdbc configuration
https://drive.google.com/file/d/16HjXR-vIn-KvgSUXEzahgl6DZruVLpjQ/view?usp=sharing

How to pull data from amazon EC2 instance using Confluent Kafka JDBC source connector?

I have confluent platform on my local machine i am just trying to read the data from aws ec2 instance i have credentials like hostname, DB name, pwd etc. I am using JDBC source connector. connector config is
name=test
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
connection.url=jdbc:mysql://ab.bca.bdc.aaa:abcd/DB?user=abc&password=bca
table.whitelist=ppp
mode=incrementing
after running connect standalone got error like
Invalid value java.sql.SQLNonTransientConnectionException: Cannot load connection class because of underlying exception: com.mysql.cj.exceptions.WrongArgumentException: Malformed database URL, failed to parse the connection string near
According to the JDBC MySQL syntax, user and password go before the database address as colon separated, not after
user:password#host_or_host_sublist
Not sure if using RDS or your own EC2 database would change the syntax of that
If you want to capture all database events, though, Debezium (your old question) would be what you want. Using the JDBC connector won't capture deletes (or events created and deleted between polls) and puts unnecessary strain on your database

Unable to connect to snappydata store with spark-shell command

SnappyData v0.5
My goal is to start a "spark-shell" from my SnappyData install's /bin directory and issue Scala commands against existing tables in my SnappyData store.
I am on the same host as my SnappyData store, locator, and lead (and yes, they are all running).
To do this, I am running this command as per the documentation here:
Connecting to a Cluster with spark-shell
~/snappydata/bin$ spark-shell --master local[*] --conf snappydata.store.locators=10.0.18.66:1527 --conf spark.ui.port=4041
I get this error trying to create a spark-shell to my store:
[TRACE 2016/08/12 15:21:55.183 UTC GFXD:error:FabricServiceAPI
tid=0x1] XJ040 error occurred while starting server :
java.sql.SQLException(XJ040): Failed to start datab
ase 'snappydata', see the cause for details.
java.sql.SQLException(XJ040): Failed to start database 'snappydata',
see the cause for details.
at com.pivotal.gemfirexd.internal.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:124)
at com.pivotal.gemfirexd.internal.impl.jdbc.Util.newEmbedSQLException(Util.java:110)
at com.pivotal.gemfirexd.internal.impl.jdbc.Util.newEmbedSQLException(Util.java:136)
at com.pivotal.gemfirexd.internal.impl.jdbc.Util.generateCsSQLException(Util.java:245)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection.bootDatabase(EmbedConnection.java:3380)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection.(EmbedConnection.java:450)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection30.(EmbedConnection30.java:94)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection40.(EmbedConnection40.java:75)
at com.pivotal.gemfirexd.internal.jdbc.Driver40.getNewEmbedConnection(Driver40.java:95)
at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:351)
at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:219)
at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:195)
at com.pivotal.gemfirexd.internal.jdbc.AutoloadedDriver.connect(AutoloadedDriver.java:141)
at com.pivotal.gemfirexd.internal.engine.fabricservice.FabricServiceImpl.startImpl(FabricServiceImpl.java:290)
at com.pivotal.gemfirexd.internal.engine.fabricservice.FabricServerImpl.start(FabricServerImpl.java:60)
at io.snappydata.impl.ServerImpl.start(ServerImpl.scala:32)
Caused by: com.gemstone.gemfire.GemFireConfigException: Unable to
contact a Locator service (timeout=5000ms). Operation either timed out
or Locator does not exist. Configured list of
locators is "[dev-snappydata-1(null):1527]".
at com.gemstone.gemfire.distributed.internal.membership.jgroup.GFJGBasicAdapter.getGemFireConfigException(GFJGBasicAdapter.java:533)
at com.gemstone.org.jgroups.protocols.TCPGOSSIP.sendGetMembersRequest(TCPGOSSIP.java:212)
at com.gemstone.org.jgroups.protocols.PingSender.run(PingSender.java:82)
at java.lang.Thread.run(Thread.java:745)
hmm! I assume you are trying the Spark-shell from your desktop and connecting to the cluster in AWS?
Not sure this is going to work because the local JVM launched by spark-shell will attempt to connect to the p2p cluster in Snappydata which is not likely to work.
Snappy-shell on the other hand merely uses the JDBC client to connect (and, hence will work).
And, you cannot use the locator client port (1527), anyway. See here
Can you try with snappydata.store.locators=10.0.18.66:10334 NOT 1527 as the port ? Unlikely this will work but worth a try.
Maybe there is a way to open up all ports and access to these nodes on AWS. Not recommended for production, though.
I am curious for other responses from the engg team.
Until then, you may have to start the spark-shell from within the network (AWS node).