AWS Glue - Unable to connect to mysql - mysql

Hi in this instance database security group is open for all inbound traffic (ALL ports - All sources).
I am also able to connect to the database fine in mysql workbench or Datagrip which for sure use jdbc connection string.
2019-05-21 14:12:03 INFO CatalogClient:651 - Got connection 'tem-sas-main' info from Catalog with url: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:12:03 INFO CatalogClient:684 - JDBC configuration for connection tem-sas-main: JDBCConfiguration(url=jdbc:mysql://my-database-example:3306/sas_tem_central, hostname=my-database-example, port=3306, databaseVendor=mysql, databaseVersion=null, connectionName=tem-sas-main, path=sas_tem_central, subnetId=subnet-0717c4db096e84393, availabilityZone=eu-west-1a, securityGroups=[sg-074b074ebc51c2315], enforceSSL=false)
2019-05-21 14:12:03 INFO JdbcConnection:42 - Starting connecter. driver com.mysql.jdbc.Driver#7e5d9a50
2019-05-21 14:12:03 INFO JdbcConnection:60 - Attempting to connect with SSL host matching: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:14:15 INFO JdbcConnection:69 - SSL connection to data store using host matching failed. Retrying without host matching.
2019-05-21 14:14:15 INFO JdbcConnection:83 - Attempting to connect with SSL: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:16:26 INFO JdbcConnection:88 - SSL connection to data store failed. Retrying without SSL.
2019-05-21 14:16:26 INFO JdbcConnection:102 - Attempting to connect without SSL: jdbc:mysql://my-database-example:3306/sas_tem_central
Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
Also note my JDBC connection string is
jdbc:mysql://my-database-example:3306/sas_tem_central
And Require SSL connection is set to false

After speaking to an AWS Architect, I can confirm that Glue does not work with MySql Version 8 at this time. This is NOT documented anywhere I could find.

Glue's supported Databases (not pointed out prominently in the Glue documentation / faqs)
The following table lists the JDBC driver versions that AWS Glue supports.
Microsoft SQL Server 6.x
MySQL 5.1
Oracle Database 11.2
PostgreSQL 42.x
Amazon Redshift 4.1
Now you can customize your own configuration to connect to MySQL 8 and other newer databases from AWS Glue Jobs.
Download the JDBC driver for MySQL 8
Upload to S3
Create a JDBC connection to your database via Glue - Tables - Connections
(Note this will not work using the "test connection" since the latest version supported is (currently) 5.7
Edit the Glue Job - Dependent jars path - e.g.
s3://yourS3bucket/path/mysql-connector-java-8.0.21.jar
Modify the Glue Job e.g. Python script
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf
(frame = dropnullfields3, catalog_connection = “GLUE-CONNECTION-NAME”,
connection_options = {
"customJdbcDriverS3Path": "s3://yourS3Bucket/path/mysql-connector-java-8.0.21.jar",
"customJdbcDriverClassName": "com.mysql.cj.jdbc.Driver",
"user": “#DBuserName#”, "password": “DbUsersPassword”,
"url":"jdbc:mysql://dbname.url.region.rds.amazonaws.com/schemaName,
"connectionType": "mysql", "dbtable": “tablenameInDb”,
"database": “schemaName”}, transformation_ctx = "datasink4")
(no new lines in that script)

Your connection string looks right.
I was struggling with this same issue and found that I had to troubleshoot a few issues:
Ensure that your security group that you are using for your RDS is self-referenced. This means that you should add the security group from your db instance on RDS to both the inbound and outbound rules for All TCP.
Ensure that you have created an endpoint for your VPC. To do this, navigate to your VPC, click on 'Endpoint', add an 'Endpoint' for the VPC that the RDS is in and create an endpoint for S3.
Finally, the version you create your MySQL RDS instance on should be version 5 (not version 8). I noticed that AWS Glue JDBC connector only worked with MySQL version 5 (5.7)

When you use the AWS RDS with the MySQL Engine, you don't need to create a JDBC connection. You can use the Amazon RDS when you are creating your connection.
New Connection Screen

I first tried with mysql8. It didn't work. It thrown same error.
Later i created mysql5.7 for testing connection. It worked now. I can able to connect Mysql5.7 from Glue. Makesure your VPC-s3 Endpoint are configured properly.
Thanks
Raaz

Related

MySQL database connections broke after DataGrip Upgrade (2021.3.x)

After upgrading DataGrip to version 2021.3.2, my existing db connections were broken. I connect to various DBs (Oracle, MySql) via an SSH tunnel configured to connect through an AWS bastion host.
After the upgrade - DataGrip suggested that the MySql driver had to be updated to Amazon Aurora MySQL driver and it no longer worked to connect where it worked before the upgrade.
Switching between drivers I get two separate errors:
First Error using Amazon Aurora MySQL (suggested driver after update)
[08000][-1] Could not connect to address=(host=localhost)(port=53929)(type=master) : (conn=57522706) could not load system variables[08000][1220] (conn=57522706) Connection is closed.
and second error using original MySQL driver
[08S01]
Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
No appropriate protocol (protocol is disabled or cipher suites are inappropriate).
All of my connections worked immediately prior to the DataGrip upgrade - so seems that upgrade requires new drivers, which have a problem with the way I connect.
After some looking - it seems that Aurora has an existing race condition - found my answer here: https://jira.mariadb.org/browse/CONJ-824?focusedCommentId=165412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-165412
My solution was to change my usePipelineAuth to false (not set prior) in the Advanced tab of the Data Sources configuration:
This fixed my connections and am back in business.
Finally, In my case, I fix the issue just replacing/using MariaDB Driver.

Glue cannot connect to a mysql hosted on an ec2 (public subnet)

I am trying to connect from Glue to a mysql instance hosted on an EC2.
I deployed mysql database on a public subnet (Yes this is just a POC to test it out).
I created a S3 VPC endpoint (gateway) and attached to my public subnet
For the security groups (inbound and outbound) Im using open to all (Yes this is just a POC to test out Glue connection).
my glue iam role has the glue full access policy and the administrator access policy (yes not best policy but im just trying to check why i cant connect)
I am hitting an error which states that I cant connect. But I can connect to it on a mysql client. So Im pretty sure its nothing to do with my connection string. Any help?
2021-06-11T09:34:19.680+08:00
Copy
Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
2021-06-11T09:34:19.680+08:00 Exiting with error code 30
Glue does not support Mysql 8 because it uses old JDBC driver. So you have to setup your own JDBC for glue:
This feature enables you to connect to data sources with custom drivers that were not natively supported in AWS Glue such as MySQL 8 and Oracle 18.
Alternatively, you can migrate to older MySQL.

AWS DMS not able to connect Mysql source DB but DB is connecting from mysql command

I am working on a project, where I want to replicate data from a source Aurora Mysql to Kinesis with AWS data migration service (DMS).
I am able to connect to source Mysql DB from mysql command and can see whole database:
mysql --host=<host>.amazonaws.com --port=8200 --user=<user> -p
But when I am starting replication task of DMS, DMS is giving connection issue with source DB. Connection with destination (Kinesis) is fine.
Testing connection for source DB from DMS, giving error:
Test Endpoint failed: Application-Status: 1020912, Application-Message: Cannot connect to ODBC provider ODBC general error., Application-Detailed-Message: RetCode: SQL_ERROR SqlState: HY000 NativeError: 2003 Message: [unixODBC][MySQL][ODBC 8.0(w) Driver]Can't connect to MySQL server on '.compute-1.amazonaws.com' (110)
I tried from AWS docs but I am not able to figure out the issue. In the network ACL of VPC, I have allowed ALL incoming traffic for its VPC.
Faced the same issue, not sure if you have figured out the solution. Network setup was fine, was using the same VPC for both RDS and DMS Replication instance, there was no issue with security group, however the issue was with the version of DMS.
RDS MySQL version is 5.7, the ODBC driver present in the higher version doesn't seem to work with the RDS instance or probably something is limiting connection to RDS MySQL 5.7 in DMS, launched a new replication instance in DMS with version 3.3.3 and the DB connectivity worked fine, also data movement is happening fine

How do I connect to my RDS MySQL instance programmatically?

Problem
I launched a MySQL RDS instance and was able to successfully connect to it using MySQL Workbench. However, I am still not able to connect to it from my local workstation using the following URI:
'mysql+pymysql://user:password#db_identifier.XXXXXXXXXX.us-east-1.rds.amazonaws.com:3306/db_name'
or the same URI without the port:
'mysql+pymysql://user:password#db_identifier.XXXXXXXXXX.us-east-1.rds.amazonaws.com/db_name'
The error that I receive when I specify this as my database URI and execute a db.create_all() command is:
sqlalchemy.exc.OperationalError:
(pymysql.err.OperationalError)
(2003, "Can't connect to MySQL server on 'db_identifier.XXXXXXXXXX.us-east-1.rds.amazonaws.com'
([WinError 10060] A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection failed because
connected host has failed to respond)")
Question
What can I do to connect using pymysql? And why would it connect with MySQL Workbench and not through this URI?
Context
I am following the tutorial here. This uses SQLAlchemy to execute the SQL statements in Python.
The RDS instance (and its associated subnet/VPC) have the following:
a security group open on port 3306
NACL rules that allow incoming and outgoing traffic
Public Accessibility set to "Yes"
Check my answer on this post, it could be something about the "Public accesibility" option in your rds "Connectivity and Security" section as described here;
https://stackoverflow.com/a/63514997/2934184

How to pull data from amazon EC2 instance using Confluent Kafka JDBC source connector?

I have confluent platform on my local machine i am just trying to read the data from aws ec2 instance i have credentials like hostname, DB name, pwd etc. I am using JDBC source connector. connector config is
name=test
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
connection.url=jdbc:mysql://ab.bca.bdc.aaa:abcd/DB?user=abc&password=bca
table.whitelist=ppp
mode=incrementing
after running connect standalone got error like
Invalid value java.sql.SQLNonTransientConnectionException: Cannot load connection class because of underlying exception: com.mysql.cj.exceptions.WrongArgumentException: Malformed database URL, failed to parse the connection string near
According to the JDBC MySQL syntax, user and password go before the database address as colon separated, not after
user:password#host_or_host_sublist
Not sure if using RDS or your own EC2 database would change the syntax of that
If you want to capture all database events, though, Debezium (your old question) would be what you want. Using the JDBC connector won't capture deletes (or events created and deleted between polls) and puts unnecessary strain on your database