Datanode not able to contact name node on AWS - hadoop2

I am trying to setup a hadoop cluster on AWS with two datanodes and one namenode.
I have followed tutorial point multinode cluster setup. I have started the name node and secondary name node and datanodes from the namenode server.
Datanode is started but not able to connect to the name node. I get the following error:
17/02/06 13:31:06 INFO ipc.Client: Retrying connect to server: hadoop-master/172.31.3.137:9000. Already tried 0 time(s); maxRetries=45

Related

How do I connect to my RDS MySQL instance programmatically?

Problem
I launched a MySQL RDS instance and was able to successfully connect to it using MySQL Workbench. However, I am still not able to connect to it from my local workstation using the following URI:
'mysql+pymysql://user:password#db_identifier.XXXXXXXXXX.us-east-1.rds.amazonaws.com:3306/db_name'
or the same URI without the port:
'mysql+pymysql://user:password#db_identifier.XXXXXXXXXX.us-east-1.rds.amazonaws.com/db_name'
The error that I receive when I specify this as my database URI and execute a db.create_all() command is:
sqlalchemy.exc.OperationalError:
(pymysql.err.OperationalError)
(2003, "Can't connect to MySQL server on 'db_identifier.XXXXXXXXXX.us-east-1.rds.amazonaws.com'
([WinError 10060] A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection failed because
connected host has failed to respond)")
Question
What can I do to connect using pymysql? And why would it connect with MySQL Workbench and not through this URI?
Context
I am following the tutorial here. This uses SQLAlchemy to execute the SQL statements in Python.
The RDS instance (and its associated subnet/VPC) have the following:
a security group open on port 3306
NACL rules that allow incoming and outgoing traffic
Public Accessibility set to "Yes"
Check my answer on this post, it could be something about the "Public accesibility" option in your rds "Connectivity and Security" section as described here;
https://stackoverflow.com/a/63514997/2934184

AWS Glue - Unable to connect to mysql

Hi in this instance database security group is open for all inbound traffic (ALL ports - All sources).
I am also able to connect to the database fine in mysql workbench or Datagrip which for sure use jdbc connection string.
2019-05-21 14:12:03 INFO CatalogClient:651 - Got connection 'tem-sas-main' info from Catalog with url: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:12:03 INFO CatalogClient:684 - JDBC configuration for connection tem-sas-main: JDBCConfiguration(url=jdbc:mysql://my-database-example:3306/sas_tem_central, hostname=my-database-example, port=3306, databaseVendor=mysql, databaseVersion=null, connectionName=tem-sas-main, path=sas_tem_central, subnetId=subnet-0717c4db096e84393, availabilityZone=eu-west-1a, securityGroups=[sg-074b074ebc51c2315], enforceSSL=false)
2019-05-21 14:12:03 INFO JdbcConnection:42 - Starting connecter. driver com.mysql.jdbc.Driver#7e5d9a50
2019-05-21 14:12:03 INFO JdbcConnection:60 - Attempting to connect with SSL host matching: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:14:15 INFO JdbcConnection:69 - SSL connection to data store using host matching failed. Retrying without host matching.
2019-05-21 14:14:15 INFO JdbcConnection:83 - Attempting to connect with SSL: jdbc:mysql://my-database-example:3306/sas_tem_central
2019-05-21 14:16:26 INFO JdbcConnection:88 - SSL connection to data store failed. Retrying without SSL.
2019-05-21 14:16:26 INFO JdbcConnection:102 - Attempting to connect without SSL: jdbc:mysql://my-database-example:3306/sas_tem_central
Check that your connection definition references your JDBC database with correct URL syntax, username, and password. Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
Also note my JDBC connection string is
jdbc:mysql://my-database-example:3306/sas_tem_central
And Require SSL connection is set to false
After speaking to an AWS Architect, I can confirm that Glue does not work with MySql Version 8 at this time. This is NOT documented anywhere I could find.
Glue's supported Databases (not pointed out prominently in the Glue documentation / faqs)
The following table lists the JDBC driver versions that AWS Glue supports.
Microsoft SQL Server 6.x
MySQL 5.1
Oracle Database 11.2
PostgreSQL 42.x
Amazon Redshift 4.1
Now you can customize your own configuration to connect to MySQL 8 and other newer databases from AWS Glue Jobs.
Download the JDBC driver for MySQL 8
Upload to S3
Create a JDBC connection to your database via Glue - Tables - Connections
(Note this will not work using the "test connection" since the latest version supported is (currently) 5.7
Edit the Glue Job - Dependent jars path - e.g.
s3://yourS3bucket/path/mysql-connector-java-8.0.21.jar
Modify the Glue Job e.g. Python script
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf
(frame = dropnullfields3, catalog_connection = “GLUE-CONNECTION-NAME”,
connection_options = {
"customJdbcDriverS3Path": "s3://yourS3Bucket/path/mysql-connector-java-8.0.21.jar",
"customJdbcDriverClassName": "com.mysql.cj.jdbc.Driver",
"user": “#DBuserName#”, "password": “DbUsersPassword”,
"url":"jdbc:mysql://dbname.url.region.rds.amazonaws.com/schemaName,
"connectionType": "mysql", "dbtable": “tablenameInDb”,
"database": “schemaName”}, transformation_ctx = "datasink4")
(no new lines in that script)
Your connection string looks right.
I was struggling with this same issue and found that I had to troubleshoot a few issues:
Ensure that your security group that you are using for your RDS is self-referenced. This means that you should add the security group from your db instance on RDS to both the inbound and outbound rules for All TCP.
Ensure that you have created an endpoint for your VPC. To do this, navigate to your VPC, click on 'Endpoint', add an 'Endpoint' for the VPC that the RDS is in and create an endpoint for S3.
Finally, the version you create your MySQL RDS instance on should be version 5 (not version 8). I noticed that AWS Glue JDBC connector only worked with MySQL version 5 (5.7)
When you use the AWS RDS with the MySQL Engine, you don't need to create a JDBC connection. You can use the Amazon RDS when you are creating your connection.
New Connection Screen
I first tried with mysql8. It didn't work. It thrown same error.
Later i created mysql5.7 for testing connection. It worked now. I can able to connect Mysql5.7 from Glue. Makesure your VPC-s3 Endpoint are configured properly.
Thanks
Raaz

AWS EB Operational Error "Lost Connection to MySQL Server"

I'm new to AWS services and I'm trying to deploy a simple Django app to Elastic Beanstalk. I found this documentation that says "to decouple your database instance from your environment, you can run a database instance in Amazon RDS and configure your application to connect to it on launch."
I followed this documentation to set up a PostgreSQL DB and uploaded my source code. When I deployed the project, the health check passed. So I tried to follow the URL to my site, and it just tries to load forever.
I checked the error logs (/var/log/httpd/error_log) and found OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0").
Why is it trying to connect to a MySQL when I want to use the PostgreSQL database?

Unable to connect to snappydata store with spark-shell command

SnappyData v0.5
My goal is to start a "spark-shell" from my SnappyData install's /bin directory and issue Scala commands against existing tables in my SnappyData store.
I am on the same host as my SnappyData store, locator, and lead (and yes, they are all running).
To do this, I am running this command as per the documentation here:
Connecting to a Cluster with spark-shell
~/snappydata/bin$ spark-shell --master local[*] --conf snappydata.store.locators=10.0.18.66:1527 --conf spark.ui.port=4041
I get this error trying to create a spark-shell to my store:
[TRACE 2016/08/12 15:21:55.183 UTC GFXD:error:FabricServiceAPI
tid=0x1] XJ040 error occurred while starting server :
java.sql.SQLException(XJ040): Failed to start datab
ase 'snappydata', see the cause for details.
java.sql.SQLException(XJ040): Failed to start database 'snappydata',
see the cause for details.
at com.pivotal.gemfirexd.internal.impl.jdbc.SQLExceptionFactory40.getSQLException(SQLExceptionFactory40.java:124)
at com.pivotal.gemfirexd.internal.impl.jdbc.Util.newEmbedSQLException(Util.java:110)
at com.pivotal.gemfirexd.internal.impl.jdbc.Util.newEmbedSQLException(Util.java:136)
at com.pivotal.gemfirexd.internal.impl.jdbc.Util.generateCsSQLException(Util.java:245)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection.bootDatabase(EmbedConnection.java:3380)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection.(EmbedConnection.java:450)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection30.(EmbedConnection30.java:94)
at com.pivotal.gemfirexd.internal.impl.jdbc.EmbedConnection40.(EmbedConnection40.java:75)
at com.pivotal.gemfirexd.internal.jdbc.Driver40.getNewEmbedConnection(Driver40.java:95)
at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:351)
at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:219)
at com.pivotal.gemfirexd.internal.jdbc.InternalDriver.connect(InternalDriver.java:195)
at com.pivotal.gemfirexd.internal.jdbc.AutoloadedDriver.connect(AutoloadedDriver.java:141)
at com.pivotal.gemfirexd.internal.engine.fabricservice.FabricServiceImpl.startImpl(FabricServiceImpl.java:290)
at com.pivotal.gemfirexd.internal.engine.fabricservice.FabricServerImpl.start(FabricServerImpl.java:60)
at io.snappydata.impl.ServerImpl.start(ServerImpl.scala:32)
Caused by: com.gemstone.gemfire.GemFireConfigException: Unable to
contact a Locator service (timeout=5000ms). Operation either timed out
or Locator does not exist. Configured list of
locators is "[dev-snappydata-1(null):1527]".
at com.gemstone.gemfire.distributed.internal.membership.jgroup.GFJGBasicAdapter.getGemFireConfigException(GFJGBasicAdapter.java:533)
at com.gemstone.org.jgroups.protocols.TCPGOSSIP.sendGetMembersRequest(TCPGOSSIP.java:212)
at com.gemstone.org.jgroups.protocols.PingSender.run(PingSender.java:82)
at java.lang.Thread.run(Thread.java:745)
hmm! I assume you are trying the Spark-shell from your desktop and connecting to the cluster in AWS?
Not sure this is going to work because the local JVM launched by spark-shell will attempt to connect to the p2p cluster in Snappydata which is not likely to work.
Snappy-shell on the other hand merely uses the JDBC client to connect (and, hence will work).
And, you cannot use the locator client port (1527), anyway. See here
Can you try with snappydata.store.locators=10.0.18.66:10334 NOT 1527 as the port ? Unlikely this will work but worth a try.
Maybe there is a way to open up all ports and access to these nodes on AWS. Not recommended for production, though.
I am curious for other responses from the engg team.
Until then, you may have to start the spark-shell from within the network (AWS node).

Connecting hivemetastore through datanode

Currently i am using namednode, mysql for metastore and hive CLI on one single node. And another nodes as datanode. mysql is running on master (namenode). It works fine when i try to get metadata(show tables) hive(on master). Now i was trying to get the same metadata on datanode. But couldn't successed. I also tried starting thrift service on master and then tried to connect still didn't successed.
After reading this on the Apache wiki it looks like you have to go through the thift service if you have the metastore service running on a remote server.
I would just add the hive.metastore.uris property to your hive-site.xml and call it a day.