No File-system for scheme: HTTP when submitted druid task for creation of data-source to supervisor - hadoop2

Setup : HDFS --- Cluster (1-master ,2 Slaves) Druid --- Cluster (1-(zk,coordinator,overload),1-(historical,middile manager),1-(broker))
Version using for Druid : imply-2.3.9,HDFS : 2.7.3
common.runtime.properties for druid :For HDFS:
druid.storage.type=hdfs
druid.storage.storageDirectory=http://hadoopmachince:9000/druid/segments
When i posted task to http://druidip:port/druid/indexer/v1/supervisor for creating datasource . Task get failed with below exception.
Error Stack Trace :
1)Error injecting constructor, java.io.IOException: No FileSystem for scheme: http
at io.druid.storage.hdfs.HdfsDataSegmentPusher.(HdfsDataSegmentPusher.java:63)
while locating io.druid.storage.hdfs.HdfsDataSegmentPusher
at io.druid.storage.hdfs.HdfsStorageDruidModule.configure(HdfsStorageDruidModule.java:97) (via modules: com.google.inject.util.Modules$OverrideModule -> io.druid.storage.hdfs.HdfsStorageDruidModule)
while locating io.druid.segment.loading.DataSegmentPusher annotated with #com.google.inject.multibindings.Element(setName=,uniqueId=152, type=MAPBINDER, keyType=java.lang.String)

Need to change from http to hdfs
common.runtime.properties for druid :For HDFS:
druid.storage.type=hdfs druid.storage.storageDirectory=hdfs://hadoopmachince:9000/druid/segments

Related

Error timeout when I want to read file from hadoop with pyspark

I want to read a csv file from hadoop with Pyspark with the following code:
dfcsv = spark.read.csv("hdfs://my_hadoop_cluster_ip:9000/user/root/input/test.csv")
dfcsv.printSchema()
My cluster hadoop is on a Docker container on my local machine and link with two other slave container for the workers.
As you see in this picture from my ui hadoop cluster, the path is the right path.
But when I submit my script with this command :
spark-submit --master spark://my_cluster_spark_ip:7077 test.py
My script stuck on the read, and after few minutes I have this following error :
22/02/09 15:42:29 WARN TaskSetManager: Lost task 0.1 in stage 4.0 (TID 4) (my_slave_spark_ip executor 1): org.apache.hadoop.net.ConnectTimeoutException: Call From spark-slave1/my_slave_spark_ip to my_hadoop_cluster_ip:9000 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=my_hadoop_cluster_ip/my_hadoop_cluster_ip:9000]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
...
For information, my csv file is very small, just 3 lines and 64 KB.
Have you any solution to fix this issue?

How to connect to JBoss EAP 7.3 using VisualVM in OpenShift

I am trying to connect an application with VisualVM, but VisualVM unable to connect with application: Below is the env:
JBoss EAP 7.3
Java 11
OpenShift
I have tried to configure it in different ways, but all failed:
Config try 1:
Use few env variables in script file, so that it could execute first (file contents are mentioned below):
echo *** Adding system property for VisulVM ***
batch
/system-property=jboss.modules.system.pkgs:add(value="org.jboss.byteman,com.manageengine,org.jboss.logmanager")
/system-property=java.util.logging.manager:add(value="org.jboss.logmanager.LogManager")
run-batch
I can see that above commands executed successfully and above properties are available in JBoss config (I verified using Jboss cli command).
JAVA_TOOLS_OPTIONS: -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=n -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=3000 -Dcom.sun.management.jmxremote.rmi.port=3001 -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Xbootclasspath/a:/opt/eap/modules/system/layers/base/org/jboss/log4j/logmanager/main/log4j-jboss-logmanager-1.2.0.Final-redhat-00001.jar -Xbootclasspath/a:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.1.14.Final-redhat-00001.jar
Result:
- java.lang.RuntimeException: WFLYCTL0079: Failed initializing module org.jboss.as.logging
- Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: WFLYLOG0078: The logging subsystem requires the log manager to be org.jboss.logmanager.LogManager. The subsystem has not be initialized and cannot be used. To use JBoss Log Manager you must add the system property "java.util.logging.manager" and set it to "org.jboss.logmanager.LogManager"
Config 2:
JAVA_TOOL_OPTIONS= -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=n -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=3000 -Dcom.sun.management.jmxremote.rmi.port=3001 -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Djboss.modules.system.pkgs=org.jboss.byteman,org.jboss.logmanager -Xbootclasspath/a:/opt/eap/modules/system/layers/base/org/jboss/log4j/logmanager/main/log4j-jboss-logmanager-1.2.0.Final-redhat-00001.jar -Xbootclasspath/a:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.1.14.Final-redhat-00001.jar
Result:
WARNING: Failed to instantiate LoggerFinder provider; Using default.
java.lang.IllegalStateException: The LogManager was not properly installed (you must set the "java.util.logging.manager" system property to "org.jboss.logmanager.LogManager")
Config 3:
• Modify in standalone.conf, where I put all required configuration in this file.
Result:
WARNING: Failed to instantiate LoggerFinder provider; Using default.
java.lang.IllegalStateException: The LogManager was not properly installed (you must set the "java.util.logging.manager" system property to "org.jboss.logmanager.LogManager")
Kindly suggest that what is the correct configurations?

Apache Drill: Failure setting up ZK for client

I am testing Apache Drill with a two server cluster.
Let's say their external IPs are:
1.1.1.1
2.2.2.2
I first setup Zookeeper to run on both, and when I do the status command I get positive response:
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: leader
The way I have my zoo.cfg to get it working was like this:
Server 1:
// other default values omitted
clientPort=2181
server.1=0.0.0.0:2888:3888
server.2=2.2.2.2:2888:3888
Server 2:
// other default values omitted
clientPort=2181
server.1=1.1.1.1:2888:3888
server.2=0.0.0.0:2888:3888
Next I wanted to get Drill running with this cluster, so I modify the drill-override.conf file for the 2 servers as follows:
Server 1:
drill.exec: {
cluster-id: "test",
zk.connect: "1.1.1.1:2181,2.2.2.2:2181"
}
Server 2:
drill.exec: {
cluster-id: "test",
zk.connect: "2.2.2.2:2181,1.1.1.1:2181"
}
I can start a drillbit on both servers, and when I do status I get this response on both servers:
drillbit is running.
But when I then try to open the console via bin/drill-conf I get this stack trace:
Error: Failure in connecting to Drill: org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client. (state=,code=0)
java.sql.SQLException: Failure in connecting to Drill: org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client.
at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:159)
at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:64)
at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:69)
at net.hydromatic.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:126)
at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167)
at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213)
at sqlline.Commands.connect(Commands.java:1083)
at sqlline.Commands.connect(Commands.java:1015)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
at sqlline.SqlLine.dispatch(SqlLine.java:742)
at sqlline.SqlLine.initArgs(SqlLine.java:528)
at sqlline.SqlLine.begin(SqlLine.java:596)
at sqlline.SqlLine.start(SqlLine.java:375)
at sqlline.SqlLine.main(SqlLine.java:268)
Caused by: org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client.
at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:208)
at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:151)
... 18 more
Caused by: java.io.IOException: Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds.
at org.apache.drill.exec.coord.zk.ZKClusterCoordinator.start(ZKClusterCoordinator.java:123)
at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:206)
... 19 more
apache drill 1.7.0
"start your sql engine"
Why would drill fail to connect to the ZK cluster, which is running just fine?
All ports are open between these two boxes.
Pre-Requisites
Prerequisites for starting drill in distributed mode:
(Required) Running Oracle JDK version 7
(Required) Running a ZooKeeper quorum
(Recommended) Running a Hadoop cluster
(Recommended) Using DNS
Configuration
As your server IP address:
Server 1 - 1.1.1.1
Server 2 - 2.2.2.2
Put same configuration in zoo.cfg in both Server 1 and Server 2
clientPort=2181
server.1=1.1.1.1:2888:3888
server.2=2.2.2.2:2888:3888
Similarly same configuration in drill-override.conf for both the servers
drill.exec: {
cluster-id: "test",
zk.connect: "1.1.1.1:2181,2.2.2.2:2181"
}
Starting Drill
Start drillbit on all the cluster nodes using
bin/drillbit.sh start
Using Drill
Web UI:
Open web UI using any node address. For example:
1.1.1.1:8047
Via Shell:
Fire bin/drill-localhost command and drill shell will appear.
Verify Installation
From drill shell or UI fire
SELECT * FROM sys.drillbits;
Drill lists information about the Drillbits that are running
Stopping Drill
Fire command
bin/drillbit.sh stop

Multi nodes hadoop cluster configuration

I'm new to Hadoop cluster and trying to deploy a multi-node cluster on ubuntu 15.10 with ONE master and TWO slaves. After configuration, there's TWO active nodes(two slaves). However, when I tried hadoop example program below
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 3 100
I got an error of connection refused:
Job job_1459774851310_0001 failed with state FAILED due to: Application application_1459774851310_0001 failed 2 times due to Error launching appattempt_1459774851310_0001_000002. Got exception: java.net.ConnectException: Call From ubuntu/127.0.1.1 to ubuntu:36380 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
To deploy this cluster, I disabled ipv6 in all machines, and edited configuration files as following shows:
In file core-site.xml:
fs.defaultFS = hdfs://master:8020
In file hdfs-site.xml:
dfs.namenode.name.dir = $HADOOP_PREFIX/namenode
dfs.datanode.data.dir = $HADOOP_PREFIX/datanode
In file yarn-site.xml:
yarn.resourcemanager.address = master:8084
yarn.resourcemanager.schedular.address = master:8085
yarn.resourcemanager.resource-tracker.address = master:8086
yarn.resourcemanager.admin.address = master:8087
yarn.resourcemanager.webapp.address = master:8088
yarn.nodemanager.aux-services = mapreduce_shuffle
In file mapred_site.xml
mapreduce.framework.name = yarn
mapreduce.jobhistory.address = master:10020
mapreduce.jobhistory.address = master:19888
Those 4 files are the same on all machines.
Where did I make mistakes? How to fix it?
In file slaves, I wrote only the IP addresses of two slaves.

Oozie - Got exception running sqoop: Could not load db driver class: com.mysql.jdbc.Driver

I am trying to perform an sqoop export on HDP sandbox 2.1 via Oozie. When I run the Oozie job I get the following java runtime exception.
'>>> Invoking Sqoop command line now >>>
7598 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR
has not been set in the environment. Cannot check for additional
configuration.
7714 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version:
1.4.4.2.1.1.0-385
7760 [main] WARN org.apache.sqoop.SqoopOptions - Character argument
'\t' has multiple characters; only the first will be used.
7791 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has
not been set in the environment. Cannot check for additional
configuration.
7904 [main] INFO org.apache.sqoop.manager.MySQLManager - Preparing
to use a MySQL streaming resultset.
7905 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code
generation
7946 [main] ERROR org.apache.sqoop.Sqoop - Got exception running
Sqoop: java.lang.RuntimeException: Could not load db driver class:
com.mysql.jdbc.Driver Intercepting System.exit(1)
I have copied jdbc driver file "mysql-connector-java.jar" to Oozie's shared library folder which I believe is "/usr/lib/oozie/share/lib/sqoop/". I have restarted my sandbox and tried to perform the export with Oozie again and I still get the same error.
The export works perfectly fine when I try performing it only via sqoop, so I presume Oozie needs its own set of drivers.
My question is, which Oozie directory am I suppose to copy my jdbc drivers to?
If you guys think I'm doing something wrong or you need further information, please let me know.
Thank you for your time.
Normally for Oozie the sharelib directory is /user/oozie/share/lib/ on HDFS where "oozie" would be the name of the user which is used to start the Oozie Server. I don't know what that is in case of HDP sandbox 2.1 , but you can use ps command to figure that out.
And for jars needed for sqoop action, I think you should copy the jar to /user/oozie/share/lib/sqoop/ folder.