Simple YARN benchmark TestDFSIO fails - exception

I've setup hadoop on a two node cluster. The first node "namenode" runs the following daemons:
hadoop#namenode:~$ jps
2916 SecondaryNameNode
2692 NameNode
3159 NodeManager
5834 Jps
2771 DataNode
3076 ResourceManager
The seconds node "datanode" runs the following daemons:
hadoop#datanode:~$ jps
2559 Jps
2087 DataNode
2198 NodeManager
In the /etc/hosts file I added on BOTH machines:
10.240.40.246 namenode
10.240.172.201 datanode
which are the corresponding ips and I check that I can ssh to any other machine from each machine. Now, I wanted to test my hadoop installation by performing a sample map reduce benchmark job:
hadoop#namenode:~$ hadoop jar /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
However this job fails:
14/02/17 22:22:53 INFO fs.TestDFSIO: TestDFSIO.1.7
14/02/17 22:22:53 INFO fs.TestDFSIO: nrFiles = 20
14/02/17 22:22:53 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
14/02/17 22:22:53 INFO fs.TestDFSIO: bufferSize = 1000000
14/02/17 22:22:53 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/02/17 22:22:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/02/17 22:22:55 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 20 files
14/02/17 22:22:56 INFO fs.TestDFSIO: created control files for: 20 files
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:57 INFO mapred.FileInputFormat: Total input paths to process : 20
14/02/17 22:22:57 INFO mapreduce.JobSubmitter: number of splits:20
14/02/17 22:22:57 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/17 22:22:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392675199090_0001
14/02/17 22:22:59 INFO impl.YarnClientImpl: Submitted application application_1392675199090_0001 to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:59 INFO mapreduce.Job: The url to track the job: http://namenode.c.forward-camera-473.internal:8088/proxy/application_1392675199090_0001/
14/02/17 22:22:59 INFO mapreduce.Job: Running job: job_1392675199090_0001
14/02/17 22:23:10 INFO mapreduce.Job: Job job_1392675199090_0001 running in uber mode : false
14/02/17 22:23:10 INFO mapreduce.Job: map 0% reduce 0%
14/02/17 22:23:42 INFO mapreduce.Job: map 20% reduce 0%
14/02/17 22:23:43 INFO mapreduce.Job: map 30% reduce 0%
14/02/17 22:24:14 INFO mapreduce.Job: map 60% reduce 0%
14/02/17 22:24:41 INFO mapreduce.Job: map 60% reduce 20%
14/02/17 22:24:45 INFO mapreduce.Job: map 85% reduce 20%
14/02/17 22:24:48 INFO mapreduce.Job: map 85% reduce 28%
14/02/17 22:24:59 INFO mapreduce.Job: map 90% reduce 28%
14/02/17 22:25:00 INFO mapreduce.Job: map 90% reduce 30%
14/02/17 22:25:02 INFO mapreduce.Job: map 100% reduce 30%
14/02/17 22:25:03 INFO mapreduce.Job: map 100% reduce 100%
14/02/17 22:25:16 INFO mapreduce.Job: map 0% reduce 0%
14/02/17 22:25:16 INFO mapreduce.Job: Job job_1392675199090_0001 failed with state FAILED due to: Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application.
14/02/17 22:25:16 INFO mapreduce.Job: Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:443)
at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:425)
at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:755)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:115)
at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Having a look into the log file I find on the machine datanode in that:
hadoop#datanode:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-nodemanager-datanode.log
...
2014-02-17 22:29:33,432 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
On my namenode I did:
hadoop#namenode:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-*log
2014-02-17 22:13:20,833 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG:
...
2014-02-17 22:13:25,240 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config.
...
2014-02-17 22:13:25,505 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (3.6 G). Thrashing might happen.
...
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000023
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000024
...
2014-02-17 22:25:15,733 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1392675199090_0001_02_000001 is : 1
2014-02-17 22:25:15,734 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1392675199090_0001_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
...
2014-02-17 22:25:15,736 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
...
2014-02-17 22:25:15,751 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1392675199090_0001 CONTAINERID=container_1392675199090_0001_02_000001
...
2014-02-17 22:13:19,150 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: STARTUP_MSG:
...
2014-02-17 22:25:15,837 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application. APPID=application_1392675199090_0001
However, I checked on machine namenode that port 8031 is listening. I get:
hadoop#namenode:~$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 namenode.c.forwar:36975 metadata.google.in:http TIME_WAIT
tcp 0 0 namenode.c.forwar:36969 metadata.google.in:http TIME_WAIT
tcp 0 0 namenode.c.forwar:40616 namenode.c.forwar:10001 TIME_WAIT
tcp 0 0 namenode.c.forwar:36974 metadata.google.in:http ESTABLISHED
tcp 0 0 namenode.c.forward:8031 namenode.c.forwar:41229 ESTABLISHED
tcp 0 352 namenode.c.forward-:ssh e178064245.adsl.a:64305 ESTABLISHED
tcp 0 0 namenode.c.forwar:41229 namenode.c.forward:8031 ESTABLISHED
tcp 0 0 namenode.c.forwar:40365 namenode.c.forwar:10001 ESTABLISHED
tcp 0 0 namenode.c.forwar:10001 namenode.c.forwar:40365 ESTABLISHED
tcp 0 0 namenode.c.forwar:10001 datanode:48786 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
unix 10 [ ] DGRAM 4604 /dev/log
unix 2 [ ] STREAM CONNECTED 10490
unix 2 [ ] STREAM CONNECTED 10488
unix 2 [ ] STREAM CONNECTED 10452
unix 2 [ ] STREAM CONNECTED 8452
unix 2 [ ] STREAM CONNECTED 7800
unix 2 [ ] STREAM CONNECTED 7797
unix 2 [ ] STREAM CONNECTED 6762
unix 2 [ ] STREAM CONNECTED 6702
unix 2 [ ] STREAM CONNECTED 6698
unix 2 [ ] STREAM CONNECTED 6208
unix 2 [ ] DGRAM 5750
unix 2 [ ] DGRAM 5737
unix 2 [ ] DGRAM 5734
unix 3 [ ] STREAM CONNECTED 5643
unix 3 [ ] STREAM CONNECTED 5642
unix 2 [ ] DGRAM 5640
unix 2 [ ] DGRAM 5192
unix 2 [ ] DGRAM 5171
unix 2 [ ] DGRAM 4889
unix 2 [ ] DGRAM 4723
unix 2 [ ] DGRAM 4663
unix 3 [ ] DGRAM 3132
unix 3 [ ] DGRAM 3131
So, what could be the problem here. In my opinion everything is setup fine. Why is my job failing then?

The log on the datanode says
Retrying connect to server: 0.0.0.0/0.0.0.0:8031
So it trys to connect to this port on the local machine which is datanode. However, the service runs on namenode. Therefore one has to add the following config lines to yarn-site.xml
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>namenode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>namenode:8088</value>
</property>
where namenode is an alias in /etc/hosts for the machine that runs the resource manager daemon.

Also add the same properties in the yarn-site.xml file on the namenode to ensure that these services connect to the same ports.

Related

How to connect Mysql with logstash?

When I was on windows this file working well but now I must restart my project on Linux and I don't know how can I do it ? bellow see logstash.conf xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
cammand : ./logstash -f longstash.conf
input {
jdbc {
jdbc_connection_string =>"jdbc:mysql://localhost:3306/test"
jdbc_user =>"root"
jdbc_password =>"password"
jdbc_driver_library =>"/usr/share/java/mysql-connector-java-5.1.45.jar"
jdbc_driver_class =>"com.mysql.jdbc.Driver"
schedule =>"* * * * *"
statement =>"SELECT * FROM Pro WHERE last_modificate >:sql_last_value"
use_column_value =>true
tracking_column =>last_modificate
}
}
output {
elasticsearch {
hosts =>"localhost:9200"
action=>update
document_id =>"%{id}"
doc_as_upsert =>true
index =>"blog"
document_type =>"pro"
}
}
And bellow see the error :
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2019-02-06 16:24:38.441 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2019-02-06 16:24:38.495 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.6.0"}
[WARN ] 2019-02-06 16:24:58.845 [Converge PipelineAction::Create<main>] elasticsearch - You are using a deprecated config setting "document_type" set in elasticsearch. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. Document types are being deprecated in Elasticsearch 6.0, and removed entirely in 7.0. You should avoid this feature If you have any questions about this, please visit the #logstash channel on freenode irc. {:name=>"document_type", :plugin=><LogStash::Outputs::ElasticSearch hosts=>[//localhost:9200], doc_as_upsert=>true, action=>"update", index=>"blog", id=>"0d9b8021264f8db7c25bca76842096f28d088e42d8e84a573b39874bc2c38c19", document_id=>"%{id}", document_type=>"pro", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_0d66fa34-7e13-432a-9405-8084af971c1a", enable_metric=>true, charset=>"UTF-8">, workers=>1, manage_template=>true, template_name=>"logstash", template_overwrite=>false, script_type=>"inline", script_lang=>"painless", script_var_name=>"event", scripted_upsert=>false, retry_initial_interval=>2, retry_max_interval=>64, retry_on_conflict=>1, ilm_enabled=>false, ilm_rollover_alias=>"logstash", ilm_pattern=>"{now/d}-000001", ilm_policy=>"logstash-policy", ssl_certificate_verification=>true, sniffing=>false, sniffing_delay=>5, timeout=>60, pool_max=>1000, pool_max_per_route=>100, resurrect_delay=>5, validate_after_inactivity=>10000, http_compression=>false>}
[INFO ] 2019-02-06 16:24:58.963 [Converge PipelineAction::Create<main>] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[INFO ] 2019-02-06 16:25:00.378 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[WARN ] 2019-02-06 16:25:01.241 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"}
[INFO ] 2019-02-06 16:25:02.692 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>6}
[WARN ] 2019-02-06 16:25:02.705 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[INFO ] 2019-02-06 16:25:02.805 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[INFO ] 2019-02-06 16:25:02.881 [Ruby-0-Thread-5: :1] elasticsearch - Using mapping template from {:path=>nil}
[INFO ] 2019-02-06 16:25:03.044 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date"}, "#version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[INFO ] 2019-02-06 16:25:03.726 [Converge PipelineAction::Create<main>] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x5e9c3d30 run>"}
[INFO ] 2019-02-06 16:25:03.868 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2019-02-06 16:25:05.345 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
Wed Feb 06 16:26:05 CET 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Feb 06 16:26:06 CET 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[ERROR] 2019-02-06 16:26:06.396 [Ruby-0-Thread-15: :1] jdbc - Unable to connect to database. Tried 1 times {:error_message=>"Java::JavaSql::SQLException: Access denied for user 'root'#'localhost'"}
{ 2014 rufus-scheduler intercepted an error:
2014 job:
2014 Rufus::Scheduler::CronJob "* * * * *" {}
2014 error:
2014 2014
2014 Sequel::DatabaseConnectionError
2014 Java::JavaSql::SQLException: Access denied for user 'root'#'localhost'
2014 com.mysql.jdbc.SQLError.createSQLException(com/mysql/jdbc/SQLError.java:965)
2014 com.mysql.jdbc.MysqlIO.checkErrorPacket(com/mysql/jdbc/MysqlIO.java:3973)
2014 com.mysql.jdbc.MysqlIO.checkErrorPacket(com/mysql/jdbc/MysqlIO.java:3909)
2014 com.mysql.jdbc.MysqlIO.checkErrorPacket(com/mysql/jdbc/MysqlIO.java:873)
2014 com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(com/mysql/jdbc/MysqlIO.java:1710)
2014 com.mysql.jdbc.MysqlIO.doHandshake(com/mysql/jdbc/MysqlIO.java:1226)
2014 com.mysql.jdbc.ConnectionImpl.coreConnect(com/mysql/jdbc/ConnectionImpl.java:2188)
2014 com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(com/mysql/jdbc/ConnectionImpl.java:2219)
2014 com.mysql.jdbc.ConnectionImpl.createNewIO(com/mysql/jdbc/ConnectionImpl.java:2014)
2014 com.mysql.jdbc.ConnectionImpl.<init>(com/mysql/jdbc/ConnectionImpl.java:776)
2014 com.mysql.jdbc.JDBC4Connection.<init>(com/mysql/jdbc/JDBC4Connection.java:47)
2014 java.lang.reflect.Constructor.newInstance(java/lang/reflect/Constructor.java:423)
As the error is self explanatory.
2014 Java::JavaSql::SQLException: Access denied for user 'root'#'localhost'
kindly check the configuration credentials. On linux have have to go /usr/share/logstash directory then run the following commnad.
sudo bin/logstash -f /etc/logstash/conf.d/yourfilename.conf

spring boot mysql application issues using docker compose

OS details
Mint version 19,
Code name : Tara,
PackageBase : Ubuntu Bionic
Cinnamon (64-bit)
I followed this URL to install mysql 5.7 and workbench 6.3
I can check mysql service running
xxxxxxxxx:~$ sudo netstat -nlpt | grep 3306
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 1547/mysqld
I also checked bind-address inside file: mysqld.cnf under directory : /etc/mysql/mysql.conf.d/
bind-address = 127.0.0.1
I have written a simple spring application using mysql
Here are all classes written
SpringBootDataJpaExampleApplication
#EnableJpaRepositories(basePackages = "com.springbootdev.examples.repository")
#SpringBootApplication
public class SpringBootDataJpaExampleApplication
{
public static void main(String[] args) {
SpringApplication.run(SpringBootDataJpaExampleApplication.class, args);
}
}
UserController
#RestController
#RequestMapping("/api")
public class UserController {
#Autowired
private UserRepository userRepository;
#GetMapping("/create")
public List<User> users() {
User users = new User();
users.setId(new Long(1));
users.setName("Sam");
users.setCountry("Development");
userRepository.save(users);
return userRepository.findAll();
}
#GetMapping("/users")
public List<User> findAll()
{
return userRepository.findAll();
}
}
UserRepository
public interface UserRepository extends JpaRepository<User, Long>
{
}
User
#Entity
#Table(name = "user")
public class User {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
private String name;
private String country;
//getters and setters
}
partial pom.xml file. Please see I am using spring boot 1.5.9 and jdk 1.8
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.5.9.RELEASE</version>
<relativePath/>
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
application.properties
spring.datasource.url = jdbc:mysql://mysql-standalone:3306/test
spring.datasource.username = testuser
spring.datasource.password = testpassword
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.MySQL5Dialect
spring.jpa.hibernate.ddl-auto = create
Dockerfile
FROM openjdk:8
VOLUME /tmp
EXPOSE 8080
ADD target/spring-boot-app.jar spring-boot-app.jar
ENTRYPOINT ["java","-jar","spring-boot-app.jar"]
When I run this application locally using IDE it works fine and I can access:
http://localhost:8080/api/users/
and http://localhost:8080/api/create/
I build image out of my application using command
docker build . -t spring-boot-app
I can see image is being built.
xxxxxxxxxx:$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
spring-boot-app latest 069e53a7c389 27 minutes ago 652MB
mysql 5.7 1b30b36ae96a 8 days ago 372MB
Now I run Command to run the mysql container
docker run --name mysql-standalone -e MYSQL_ROOT_PASSWORD=password -e MYSQL_DATABASE=test -e MYSQL_USER=testuser -e MYSQL_PASSWORD=testpassword -d mysql:5.7
Then I connect to mysql-standalone container from my application(spring-boot-app) (referencing this)
docker run --name spring-boot-app-container --link mysql-standalone:mysql -d spring-boot-app
With this, I can see my application working fine from docker using
http://(docker-container-ip):8080/api/users/ and http://(docker-container-ip):8080/api/create/
Here I wanted to make the same thing work with docker-compose.
Referenced this to install docker compose.
xxxxxxxxxx:~$ docker-compose --version
docker-compose version 1.22.0, build f46880fe
Then I created file docker-compose.yml in my project directory.
version: '3'
services:
mysql-docker-container:
image: mysql:5.7
environment:
- MYSQL_ROOT_PASSWORD=adminpassword
- MYSQL_DATABASE=test
- MYSQL_USER=testuser
- MYSQL_PASSWORD=testpassword
volumes:
- /data/mysql
spring-boot-app-container:
image: spring-boot-app
build:
context: .
dockerfile: Dockerfile
depends_on:
- mysql-docker-container
ports:
- 8087:8080
volumes:
- /data/spring-boot-app
Changed one line in my application.properties and commented earlier datasource url as below.
spring.datasource.url = jdbc:mysql://mysql-docker-container:3306/test
#spring.datasource.url = jdbc:mysql://mysql-standalone:3306/test
Did clean install so new target jar gets created. I also deleted images and containers from docker.
Then ran below command:
docker-compose up
This is what I see in logs
Creating spring-boot-data-jpa-mysql-docker-no-composer-master_mysql-docker-container_1 ... done
Creating spring-boot-data-jpa-mysql-docker-no-composer-master_spring-boot-app-container_1 ... done
Attaching to spring-boot-data-jpa-mysql-docker-no-composer-master_mysql-docker-container_1, spring-boot-data-jpa-mysql-docker-no-composer-master_spring-boot-app-container_1
spring-boot-app-container_1 |
spring-boot-app-container_1 | . ____ _ __ _ _
spring-boot-app-container_1 | /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
spring-boot-app-container_1 | ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
spring-boot-app-container_1 | \\/ ___)| |_)| | | | | || (_| | ) ) ) )
spring-boot-app-container_1 | ' |____| .__|_| |_|_| |_\__, | / / / /
spring-boot-app-container_1 | =========|_|==============|___/=/_/_/_/
spring-boot-app-container_1 | :: Spring Boot :: (v1.5.9.RELEASE)
spring-boot-app-container_1 |
spring-boot-app-container_1 | 2018-10-26 01:24:48.748 INFO 1 --- [ main] .s.e.SpringBootDataJpaExampleApplication : Starting SpringBootDataJpaExampleApplication v0.0.1-SNAPSHOT on 582c035536e4 with PID 1 (/spring-boot-app.jar started by root in /)
spring-boot-app-container_1 | 2018-10-26 01:24:48.752 INFO 1 --- [ main] .s.e.SpringBootDataJpaExampleApplication : No active profile set, falling back to default profiles: default
spring-boot-app-container_1 | 2018-10-26 01:24:48.875 INFO 1 --- [ main] ationConfigEmbeddedWebApplicationContext : Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext#28c97a5: startup date [Fri Oct 26 01:24:48 UTC 2018]; root of context hierarchy
spring-boot-app-container_1 | 2018-10-26 01:24:51.022 INFO 1 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat initialized with port(s): 8080 (http)
spring-boot-app-container_1 | 2018-10-26 01:24:51.063 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
spring-boot-app-container_1 | 2018-10-26 01:24:51.065 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet Engine: Apache Tomcat/8.5.23
spring-boot-app-container_1 | 2018-10-26 01:24:51.218 INFO 1 --- [ost-startStop-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
spring-boot-app-container_1 | 2018-10-26 01:24:51.218 INFO 1 --- [ost-startStop-1] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 2399 ms
spring-boot-app-container_1 | 2018-10-26 01:24:51.335 INFO 1 --- [ost-startStop-1] o.s.b.w.servlet.ServletRegistrationBean : Mapping servlet: 'dispatcherServlet' to [/]
spring-boot-app-container_1 | 2018-10-26 01:24:51.340 INFO 1 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'characterEncodingFilter' to: [/*]
spring-boot-app-container_1 | 2018-10-26 01:24:51.341 INFO 1 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'hiddenHttpMethodFilter' to: [/*]
spring-boot-app-container_1 | 2018-10-26 01:24:51.341 INFO 1 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'httpPutFormContentFilter' to: [/*]
spring-boot-app-container_1 | 2018-10-26 01:24:51.341 INFO 1 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean : Mapping filter: 'requestContextFilter' to: [/*]
spring-boot-app-container_1 | 2018-10-26 01:24:51.895 ERROR 1 --- [ main] o.a.tomcat.jdbc.pool.ConnectionPool : Unable to create initial connections of pool.
spring-boot-app-container_1 |
spring-boot-app-container_1 | com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
spring-boot-app-container_1 |
spring-boot-app-container_1 | The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
spring-boot-app-container_1 | at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_181]
How to fix this error:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
I can see lot of answers on this forum for this similar error message. Tried bunch of those options/answers, but that didn't work.
No answer talks about this combination (linux + spring boot + mysql + docker compose)
Note: This has worked fine without using docker-compose. Have already mentioned the same in above description. Am I making any mistake in docker-compose file or application properties file?
I did see lot of people posting about adding hikari dependency in pom.xml if you are using any spring-boot version < 2.0
<!-- Spring Boot Data 2.0 includes HikariCP by default -->
<!-- <dependency>
<groupId>com.zaxxer</groupId>
<artifactId>HikariCP</artifactId>
<version>3.1.0</version>
</dependency> -->
With that I thought of using same application but made changes to my pom.xml as below
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.6.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
Then I followed exact same things as mentioned in description of issue. With that I saw bit more clear error as below:
spring-boot-app-container_1 | 2018-10-27 18:51:47.259 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
spring-boot-app-container_1 | 2018-10-27 18:51:48.464 ERROR 1 --- [ main] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Exception during pool initialization.
spring-boot-app-container_1 |
spring-boot-app-container_1 | com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
spring-boot-app-container_1 |
spring-boot-app-container_1 | The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
That confirms spring-boot 2.0 uses hikari datasource default.
Now coming back to how I solved the issue.
I changed connection string in application.properties like below:
spring.datasource.url = jdbc:mysql://mysql-docker-container:3306/test?autoReconnect=true&failOverReadOnly=false&maxReconnects=10&useSSL=false
instead of earlier used:
spring.datasource.url = jdbc:mysql://mysql-docker-container:3306/test
The answer was simple. Below change worked for me in Spring-boot 2.0 as well as Spring-boot 1.5.9: (Add this to your connection string)
?autoReconnect=true&failOverReadOnly=false&maxReconnects=10
Some handy commands:
Once containers are up, you can check ip addresses of containers using command:
docker inspect -f '{{.Name}} - {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -aq)
========================================
UPDATE: some additional handy information...
Below is DEPRECATED
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
This needs to be replaced by
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
To fix HikariCP Pool initialization issue/exception, please set HikariCP’s initializationFailTimeout property to 0 (zero), or a negative number.
# HikaryCP Properties
spring.datasource.hikari.initialization-fail-timeout=0
This property controls whether the pool will "fail fast" if the pool cannot be seeded with an initial connection successfully. Any positive number is taken to be the number of milliseconds to attempt to acquire an initial connection; the application thread will be blocked during this period. If a connection cannot be acquired before this timeout occurs, an exception will be thrown. This timeout is applied after the connectionTimeout period. If the value is zero (0), HikariCP will attempt to obtain and validate a connection. If a connection is obtained, but fails validation, an exception will be thrown and the pool not started. However, if a connection cannot be obtained, the pool will start, but later efforts to obtain a connection may fail. A value less than zero will bypass any initial connection attempt, and the pool will start immediately while trying to obtain connections in the background. Consequently, later efforts to obtain a connection may fail. Default: 1

ERROR 1066: Unable to open iterator for alias- PIG SCRIPT

I have been facing this issue from long time. I tried to solve this but i couldn't. I need some experts advice to solve this.
I am trying to load a sample tweets json file.
sample.json;-
{"filter_level":"low","retweeted":false,"in_reply_to_screen_name":"FilmFan","truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":689085590822891521,"in_reply_to_user_id_str":"6048122","timestamp_ms":"1453125782100","in_reply_to_status_id":null,"created_at":"Mon Jan 18 14:03:02 +0000 2016","favorite_count":0,"place":null,"coordinates":null,"text":"#filmfan hey its time for you guys follow #acadgild To #AchieveMore and participate in contest Win Rs.500 worth vouchers","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"AchieveMore","indices":[56,68]}],"user_mentions":[{"id":6048122,"name":"Tanya","indices":[0,8],"screen_name":"FilmFan","id_str":"6048122"},{"id":2649945906,"name":"ACADGILD","indices":[42,51],"screen_name":"acadgild","id_str":"2649945906"}]},"is_quote_status":false,"source":"<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck<\/a>","favorited":false,"in_reply_to_user_id":6048122,"retweet_count":0,"id_str":"689085590822891521","user":{"location":"India ","default_profile":false,"profile_background_tile":false,"statuses_count":86548,"lang":"en","profile_link_color":"94D487","profile_banner_url":"https://pbs.twimg.com/profile_banners/197865769/1436198000","id":197865769,"following":null,"protected":false,"favourites_count":1002,"profile_text_color":"000000","verified":false,"description":"Proud Indian, Digital Marketing Consultant,Traveler, Foodie, Adventurer, Data Architect, Movie Lover, Namo Fan","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Bahubali","profile_background_color":"000000","created_at":"Sat Oct 02 17:41:02 +0000 2010","default_profile_image":false,"followers_count":4467,"profile_image_url_https":"https://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","geo_enabled":true,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","follow_request_sent":null,"url":null,"utc_offset":19800,"time_zone":"Chennai","notifications":null,"profile_use_background_image":false,"friends_count":810,"profile_sidebar_fill_color":"000000","screen_name":"Ashok_Uppuluri","id_str":"197865769","profile_image_url":"http://pbs.twimg.com/profile_images/664486535040000000/GOjDUiuK_normal.jpg","listed_count":50,"is_translator":false}}
I have tried to load this json file using ELEPHANT BIRD
script:-
REGISTER json-simple-1.1.1.jar
REGISTER elephant-bird-2.2.3.jar
REGISTER guava-11.0.2.jar
REGISTER avro-1.7.7.jar
REGISTER piggybank-0.12.0.jar
twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
B = foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited;
describe B;
OUTPUT:-
B: {created_at: chararray,id: chararray,id_str: chararray,text: chararray,source: chararray,entitis: map[chararray],favorited: boolean}
But when I tried to DUMP B the follwoing error has occured
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B
I am providing the complete logs here.
2016-09-11 14:07:57,184 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1 2016-09-11 14:07:57,184 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1 2016-09-11 14:07:57,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2016-09-11 14:07:57,194 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script
settings are added to the job 2016-09-11 14:07:57,194 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-09-11 14:07:57,199 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job 2016-09-11 14:07:57,199 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is
false, will not generate code. 2016-09-11 14:07:57,199 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Starting process to move
generated code to distributed cacche 2016-09-11 14:07:57,199 [main]
INFO org.apache.pig.data.SchemaTupleFrontend - Distributed cache not
supported or needed in local mode. Setting key
[pig.schematuple.local.dir] with code temp directory:
/tmp/1473583077199-0 2016-09-11 14:07:57,206 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission. 2016-09-11 14:07:57,207 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized 2016-09-11 14:07:57,208 [JobControl] WARN
org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.
User classes may not be found. See Job or Job#setJar(String).
2016-09-11 14:07:57,211 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1 2016-09-11 14:07:57,211 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1 2016-09-11 14:07:57,212
[JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of
splits:1 2016-09-11 14:07:57,216 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job:
job_local360376249_0009 2016-09-11 14:07:57,267 [JobControl] INFO
org.apache.hadoop.mapreduce.Job - The url to track the job:
http://localhost:8080/ 2016-09-11 14:07:57,267 [Thread-214] INFO
org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in
config null 2016-09-11 14:07:57,270 [Thread-214] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File
Output Committer Algorithm version is 1 2016-09-11 14:07:57,270
[Thread-214] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter -
FileOutputCommitter skip cleanup _temporary folders under output
directory:false, ignore cleanup failures: false 2016-09-11
14:07:57,270 [Thread-214] INFO org.apache.hadoop.mapred.LocalJobRunner
- OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2016-09-11 14:07:57,271 [Thread-214] INFO
org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2016-09-11 14:07:57,272 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.LocalJobRunner - Starting task:
attempt_local360376249_0009_m_000000_0 2016-09-11 14:07:57,277
[LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File
Output Committer Algorithm version is 1 2016-09-11 14:07:57,277
[LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter -
FileOutputCommitter skip cleanup _temporary folders under output
directory:false, ignore cleanup failures: false 2016-09-11
14:07:57,277 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree :
[ ] 2016-09-11 14:07:57,278 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - Processing split: Number of splits
:1 Total Length = 2416 Input split[0]: Length = 2416 ClassName:
org.apache.hadoop.mapreduce.lib.input.FileSplit Locations:
----------------------- 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader
- Current split being processed file:/root/PIG/PIG/sample.json:0+2416 2016-09-11 14:07:57,282 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File
Output Committer Algorithm version is 1 2016-09-11 14:07:57,282
[LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter -
FileOutputCommitter skip cleanup _temporary folders under output
directory:false, ignore cleanup failures: false 2016-09-11
14:07:57,288 [LocalJobRunner Map Task Executor #0] INFO
org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not
set... will not generate code. 2016-09-11 14:07:57,290 [LocalJobRunner
Map Task Executor #0] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map
- Aliases being processed per job phase (AliasName[line,offset]): M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,291 [Thread-214] INFO
org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2016-09-11 14:07:57,296 [Thread-214] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local360376249_0009
java.lang.Exception: java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.Counter, but class was expected
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.Counter, but class was expected at
com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter(PigCounterHelper.java:55)
at
com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter(LzoBaseLoadFunc.java:70)
at
com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:130)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) 2016-09-11 14:07:57,467
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local360376249_0009 2016-09-11 14:07:57,467 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases B,twitter 2016-09-11 14:07:57,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: twitter[20,10],B[21,4] C: R: 2016-09-11 14:07:57,468 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete 2016-09-11 14:07:57,468 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2016-09-11 14:07:57,468 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local360376249_0009 has failed! Stop running all dependent jobs 2016-09-11 14:07:57,468 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete 2016-09-11 14:07:57,469 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2016-09-11 14:07:57,469 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2016-09-11 14:07:57,469 [main] ERROR
org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce
job(s) failed! 2016-09-11 14:07:57,470 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script
Statistics: HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures
2.7.1.2.3.4.7-40.15.0.2.3.4.7-4root2016-09-11 14:07:572016-09-11 14:07:57UNKNOWN Failed! Failed Jobs: JobIdAliasFeatureMessageOutputs
job_local360376249_0009B,twitterMAP_ONLYMessage: Job
failed!file:/tmp/temp252944192/tmp-470484503, Input(s): Failed to read
data from "file:///root/PIG/PIG/sample.json" Output(s): Failed to
produce result in "file:/tmp/temp252944192/tmp-470484503" Counters:
Total records written : 0 Total bytes written : 0 Spillable Memory
Manager spill count : 0 Total bags proactively spilled: 0 Total
records proactively spilled: 0 Job DAG: job_local360376249_0009
And please give a clarification on how to use jar files,
And what are the versions to use.I am so confused which version to use.
Someone says use Elephant Bird, and Someone says use AVRO. But I have with both non of them are working.
Please help.
Mohan.V
I got it on my own.
It is of jar versions issue.
script:-
REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-pig-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.1.jar
And it worked fine.

Apache Pig error while dumping Json data

I have a JSON file and want to load using Apache Pig.
I am using the built-in JSONLOADER to load json data, Below is the sample json data.
cat jsondata1.json
{ "response": { "id": 10123, "thread": "Sloths", "comments": ["Sloths are adorable So chill"] }, "response_time": 0.425 }
{ "response": { "id": 13828, "thread": "Bigfoot", "comments": ["hello world"] } , "response_time": 0.517 }
Here I loading json data using builtin Json loader. While loading there is no error, but while dumping the data it gives the following error.
grunt> a = load '/home/cloudera/jsondata1.json' using JsonLoader('response:tuple (id:int, thread:chararray, comments:bag {tuple(comment:chararray)}), response_time:double');
grunt> dump a;
2016-04-17 01:11:13,286 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/cloudera/jsondata1.json:0+229
2016-04-17 01:11:13,287 [pool-4-thread-1] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2016-04-17 01:11:13,311 [pool-4-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-04-17 01:11:13,321 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: a[5,4] C: R:
2016-04-17 01:11:13,349 [Thread-16] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2016-04-17 01:11:13,351 [Thread-16] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local801054416_0004
java.lang.Exception: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream#2484de3c; line: 1, column: 120]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream#2484de3c; line: 1, column: 120]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonNumericParserBase._parseNumericValue(JsonNumericParserBase.java:399)
at org.codehaus.jackson.impl.JsonNumericParserBase.getDoubleValue(JsonNumericParserBase.java:311)
at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:203)
at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local801054416_0004
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[5,4] C: R:
2016-04-17 01:11:18,059 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local801054416_0004 has failed! Stop running all dependent jobs
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-04-17 01:11:18,059 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.0 0.11.0-cdh4.7.0 cloudera 2016-04-17 01:11:12 2016-04-17 01:11:18 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local801054416_0004 a MAP_ONLY Message: Job failed! file:/tmp/temp-1766116741/tmp1151698221,
Input(s):
Failed to read data from "/home/cloudera/jsondata1.json"
Output(s):
Failed to produce result in "file:/tmp/temp-1766116741/tmp1151698221"
Job DAG:
job_local801054416_0004
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-04-17 01:11:18,061 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
Details at logfile: /home/cloudera/pig_1460877001124.log
I could not able to find the issue. Can I know how to define the correct schema for the above json data?.
Try this:
comments:{(chararray)}
because this version:
comments:bag {tuple(comment:chararray)}
fits this JSON schema:
"comments": [{comment:"hello world"}]
and you have simple string values, not another nested documents:
"comments": ["hello world"]

Error processing complex json object of twitter with pig JsonLoader() of elephant-bird Jars

I wanted to process twitter json object with pig using elephant-bird jars for which i wrote the pig script as below.
REGISTER '/usr/lib/pig/lib/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/usr/lib/pig/lib/elephant-bird-pig-4.1.jar';
A = LOAD '/user/flume/tweets/data.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE myMap#'id' AS ID,myMap#'created_at' AS createdAT;
DUMP B;
which gave me error as below
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1439883208520_0177
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],B[4,4] C: R:
2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177]
2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177]
2015-08-25 11:07:09,458 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-08-25 11:07:09,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1439883208520_0177 has failed! Stop running all dependent jobs
2015-08-25 11:07:09,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-08-25 11:07:09,667 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://trinityhadoopmaster.com:8188/ws/v1/timeline/
2015-08-25 11:07:09,668 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at trinityhadoopmaster.com/192.168.1.135:8032
2015-08-25 11:07:09,678 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2015-08-25 11:07:09,780 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0 0.14.0 hdfs 2015-08-25 11:06:33 2015-08-25 11:07:09 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1439883208520_0177 A,B MAP_ONLY Message: Job failed! hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559,
Input(s):
Failed to read data from "hdfs://trinityhadoopmaster.com:9000/user/flume/tweets/data.json"
Output(s):
Failed to produce result in "hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1439883208520_0177
2015-08-25 11:07:09,780 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-08-25 11:07:09,787 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B. Backend error : java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
Details at logfile: /tmp/pig-err.log
grunt>
which i have no clue on how to approach, can any one help me on this.
REGISTER '/tmp/elephant-bird-core-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/google-collections-1.0.jar';
REGISTER '/tmp/json-simple-1.1.jar';
It works.