sqoop export failed after successful map - mysql

1.sqoop export --connect jdbc:mysql://localhost:3306/hduser_db --username hduser
--password hduser --table export --export-dir /user/hive/warehouse/three --
fields-terminated-by ','
17/09/13 14:10:45 INFO mapreduce.Job: map 0% reduce 0%
17/09/13 14:10:50 INFO mapreduce.Job: map 100% reduce 0%
17/09/13 14:10:51 INFO mapreduce.Job: Job job_1505199140014_0033 failed with
state FAILED due to: Task failed task_1505199140014_0033_m_000000
ob failed as tasks failed. failedMaps:1 failedReduces:0
2.17/09/13 14:10:51 INFO mapreduce.Job: Counters: 8
Job Counters
Failed map tasks=1
Launched map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2947
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=2947
Total vcore-milliseconds taken by all map tasks=2947
Total megabyte-milliseconds taken by all map tasks=3017728
17/09/13 14:10:51 WARN mapreduce.Counters: Group FileSystemCounters is
deprecate
17/09/13 14:10:51 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
14.8875 s
17/09/13 14:10:51 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$C
17/09/13 14:10:51 INFO mapreduce.ExportJobBase: Exported 0 records.
17/09/13 14:10:51 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

While running export command-
Below steps have to be taken care.
Datatypes, column names between source(HDFS data) and destination(table on rdbms) should match.
we should specify all column names in the --columns parameter.
Eg:
sqoop export --connect jdbc:mysql://localhost:3306/hduser_db
--username hduser
--password hduser
--table export
--export-dir /user/hive/warehouse/three
--fields-terminated-by ','
--columns "column1,column2,...." ;

Related

I am getting exception when importing mysql data to HDFS

I am trying to import MySQL data into HDFS but I am getting an exception.
I have a table(products) in MYSQL and I am using the following command to import data into HDFS.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table products --target-dir /user/nitin/products
I am getting the following exception:
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)`enter code here`
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.`enter code here`map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.sql.SQLException: Unknown type '246 in column 2 of 3 in binary-encoded result set.
at com.mysql.jdbc.MysqlIO.extractNativeEncodedColumn(MysqlIO.java:3710)
at com.mysql.jdbc.MysqlIO.unpackBinaryResultSetRow(MysqlIO.java:3620)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1282)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:335)
at com.mysql.jdbc.RowDataDynamic.<init>(RowDataDynamic.java:68)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:416)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:1899)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1347)
at com.mysql.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:1393)
at com.mysql.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:958)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1705)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
... 12 more
I have also used this command to import data into HDFS:
bin/sqoop-import --connect jdbc:mysql://localhost:3306/test?zeroDateTimeBehavior=convertToNull --username root --password root --table products --target-dir /user/nitin/product
MapReduce job failed.
It's because of data type conversion.
Try using --map-column-java option to define the column data type mapping explicitly

ERROR exec.DDLTask : java.lang.NoSuchMethod Error:

I was importing data from mysql to hive using sqoop:
sqoop import --connect jdbc:mysql://localhost:3306/DATASET -username root -P --table MATCHES --hive-import
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;18/11/25
11:42:58 ERROR ql.Driver: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
com.fasterxml.jackson.databind.ObjectMapper.readerFor(Ljava/lang/Class;)Lcom/fasterxml/jackson/databind/ObjectReader;
Do you have the jackson-databind jar in your hive lib directory.check it once

SQOOP export CSV to MySQL fails

I have CSV file in HDFS with lines like:
"2015-12-01","Augusta","46728.0","1"
I am trying to export this file to MySQL table.
CREATE TABLE test.events_top10(
dt VARCHAR(255),
name VARCHAR(255),
summary VARCHAR(255),
row_number VARCHAR(255)
);
With the command:
sqoop export --table events_top10 --export-dir /user/hive/warehouse/result --escaped-by \" --connect ...
This command fails with error:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: '2015-12-02,Ashburn,43040.0,9'
at events_top10.__loadFromFields(events_top10.java:335)
at events_top10.parse(events_top10.java:268)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at events_top10.__loadFromFields(events_top10.java:320)
... 12 more
In case I do not use --escaped-by \" parameter than MySQL table contains rows like this
"2015-12-01" | "Augusta" | "46728.0" | "1"
Could you please explain how to export CSV file to MySQL table without double quotes?
I have to use both --escaped-by \ and --enclosed-by '\"'
So the correct command is
sqoop export --table events_top10 --export-dir /user/hive/warehouse/result --escaped-by '\\' --enclosed-by '\"' --connect ...
For more information please see official documentation

using sqoop to update hive table

I am trying to sqoop data out of a MySQL database where I have a table with both a primary key and a last_updated field. I am trying to essentially get all records that were recently updated and overwrite the current records in the hive warehouse
I have tried the following command
sqoop job --create trainingDataUpdate -- import \
--connect jdbc:mysql://localhost:3306/analytics \
--username user \
--password-file /sqooproot.pwd \
--incremental lastmodified \
--check-column last_updated \
--last-value '2015-02-13 11:08:18' \
--table trainingDataFinal \
--merge-key id \
--direct --hive-import \
--hive-table analytics.trainingDataFinal \
--null-string '\\N' \
--null-non-string '\\N' \
--map-column-hive last_updated=TIMESTAMP
and I get the following error
15/02/13 14:07:41 INFO hive.HiveImport: FAILED: SemanticException Line 2:17 Invalid path ''hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_hwhjobdev_cluster.com_trainingDataFinal'': No files matching path hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_dev.cluster.com_trainingDataFinal
15/02/13 14:07:42 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
I thought by including the --merge-key it would be able to overwrite the old records with new records. Does anyone know if this is possible in sqoop?
I don't think sqoop can do it.
--merge-key is only used by sqoop-merge not import
also see http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421

Simple YARN benchmark TestDFSIO fails

I've setup hadoop on a two node cluster. The first node "namenode" runs the following daemons:
hadoop#namenode:~$ jps
2916 SecondaryNameNode
2692 NameNode
3159 NodeManager
5834 Jps
2771 DataNode
3076 ResourceManager
The seconds node "datanode" runs the following daemons:
hadoop#datanode:~$ jps
2559 Jps
2087 DataNode
2198 NodeManager
In the /etc/hosts file I added on BOTH machines:
10.240.40.246 namenode
10.240.172.201 datanode
which are the corresponding ips and I check that I can ssh to any other machine from each machine. Now, I wanted to test my hadoop installation by performing a sample map reduce benchmark job:
hadoop#namenode:~$ hadoop jar /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
However this job fails:
14/02/17 22:22:53 INFO fs.TestDFSIO: TestDFSIO.1.7
14/02/17 22:22:53 INFO fs.TestDFSIO: nrFiles = 20
14/02/17 22:22:53 INFO fs.TestDFSIO: nrBytes (MB) = 10.0
14/02/17 22:22:53 INFO fs.TestDFSIO: bufferSize = 1000000
14/02/17 22:22:53 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
14/02/17 22:22:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/02/17 22:22:55 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 20 files
14/02/17 22:22:56 INFO fs.TestDFSIO: created control files for: 20 files
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:57 INFO mapred.FileInputFormat: Total input paths to process : 20
14/02/17 22:22:57 INFO mapreduce.JobSubmitter: number of splits:20
14/02/17 22:22:57 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/17 22:22:57 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/17 22:22:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392675199090_0001
14/02/17 22:22:59 INFO impl.YarnClientImpl: Submitted application application_1392675199090_0001 to ResourceManager at /0.0.0.0:8032
14/02/17 22:22:59 INFO mapreduce.Job: The url to track the job: http://namenode.c.forward-camera-473.internal:8088/proxy/application_1392675199090_0001/
14/02/17 22:22:59 INFO mapreduce.Job: Running job: job_1392675199090_0001
14/02/17 22:23:10 INFO mapreduce.Job: Job job_1392675199090_0001 running in uber mode : false
14/02/17 22:23:10 INFO mapreduce.Job: map 0% reduce 0%
14/02/17 22:23:42 INFO mapreduce.Job: map 20% reduce 0%
14/02/17 22:23:43 INFO mapreduce.Job: map 30% reduce 0%
14/02/17 22:24:14 INFO mapreduce.Job: map 60% reduce 0%
14/02/17 22:24:41 INFO mapreduce.Job: map 60% reduce 20%
14/02/17 22:24:45 INFO mapreduce.Job: map 85% reduce 20%
14/02/17 22:24:48 INFO mapreduce.Job: map 85% reduce 28%
14/02/17 22:24:59 INFO mapreduce.Job: map 90% reduce 28%
14/02/17 22:25:00 INFO mapreduce.Job: map 90% reduce 30%
14/02/17 22:25:02 INFO mapreduce.Job: map 100% reduce 30%
14/02/17 22:25:03 INFO mapreduce.Job: map 100% reduce 100%
14/02/17 22:25:16 INFO mapreduce.Job: map 0% reduce 0%
14/02/17 22:25:16 INFO mapreduce.Job: Job job_1392675199090_0001 failed with state FAILED due to: Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application.
14/02/17 22:25:16 INFO mapreduce.Job: Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:443)
at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:425)
at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:755)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:115)
at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Having a look into the log file I find on the machine datanode in that:
hadoop#datanode:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-nodemanager-datanode.log
...
2014-02-17 22:29:33,432 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
On my namenode I did:
hadoop#namenode:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-*log
2014-02-17 22:13:20,833 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG:
...
2014-02-17 22:13:25,240 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config.
...
2014-02-17 22:13:25,505 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: NodeManager configured with 8 G physical memory allocated to containers, which is more than 80% of the total physical memory available (3.6 G). Thrashing might happen.
...
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000023
2014-02-17 22:24:48,779 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1392675199090_0001_01_000024
...
2014-02-17 22:25:15,733 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1392675199090_0001_02_000001 is : 1
2014-02-17 22:25:15,734 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1392675199090_0001_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
...
2014-02-17 22:25:15,736 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
...
2014-02-17 22:25:15,751 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1392675199090_0001 CONTAINERID=container_1392675199090_0001_02_000001
...
2014-02-17 22:13:19,150 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: STARTUP_MSG:
...
2014-02-17 22:25:15,837 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1392675199090_0001 failed 2 times due to AM Container for appattempt_1392675199090_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application. APPID=application_1392675199090_0001
However, I checked on machine namenode that port 8031 is listening. I get:
hadoop#namenode:~$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 namenode.c.forwar:36975 metadata.google.in:http TIME_WAIT
tcp 0 0 namenode.c.forwar:36969 metadata.google.in:http TIME_WAIT
tcp 0 0 namenode.c.forwar:40616 namenode.c.forwar:10001 TIME_WAIT
tcp 0 0 namenode.c.forwar:36974 metadata.google.in:http ESTABLISHED
tcp 0 0 namenode.c.forward:8031 namenode.c.forwar:41229 ESTABLISHED
tcp 0 352 namenode.c.forward-:ssh e178064245.adsl.a:64305 ESTABLISHED
tcp 0 0 namenode.c.forwar:41229 namenode.c.forward:8031 ESTABLISHED
tcp 0 0 namenode.c.forwar:40365 namenode.c.forwar:10001 ESTABLISHED
tcp 0 0 namenode.c.forwar:10001 namenode.c.forwar:40365 ESTABLISHED
tcp 0 0 namenode.c.forwar:10001 datanode:48786 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
unix 10 [ ] DGRAM 4604 /dev/log
unix 2 [ ] STREAM CONNECTED 10490
unix 2 [ ] STREAM CONNECTED 10488
unix 2 [ ] STREAM CONNECTED 10452
unix 2 [ ] STREAM CONNECTED 8452
unix 2 [ ] STREAM CONNECTED 7800
unix 2 [ ] STREAM CONNECTED 7797
unix 2 [ ] STREAM CONNECTED 6762
unix 2 [ ] STREAM CONNECTED 6702
unix 2 [ ] STREAM CONNECTED 6698
unix 2 [ ] STREAM CONNECTED 6208
unix 2 [ ] DGRAM 5750
unix 2 [ ] DGRAM 5737
unix 2 [ ] DGRAM 5734
unix 3 [ ] STREAM CONNECTED 5643
unix 3 [ ] STREAM CONNECTED 5642
unix 2 [ ] DGRAM 5640
unix 2 [ ] DGRAM 5192
unix 2 [ ] DGRAM 5171
unix 2 [ ] DGRAM 4889
unix 2 [ ] DGRAM 4723
unix 2 [ ] DGRAM 4663
unix 3 [ ] DGRAM 3132
unix 3 [ ] DGRAM 3131
So, what could be the problem here. In my opinion everything is setup fine. Why is my job failing then?
The log on the datanode says
Retrying connect to server: 0.0.0.0/0.0.0.0:8031
So it trys to connect to this port on the local machine which is datanode. However, the service runs on namenode. Therefore one has to add the following config lines to yarn-site.xml
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>namenode:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>namenode:8088</value>
</property>
where namenode is an alias in /etc/hosts for the machine that runs the resource manager daemon.
Also add the same properties in the yarn-site.xml file on the namenode to ensure that these services connect to the same ports.