I started learning hadoop and am working on some practice myself, here is an issue when I try to use sqoop to import a mysql table to hdfs:
sqoop import --connect jdbc:mysql://localhost/employees --username=root -P --table=dept_emp --warehouse-dir=dept_emp -where dept_no='d001' --m 1;
The dept_emp has 20k records roughly.
The output is as the following:
2016-09-26 16:42:26,467 INFO [main] ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-26 16:42:27,470 INFO [main] ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
The "Already tried x time(s)" increased from 0 to 9, and then looped again from 0 to 9, hanging there now.
Can someone shed me some light?
Thank you very much.
Please correct the syntax:
--where "dept_no='d001'"
-m 1;
Related
I want to import data from MySQL to remote Hive using sqoop. I have installed Sqoop on a middleware machine. When i run this command:
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://192.168.2.146:3306/fir --username root -P -m 1 --table beard_size_list --connect jdbc:hive2://192.168.2.141:10000/efir --username oracle -P -m 1 --hive-table lnd_beard_size_list --hive-import;
Is this command correct can i import data from remote MySQL to remote Hive?
When i ran this command it keeps on trying to connect to resource manager:
17/11/01 10:54:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.1.0-129
Enter password:
17/11/01 10:54:10 INFO tool.BaseSqoopTool: Using Hive-specific delimiters
for output. You can override
17/11/01 10:54:10 INFO tool.BaseSqoopTool: delimiters with --fields-
terminated-by, etc.
17/11/01 10:54:10 WARN sqoop.ConnFactory: Parameter --driver is set to an
explicit driver however appropriate connection manager is not being set (via
--connection-manager). Sqoop is going to fall back to
org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which
connection manager should be used next time.
17/11/01 10:54:10 INFO manager.SqlManager: Using default fetchSize of 1000
17/11/01 10:54:10 INFO tool.CodeGenTool: Beginning code generation
17/11/01 10:54:11 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM beard_size_list AS t WHERE 1=0
17/11/01 10:54:11 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM beard_size_list AS t WHERE 1=0
17/11/01 10:54:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/hdp/2.6.1.0-129/hadoop-mapreduce
Note: /tmp/sqoop-
oracle/compile/d93080265a09913fbfe9e06e92d314a3/beard_size_list.java uses or
overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/11/01 10:54:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-
oracle/compile/d93080265a09913fbfe9e06e92d314a3/beard_size_list.jar
17/11/01 10:54:15 INFO mapreduce.ImportJobBase: Beginning import of
beard_size_list
17/11/01 10:54:15 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
17/11/01 10:54:15 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM beard_size_list AS t WHERE 1=0
17/11/01 10:54:17 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
17/11/01 10:54:17 INFO client.RMProxy: Connecting to ResourceManager at
hortonworksn2.com/192.168.2.191:8050
17/11/01 10:54:17 INFO client.AHSProxy: Connecting to Application History
server at hortonworksn2.com/192.168.2.191:10200
17/11/01 10:54:19 INFO ipc.Client: Retrying connect to server:
hortonworksn2.com/192.168.2.191:8050. Already tried 0 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
17/11/01 10:54:20 INFO ipc.Client: Retrying connect to server:
hortonworksn2.com/192.168.2.191:8050. Already tried 1 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
17/11/01 10:54:21 INFO ipc.Client: Retrying connect to server:
hortonworksn2.com/192.168.2.191:8050. Already tried 2 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
17/11/01 10:54:22 INFO ipc.Client: Retrying connect to server:
hortonworksn2.com/192.168.2.191:8050. Already tried 3 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
17/11/01 10:54:23 INFO ipc.Client: Retrying connect to server:
hortonworksn2.com/192.168.2.191:8050. Already tried 4 time(s); retry policy
is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
The port it is trying to connect is 8050 but the actual port is 8033. How can i fix this?
try this below command :
sqoop import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://192.168.2.146:3306/fir --username root -P -m 1 --table beard_size_list ;
Please check the below property is set to yarn-site.xml correctly
<name>yarn.resourcemanager.address</name>
<value>192.168.2.191:8033</value>
Why you have added -connect statement twice in your code? Try with below code:
sqoop import
--driver com.mysql.jdbc.Driver
--connect jdbc:mysql://192.168.2.146:3306/fir
--username root -P -m 1
--split-by beard_size_list_table_primary_key
--table beard_size_list
--target-dir /user/data/raw/beard_size_list
--fields-terminated-by ","
--hive-import
--create-hive-table
--hive-table dbschema.beard_size_list
Note:
create-hive-table – Determines if set job will fail if a Hive table already exists. It will work in this case other wise you have create hive external table and set the target-dir path
Am trying to run the below command in cloudera and getting link failure error. I have tried to restart mysqld service too, no use. Kindly some one help friends.
Code and error:
[cloudera#quickstart ~]$ sqoop list-databases --connect "jdbc:mysql://quickstart.cloudera:3306" --username=retail_dba --password=cloudera
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/09/22 09:45:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.0
17/09/22 09:45:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/09/22 09:45:59 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/09/22 09:46:16 ERROR manager.CatalogQueryManager: Failed to list databases
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Download mysql-connector-java-5.1.21.jar and copy it into the sqoop lib folder then try to run the sqoop import as follows:
sqoop list-databases \
--connect "jdbc:mysql://localhost:3306" \
--username=retail_dba \
--password=cloudera
Is there anything wrong with the below command. Its not working for me.
sqoop import-all-tables
--connect jdbc:mysql://localhost/retail_db --username=retail_dba
-- compression-codec=snappy
--as-parquetfile --hive-import -m 1
16/08/17 08:34:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/08/17 08:34:07 INFO tool.BaseSqoopTool:
Using Hive-specific delimiters for output. You can override
16/08/17 08:34:07 INFO tool.BaseSqoopTool:
delimiters with --fields- terminated-by, etc.
16/08/17 08:34:08 INFO manager.MySQLManager:
Preparing to use a MySQL streaming resultset.
16/08/17 08:34:08 INFO tool.CodeGenTool: Beginning code generation
16/08/17 08:34:08 ERROR sqoop.Sqoop:
Got exception running Sqoop:
java.lang.NullPointerException
java.lang.NullPointerException
Can you post the complete error you are getting?
Does you error contain - Class not found exception? If yes, go to $HADOOP_HOME/etc/hadoop find mapred-site.xml and change/add the value of property mapreduce.framework.name as yarn. Save the file and restart hadoop
I am using MAC Osx for Hadoop Stack and using MySQL as the database for it.I am trying to execute a Sqoop import command:
sqoop import --connect jdbc:mysql://127.0.0.1/emp --table employee --username root --password reality --m 1 --target-dir /sqoop_import
But I am facing below-mentioned issue while executing it. Even in /etc/hosts, localhost is at 127.0.0.1.host file screenshot. I have tried pinging localhost and it works but the error of host is down, still prevails. Please help.
2016-02-06 17:42:38,267 INFO [main] mapreduce.Job: Job job_1454152643692_0010 failed with state FAILED due to: Application application_1454152643692_0010 failed 2 times due to Error launching appattempt_1454152643692_0010_000002. Got exception: java.io.IOException: Failed on local exception: java.net.SocketException: Host is down; Host Details : local host is: "Mohits-MacBook-Pro.local/192.168.0.103"; destination host is: "192.168.0.105":38183;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy32.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Host is down
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 9 more
. Failing the application.
2016-02-06 17:42:38,304 INFO [main] mapreduce.Job: Counters: 0
2016-02-06 17:42:38,321 WARN [main] mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
2016-02-06 17:42:38,326 INFO [main] mapreduce.ImportJobBase: Transferred 0 bytes in 125.7138 seconds (0 bytes/sec)
2016-02-06 17:42:38,349 WARN [main] mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2016-02-06 17:42:38,350 INFO [main] mapreduce.ImportJobBase: Retrieved 0 records.
2016-02-06 17:42:38,351 ERROR [main] tool.ImportTool: Error during import: Import job failed!
I see that you are having network issues. From the log above it says that local host translates to Mohits-MacBook-Pro.local/192.168.0.103 in you unix system and destination which it is trying to connect is "192.168.0.105":38183. Please go to unix system and see /etc/hosts file and make sure you change localhost to 127.0.0.1.
I am using hadoop-1.0.4 on amazon ec2 of 3 ubuntu 12.10 instances, 1 master and 2 slaves, just under ~ directory.
Now start-all.sh and stop-all.sh is ok, but when i run jps on master or slaves, it prints nothing. Then i tested hadoop examples:
~/hadoop$ bin/hadoop jar hadoop-examples-1.0.4.jar pi 10 10000
It shows
Exception in thread "main" java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:1879)
at org.apache.hadoop.util.RunJar.main(RunJar.java:115)
However i've chmod 777 -R tmp to tmp folders.
~/hadoop$ sudo bin/hadoop jar hadoop-examples-1.0.4.jar pi 10 10000
With sudo, it produces
13/05/12 03:58:11 WARN conf.Configuration: DEPRECATED: hadoop-site.xml
found in the classpath. Usage of hadoop-site.xml is deprecated.
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
override properties of core-default.xml, mapred-default.xml
and hdfs-default.xml respectively
Number of Maps = 10
Samples per Map = 10000
13/05/12 03:58:12 WARN fs.FileSystem: "54.235.101.85:50001" is a deprecated
filesystem name. Use "hdfs://54.235.101.85:50001/" instead.
13/05/12 03:58:13 INFO ipc.Client: Retrying connect to server:
hdmaster/54.235.101.85:50001. Already tried 0 time(s).
13/05/12 03:58:14 INFO ipc.Client: Retrying connect to server:
hdmaster/54.235.101.85:50001. Already tried 1 time(s).
13/05/12 03:58:15 INFO ipc.Client: Retrying connect to server:
hdmaster/54.235.101.85:50001. Already tried 2 time(s).
Then failed to connect. So what is the problem? should i put sudo to run the examples? Thanks a lot.
I think, the problem is, 54.235.101.85 is suppose to be a public IP address. Use ifconfig in all the nodes to get a list of IP address and check for IP beginning with 10.x.x.x/172.x.x.x/192.x.x.x. If you find any, modify your configuration files in all the nodes accordingly.