MySQL to HBase using Sqoop: Driver issue - mysql

I am new to Sqoop. I am trying to import data from MySQL to hbase. That's why have to use Database connector for MySQL. Path to my connector file is /usr/lib/sqoop2/lib/mysql-connector-java-5.1.6.jar at server. Database name is :testhadoop and table which i am using is employee the command i enter is as
root#server:~# sqoop import --connect jdbc:mysql//localhost/testhadoop --driver com.mysql.jdbc.Driver --username root --table mytable
After hitting Enter key, i have to enter root password. And then a long long error message come
13/09/12 17:39:16 WARN sqoop.ConnFactory: Parameter --driver is set to an
explicit driver however appropriate connection manager is not being set
(via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
13/09/12 17:39:16 INFO manager.SqlManager: Using default fetchSize of 1000
13/09/12 17:39:16 INFO tool.CodeGenTool: Beginning code generation
13/09/12 17:39:16 ERROR manager.SqlManager:
Error executing statement: java.sql.SQLException:
No suitable driver found for jdbc:mysql//localhost/testhadoop
Please tell me how to get rid of this problem.

Based on the command line it seems that you are using Sqoop 1.x whereas the JDBC driver is in path for Sqoop2. I would recommend to copy the jar file mysql-connector-java-5.1.6.jar to /usr/lib/sqoop/lib instead, so that it's available for Sqoop 1.
Also I would strongly suggest to drop the parameter --driver as it will force Sqoop to use Generic JDBC Connector instead of the specialized MySQL connector.

Related

Sqoop functionality has been removed from DSE

I am new to cassandra. Here I am tring to transfer whole my MYSQL database to cassandra using sqoop. But after all setup, when i execute following command.
bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/ABCDatabase --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct
I have received following error.
Sqoop functionality has been removed from DSE.
It said that sqoop functionality is removed from datastax. can you please if it removed then is there any other way to do that?
Thanks
You can use Spark to transfer data - it should be easy, something like:
val table = spark.read.jdbc(jdbcUrl, "table", connectionProperties)
table.write.format("org.apache.spark.sql.cassandra").options(
Map("table" -> "TBL", "keyspace" -> "KS")).save()
Examples of jdbc URLs, options, etc. are described in Databrick's documentation as they could be different for different databases.

Import data from mysql into HDFS using Sqoop

I am using Hadoop-1.2.1 and Sqoop-1.4.6. I am using sqoop to import the table test from the database meshtree into HDFS using this command:
`sqoop import --connect jdbc:mysql://localhost/meshtree --username user --password password --table test`
But, it shows this error:
17/06/17 18:15:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/17 18:15:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/06/17 18:15:21 INFO tool.CodeGenTool: Beginning code generation
17/06/17 18:15:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
17/06/17 18:15:22 INFO orm.CompilationManager: HADOOP_HOME is /home/student /Installations/hadoop-1.2.1/libexec/..
Note: /tmp/sqoop-student/compile/6bab6efaa3dc60e67a50885b26c1d14b/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/06/17 18:15:24 ERROR orm.CompilationManager: Could not rename /tmp/sqoop- student/compile/6bab6efaa3dc60e67a50885b26c1d14b/test.java to /home/student /Installations/hadoop-1.2.1/./test.java
org.apache.commons.io.FileExistsException: Destination '/home/student /Installations/hadoop-1.2.1/./test.java' already exists
at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2378)
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
17/06/17 18:15:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop- student/compile/6bab6efaa3dc60e67a50885b26c1d14b/test.jar
17/06/17 18:15:24 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/06/17 18:15:24 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/06/17 18:15:24 WARN manager.MySQLManager: option to exercise a MySQL- specific fast path.
17/06/17 18:15:24 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/06/17 18:15:24 INFO mapreduce.ImportJobBase: Beginning import of test
17/06/17 18:15:27 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/home/student/Installations/hadoop-1.2.1/data/mapred /staging/student/.staging/job_201706171814_0001
17/06/17 18:15:27 ERROR security.UserGroupInformation: PriviledgedActionException as:student cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory test already exists
17/06/17 18:15:27 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory test already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileO utputFormat.java:137)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
Is there any way to figure out this problem?
It’s important that you do not use the URL localhost if you intend to use Sqoop with a distributed Hadoop cluster. The connect string you supply will be used on TaskTracker nodes throughout your MapReduce cluster; if you specify the literal name localhost, each node will connect to a different database (or more likely, no database at all). Instead, you should use the full hostname or IP address of the database host that can be seen by all your remote nodes.
Please visit Sqoop Document Connecting to a Database Server section for more information.
You don't permissions.So contact myql dba to grant you the same.
Or you may do yourself if you have admin access to mysql.
grant all privileges on databasename.* to 'username'#'%' identified by 'password';
*-for all tables
%- allow from any host
The above syntax is to grant permission to user in mysql server.In your case it will be:-
grant all privileges on meshtree.test to 'root'#'localhost' identified by 'yourpassword';
you are importing without providing target directory of hdfs. when we are not providing any target directory sqoop run import only once and create directory in hdfs with your mysql table name.
So your query
sqoop import --connect jdbc:mysql://localhost/meshtree --username user --password password --table test
this create a directory with the name test1 in hdfs
Just add following script
sqoop import --connect jdbc:mysql://localhost/meshtree --username user --password password --table test --target-dir test1
hope fully its work fine and just refer sqoop import and all related sqoop

Error in export sqoop command

I was using the export command in sqoop and facing this error while exporting from hdfs to MySQL
The command is:
sqoop export
--connect jdbc:mysql://localhost/property
--username root
--password root
--table xyz
--m 1
--export-dir abc.csv
The error is:
16/08/30 23:11:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/30 23:11:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/08/30 23:11:34 INFO tool.CodeGenTool: Beginning code generation
16/08/30 23:11:34 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Add mysql-connector.jar in $SQOOP_HOME/lib.
As per Sqoop docs,
You can use Sqoop with any other JDBC-compliant database. First, download the appropriate JDBC driver for the type of database you want to import, and install the .jar file in the $SQOOP_HOME/lib directory on your client machine
Also,
Each driver .jar file also has a specific driver class which defines the entry-point to the driver. For example, MySQL’s Connector/J library has a driver class of com.mysql.jdbc.Driver. Refer to your database vendor-specific documentation to determine the main driver class. This class must be provided as an argument to Sqoop with --driver.
So, add --driver com.mysql.jdbc.Driver in your command.

Caused by: java.sql.SQLException: The database is already in use by another process: org.hsqldb.persist.NIOLockFile

I have replaced sqoop metatsore with a MySQL db.
When I try to run a saved sqoop job from the command line, it "asks" me for the password and run the job when I supply it.
However, I need to run this job through oozie.
--password is not recognized as a valid argument here in command line while executing sqoop saved job and anyway, it keeps asking for password through the prompt.
Now, when I try to run it through oozie,
I get
Caused by: java.sql.SQLException: The database is already in use by
another process: org.hsqldb.persist.NIOLockFile#950abfc6[file
=/home/yarn/.sqoop/metastore.db.lck, exists=false, locked=false,
valid=false, fl =null]: java.io.FileNotFoundException:
/home/yarn/.sqoop/metastore.db.lck (No such file or directory)
What is this and why is this? I am directly connecting to my MySQL metatsore and have configured sqoop-site.xml accordingly.
Why is this trying to to connect to hsqldb?
Why is it locked?
How do I fix this?
Also, how do I supply password for execution of sqoop job in oozie?

Sqoop imports into secure hbase fails

I am using hadoop-2.6.0 with kerberos security. I have installed hbase with kerberos security and could able to create table and scan it.
I could run sqoop job as well to import data from mysql into hdfs but sqoop job fails when trying to import from mysql into HBase.
Sqoop Command
sqoop import --hbase-create-table --hbase-table newtable --column-family ck --hbase-row-key id --connect jdbc:mysql://localhost/sample --username root --password root --table newtable -m 1
Exception
15/01/21 16:30:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x734c0647, quorum=localhost:2181, baseZNode=/hbase
15/01/21 16:30:24 INFO zookeeper.ClientCnxn: Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknownerror)
15/01/21 16:30:24 INFO zookeeper.ClientCnxn: Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
15/01/21 16:30:24 INFO zookeeper.ClientCnxn: Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x14b0ac124600016, negotiated timeout = 40000
15/01/21 16:30:25 ERROR tool.ImportTool: Error during import: Can't get authentication token
Could you please try the following :
In the connection string add the port number as :
jdbc:mysql://localhost:3306/sample
Remove --table newtable. Create the required table on Hbase first with the column family.
mention --split-by id
Finally mention a specific --fetch-size , as the sqoop client for MySQL have an error internally which attempts to set the default MIN fetch size which will run into an exception.
Could you attempt the import again and let us know ?