I am new to BigData i don't know what is going on! Please note, I am learning this myself.
I imported a table called sqooptest from MySQL, from a db called sqoopex8 using this command:
sqoop import \
--connect jdbc:mysql://localhost/sqoopex8 \
--table sqooptest \
-m 1
I don't know where it goes (or imports).
There is a bunch of errors that is thrown, and honestly, i don't even know what to look for in the error. If it's the last line, it says "16/04/23 01:46:52 ERROR tool.ImportTool: Error during import: Import job failed!" Again, I am in learning phase and I am learning this all by myself, so please bear with me!
Now, I look under /user/hduser/ and there is a folder by the name of the table (sqooptest). Nothing inside it though.
Next, intuitively, looking around in the Internet, i find out that MySQL saves all its dbs in /var/lib/mysql. Apparently, i didn't have access to it, so I had to access it from the terminal (CLI). When I did, I found all my dbs there. Now, I did this:
sqoop import \
--connect jdbc:mysql://localhost/sqoopex8 \
--table sqooptest \
--target-dir /var/lib/mysql \
-m 1
( added --target-dir /var/lib/mysql)
This worked for some reason. when I do hadoop fs -ls /var/lib/mysql I see two files - _SUCCESS and part-m-00000. Why is that? Why did it not work in the first time.
Also, in the first attempt, even when I specify the target to HDFS --target-dir /user/hduser, it doesn't take it for some reason. When I give the target as a local file system, it takes it. Why?
Sqoop Needs a empty target path to save files. the path you gave /var/lib/mysql is used as the hdfs path to save the imported files.
/user/hduser might not have worked because that directory might exist or you don't have privileges to create one. try hadoop fs -mkdir to check.
For SUCCESS and part files here is the nice post
What are SUCCESS and part-r-00000 files in hadoop
By default sqoop will import the data from RDBMS to hdfs in the hdfs user directory eg: /user/hduser/[tablename]/part-m-00000 To avoid this and to store the data in our desired directory we have to mention it in the clause --target-dir. But this path should be existed except the last directory in the path. like if --target-dir is /user/hduser/mydirectory/mytabledata is the path, /user/hduser/mydirectory/ should exist in the hdfs then sqoop will create a directory[mytabledata here] under /user/hduser/mydirectory/ and will import the rdbms table data in this directory.
sqoop import \
--connect jdbc:mysql://localhost/sqoopex8 \
--table sqooptest \
--target-dir /path/to/your/desireddir \
-m 1
Please check the documentation here .
Related
I am trying to import a table from MySql to HDFS.
Below is the query which I am trying to run:
sqoop import --connect jdbc:mysql://localhost:3306/demodb --table Categories --username root --target-dir /user/msingh/demodb -P
I am getting error:
Exception message: '/tmp/hadoop-Martand' is not recognized as an internal or external command,
Installation is fine. I verified with the following command:
sqoop list-databases --connect jdbc:mysql://localhost/ --username root -P
it is returning the list of databases available.
Any idea what is the mistake?
Screenshot:
Exception-Screenshot
I found the issue, So answering my own question.
I think this is happening because of "space" in your user profile folder C:/Users/{foldername}. Hadoop does not support "space" in the folder name. So when you execute the Hadoop job, it creates some folders during backend jobs, which causes the issue.
So I changed the user folder name. you can follow the below link to change the folder name:
https://superuser.com/questions/890812/how-to-rename-the-user-folder-in-windows-10#:~:text=Go%20to%20the%20C%3A%5C,to%20the%20new%20path%20name.
after that, My issue got resolved.
I have a table (regularly updated) in Hive that I want to have in one of my tool that has a MySQL database. I can't just connect my application to the Hive database, so I want to export those data directly in the MySQL database.
I've searched a bit and found out that it was possible with Sqoop, and I've been told to use Oozie since I want to regularly update the table and export it.
I've looked around for a while and tried some stuff but so far I can't succeed, and I just don't understand what I'm doing.
So far, the only code I understand but doesn't work looks like that :
export --connect jdbc:mysql://myserver
--username username
--password password
--table theMySqlTable
--hive-table cluster.hiveTable
I've seen people using temporary table and export it on a txt file to then export it, but I'm not sure I can do it.
Should Oozie have specific parameters too ? I'm not the administrator so I'm not sure if I'm able to do it...
Thank you !
Try this.
sqoop export \
--connect "jdbc:sqlserver://servername:1433;databaseName=EMP;" \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--username userid \
-P \
--table theMySqlTable\
--input-fields-terminated-by '|' \
--export-dir /hdfs path location of file/part-m-00000 \
--num-mappers 1 \
I am new to cassandra. Here I am tring to transfer whole my MYSQL database to cassandra using sqoop. But after all setup, when i execute following command.
bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/ABCDatabase --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct
I have received following error.
Sqoop functionality has been removed from DSE.
It said that sqoop functionality is removed from datastax. can you please if it removed then is there any other way to do that?
Thanks
You can use Spark to transfer data - it should be easy, something like:
val table = spark.read.jdbc(jdbcUrl, "table", connectionProperties)
table.write.format("org.apache.spark.sql.cassandra").options(
Map("table" -> "TBL", "keyspace" -> "KS")).save()
Examples of jdbc URLs, options, etc. are described in Databrick's documentation as they could be different for different databases.
We are importing databases from MySQL to Hive using Sqoop (1.4.6). Everything works ok, except when table schemas get updated (mainly columns being added) in the source databases. The modifications do not end up in Hive. It seems that the Hive schema is created only once, and not verified in each import. The rows are loaded fine, but of course missing the new columns. We can work around this by first dropping the databases to force a schema re-creation in Hive, but my question is, that is there a way to do this from Sqoop directly?
Our import script resembles:
sqoop import-all-tables
--compress
--compression-codec=snappy
--connect "jdbc:mysql://HOST:PORT/DB"
--username USER
--password PASS
--hive-import
--hive-overwrite
--hive-database DB
--as-textfile
you can use hcatalog table instead of hive, it will work.
I was able to use sqoop to import a mysql table "titles" to hdfs using command like this:
sqoop import --connect jdbc:mysql://localhost/employees --username=root -P --table=titles --target-dir=titles --m=1
Now I want to import to hive, if I use the following command:
sqoop import --connect jdbc:mysql://localhost/employees --username=root -P --table titles --hive-import
I will be prompted that:
Output directory hdfs://localhost:9000/user/root/titles already exists
In hive, if I do a show tables I get the following:
hive> show tables;
OK
dept_emp
emp
myfirsthivetable
parted1emp
partitionedemp
You can see there is no table called titles in hive
I am confused at this, for the sqoop imported data, is there any 1 to 1 relationship between hdfs and hive? What's the meaning of the prompt?
Thank you for your enlighening.
As Amit has pointed out, since you already created the HDFS directory in your first command, Sqoop refuses to overwrite the folder titles since it already contains data.
In your second command, you are telling Sqoop to import (once again) the whole table (which was already imported in the first command) into Hive. Since you are not specifying the --target-dir with the HDFS destination, Sqoop will try to create the folder titles under /user/root/. SInce this folder already exists, an error was raised.
When you tell Hive to show the tables, titles doesn't appear because the second command (with hive-import) was not successful, and Hive doesn't know anything about the data. When you add the flag --hive-import, what Sqoop does under the hood is update the Hive metastore which is a database that has the metadata of the Hive tables, partitions and HDFS location.
You could do the data import using just one Sqoop command instead of using two different ones. If you delete the titles HDFS folder and you perform something like this:
sqoop import --connect jdbc:mysql://localhost/employees --username=root
-P --table=titles --target-dir /user/root/titles --hive-import --m=1
This way, you are pulling the data from Mysql, creating the /user/root/titles HDFS directory and updating the metastore, so that Hive knows where the table (and the data) is.
But what if you wouldn't want to delete the folder with the data that you already imported? In that case, you could create a new Hive table titles and specify the location of the data using something like this:
CREATE [TEMPORARY] [EXTERNAL] TABLE title
[(col_name data_type [COMMENT col_comment], ...)]
(...)
LOCATION '/user/root/titles'
This way, you wouldn't need to re-import the whole data again, since it's already in HDFS.
When you create a table on hive it eventually creates a directory on HDFS, as you already ran the hadoop import first hence a directory named "titles" already been created on HDFS.
Either can you delete the /user/root/titles directory from HDFS and ran the hive import command again or use --hive-table option while import.
You can refer to the sqoop documentation.
Hope this helps.