I am a new bie to Sqoop. As per my understanding, Sqoop commands are for importing data from database like MySql to HDFs and viceversa and HDFS commands are for dealing with data in HDFS, such as getting data from HDFS to local file system and viceversa. Cant we use sqoop commands to deal with data in HDFS - to get the data from local file system to hdfs and viceversa. Please let me know the exact differences between Sqoop and HDFS commands. Why do we have two separate things. Why they did not put all these commands into one set. Apologies, if my question does not make sense.
Sqoop commands serves below purposes:
1)Import/export data from any database to hdfs/hive/hbase and vice versa. Its not restrict only to hdfs import and export.
2)data can be sqooped at one go if we need to move a whole database/list of tables.
3)only incremental data can be imported via sqoop commands.
4) It also required connection driver to connect to databases
In short it deals with tables/databases.
hdfs commands:
1) It only used to transfer any type(csv,text,xls) of file from local to hdfs or vice versa. Its just serve basic functionality of moving or copying data from one system to other just like unix commands.
Sqoop only functionality to import and export data from RDBMS (Structured) to Hadoop. It does not provide any other HDFS inside activities. Once if you get the data using Sqoop to HDFS, HDFS commands will be used to process the data (copy, move,etc)
For more Sqoop functionalities http://hortonworks.com/apache/sqoop/
Yes your understanding is correct.
Sqoop commands are for :
importing data from any relational database(like mysql) to HDFS/Hive/Hbase
exporting data from HDFS/Hive/Hbase to any relational database(like mysql)
hdfs commands are for :
Copying/transferring any files (like :.txt,.csv,.xls,..etc) from local to hdfs or vice versa.
for :
Why do we have two separate things. Why they did not put all these commands into one set.
answer :
Sqoop commands
(for copying structured data b/w two different systems)
Hdfs commands
(for copying files b/w local and hdfs)
using sqoop we cannot copy files from local to hdfs and viceversa
and also
using hdfs commands we cannot copy data from hdfs to any other external databases (like mysql) and viceversa.
Related
Using Cron to export data from mysql database to CSV, then reading this csv file and getting all it's data to google cloud using bigquery
Hello guys, I have a Mysql database called db_test, and one table in it called members_test(id, name). I'm working on Linux Ubunto OS. I am trying to use cronjob to take data at midnight from this table into a CSV file. Also I want to let bigquery somehow read this csv file and take its data and put them in a table called cloud_members_tab saved on the google cloud platform.
How to do this?
make sure you have your CSV generated correctly (don't rely on MySQL CSV export natively)
install gsutil and bq command line utility
upload CSV to Google Cloud Storage
use a shell command like below:
gsutil cp -j -o="GSUtil:parallel_composite_upload_threshold=150M"
/tmp/export.csv gs://bucket/export.csv
use bq load
bq load --source_format=CSV --field_delimiter="," --null_marker="\N"
--allow_quoted_newlines --autodetect --source_format=CSV dataset.tablename gs://bucket/export.csv
I am facing a problem while importing MySql dump into Hive.
I used sqoop connector to import data from MySql to Hive successfully. However, there are more data dumps to import to Hive. Restoring the database first is not feasible. Since the dump size is of 300G, hence takes 3 days to restore. Also, I can't restore more than two files on MySql because of disk space issue.
As a result, I am looking to import data, which is in MySql dump, directly into hive without restoring into MySql.
There is one more problem with the MySql dump is that there are multiple insert statements (around 1 billion). So will it creates multiple files for each insert? In that case, how to merge them?
You can use the "load" command provided by Hive to load data present in the your local directory.
Example: This will load the data present in the file fileName.csv into your hive table tableName.
load data local inpath '/tmp/fileName.csv' overwrite into table tableName;
In case your data is present in the HDFS.Use the same load command without local option.
Example:Here /tmp/DataDirectory is a HDFS directory and all the files present in that directory will be loaded into Hive.
load data inpath '/tmp/DataDirectory/*' overwrite into table tableName;
Caution: As Hive is schema on read make sure to take care of your line delimiter and field delimiter are same in both the file and the Hive table you are loading into.
I have created a multi-node hadoop cluster and installed hive on it. Also, on another remote machine I have installed MySQL.
I intend to export data stored in HDFS into relational database MySQL. I researched about how this can be done using Sqoop. So I found that I need to create a table in MySQL that has target columns in the same order(as present in Hive), with the appropriate SQL types. And then use the sqoop export command.
My question is:
If the table is partitioned in Hive, and if while creating the table in MySQL I partition it accordingly, will the sqoop export command preserve the partitions?
My question is similar to sqoop export mysql partition. I want to know if partitioning support has been added to sqoop.
This will help me decide whether to go ahead and install scoop for the task or to use some custom Python scripts that I have written for it.
Thank you.
Sqoop will work at the JDBC layer when talking to MySQL. It won't be aware of the underlying partitioning, MySQL will handle this as the records are inserted or updated.
I'm using HBase 0.92.1-cdh4.1.2. I have almost 100 tables in HBase [Size about 10GB]. I want to export all the data to MYSQL. I think sqoop isn't an viable option as it is used mostly to export from sql-nosql databases. And an other option would be exporting HBase data as flat files to local file system and dump the data to mysql using mysqlimport option !
Is there anyway, we can directly dump the data to mysql rather than exporting as flat files?
I am using hbase-0.90.6. I want to export the data from HBase to mysql. I know two-step process , first by running a mapreduce job to pull Hbase data into flat files, then exports flat file data into mysql.
Is their any other tool which I can use to reduce this two-step to one. Or can we use sqoop to do the same in one step. Thanks.
I'm afraid that Sqoop do not support exports directly from HBase at the moment. Sqoop can help you in the two-step process with the second step - e.g. Sqoop can take data from HDFS and export them to MySQL.
Yes Sqoop is the tool that can be used for both importing as well as exporting ur data from/to mysql and HBase
You can know more about Sqoop #
http://sqoop.apache.org