Import from MySql dump to hive - mysql

I am facing a problem while importing MySql dump into Hive.
I used sqoop connector to import data from MySql to Hive successfully. However, there are more data dumps to import to Hive. Restoring the database first is not feasible. Since the dump size is of 300G, hence takes 3 days to restore. Also, I can't restore more than two files on MySql because of disk space issue.
As a result, I am looking to import data, which is in MySql dump, directly into hive without restoring into MySql.
There is one more problem with the MySql dump is that there are multiple insert statements (around 1 billion). So will it creates multiple files for each insert? In that case, how to merge them?

You can use the "load" command provided by Hive to load data present in the your local directory.
Example: This will load the data present in the file fileName.csv into your hive table tableName.
load data local inpath '/tmp/fileName.csv' overwrite into table tableName;
In case your data is present in the HDFS.Use the same load command without local option.
Example:Here /tmp/DataDirectory is a HDFS directory and all the files present in that directory will be loaded into Hive.
load data inpath '/tmp/DataDirectory/*' overwrite into table tableName;
Caution: As Hive is schema on read make sure to take care of your line delimiter and field delimiter are same in both the file and the Hive table you are loading into.

Related

MySQL compare newly created dump file to database

I have this task in my work to create a MySQL dump file and i have to review the file before restoring it from a different server.
My problem is how to compare my created dump file to the database where i get it from to make sure that all tables or function and stored proc are included in the dump file.
Is there a way to do it because currently i'm manually reviewing the dump file per table name function and stored proc which is very time consuming.

sqoop vs hdfs commands

I am a new bie to Sqoop. As per my understanding, Sqoop commands are for importing data from database like MySql to HDFs and viceversa and HDFS commands are for dealing with data in HDFS, such as getting data from HDFS to local file system and viceversa. Cant we use sqoop commands to deal with data in HDFS - to get the data from local file system to hdfs and viceversa. Please let me know the exact differences between Sqoop and HDFS commands. Why do we have two separate things. Why they did not put all these commands into one set. Apologies, if my question does not make sense.
Sqoop commands serves below purposes:
1)Import/export data from any database to hdfs/hive/hbase and vice versa. Its not restrict only to hdfs import and export.
2)data can be sqooped at one go if we need to move a whole database/list of tables.
3)only incremental data can be imported via sqoop commands.
4) It also required connection driver to connect to databases
In short it deals with tables/databases.
hdfs commands:
1) It only used to transfer any type(csv,text,xls) of file from local to hdfs or vice versa. Its just serve basic functionality of moving or copying data from one system to other just like unix commands.
Sqoop only functionality to import and export data from RDBMS (Structured) to Hadoop. It does not provide any other HDFS inside activities. Once if you get the data using Sqoop to HDFS, HDFS commands will be used to process the data (copy, move,etc)
For more Sqoop functionalities http://hortonworks.com/apache/sqoop/
Yes your understanding is correct.
Sqoop commands are for :
importing data from any relational database(like mysql) to HDFS/Hive/Hbase
exporting data from HDFS/Hive/Hbase to any relational database(like mysql)
hdfs commands are for :
Copying/transferring any files (like :.txt,.csv,.xls,..etc) from local to hdfs or vice versa.
for :
Why do we have two separate things. Why they did not put all these commands into one set.
answer :
Sqoop commands
(for copying structured data b/w two different systems)
Hdfs commands
(for copying files b/w local and hdfs)
using sqoop we cannot copy files from local to hdfs and viceversa
and also
using hdfs commands we cannot copy data from hdfs to any other external databases (like mysql) and viceversa.

MySQL Workbench Import CSV File Restrition

I'm trying to import a csv file into MySQL 5.7 using the MySQL Workbench import module.
It's everything ok, the data are correctly readed by the module.
But I have 11.000 rows.
The module are just importing 29.
Is there any configuration restricting my full import?
I had the same problems as yours. I got around it by the following steps:
after the MySQL Workbench import clear all the table rows by truncate(but reserve the table itself)
Using MySQL command line
LOAD DATA LOCAL INFILE '/data.csv' INTO TABLE my_table FIELDS TERMINATED BY ','
Hope it helps.

Import CSV file into MySQL without using load data infile

I have recently switched web hosting providers and the new provider does not allow 'load data infile' commands in MySQL. However, I have many CSV files that update tables in a database weekly using that command. What is the best way to import into MySQL without the typical load data option? I tried mysqlimport, but that seems to fail since the data isn't in SQL format, its just standard CSV data. Thanks for your help.
Use the following process:
Convert the CSV to the MySQL CSV dump format
Upload the file to the MySQL server or to the shared hosting file system
Use one of the following commands to import it:
mysqladmin:
mysqladmin create db1
mysql db1 < dump.csv
mysql:
mysql> CREATE DATABASE IF NOT EXISTS db1;
mysql> USE db1;
mysql> source dump.csv
References
MySQL :: MySQL 5.7 Reference Manual :: 7.4.2 Reloading SQL-Format Backups
Text::CSV::Auto - Comprehensive and automatic loading, processing, and analysis of CSV files. - metacpan.org
MySQL :: MySQL 5.7 Reference Manual :: 8.4.3 Dumping Data in Delimited-Text Format with mysqldump
Restore data from a tab delimited file to MySQL - Electric Toolbox
Using mysqldump to save data to CSV files - Electric Toolbox
Mysqldump in CSV format

Exporting data from HBase to MySql

I'm using HBase 0.92.1-cdh4.1.2. I have almost 100 tables in HBase [Size about 10GB]. I want to export all the data to MYSQL. I think sqoop isn't an viable option as it is used mostly to export from sql-nosql databases. And an other option would be exporting HBase data as flat files to local file system and dump the data to mysql using mysqlimport option !
Is there anyway, we can directly dump the data to mysql rather than exporting as flat files?