I am trying to import data from Mysql to Hbase using sqoop.
I am running following command.
sqoop import --connect jdbc:mysql://localhost/database --table users --columns "loginid,email" --username tester -P -m 8 --hbase-table hbaseTable --hbase-row-key user_id --column-family user_info --hbase-create-table
But i am getting below error :-
13/05/08 10:42:10 WARN hbase.ToStringPutTransformer: Could not insert
row with null value for row-key column: user_id
please help here
Got the solution.
I was not including my rowKey i.e. user_id in the columns list.
After including it , it worked like a charm.
Thanks..
Your columns should be upper status, not seq_id but SEQ_ID.
I think sqoop considering it as a different column.which is null(Of course).
Related
I have already a hive table called roles. I need to update this table with info coming up from mysql. So, I have used this script think that it will add and update new data on my hive table:`
sqoop import --connect jdbc:mysql://nn01.itversity.com/retail_export --username retail_dba --password itversity \ --table roles --split-by id_emp --check-column id_emp --last-value 5 --incremental append \ --target-dir /user/ingenieroandresangel/hive/roles --hive-import --hive-database poc --hive-table roles
Unfortunately, that only insert the new data but I can't update the record that already exits. before you ask a couple of statements:
the table doesn't have a PK
if i dont specify --last-value as a parameter I will get duplicated records for those who already exist.
How could I figure it out without applying a truncate table or recreate the table using a PK? exist the way?
thanks guys.
Hive does not operate update queries.
You have to drop/truncate old table and reload again.
I'm already having a MySQL table in my local machine (Linux) it self, and I have a Hive external table with the same schema as the MySQL table.
I'm trying to import data from MySQL table to my Hive external table and I'm using Sqoop for this.
But then the problem is, whenever a new record is being added to the MySQL table, it doesn't update the Hive external table automatically?
This is the Sqoop import command I'm using:
sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root -P --split-by id --columns id,name,age,salary --table customer --target-dir /user/chamith/mysqlhivetest/ --fields-terminated-by "," --hive-import --hive-table test.customers
Am I missing something over here? Or how can this be done?
Any help could be appreciated.
In your case a new row appended to the table.
So you need to use incremental append approach.
When to use append mode?
Works for numerical data that is incrementing over time, such as
auto-increment keys
When importing a table where new rows are continually being added
with increasing row id values
Now what you need to add in command
-check-column Specifies the column to be examined when determining which rows to import.
--incremental Specifies how Sqoop determines which rows are new.
--last-value Specifies the maximum value of the check column from the previous import
Ideal to perform this is using sqoop job as in this case sqoop metastore remembers the last value automatically
Step 1 :Intially load data with normal import command.
Step 2:
sqoop job --create incrementalImportJob -- import \
--connect jdbc:mysql://localhost:3306/sqoop
--username root
-P
--split-by id
--columns id,name,age,salary
--table customer
--incremental append \
--check-column id \
--last-value 5
--fields-terminated-by ","
--target-dir hdfs://ip:8020/path/to/table/;
Hope this helps..
Most of tutorials I've researched point me that I have to use Sqoop for export/import and a lot of the manuals show how I can export data from DB to HDFS, but how I can do backwards case?
Let's say, I have company DB on localhost, it has an empty users table with columns: id, user and I have hadoop that provides me with data like (id, user) but saves this to some hadoop-output.txt not to MySQL.
Are there some commands for command line to make import from HDFS to MySQL via Sqoop?
sqoop-export does this.
sqoop-export --connect jdbc:mysql://localhost/company
--username user --password passwd
--table users
--export-dir /path/to/HDFS_Source
--input-fields-terminated-by ','
Refer SqoopUserGuide.html#sqoop_export
I imported MySQL database tables to Hive using sqoop tool by using below script.
sqoop import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username=retail_dba --password=cloudera --hive-import --hive-overwrite --create-hive-table --warehouse-dir=/user/hive/warehouse/
but when I check the database in hive, there is no retail.db.
If you want to import all tables in a specific hive database (already created). Use:
--hive-database retail
in your sqoop command.
as dev said if you want to sqoop everything in a particular db then use
--hive-database retail_db else every tables will be sqooped under default warehouse dir/tablename
Your command sqoops everything into this directory: /user/hive/warehouse/retail.db/
To import into hive use this argument: --hive-import and why are you using --as-textfile?
If you want to store as textfile then use --as-textfile and then use hive external table command to create external tables in Hive.
I have to import > 400 million rows from a MySQL table(having a composite primary key) into a PARTITIONED Hive table Hive via Sqoop. The table has data for two years with a column departure date ranging from 20120605 to 20140605 and thousands of records for one day. I need to partition the data based on the departure date.
The versions :
Apache Hadoop - 1.0.4
Apache Hive - 0.9.0
Apache Sqoop - sqoop-1.4.2.bin__hadoop-1.0.0
As per my knowledge, there are 3 approaches:
MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive table into Partitioned Hive table
MySQL -> Partitioned Hive table
MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table to add PARTITION
is the current painful one that I’m following
I read that the support for this is added in later(?) versions of Hive and Sqoop but was unable to find an example
The syntax dictates to specify partitions as key value pairs – not feasible in case of millions of records where one cannot think of all the partition key-value pairs
3.
Can anyone provide inputs for approaches 2 and 3?
I guess you can create a hive partitioned table.
Then write the sqoop import code for it.
for example:
sqoop import --hive-overwrite --hive-drop-import-delims --warehouse-dir "/warehouse" --hive-table \
--connect jdbc< mysql path>/DATABASE=xxxx\
--table --username xxxx --password xxxx --num-mappers 1 --hive-partition-key --hive-partition-value --hive-import \
--fields-terminated-by ',' --lines-terminated-by '\n'
You have to create a partitioned table structure first, before you move your data to table into partitioned table. While sqoop, no need to specify --hive-partition-key and --hive-partition-value, use --hcatalog-table instead of --hive-table.
Manu
If this is still something people wanted to understand, they can use
sqoop import --driver <driver name> --connect <connection url> --username <user name> -P --table employee --num-mappers <numeral> --warehouse-dir <hdfs dir> --hive-import --hive-table table_name --hive-partition-key departure_date --hive-partition-value $departure_date
Notes from the patch:
sqoop import [all other normal command line options] --hive-partition-key ds --hive-partition-value "value"
Some limitations:
It only allows for one partition key/value
hardcoded the type for the partition key to be a string
With auto partitioning in hive 0.7 we may want to adjust this to just have one command line option for the key name and use that column from the db table to partition.