I am trying to sqoop data out of a MySQL database where I have a table with both a primary key and a last_updated field. I am trying to essentially get all records that were recently updated and overwrite the current records in the hive warehouse
I have tried the following command
sqoop job --create trainingDataUpdate -- import \
--connect jdbc:mysql://localhost:3306/analytics \
--username user \
--password-file /sqooproot.pwd \
--incremental lastmodified \
--check-column last_updated \
--last-value '2015-02-13 11:08:18' \
--table trainingDataFinal \
--merge-key id \
--direct --hive-import \
--hive-table analytics.trainingDataFinal \
--null-string '\\N' \
--null-non-string '\\N' \
--map-column-hive last_updated=TIMESTAMP
and I get the following error
15/02/13 14:07:41 INFO hive.HiveImport: FAILED: SemanticException Line 2:17 Invalid path ''hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_hwhjobdev_cluster.com_trainingDataFinal'': No files matching path hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_dev.cluster.com_trainingDataFinal
15/02/13 14:07:42 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
I thought by including the --merge-key it would be able to overwrite the old records with new records. Does anyone know if this is possible in sqoop?
I don't think sqoop can do it.
--merge-key is only used by sqoop-merge not import
also see http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421
Related
while doing sqoop export,cant parse input data.seeing below exception.
java.lang.RuntimeException: Can't parse input data: '"DeptId":888'
Caused by: java.lang.NumberFormatException
From Oracle DeptId is Number datatype
sqoop export \
--connect "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=wcx2-scan..com)(PORT=))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=***)))" \
--table API.CUSTOMER \
--columns 'Id','DeptId','Strategy','RCM_PROD_MSG_TXT' \
--export-dir /tmp/test \
--map-column-java RCM_PROD_MSG_TXT=String \
--username ********* \
--password ********** \
--input-fields-terminated-by ',' \
--input-null-string '\N' \
--input-null-non-string '\N'
Sample Json data
{"Id":"27952436","DeptId":888,"Strategy":"syn-cat-recs","recs":
[629848,1029280]}
Make sure data should be loaded to Oracle table
I am trying to import data from mysql to hbase with the below code:
sqoop import \
--connect jdbc:mysql://localhost/sampleOne \
--username root \
--password root \
--table SAMPLEDATA \
--columns "ID,NAME,DESIGNATION" \
--hbase-table customers \
--column-family ‘ID’ \
--hbase-row-key ‘ID,NAME’ \
-m 1
But the above code fails saying that --hbase-row-key command not found but if I execute without --hbase-row-key command then it works like a charm. So what must be the issue ??
By by default tables are moving to HDFS not to warehouse directory(user/hive/warehouse)
sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_files
Tried with --hive-home by overriding $HIVE_HOME- No use
Can any one suggest me the reason?
I have CSV file in HDFS with lines like:
"2015-12-01","Augusta","46728.0","1"
I am trying to export this file to MySQL table.
CREATE TABLE test.events_top10(
dt VARCHAR(255),
name VARCHAR(255),
summary VARCHAR(255),
row_number VARCHAR(255)
);
With the command:
sqoop export --table events_top10 --export-dir /user/hive/warehouse/result --escaped-by \" --connect ...
This command fails with error:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: '2015-12-02,Ashburn,43040.0,9'
at events_top10.__loadFromFields(events_top10.java:335)
at events_top10.parse(events_top10.java:268)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at events_top10.__loadFromFields(events_top10.java:320)
... 12 more
In case I do not use --escaped-by \" parameter than MySQL table contains rows like this
"2015-12-01" | "Augusta" | "46728.0" | "1"
Could you please explain how to export CSV file to MySQL table without double quotes?
I have to use both --escaped-by \ and --enclosed-by '\"'
So the correct command is
sqoop export --table events_top10 --export-dir /user/hive/warehouse/result --escaped-by '\\' --enclosed-by '\"' --connect ...
For more information please see official documentation
I wanted to import the structured data from my MySQL database sqoopDB using sqoop, but I met some pbs when I tried to specify the timestamp mapping. For example,
$ sqoop import \
--connect jdbc:mysql://localhost/sqoopDB \
--username root -P \
--table sensor \
--columns "name, type, value" \
--hbase-table sqoopDB \
--column-family sensor \
--hbase-row-key id -m 1
Enter password:
...
Can I add another parameter --timestamp to specify the timestamp mapping ? Such as
--timestamp=insert_date_long