How to specify timestamp in sqoop import from MySQL to HBase? - mysql

I wanted to import the structured data from my MySQL database sqoopDB using sqoop, but I met some pbs when I tried to specify the timestamp mapping. For example,
$ sqoop import \
--connect jdbc:mysql://localhost/sqoopDB \
--username root -P \
--table sensor \
--columns "name, type, value" \
--hbase-table sqoopDB \
--column-family sensor \
--hbase-row-key id -m 1
Enter password:
...
Can I add another parameter --timestamp to specify the timestamp mapping ? Such as
--timestamp=insert_date_long

Related

unable to export Json data from HDFS to Oracle using sqoop export

while doing sqoop export,cant parse input data.seeing below exception.
java.lang.RuntimeException: Can't parse input data: '"DeptId":888'
Caused by: java.lang.NumberFormatException
From Oracle DeptId is Number datatype
sqoop export \
--connect "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=wcx2-scan..com)(PORT=))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=***)))" \
--table API.CUSTOMER \
--columns 'Id','DeptId','Strategy','RCM_PROD_MSG_TXT' \
--export-dir /tmp/test \
--map-column-java RCM_PROD_MSG_TXT=String \
--username ********* \
--password ********** \
--input-fields-terminated-by ',' \
--input-null-string '\N' \
--input-null-non-string '\N'
Sample Json data
{"Id":"27952436","DeptId":888,"Strategy":"syn-cat-recs","recs":
[629848,1029280]}
Make sure data should be loaded to Oracle table

SQOOP import from MYSQL to HBASE

I am trying to import data from mysql to hbase with the below code:
sqoop import \
--connect jdbc:mysql://localhost/sampleOne \
--username root \
--password root \
--table SAMPLEDATA \
--columns "ID,NAME,DESIGNATION" \
--hbase-table customers \
--column-family ‘ID’ \
--hbase-row-key ‘ID,NAME’ \
-m 1
But the above code fails saying that --hbase-row-key command not found but if I execute without --hbase-row-key command then it works like a charm. So what must be the issue ??

Sqoop import all tables into hive gets stuck with below statement

By by default tables are moving to HDFS not to warehouse directory(user/hive/warehouse)
sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_files
Tried with --hive-home by overriding $HIVE_HOME- No use
Can any one suggest me the reason?

using sqoop to update hive table

I am trying to sqoop data out of a MySQL database where I have a table with both a primary key and a last_updated field. I am trying to essentially get all records that were recently updated and overwrite the current records in the hive warehouse
I have tried the following command
sqoop job --create trainingDataUpdate -- import \
--connect jdbc:mysql://localhost:3306/analytics \
--username user \
--password-file /sqooproot.pwd \
--incremental lastmodified \
--check-column last_updated \
--last-value '2015-02-13 11:08:18' \
--table trainingDataFinal \
--merge-key id \
--direct --hive-import \
--hive-table analytics.trainingDataFinal \
--null-string '\\N' \
--null-non-string '\\N' \
--map-column-hive last_updated=TIMESTAMP
and I get the following error
15/02/13 14:07:41 INFO hive.HiveImport: FAILED: SemanticException Line 2:17 Invalid path ''hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_hwhjobdev_cluster.com_trainingDataFinal'': No files matching path hdfs://dev.cluster.com:8020/user/hdfs/_sqoop/13140640000000520_32226_dev.cluster.com_trainingDataFinal
15/02/13 14:07:42 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:385)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:335)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
I thought by including the --merge-key it would be able to overwrite the old records with new records. Does anyone know if this is possible in sqoop?
I don't think sqoop can do it.
--merge-key is only used by sqoop-merge not import
also see http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764421

why can mongoexport not parse this JSON when mongo can?

this is the query:
mongoexport --host our.dbhost.com --port 27017 --username peter -p clark --collection sent_mails --db dbname --query '{trigger_id:ObjectId( "50c62e97b9fe6a000200000c"), updated_at: {$lt : ISODate("2013-02-28"), $gte : ISODate("2013-02-01") }}'
when I run this command I get:
assertion: 10340 Failure parsing JSON string near: , updated_
any ideas? (i want all records that match the trigger_id that were updated in February.)
as explained in this issue: Mongoexport using $gt and $lt constraints on a date range, you have to use Unix time stamps for date queries in mongoexport
The time stamps have to be in milliseconds
Invoking this in the bash shell would look like this:
let "date_from=`date --utc --date "2013-02-01" +%s` * 1000"
let "date_to=`date --utc --date "2013-03-01" +%s` * 1000"
mongoexport -d test -c xx --query "{updated_at:{\$gte:new Date($date_from),\$lt:new Date($date_to)}}"> xx.json
> connected to: 127.0.0.1
> exported 1 records
The xx colletion contains:
> db.xx.find().pretty()
{
"_id" : ObjectId("5158f670c2293fc7aadd811e"),
"trigger_id" : ObjectId("50c62e97b9fe6a000200000c"),
"updated_at" : ISODate("2013-02-11T00:00:00Z")
}