I have 1000 tables with more than 100000 records in each table in mysql. The tables have 300-500 columns.
Some of tables have columns with special characters like .(dot) and space in the column names.
Now I want to do sqoop import and create a hive table in HDFS in a single shot query like below
sqoop import --connect ${domain}:${port}/$(database) --username ${username} --password ${password}\
--table $(table) -m 1 --hive-import --hive-database ${hivedatabase} --hive-table $(table) --create-hive-table\
--target-dir /user/hive/warehouse/${hivedatabase}.db/$(table)
After this the hive table is created but when I query the table it shows error as
This error output is a sample output.
Error while compiling statement: FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp from [0:emp.id, 1:emp.name, 2:emp.salary, 3:emp.dno]
How can we replace the .(dot) with _(underscore) while doing sqoop import itself. I would like to do this dynamically.
Use sqoop import \ with --query option rather than --table and in query use replace function .
ie
sqoop import --connect ${domain}:${port}/$(database) --username ${username} --password ${password}\
-- query 'Select col1 ,replace(col2 ,'.','_') as col from table.
Or (not recommended) write a shell script which can do find and replace "." to "_" (Grep command)at /user/hive/warehouse/${hivedatabase}.db/$(table)
Related
I'm trying to export one of the tables from hive to MySQL using sqoop export. The hive table data contains the special characters.
My hive "special_char" table data:
1 じゃあまた
2 どうぞ
My Sqoop Command:
sqoop export --verbose --connect jdbc:mysql://xx.xx.xx.xxx/Sampledb --username abc --password xyz --table special_char --direct --driver com.mysql.jdbc.Driver --export-dir /apps/hive/warehouse/sampledb.db/special_char --fields-terminated-by ' '
After using the above sqoop export command, the data is stored in the form of question marks (???) instead of actual message with special characters.
MySql "special_char" table:
id message
1 ?????
2 ???
Can anyone please help me out,in storing the special characters instead of question marks (???).
Specify proper encoding and charset in the JDBC URL as below:
jdbc:mysql://xx.xx.xx.xxx/Sampledb?useUnicode=true&characterEncoding=UTF-8
sqoop export --verbose --connect jdbc:mysql://xx.xx.xx.xxx/Sampledb?useUnicode=true&characterEncoding=UTF-8 --username abc --password xyz --table special_char --direct --driver com.mysql.jdbc.Driver --export-dir /apps/hive/warehouse/sampledb.db/special_char --fields-terminated-by ' '
Please verify charset encoding for Japanese characters and use proper one.
Reference: https://community.hortonworks.com/content/supportkb/198290/native-sqoop-export-from-hdfs-fails-for-unicode-ch.html
I'm already having a MySQL table in my local machine (Linux) it self, and I have a Hive external table with the same schema as the MySQL table.
I'm trying to import data from MySQL table to my Hive external table and I'm using Sqoop for this.
But then the problem is, whenever a new record is being added to the MySQL table, it doesn't update the Hive external table automatically?
This is the Sqoop import command I'm using:
sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root -P --split-by id --columns id,name,age,salary --table customer --target-dir /user/chamith/mysqlhivetest/ --fields-terminated-by "," --hive-import --hive-table test.customers
Am I missing something over here? Or how can this be done?
Any help could be appreciated.
In your case a new row appended to the table.
So you need to use incremental append approach.
When to use append mode?
Works for numerical data that is incrementing over time, such as
auto-increment keys
When importing a table where new rows are continually being added
with increasing row id values
Now what you need to add in command
-check-column Specifies the column to be examined when determining which rows to import.
--incremental Specifies how Sqoop determines which rows are new.
--last-value Specifies the maximum value of the check column from the previous import
Ideal to perform this is using sqoop job as in this case sqoop metastore remembers the last value automatically
Step 1 :Intially load data with normal import command.
Step 2:
sqoop job --create incrementalImportJob -- import \
--connect jdbc:mysql://localhost:3306/sqoop
--username root
-P
--split-by id
--columns id,name,age,salary
--table customer
--incremental append \
--check-column id \
--last-value 5
--fields-terminated-by ","
--target-dir hdfs://ip:8020/path/to/table/;
Hope this helps..
I am trying to import a partial table from MySQL to HDFS database. I tried Sqoop import. It is working when I am applying only one condition in where clause. But when I add one more condition it gives me error:
Error parsing arguments for import:
Query is following:
sqoop import --table accounts --connect jdbc:mysql://localhost/loudacre --username myuser --password mypw --target-dir /homeworks/sqoop/ --where "state='CA'" and "acct_close_dt IS NULL"
Try with Free-form Query Imports:
sqoop import --table accounts --connect jdbc:mysql://localhost/loudacre --username myuser --password mypw --target-dir /homeworks/sqoop/ --query "select * from accounts where state='CA' and acct_close_dt IS NULL AND \$CONDITIONS"
If you are writing --query in single quotes(') use $CONDITIONS instead of \$CONDITIONS
Check docs as suggested by Piyush.
how does Sqoop mapped import csv file to my sql table's column ? I just ran below import and export sqoop command and it work properly but not sure how Sqoop mapped the imported result into my sql table column's ? I have CSV file created manually which I want to export to my sql so need a way to specify csv file & column mapping ..
sqoop import \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--username myuser \
--password mypassword \
--query 'SELECT MARS_ID , MARKET_ID , USERROLE_ID , LEADER_MARS_ID , CREATED_TIME , CREATED_USER , LST_UPDTD_TIME , LST_UPDTD_USER FROM USERS_TEST u WHERE $CONDITIONS' \
-m 1 \
--target-dir /idn/home/data/user
Deleted record from my sql database and run the below export command which inserted data back into table .
sqoop export \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--table USERS_TEST \
--export-dir /idn/home/data/user \
--username myuser \
--password mypassword \
You can utilize --input-fields-terminated-by and --columns parameters to control the structure of the data to be exported back to RDBMS through Sqoop.
I would recommend you to refer the sqoop user guide for more information.
I am trying to import data to HDFS from a RDBMS table. I am then using create-hive-table to copy schema to hive and then load data to that hive table.
command used to import to HDFS
sqoop import --connect jdbc:mysql://localhost/sqoop --username sqoop --password sqoop --table customers --warehouse-dir testingsqoop -m 1 --fields-terminated-by ',' --enclosed-by "\'" --lines-terminated-by "\n"
command used to create hive table:
sqoop create-hive-table --connect jdbc:mysql://localhost/sqoop --username sqoop --password sqoop --table customers --hive-table customers --fields-terminated-by "," --enclosed-by "\'" --lines-terminated-by "\n"
And finally the query used to load data to hive
load data inpath '/user/cloudera/testingsqoop/customers/*' into table customers;
As I am enclosing the fields with a single quote ', hive while creating the table is not considering the --enclosed-by flag, hence the columns in the hive table are still having quotes '.
NULL 'Richard' 'Hernandez' 'XXXXXXXXX' 'XXXXXXXXX' '6303 Heather Plaza' 'Brownsville' 'TX' '78521'
However if I don't use --enclosed-by it works fine, but i want to have it.
1) Could you please help regarding this?
2) Also is there any way i can specify multiple chars for field terminator?
Thanks!
Try below,
--fields-terminated-by \01
--hive-drop-import-delims
--null-string \N
--null-non-string \N
in your sqoop import data query and hive table definition queries.
Most likely, your syntax is causing the exception. Try using:
--enclosed-by "'"
instead of \'.
Yes, you can import with multiple characters set as the field delimiters.