I use sqoop to import data from mysql to hadoop in csv form, it works well when use table argument. However, when I use query argument, it can only import the first column, the other columns are missed.
Here you are my command.
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
--query ' select age, job from person where $CONDITIONS ' \
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
In the csv file, it shows only the age.
Does anyone know how to solve it?
Thanks
Read this documentation from sqoop User Guide, When you use $condition you must specift the splitting column.
Sqoop can also import the result set of an arbitrary SQL query. Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument.
When importing a free-form query, you must specify a destination directory with --target-dir.
If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop.
Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.
For example:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults
Alternately, the query can be executed once and imported serially, by specifying a single map task with -m 1:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
-m 1 --target-dir /user/foo/joinresults
Try this:
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
**--columns "First_Column" \**
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
Whenever you are using --query parameter, you need to specify the --split-by parameter with the column that should be used for slicing your data into multiple parallel tasks. The another required parameter is --target-dir, which specifies the directory on HDFS where your data should be stored.
Solution: Try to include --split-by argument to your sqoop command and see if the error is resolved.
Related
I'm trying to import data from MySQL table to Hive using Sqoop. From what I understood there are 2 ways of doing that.
Import data into HDFS and then create External Table in Hive and load data into that table.
Use create-hive-table while running Sqoop query to create a new table in Hive and directly load data into that. I am trying to do this but can't do it for some reason
This is my code
sqoop import \
--connect jdbc:mysql://localhost/EMPLOYEE \
--username root \
--password root \
--table emp \
--m 1 \
--hive-database sqoopimport \
--hive-table sqoopimport.employee \
--create-hive-table \
--fields-terminated-by ',';
I tried using --hive-import as well but got error.
When I ran the above query, job was successful but there was no table created in hive as well as data was stored in \user\HDFS\emp\ location where \HDFS\emp was created during the job.
PS: Also I could not find any reason for using --m 1 with Sqoop. It's just there in all queries.
I got the import working with following query. There is no need to write create-hive-table we can just write new table name with hive-table and that table will be created. Also if there is any issue then go to hive-metastore location and run rm *.lck then try import again.
sqoop import \
--connect jdbc:mysql://localhost/EMPLOYEE \
--username root \
--password root \
--table emp4 \
--hive-import \
--hive-table sqoopimport.emp4 \
--fields-terminated-by "," ;
I'm trying to perform this query in sqoop but seems like I'm not able to apply a right text filter in a string field. Here is my code:
sqoop import --connect jdbc:mysql://xxxxxxx --username xxxxxx --password xxxx \
--query 'select year(order_date) as year,department_name,sum(revenue_per_day) from revenue where department_name="Apparel" and $CONDITIONS group by year(order_date),department_name' \
--split-by department_name --target-dir /user/ --fields-terminated-by '|' -m 2 \
The message says: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" the property passed as a parameter
So what should be the right split, in this case, to perform this query if the other two columns are aggregation?
Could some of you guys check what's wrong in my code? I have not found how to figure it out.
I have 1000 tables with more than 100000 records in each table in mysql. The tables have 300-500 columns.
Some of tables have columns with special characters like .(dot) and space in the column names.
Now I want to do sqoop import and create a hive table in HDFS in a single shot query like below
sqoop import --connect ${domain}:${port}/$(database) --username ${username} --password ${password}\
--table $(table) -m 1 --hive-import --hive-database ${hivedatabase} --hive-table $(table) --create-hive-table\
--target-dir /user/hive/warehouse/${hivedatabase}.db/$(table)
After this the hive table is created but when I query the table it shows error as
This error output is a sample output.
Error while compiling statement: FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp from [0:emp.id, 1:emp.name, 2:emp.salary, 3:emp.dno]
How can we replace the .(dot) with _(underscore) while doing sqoop import itself. I would like to do this dynamically.
Use sqoop import \ with --query option rather than --table and in query use replace function .
ie
sqoop import --connect ${domain}:${port}/$(database) --username ${username} --password ${password}\
-- query 'Select col1 ,replace(col2 ,'.','_') as col from table.
Or (not recommended) write a shell script which can do find and replace "." to "_" (Grep command)at /user/hive/warehouse/${hivedatabase}.db/$(table)
how does Sqoop mapped import csv file to my sql table's column ? I just ran below import and export sqoop command and it work properly but not sure how Sqoop mapped the imported result into my sql table column's ? I have CSV file created manually which I want to export to my sql so need a way to specify csv file & column mapping ..
sqoop import \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--username myuser \
--password mypassword \
--query 'SELECT MARS_ID , MARKET_ID , USERROLE_ID , LEADER_MARS_ID , CREATED_TIME , CREATED_USER , LST_UPDTD_TIME , LST_UPDTD_USER FROM USERS_TEST u WHERE $CONDITIONS' \
-m 1 \
--target-dir /idn/home/data/user
Deleted record from my sql database and run the below export command which inserted data back into table .
sqoop export \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--table USERS_TEST \
--export-dir /idn/home/data/user \
--username myuser \
--password mypassword \
You can utilize --input-fields-terminated-by and --columns parameters to control the structure of the data to be exported back to RDBMS through Sqoop.
I would recommend you to refer the sqoop user guide for more information.
When importing data from MySQL to Hive I need to normalize several text fields containing phone numbers. This requires quite complex logic which is hard to express in Sqoop command line with a single SQL replace function.
Is it possible to specify SQL select expressions in a separate file and refer to it from a command line?
Thanks!
You can try:
$ sqoop --options-file /users/homer/work/option.txt -
your option.txt will look as:
# Options file for Sqoop import
#
# Specifies the tool being invoked
import
# Connect parameter and value
--connect
jdbc:mysql://localhost/db
# Username parameter and value
--username
foo
## Query
--query
"select * from Table WHERE \$CONDITIONS"
You can use query option in your sqoop as:
sqoop import --username $username\
--password $pwd\
--connect jdbc:mysql://54.254.177.160:3306/msta_casestudy\
--query "SELECT a1, b1,c1 FROM MyTable WHERE \$CONDITIONS"\
--split-by HMS_PACK_ID\
--target-dir /home/root/myfile\
--fields-terminated-by "|" -m 1
If your query is spanned across multiple lines you either you can pass a file with --options-file or, in your command prompt or shell script you can code query as below (please note "\" specifies line concatenations)
--query
select \
col1 \
,col2 \
,col3 \
,col4 \
from table1 a \
join \
table2 b \
etc etc