When importing data from MySQL to Hive I need to normalize several text fields containing phone numbers. This requires quite complex logic which is hard to express in Sqoop command line with a single SQL replace function.
Is it possible to specify SQL select expressions in a separate file and refer to it from a command line?
Thanks!
You can try:
$ sqoop --options-file /users/homer/work/option.txt -
your option.txt will look as:
# Options file for Sqoop import
#
# Specifies the tool being invoked
import
# Connect parameter and value
--connect
jdbc:mysql://localhost/db
# Username parameter and value
--username
foo
## Query
--query
"select * from Table WHERE \$CONDITIONS"
You can use query option in your sqoop as:
sqoop import --username $username\
--password $pwd\
--connect jdbc:mysql://54.254.177.160:3306/msta_casestudy\
--query "SELECT a1, b1,c1 FROM MyTable WHERE \$CONDITIONS"\
--split-by HMS_PACK_ID\
--target-dir /home/root/myfile\
--fields-terminated-by "|" -m 1
If your query is spanned across multiple lines you either you can pass a file with --options-file or, in your command prompt or shell script you can code query as below (please note "\" specifies line concatenations)
--query
select \
col1 \
,col2 \
,col3 \
,col4 \
from table1 a \
join \
table2 b \
etc etc
Related
I'm trying to perform this query in sqoop but seems like I'm not able to apply a right text filter in a string field. Here is my code:
sqoop import --connect jdbc:mysql://xxxxxxx --username xxxxxx --password xxxx \
--query 'select year(order_date) as year,department_name,sum(revenue_per_day) from revenue where department_name="Apparel" and $CONDITIONS group by year(order_date),department_name' \
--split-by department_name --target-dir /user/ --fields-terminated-by '|' -m 2 \
The message says: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" the property passed as a parameter
So what should be the right split, in this case, to perform this query if the other two columns are aggregation?
Could some of you guys check what's wrong in my code? I have not found how to figure it out.
I have 1000 tables with more than 100000 records in each table in mysql. The tables have 300-500 columns.
Some of tables have columns with special characters like .(dot) and space in the column names.
Now I want to do sqoop import and create a hive table in HDFS in a single shot query like below
sqoop import --connect ${domain}:${port}/$(database) --username ${username} --password ${password}\
--table $(table) -m 1 --hive-import --hive-database ${hivedatabase} --hive-table $(table) --create-hive-table\
--target-dir /user/hive/warehouse/${hivedatabase}.db/$(table)
After this the hive table is created but when I query the table it shows error as
This error output is a sample output.
Error while compiling statement: FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp from [0:emp.id, 1:emp.name, 2:emp.salary, 3:emp.dno]
How can we replace the .(dot) with _(underscore) while doing sqoop import itself. I would like to do this dynamically.
Use sqoop import \ with --query option rather than --table and in query use replace function .
ie
sqoop import --connect ${domain}:${port}/$(database) --username ${username} --password ${password}\
-- query 'Select col1 ,replace(col2 ,'.','_') as col from table.
Or (not recommended) write a shell script which can do find and replace "." to "_" (Grep command)at /user/hive/warehouse/${hivedatabase}.db/$(table)
how does Sqoop mapped import csv file to my sql table's column ? I just ran below import and export sqoop command and it work properly but not sure how Sqoop mapped the imported result into my sql table column's ? I have CSV file created manually which I want to export to my sql so need a way to specify csv file & column mapping ..
sqoop import \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--username myuser \
--password mypassword \
--query 'SELECT MARS_ID , MARKET_ID , USERROLE_ID , LEADER_MARS_ID , CREATED_TIME , CREATED_USER , LST_UPDTD_TIME , LST_UPDTD_USER FROM USERS_TEST u WHERE $CONDITIONS' \
-m 1 \
--target-dir /idn/home/data/user
Deleted record from my sql database and run the below export command which inserted data back into table .
sqoop export \
--connect jdbc:mysql://mysqlserver:3306/mydb \
--table USERS_TEST \
--export-dir /idn/home/data/user \
--username myuser \
--password mypassword \
You can utilize --input-fields-terminated-by and --columns parameters to control the structure of the data to be exported back to RDBMS through Sqoop.
I would recommend you to refer the sqoop user guide for more information.
I use sqoop to import data from mysql to hadoop in csv form, it works well when use table argument. However, when I use query argument, it can only import the first column, the other columns are missed.
Here you are my command.
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
--query ' select age, job from person where $CONDITIONS ' \
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
In the csv file, it shows only the age.
Does anyone know how to solve it?
Thanks
Read this documentation from sqoop User Guide, When you use $condition you must specift the splitting column.
Sqoop can also import the result set of an arbitrary SQL query. Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument.
When importing a free-form query, you must specify a destination directory with --target-dir.
If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop.
Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.
For example:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults
Alternately, the query can be executed once and imported serially, by specifying a single map task with -m 1:
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
-m 1 --target-dir /user/foo/joinresults
Try this:
sqoop import \
--connect jdbc:mysql://127.0.0.1:3306/sqoop \
--username root \
--password root \
**--columns "First_Column" \**
--bindir /tmp/sqoop-hduser/compile \
--fields-terminated-by ',' \
--target-dir /Users/hduser/hadoop_data/onedaydata -m1
Whenever you are using --query parameter, you need to specify the --split-by parameter with the column that should be used for slicing your data into multiple parallel tasks. The another required parameter is --target-dir, which specifies the directory on HDFS where your data should be stored.
Solution: Try to include --split-by argument to your sqoop command and see if the error is resolved.
I have a MySQL table where some of the values in a varchar column end with '^M' (i.e. carriage return or '\r') while others do not. The MySQL database is part of a production environment that I do not control, and so I'm unable to remove the trailing carriage returns with a simple update mytable set mycol = trim(mycol);.
When I sqoop the MySQL table to my cluster, I notice that the records with carriage return end up misaligned resulting in some strange query results. The sqoop (v 1.4.4) command looks like this:
sqoop import \
--connect jdbc:mysql://myhost:3306/mydb
--username myuser
--password mypass
--table mytable
--target-dir user/hive/warehouse/mydb.db/mytable
--hive-import
--hive-table mydb.mytable
--hive-overwrite -m 1
Q) Is it possible to sqoop data that contains some carriage returns directly from MySQL without having some sort of intermediate step to remove the carriage returns?
The ideal workflow would be a simple sqoop command scheduled by oozie. Staging the data and stripping out \r with sed (or whatever) seems like a kludge.
The answer was in the manual (http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html). I needed to add the following argument to my sqoop statement:
--hive-drop-import-delims