I am using cloudera quick start edition CDH 5.7
I used below query on terminal window:
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--query="select * from orders join order_items on orders.order_id = order_items.order_item_order_id where \$CONDITIONS" \
--target-dir /user/cloudera/order_join \
--split-by order_id \
--num-mappers 4
Q: What is the purpose of the $CONDITIONS ? Why used in this query ? Can anybody can explain to me.
$CONDITIONS is used internally by sqoop to modify query to achieve task splitting and fetching metadata.
To fetch metadata, sqoop replaces \$CONDITIONS with 1= 0
select * from table where 1 = 0
To fetch all data (1 mapper), sqoop replaces \$CONDITIONS with 1= 1
select * from table where 1 = 1
In the case of multiple mappers, sqoop replaces \$CONDITIONS with range query to fetch a subset of data from RDBMS.
For example, id lies between 1 to 100 and we are using 4 mappers.
Select * From table WHERE id >= 1' AND 'id < 25
Select * From table WHERE id >= 25' AND 'id < 50
Select * From table WHERE id >= 50' AND 'id < 75
Select * From table WHERE id >= 75' AND 'id <= 100
Related
I am trying to run MySQL query through bash script. But, when I run SELECT * FROM EXAMPLE_DB; inside bash scripting, it is translated to SELECT files1 files2 files3 where I run the script.
Example :
read -d '' SQL_QUERY << EOF
SET #var_name = NOW() - INTERVAL 30 DAY;
CREATE TABLE tassta.temp_vontista_messages AS SELECT * FROM tassta.vontista_messages WHERE date(sent_date) >= date(#var_name);
EOF
echo ${SQL_QUERY} | mysql
What I want to run the mysql query as it is. What happened now that this is translated to
read -d '' SQL_QUERY << EOF
SET #var_name = NOW() - INTERVAL 30 DAY;
CREATE TABLE tassta.temp_vontista_messages AS SELECT file1 file2 file3 [files from where I run the script.] FROM tassta.vontista_messages WHERE date(sent_date) >= date(#var_name);
EOF
echo ${SQL_QUERY} | mysql
SQL_QUERY="SET #var_name = NOW() - INTERVAL 30 DAY;
CREATE TABLE tassta.temp_vontista_messages AS SELECT * FROM tassta.vontista_messages WHERE date(sent_date) >= date(#var_name);"
mysql -Be "$SQL_QUERY"
or:
echo "$SQL_QUERY" | mysql
NOTE: Do not put spaces before, or after the =.
see: How do I escape the wildcard/asterisk character in bash?
Lets Say I have database named ABC.
And tables X,Y,Z,M,N which has organization_id 0-50.
How is it possible to create a dump file for tables X,Y,Z with only organization_id=22 ?
You could go through table by table and do
mysqldump -uroot -p db_name table_name --where='id=22'
or you can use
SELECT * INTO OUTFILE 'data_path.sql' from table where id=22
using mysqldump tool you can do that , as you mentioned your database is ABC , and tables are X,Y,Z.hope this works for you.
mysqldump -u db_user -p ABC X Y Z --no_create_info \
--lock-all-tables --where 'organization_id \
in (SELECT X.organization_id FROM X \
JOIN Y ON X.organization_id = Y.organization_id \
JOIN Z ON Y.organization_id = Z.organization_id \
WHERE X.organization_id = 22)' > data.sql
Other questions have been asked/answered but none could point towards answering my question below:
I am importing just a limited range of rows (e.g., "where _ID between 107 and 307 ") from MySQL table to hdfs. I expect the query to work given that the MySQL query alone is valid, yet I get a MySQL syntax error. Alternatively I could import using the upper and lower limits and then merge files later, which long and I don't want to do that.
Here is the query:
sqoop import \
--connect jdbc:mysql://localhost/test \
--username=username \
--password=password \
--query 'select * from PURCHASE where purchase_id between 107 and 307 where $CONDITIONS' \
--target-dir /testpurchase \
--split-by purchase_id
Please, is there anything I am omitting here in the query, thanks?
I found out that there were two ambiguous WHERE clauses, so changing last WHERE $CONDITIONS to AND $CONDITIONS did the trick.
sqoop import \
--connect jdbc:mysql://localhost/test \
--username=username \
--password=password \
--query 'select * from PURCHASE WHERE purchase_id between 107 and 307 AND $CONDITIONS' \
--target-dir /testpurchase \
--split-by purchase_id
And that a --boundary-query & --table could just do the work as well
sqoop import \
--connect jdbc:mysql://localhost/test \
--username=username \
--password=password \
--boundary-query 'Select 107,307 from purchase' \
--table purchase\
--target-dir /testpurchase \
I met some problem that wired me the whole day. I'm using flask/flask-sqlalchemy/postgresql, and I want to do this:
published_sites = msg_published.query \
.filter( and_(*filter_clause) ) \
.group_by( msg_published.site_id ) \
.order_by( order_clause ) \
.paginate( page_no, per_page, error_out = False )
but in mysql, it is OK, and in postgresql it is wrong and ask for the other fields besides site_id either in a group by clause or in a aggregation function, I know that postgresql is stricter on SQL than mysql , so I must select the site_id in the query object of msg_published, but in pure sqlalchemy I can do like this:
published_sites = session.query( msg_published.site_id ) \
.filter( and_(*filter_clause) ) \
.group_by( msg_published.site_id ) \
.order_by( order_clause ) \
.paginate( page_no, per_page, error_out = False )
and in flask-sqlalchemy, how to get it work?
You're most of the way there. to do in PostgreSQL what MySQL allows requires a subselect.
published_sites_ids = session.query( msg_published.site_id ) \
.filter( and_(*filter_clause) ) \
.group_by( msg_published.site_id ) \
.order_by( order_clause ) \
.paginate( page_no, per_page, error_out = False )
published_sites = session.query(msg_published) \
.filter(msg_published.id.in_(published_sites_ids))
When I run the SQL script to export data out of MySQL database, using command line, I get the above error.
The SQL query works fine when run in phpMyAdmin, but just when run from command line throws an error.
Here is the command line I am using:
cat my_export | mysql -uxyzuser -pabcpassword mydb > export072911.txt
The code in my_export is as follows:
SELECT CONCAT( custfirstname, ' ', custlastname ) AS fullname, custcompany, \
SPACE( 10 ) AS custtitle, custaddressone, custaddresstwo, custcity, custstate, \
custzip, SPACE( 10 ) AS dummy, custphone, SPACE( 10 ) AS custfax, custemail, \
event_id, SPACE( 10 ) AS ticket1, SPACE( 10 ) AS ticket2, \
SPACE( 10 ) AS ticket3, SPACE( 10 ) AS ticket4, orderdate, b.quantity, \
FROM order_master a \
LEFT JOIN order_detail b ON b.order_master_id = a.id \
LEFT JOIN customer c ON c.email = a.custemail \
WHERE a.orderdate > '2010-12-01'\
AND a.event_id = '30' \
AND a.orderstatus = 'O' \
AND b.litype = 'ITEM' \
AND b.reftag = 'PKG' \
ORDER BY a.orderdate DESC;
You can safely delete all the backslashes and use input redirection rather than piping. The backslashes are needed if you are working with the SQL as a shell variable, but not for piping or redirection.
mysql -uxyzuser -pabcpassword mydb < my_export > export072911.txt
UPDATE After a quick test of my own, it looks like the pipe works just as well as input redirection as long as the backslashes are removed.