What is the purpose of $CONDITIONS under --query?

What is the purpose of $CONDITIONS under --query? - mysql

I am using cloudera quick start edition CDH 5.7
I used below query on terminal window:
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--query="select * from orders join order_items on orders.order_id = order_items.order_item_order_id where \$CONDITIONS" \
--target-dir /user/cloudera/order_join \
--split-by order_id \
--num-mappers 4
Q: What is the purpose of the $CONDITIONS ? Why used in this query ? Can anybody can explain to me.

$CONDITIONS is used internally by sqoop to modify query to achieve task splitting and fetching metadata.
To fetch metadata, sqoop replaces \$CONDITIONS with 1= 0
select * from table where 1 = 0
To fetch all data (1 mapper), sqoop replaces \$CONDITIONS with 1= 1
select * from table where 1 = 1
In the case of multiple mappers, sqoop replaces \$CONDITIONS with range query to fetch a subset of data from RDBMS.
For example, id lies between 1 to 100 and we are using 4 mappers.
Select * From table WHERE id >= 1' AND 'id < 25
Select * From table WHERE id >= 25' AND 'id < 50
Select * From table WHERE id >= 50' AND 'id < 75
Select * From table WHERE id >= 75' AND 'id <= 100

Related

Run mysql query inside bash script

I am trying to run MySQL query through bash script. But, when I run SELECT * FROM EXAMPLE_DB; inside bash scripting, it is translated to SELECT files1 files2 files3 where I run the script.
Example :
read -d '' SQL_QUERY << EOF
SET #var_name = NOW() - INTERVAL 30 DAY;
CREATE TABLE tassta.temp_vontista_messages AS SELECT * FROM tassta.vontista_messages WHERE date(sent_date) >= date(#var_name);
EOF
echo ${SQL_QUERY} | mysql
What I want to run the mysql query as it is. What happened now that this is translated to
read -d '' SQL_QUERY << EOF
SET #var_name = NOW() - INTERVAL 30 DAY;
CREATE TABLE tassta.temp_vontista_messages AS SELECT file1 file2 file3 [files from where I run the script.] FROM tassta.vontista_messages WHERE date(sent_date) >= date(#var_name);
EOF
echo ${SQL_QUERY} | mysql

SQL_QUERY="SET #var_name = NOW() - INTERVAL 30 DAY;
CREATE TABLE tassta.temp_vontista_messages AS SELECT * FROM tassta.vontista_messages WHERE date(sent_date) >= date(#var_name);"
mysql -Be "$SQL_QUERY"
or:
echo "$SQL_QUERY" | mysql
NOTE: Do not put spaces before, or after the =.
see: How do I escape the wildcard/asterisk character in bash?

How to take databse dump of ABC database of table XYZ and only of organization_id "22"?

Lets Say I have database named ABC.
And tables X,Y,Z,M,N which has organization_id 0-50.
How is it possible to create a dump file for tables X,Y,Z with only organization_id=22 ?

You could go through table by table and do
mysqldump -uroot -p db_name table_name --where='id=22'
or you can use
SELECT * INTO OUTFILE 'data_path.sql' from table where id=22

using mysqldump tool you can do that , as you mentioned your database is ABC , and tables are X,Y,Z.hope this works for you.
mysqldump -u db_user -p ABC X Y Z --no_create_info \
--lock-all-tables --where 'organization_id \
in (SELECT X.organization_id FROM X \
JOIN Y ON X.organization_id = Y.organization_id \
JOIN Z ON Y.organization_id = Z.organization_id \
WHERE X.organization_id = 22)' > data.sql

Sqoop import query where ID between row_numbers does not work

Other questions have been asked/answered but none could point towards answering my question below:
I am importing just a limited range of rows (e.g., "where _ID between 107 and 307 ") from MySQL table to hdfs. I expect the query to work given that the MySQL query alone is valid, yet I get a MySQL syntax error. Alternatively I could import using the upper and lower limits and then merge files later, which long and I don't want to do that.
Here is the query:
sqoop import \
--connect jdbc:mysql://localhost/test \
--username=username \
--password=password \
--query 'select * from PURCHASE where purchase_id between 107 and 307 where $CONDITIONS' \
--target-dir /testpurchase \
--split-by purchase_id
Please, is there anything I am omitting here in the query, thanks?

I found out that there were two ambiguous WHERE clauses, so changing last WHERE $CONDITIONS to AND $CONDITIONS did the trick.
sqoop import \
--connect jdbc:mysql://localhost/test \
--username=username \
--password=password \
--query 'select * from PURCHASE WHERE purchase_id between 107 and 307 AND $CONDITIONS' \
--target-dir /testpurchase \
--split-by purchase_id
And that a --boundary-query & --table could just do the work as well
sqoop import \
--connect jdbc:mysql://localhost/test \
--username=username \
--password=password \
--boundary-query 'Select 107,307 from purchase' \
--table purchase\
--target-dir /testpurchase \

how to let flask-sqlalchemy's query object select columns ( or say fields )?

I met some problem that wired me the whole day. I'm using flask/flask-sqlalchemy/postgresql, and I want to do this:
published_sites = msg_published.query \
.filter( and_(*filter_clause) ) \
.group_by( msg_published.site_id ) \
.order_by( order_clause ) \
.paginate( page_no, per_page, error_out = False )
but in mysql, it is OK, and in postgresql it is wrong and ask for the other fields besides site_id either in a group by clause or in a aggregation function, I know that postgresql is stricter on SQL than mysql , so I must select the site_id in the query object of msg_published, but in pure sqlalchemy I can do like this:
published_sites = session.query( msg_published.site_id ) \
.filter( and_(*filter_clause) ) \
.group_by( msg_published.site_id ) \
.order_by( order_clause ) \
.paginate( page_no, per_page, error_out = False )
and in flask-sqlalchemy, how to get it work?

You're most of the way there. to do in PostgreSQL what MySQL allows requires a subselect.
published_sites_ids = session.query( msg_published.site_id ) \
.filter( and_(*filter_clause) ) \
.group_by( msg_published.site_id ) \
.order_by( order_clause ) \
.paginate( page_no, per_page, error_out = False )
published_sites = session.query(msg_published) \
.filter(msg_published.id.in_(published_sites_ids))

ERROR at line 1: Unknown command '\ '

When I run the SQL script to export data out of MySQL database, using command line, I get the above error.
The SQL query works fine when run in phpMyAdmin, but just when run from command line throws an error.
Here is the command line I am using:
cat my_export | mysql -uxyzuser -pabcpassword mydb > export072911.txt
The code in my_export is as follows:
SELECT CONCAT( custfirstname, ' ', custlastname ) AS fullname, custcompany, \
SPACE( 10 ) AS custtitle, custaddressone, custaddresstwo, custcity, custstate, \
custzip, SPACE( 10 ) AS dummy, custphone, SPACE( 10 ) AS custfax, custemail, \
event_id, SPACE( 10 ) AS ticket1, SPACE( 10 ) AS ticket2, \
SPACE( 10 ) AS ticket3, SPACE( 10 ) AS ticket4, orderdate, b.quantity, \
FROM order_master a \
LEFT JOIN order_detail b ON b.order_master_id = a.id \
LEFT JOIN customer c ON c.email = a.custemail \
WHERE a.orderdate > '2010-12-01'\
AND a.event_id = '30' \
AND a.orderstatus = 'O' \
AND b.litype = 'ITEM' \
AND b.reftag = 'PKG' \
ORDER BY a.orderdate DESC;

You can safely delete all the backslashes and use input redirection rather than piping. The backslashes are needed if you are working with the SQL as a shell variable, but not for piping or redirection.
mysql -uxyzuser -pabcpassword mydb < my_export > export072911.txt
UPDATE After a quick test of my own, it looks like the pipe works just as well as input redirection as long as the backslashes are removed.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

What is the purpose of $CONDITIONS under --query? - mysql

Related

Run mysql query inside bash script

How to take databse dump of ABC database of table XYZ and only of organization_id "22"?

Sqoop import query where ID between row_numbers does not work

how to let flask-sqlalchemy's query object select columns ( or say fields )?

ERROR at line 1: Unknown command '\ '

Categories

Resources