I'm in the middle of converting an old legacy PHP system to Flask + SQLAlchemy and was wondering how I would construct the following:
I have a model:
class Invoice(db.Model):
paidtodate = db.Column(DECIMAL(10,2))
fullinvoiceamount = db.Column(DECIMAL(10,2))
invoiceamount = db.Column(DECIMAL(10,2))
invoicetype = db.Column(db.String(10))
acis_cost = db.Column(DECIMAL(10,2))
The query I need to run is:
SELECT COUNT(*) AS the_count, sum(if(paidtodate>0,paidtodate,if(invoicetype='CPCN' or invoicetype='CPON' or invoicetype='CBCN' or invoicetype='CBON' or invoicetype='CPUB' or invoicetype='CPGU' or invoicetype='CPSO',invoiceamount,
fullinvoiceamount))) AS amount,
SUM(acis_cost) AS cost, (SUM(if(paidtodate>0,paidtodate,invoiceamount))-SUM(acis_cost)) AS profit FROM tblclientinvoices
Is there an SQLAlchemyish way to construct this query? - I've tried googling for Mysql IF statments with SQlAlchemy but drew blanks.
Many thanks!
Use func(documentation) to generate SQL function expression:
qry = select([
func.count().label("the_count"),
func.sum(func.IF(
Invoice.paidtodate>0,
Invoice.paidtodate,
# #note: I prefer using IN instead of multiple OR statements
func.IF(Invoice.invoicetype.in_(
("CPCN", "CPON", "CBCN", "CBON", "CPUB", "CPGU", "CPSO",)
),
Invoice.invoiceamount,
Invoice.fullinvoiceamount)
)
).label("amount"),
func.sum(Invoice.acis_cost).label("Cost"),
(func.sum(func.IF(
Invoice.paidtodate>0,
Invoice.paidtodate,
Invoice.invoiceamount
))
- func.sum(Invoice.acis_cost)
).label("Profit"),
],
)
rows = session.query(qry).all()
for row in rows:
print row
Related
I have a data frame in pyspark like below
df = spark.createDataFrame(
[
('2021-10-01','A',25),
('2021-10-02','B',24),
('2021-10-03','C',20),
('2021-10-04','D',21),
('2021-10-05','E',20),
('2021-10-06','F',22),
('2021-10-07','G',23),
('2021-10-08','H',24)],("RUN_DATE", "NAME", "VALUE"))
Now using this data frame I want to update a table in MySql
# query to run should be similar to this
update_query = "UPDATE DB.TABLE SET DATE = '2021-10-01', VALUE = 25 WHERE NAME = 'A'"
# mysql_conn is a function which I use to connect to `MySql` from `pyspark` and run queries
# Invoking the function
mysql_conn(host, user_name, password, update_query)
Now when I invoke the mysql_conn function by passing parameters the query runs successfully and the record gets updated in the MySql table.
Now I want to run the update statement for all the records in the data frame.
For each NAME it has to pick the RUN_DATE and VALUE and replace in update_query and trigger the mysql_conn.
I think we need to a for loop but not sure how to proceed.
Instead of iterating through the dataframe with a for loop, it would be better to distribute the workload across each partitions using foreachPartition. Moreover, since you are writing a custom query instead of executing one query for each query, it would be more efficient to execute a batch operation to reduce the round trips, latency and concurrent connections. Eg
def update_db(rows):
temp_table_query=""
for row in rows:
if len(temp_table_query) > 0:
temp_table_query = temp_table_query + " UNION ALL "
temp_table_query = temp_table_query + " SELECT '%s' as RUNDATE, '%s' as NAME, %d as VALUE " % (row.RUN_DATE,row.NAME,row.VALUE)
update_query="""
UPDATE DBTABLE
INNER JOIN (
%s
) new_records ON DBTABLE.NAME = new_records.NAME
SET
DBTABLE.DATE = new_records.RUNDATE,
DBTABLE.VALUE = new_records.VALUE
""" % (temp_table_query)
mysql_conn(host, user_name, password, update_query)
df.foreachPartition(update_db)
View Demo on how the UPDATE query works
Let me know if this works for you.
I have three tables
User
Device
Log
I want to filter the logs based on devices and logs. I'm using the following querying which iterates over the users and devices in order to get the logs. I feel this will become a performance hit. How to reduce the number of database hits?
for user_obj in User.objects.all():
device_qs = Device.objects.filter(user=user_obj)
if device_qs.exists():
for device_obj in device_qs:
log_count = Log.objects.filter(user=user_obj, device=device_obj, created_at__range(from_date, to_date)).count()
If you only need the log count per user and device (which is what you get from the code you posted), you can get that in just one query:
from django.db.models import Count
logs = (Log.objects
.filter(created_at__range = (from_date, to_date))
.values('user', 'device')
.annotate(log_count=Count('device'))
)
You can modify the query to include any attributes of the user and device models that you need:
.values('user__last_name', 'device__name') # etc.
You can also order the dataset by appending order_by() at the end to be able to iterate over it in the desired order:
.order_by('user__last_name', '-log_count')
What I would do is create a "proxy model" that references a view in your MySQL instance
The view would look like this:
SELECT
t1.*,
t2.*,
t3.*
FROM users t1
RIGHT JOIN device t2 (ON t1.id=t2.user_id)
RIGHT JOIN log t3 (ON t3.device_id=t2.id);
Now to create a proxy model, do this:
class SomeModel(models.Model):
# all fields from the 3 tables here
class Meta:
db_table = 'yourViewNameHere'
managed = False # this keeps django from creating the table
then python manage.py makemigrations + python manage.py migrate as usual
Now, to access the the data you need, you would do something like this:
from django.db import connection
sql = "SELECT * FROM your_view WHERE some_date_column > 'foo' AND some_date_column < 'bar' "
with connection.cursor() as cur:
cur.execute(sql)
data = cur.fetchall()
print(data)
Note that if you are passing parameters to the raw sql query, you should always pass them like this to avoid sql injection:
sql = "SELECT * FROM your_view WHERE some_date_column > %s AND some_date_column < %s"
params = ('foo', 'bar')
with connection.cursor() as cur:
cur.execute(sql, params)
data = cur.fetchall()
I'm currently using PHP and MySQL to retrieve a set of 100,000 records in a table, then iterate over each of those records to do some calculations and then insert the result into another table. I'm wondering if I'd be able to do this in pure SQL and make the query run faster.
Here's what I"m currently using:
$stmt= $pdo->query("
SELECT Well_Permit_Num
, Gas_Quantity
, Gas_Production_Days
FROM DEP_OG_Production_Import
ORDER
BY id ASC
");
foreach ($stmt as $row) {
$data = array('well_id' => $row['Well_Permit_Num'],
'gas_quantity' => $row['Gas_Quantity'],
'gas_days' => $row['Gas_Production_Days'],
'gas_average' => ($row['Gas_Production_Days']);
$updateTot = $pdo->prepare("INSERT INTO DEP_OG_TOTALS
(Well_Permit_Num,
Total_Gas,
Total_Gas_Days,
Total_Gas_Avg)
VALUES (:well_id,
:gas_quantity,
:gas_days,
:gas_average)
ON DUPLICATE KEY UPDATE
Total_Gas = Total_Gas + VALUES(Total_Gas),
Total_Gas_Days = Total_Gas_Days + VALUES(Total_Gas_Days),
Total_Gas_Avg =(Total_Gas + VALUES(Total_Gas)) / (Total_Gas_Days + VALUES(Total_Gas_Days))");
}
I'd like to see if this can be done in pure MySQL instead of having to use PHP just for the fact of using it to hold the variables.
My Result should be 1 record that is a running total for each Well. The source table may house 60-70 records for the same well, but over a few thousand different Wells.
It's a constant import process that has to be run, so it's not like there is a final table which you can just do SUM(Gas_Quantity)... etc.. on
As commented by Uueerdo, you seem to need an INSERT ... SELECT query. The role of such query is to INSERT insert the resultset returned by an inner SELECT. The inner select is an aggregate query that computes the total sum of gas and days for each well.
INSERT INTO DEP_OG_TOTALS (Well_Permit_Num, Total_Gas, Total_Gas_Days, Total_Gas_Avg)
SELECT
t.Well_Permit_Num,
SUM(t.Gas_Quantity) Total_Gas,
SUM(t.Gas_Production_Days) Total_Gas_Days
FROM DEP_OG_Production_Import t
GROUP BY t.Well_Permit_Num
ON DUPLICATE KEY UPDATE
Total_Gas = Total_Gas + t.Total_Gas,
Total_Gas_Days = Total_Gas_Days + t.Total_Gas_Days,
Total_Gas_Avg =(Total_Gas + t.Total_Gas) / (Total_Gas_Days + t.Total_Gas_Days)
I want to union two tables with where clause in zf2:-
table1 app_followers
table2 app_users
where condition could be anything
and order by updated_date.
Please let me know the query for zend 2.
Thanks..
Using UNION is ZF2:
Using ZF2 dedicated class Combine Zend\Db\Sql\Combine
new Combine(
[
$select1,
$select2,
$select3,
...
]
)
A detailed example which uses combine is as follows:
$select1 = $sql->select('java');
$select2 = $sql->select('dotnet');
$select1->combine($select2);
$select3 = $sql->select('android');
$selectall3 = $sql->select();
$selectall3->from(array('sel1and2' => $select1));
$selectall3->combine($select3);
$select4 = $sql->select('network');
$selectall4 = $sql->select();
$selectall4->from(array('sel1and2and3' => $selectall3));
$selectall4->combine($select4);
$select5 = $sql->select('dmining');
$selectall5 = $sql->select();
$selectall5->from(array('sel1and2and3and4' => $selectall4));
$selectall5->combine($select5);
which is equivalent to the normal SQL query for UNION:
SELECT * FROM java
UNION SELECT * from dotnet
UNION SELECT * from android
UNION SELECT * from network;
UNION SELECT * from dmining;
I hope it helps.
I wanted to do a similar task and spent a lot of time while to figure out how to do that in the right way.
The idea with Laminas\Db\Sql\Combine is really well but you cannot apply the ordering to this object and as the result, it's useless in this case.
Finally, I ended up with the next code:
$skill = $sql->select('skill');
$language = $sql->select('language');
$location = $sql->select('location');
$occupation = $sql->select('occupation');
$skill->combine($language);
$language->combine($location);
$location->combine($occupation);
$combined = (new Laminas\Db\Sql\Select())
->from(['sub' => $skill])
->order(['updated_date ASC']);
However, it's a bit messy with parentheses. If it's a matter for you, please check this comment on Github, but on MySQL id doesn't matter, not sure about other databases.
I want to select a bunch distinct records based off a composite key. In SQL I'd write something like this:
SELECT * FROM security WHERE (
exchange_code = 'exchange_code_1' AND code = 'code_1')
OR (exchange_code = 'exchange_code_2' AND code = 'code_2')
...
OR (exchange_code = 'exchange_code_N' AND code = 'code_N')
)
With SQLAlchemy I'd like to use the filter clause like:
query = sess.query(Security)
[query.filter(
and_(Security.exchange_code == security.exchange_code,
Security.code == security.code)
) for security in securities]
result = query.all()
The problem is filter and where join clauses with an AND not an OR... is there some way to use filter with OR?
Or is my only choice to generate a bunch of individual select's and UNION them? Something like:
first = exchanges.pop()
query = reduce(lambda query, exchange: query.union(exchange.pk_query),
first.pk_query())
query.all()
Use or_:
query = sess.query(Security).filter(
or_(*(and_(Security.exchange_code == security.exchange_code,
Security.code == security.code)
for security in securities)))
If your database supports it, you should use tuple_ instead.