Optimize subquery in SELECT - mysql

My table schema is as follow:
Indexes:
products.id PRIMARY KEY
products.description UNIQUE
expenses.id PRIMARY KEY
expenses.product_id FOREIGN KEY to product.id
My goal is to load
Cost of each product of current month (AS costs_november)
Cost of each product of last month (AS costs_october)
Change in costs of current month compared to last (current month costs - last month costs) (AS costs)
Percentage change of current month costs compared to last (last month costs * 100 / current month costs) (AS percent_diff)
I've managed to code SQL that does exactly that:
SELECT description, (SUM(cost) - IFNULL(
(
SELECT SUM(cost)
FROM expenses
WHERE month = 9 AND year = 2019 AND product_id = e.product_id
GROUP BY product_id
), 0)) AS costs,
SUM(cost) * 100 /
(
SELECT SUM(cost)
FROM expenses
WHERE month = 9 AND year = 2019 AND product_id = e.product_id
GROUP BY product_id
) AS percent_diff,
SUM(cost) AS costs_october,
IFNULL(
(
SELECT SUM(cost)
FROM expenses
WHERE month = 9 AND year = 2019 AND product_id = e.product_id
GROUP BY product_id
), 0) AS costs_september
FROM expenses e
JOIN products p ON (e.product_id = p.id)
WHERE month = 10 AND year = 2019
GROUP BY product_id
ORDER BY product_id;
But is copy-pasting the same subquery three times really the solution? In theory it requires to run four queries per product. Is there a more elegant way?
Appreciate for any help!

I would address this with conditional aggregation:
select
p.description,
sum(case when e.month = 11 then e.cost else 0 end) costs_november,
sum(case when e.month = 10 then e.cost else 0 end) costs_october,
sum(case when e.month = 11 then e.cost else -1 * e.cost end) costs,
sum(case when e.month = 10 then e.cost else 0 end)
* 100
/ nullif(
sum(case when e.month = 11 then e.cost else 0 end),
0
) percent_diff
from expenses e
inner join products p on p.id = e.product_id
where e.year = 2019 and e.month in (10, 11)
goup by e.product_id
You can avoid repeating the same conditional sums by using a subquery (your RDBMS would probably optimize it anyway, but this tends to make the query more readable):
select
description,
costs_november,
costs_october,
costs_november - costs_october costs,
costs_october * 100 / nullif(costs_november, 0) percent_diff
from (
select
p.description,
sum(case when e.month = 11 then e.cost else 0 end) costs_november,
sum(case when e.month = 10 then e.cost else 0 end) costs_october
from expenses e
inner join products p on p.id = e.product_id
where e.year = 2019 and e.month in (10, 11)
goup by e.product_id
) t

You can calculate for all months and all products at one time:
SELECT year, month,
SUM(costs) as curr_month_costs,
LAG(SUM(costs)) OVER (ORDER BY year, month) as prev_month_costs,
(SUM(costs) -
LAG(SUM(costs)) OVER (ORDER BY year, month)
) as diff,
LAG(SUM(costs)) OVER (ORDER BY year, month) * 100 / SUM(costs)
FROM expenses e JOIN
products p
ON e.product_id = p.id
GROUP BY product_id, year, month
ORDER BY year, month, product_id;
You can use a subquery if you want to select only the current month.

Related

SQL: SELECT AS multiple value with the same FROM with different WHERE

So i have this code:
SELECT a.total_sales AS July, b.total_sales AS August, c.total_sales AS September
FROM
(SELECT EXTRACT(month FROM delivered_at) AS month, ROUND(SUM (sale_price),2) AS total_sales
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE status = 'Complete' AND delivered_at BETWEEN "2022-01-01" AND "2022-10-01"
GROUP BY month
ORDER BY month) a,
(SELECT EXTRACT(month FROM delivered_at) AS month, ROUND(SUM (sale_price),2) AS total_sales
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE status = 'Complete' AND delivered_at BETWEEN "2022-01-01" AND "2022-10-01"
GROUP BY month
ORDER BY month) b,
(SELECT EXTRACT(month FROM delivered_at) AS month, ROUND(SUM (sale_price),2) AS total_sales
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE status = 'Complete' AND delivered_at BETWEEN "2022-01-01" AND "2022-10-01"
GROUP BY month
ORDER BY month) c
WHERE a.month = 7 AND b.month = 8 AND c.month = 9
I got the result that i wanted, which is this:
Row July August September
1 148622.29 169310.62 209339.57
Is there any simpler ways to do this?
We can reduce 3 subquerys into 1 subquery
SELECT
SUM(IF(t.month=7,t.total_sales,0)) AS July,
SUM(IF(t.month=8,t.total_sales,0)) AS August,
SUM(IF(t.month=9,t.total_sales,0)) AS September
FROM
(
SELECT EXTRACT(month FROM delivered_at) AS month, ROUND(SUM (sale_price),2) AS total_sales
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE status = 'Complete' AND delivered_at BETWEEN "2022-01-01" AND "2022-10-01"
AND month in(7,8,9)
GROUP BY month
) t

Optimise MySQL - JOIN vs Nested query

I have been trying to optimise some SQL queries based on the assumption that Joining tables is more efficient than nesting queries. I am joining the same table multiple times to perform a different analysis on the data.
I have 2 tables:
transactions:
id | date_add | merchant_ id | transaction_type | amount
1 1488733332 108 add 20.00
2 1488733550 108 remove 5.00
and a calendar table which just lists dates so that I can create empty records where there are no transactions on particular days:
calendar:
id | datefield
1 2017-03-01
2 2017-03-02
3 2017-03-03
4 2017-03-04
I have many thousands of rows in the transactions table, and I'm trying to get an annual summary of total and different types of transactions per month (i.e 12 rows in total), where
transactions = sum of all "amount"s,
additions = sum of all "amounts" where transaction_type = "add"
redemptions = sum of all "amounts" where transaction_type = "remove"
result:
month | transactions | additions | redemptions
Jan 15 12 3
Feb 20 15 5
...
My initial query looks like this:
SELECT COALESCE(tr.transactions, 0) AS transactions,
COALESCE(ad.additions, 0) AS additions,
COALESCE(re.redemptions, 0) AS redemptions,
calendar.date
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date FROM calendar WHERE datefield LIKE '2017-%' GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar
LEFT JOIN (SELECT COUNT(transaction_type) as transactions, from_unixtime(date_add, '%b %Y') as date_t FROM transactions WHERE merchant_id = 108 GROUP BY from_unixtime(date_add, '%b %Y')) AS tr
ON calendar.date = tr.date_t
LEFT JOIN (SELECT COUNT(transaction_type = 'add') as additions, from_unixtime(date_add, '%b %Y') as date_a FROM transactions WHERE merchant_id = 108 AND transaction_type = 'add' GROUP BY from_unixtime(date_add, '%b %Y')) AS ad
ON calendar.date = ad.date_a
LEFT JOIN (SELECT COUNT(transaction_type = 'remove') as redemptions, from_unixtime(date_add, '%b %Y') as date_r FROM transactions WHERE merchant_id = 108 AND transaction_type = 'remove' GROUP BY from_unixtime(date_add, '%b %Y')) AS re
ON calendar.date = re.date_r
I tried optimising and cleaning it up a little, removing the nested statements and came up with this:
SELECT
DATE_FORMAT(cal.datefield, '%b %d') as date,
IFNULL(count(ct.amount),0) as transactions,
IFNULL(count(a.amount),0) as additions,
IFNULL(count(r.amount),0) as redeptions
FROM calendar as cal
LEFT JOIN transactions as ct ON cal.datefield = date(from_unixtime(ct.date_add)) && ct.merchant_id = 108
LEFT JOIN transactions as r ON r.id = ct.id && r.transaction_type = 'remove'
LEFT JOIN transactions as a ON a.id = ct.id && a.transaction_type = 'add'
WHERE cal.datefield like '2017-%'
GROUP BY month(cal.datefield)
I was surprised to see that the revised statement was about 20x slower than the original with my dataset. Have I missed some sort of logic? Is there a better way to achieve the same result with a more streamlined query, given I am joining the same table multiple times?
EDIT:
So to further explain the results I'm looking for - I'd like a single row for each month of the year (12 rows) each with a column for the total transactions, total additions, and total redemptions in each month.
The first query I was getting a result in about 0.5 sec but with the second I was getting results in 9.5sec.
Looking to your query You could use a single left join with case when
SELECT COALESCE(t.transactions, 0) AS transactions,
COALESCE(t.additions, 0) AS additions,
COALESCE(t.redemptions, 0) AS redemptions,
calendar.date
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date
FROM calendar
WHERE datefield LIKE '2017-%'
GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar
LEFT JOIN
( select
COUNT(transaction_type) as transactions
, sum( case when transaction_type = 'add' then 1 else 0 end ) as additions
, sum( case when transaction_type = 'remove' then 1 else 0 end ) as redemptions
, from_unixtime(date_add, '%b %Y') as date_t
FROM transactions
WHERE merchant_id = 108
GROUP BY from_unixtime(date_add, '%b %Y' ) t ON calendar.date = t.date_t
First I would create a derived table with timestamp ranges for every month from your calendar table. This way a join with the transactions table will be efficient if date_add is indexed.
select month(c.datefield) as month,
unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield)
Join it with the transaactions table and use conditional aggregations to get your data:
select c.month,
sum(t.amount) as transactions,
sum(case when t.transaction_type = 'add' then t.amount else 0 end) as additions,
sum(case when t.transaction_type = 'remove' then t.amount else 0 end) as redemptions
from (
select month(c.datefield) as m, date_format(c.datefield, '%b') as `month`
unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield), date_format(c.datefield, '%b')
) c
left join transactions t on t.date_add between c.ts_from and c.ts_to
where t.merchant_id = 108
group by c.m, c.month
order by c.m

SQL Query to find rows that didn't occur this month

I am trying to find the number of sellers that made a sale last month but didn't make a sale this month.
I have a query that works but I don't think its efficient and I haven't figured out how to do this for all months.
SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = 12
AND YEAR(date) = 2015
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN
(
SELECT distinct user_id
FROM transactions
WHERE MONTH(date) = 1
AND YEAR(date) = 2016
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
)
The structure of the table is:
+---------+------------+-------------+--------+
| user_id | date | status | amount |
+---------+------------+-------------+--------+
| 1 | 2016-01-01 | 'COMPLETED' | 1.00 |
| 2 | 2015-12-01 | 'COMPLETED' | 1.00 |
| 3 | 2015-12-01 | 'COMPLETED' | 2.00 |
| 1 | 2015-12-01 | 'COMPLETED' | 3.00 |
+---------+------------+-------------+--------+
So in this case, users with ID 2 and 3, didn't make a sale this month.
Use conditional aggregation:
SELECT count(*) as users
FROM
(
SELECT user_id
FROM transactions
-- 1st of previous month
WHERE date BETWEEN SUBDATE(SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1), interval 1 month)
-- end of current month
AND LAST_DAY(CURRENT_DATE)
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
GROUP BY user_id
-- any row from previous month
HAVING MAX(CASE WHEN date < SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
THEN date
END) IS NOT NULL
-- no row in current month
AND MAX(CASE WHEN date >= SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
THEN date
END) IS NULL
) AS dt
SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1) = first day of current month
SUBDATE(first day of current month, interval 1 month) = first day of previous month
LAST_DAY(CURRENT_DATE) = end of current month
if you want to generify it, you can use curdate() to get current month, and DATE_SUB(curdate(), INTERVAL 1 MONTH) to get last month (you will need to do some if clause for January/December though):
SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = MONTH(DATE_SUB(curdate(), INTERVAL 1 MONTH))
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN
(
SELECT distinct user_id
FROM transactions
WHERE MONTH(date) = MONTH(curdate())
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
)
as far as efficiency goes, I don't see a problem with this one
The following should be pretty efficient. In order to make it even more so, you would need to provide the table definition and and the EXPLAIN.
SELECT COUNT(DISTINCT user_id) users
FROM transactions t
LEFT
JOIN transactions x
ON x.user_id = t.user_id
AND x.date BETWEEN '2016-01-01' AND '2016-01-31'
AND x.status = 'COMPLETED'
AND x.amount > 0
WHERE t.date BETWEEN '2015-12-01' AND '2015-12-31'
AND t.status = 'COMPLETED'
AND t.amount > 0
AND x.user_id IS NULL;
Just some input for thought:
You could create aggregated lists of user-IDs per month, representing all the unique buyers in that month. In your application, you would then simply have to subtract the two months in question in order to get all user-IDs that have only made a sale in one of the two months.
See below for query- and post-processing-examples.
In order to make your query efficient, I would recommend at least a 2-column index for table transactions on [status, amount]. However, in order to prevent the query from having to look up data in the actual table, you could even create a 4-column index [status, amount, date, user_id], which should further improve the performance of your query.
Postgres (v9.0+, tested)
SELECT (DATE_PART('year', t.date) || '-' || DATE_PART('month', t.date)) AS d,
STRING_AGG( DISTINCT t.user_id::TEXT, ',' ) AS buyers
FROM transactions t
WHERE t.status = 'COMPLETED'
AND t.amount > 0
GROUP BY DATE_PART('year', t.date),
DATE_PART('month', t.date)
ORDER BY DATE_PART('year', t.date),
DATE_PART('month', t.date)
;
MySQL (not tested)
SELECT (YEAR(t.date) || '-' || MONTH(t.date)) AS d,
GROUP_CONCAT( DISTINCT t.user_id ) AS buyers
FROM transactions t
WHERE t.status = 'COMPLETED'
AND t.amount > 0
GROUP BY YEAR(t.date), MONTH(t.date)
ORDER BY YEAR(t.date), MONTH(t.date)
;
Ruby (example for post-processing)
db_result = ActiveRecord::Base.connection_pool.with_connection { |con| con.execute( db_query ) }
unique_buyers = db_result.map{|e|[e['d'],e['buyers'].split(',')]}.to_h
buyers_dec15_but_not_jan16 = unique_buyers['2015-12'] - unique_buyers['2016-1']
buyers_nov15_but_not_dec16 = unique_buyers['2015-11']||[] - unique_buyers['2015-12']
...(and so on)...

showing previous and current month data in table using mysql

I am trying to show three different figures of the same column In a mysql query, I would like to keep one month static: April, so it would be a case like this I want to show The current month, the previous month and the static month of the year I'm working with, in this case let us stick with 2012
Example
Tablename:payment
id , pay_date, amount
1 2012-02-12 1000
2 2012-03-11 780
3 2012-04-15 890
4 2012-05-12 1200
5 2012-06-12 1890
6 2012-07-12 1350
7 2012-08-12 1450
So what I want to do is show the column amount for the month of April as I said I want to keep that row static: 890, the current month lets say the current month is August:1450 and the previous month amount which would be July:1350: so the final result would be something like this:
april_amount current_month_amount previous_month_amount
890 1450 1350
However I'm stuck here:
select amount as april_amount
from payment
where monthname(pay_date) LIKE 'April'
and year(pay_date) LIKE 2012
I hope the question is written clear enough, and thanks alot for the help much appreciated.
If the results can be rows instead of columns:
SELECT MONTHNAME(pay_date), amount FROM payment
WHERE pay_date BETWEEN '2012-04-01'
AND '2012-04-30'
OR pay_date BETWEEN CURRENT_DATE
- INTERVAL DAYOFMONTH(CURRENT_DATE) - 1 DAY
AND LAST_DAY(CURRENT_DATE)
OR pay_date BETWEEN CURRENT_DATE
- INTERVAL DAYOFMONTH(CURRENT_DATE) - 1 DAY
- INTERVAL 1 MONTH
AND LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH)
See it on sqlfiddle.
I might be way off here. But try:
select top 1
p.amount, c.amount, n.amount
from payment c
inner join payment p ON p.pay_date < c.pay_date
inner join payment n ON n.pay_date > c.pay_date
where monthname(c.paydate) LIKE 'April'
and year(c.pay_date) LIKE 2012
order by p.pay_date DESC, n.pay_date ASC
EDIT, I didnt read your question properly. I was going for previous, current, and next month. 1 minute and I'll try again.
select top 1
p.amount AS april_amount, c.amount AS current_month_amount, n.amount AS previous_month_amount
from payment c
inner join payment p ON monthname(p.pay_date) = 'April' AND year(p.pay_date) = 2012
inner join payment n ON n.pay_date > c.pay_date
where monthname(c.paydate) = monthname(curdate())
and year(c.pay_date) = year(curdate())
order by n.pay_date ASC
This assumes there is only 1 entry per month.
Ok, so i haven't written in mysql for a while. here is what worked for your example data:
select
p.amount AS april_amount, c.amount AS current_month_amount, n.amount AS previous_month_amount
from payment AS c
inner join payment AS p ON monthname(p.pay_date) LIKE 'April' AND year(p.pay_date) LIKE 2012
inner join payment AS n ON n.pay_date < c.pay_date
where monthname(c.pay_date) LIKE monthname(curdate())
and year(c.pay_date) LIKE year(curdate())
order by n.pay_date DESC
limit 1
the previous month table joined is counterintuitively named n, but this works. I verified it in a WAMP install.
To handle aggregates per month you can use subselects. Performance may suffer on very large tables (millions of rows or more).
SELECT SUM( a.amount ) AS april_amount,
(
SELECT SUM( c.amount )
FROM payment c
WHERE MONTH( c.pay_date ) = MONTH( CURDATE( ) )
) AS current_month_amount,
(
SELECT SUM( p.amount )
FROM payment p
WHERE MONTH( p.pay_date ) = MONTH( CURDATE( ) - INTERVAL 1
MONTH )
) AS previous_month_amount
FROM payment a
WHERE MONTHNAME( a.pay_date ) = 'April'
AND YEAR( a.pay_date ) =2012

SQL query with subqueries performing terribly

I have this quite long query that should give me some information about shipments, and it works, but it's performing terribly bad. It takes about 4500ms to load.
SELECT
DATE(paid_at) AS day,
COUNT(*) as order_count,
(
SELECT COUNT(*) FROM line_items
WHERE order_id IN (SELECT id from orders WHERE DATE(paid_at) = day)
) as product_count,
(
SELECT COUNT(*) FROM orders
WHERE shipping_method = 'colissimo'
AND DATE(paid_at) = day
AND state IN ('paid','shipped','completed')
) as orders_co,
(
SELECT COUNT(*) FROM orders
WHERE shipping_method = 'colissimo'
AND DATE(paid_at) = day
AND state IN ('paid','shipped','completed')
AND paid_amount < 70
) as co_less_70,
(
SELECT COUNT(*) FROM orders
WHERE shipping_method = 'colissimo'
AND DATE(paid_at) = day
AND state IN ('paid','shipped','completed')
AND paid_amount >= 70
) as co_plus_70,
(
SELECT COUNT(*) FROM orders
WHERE shipping_method = 'mondial_relais'
AND DATE(paid_at) = day
AND state IN ('paid','shipped','completed')
) as orders_mr,
(
SELECT COUNT(*) FROM orders
WHERE shipping_method = 'mondial_relais'
AND DATE(paid_at) = day
AND state IN ('paid','shipped','completed')
AND paid_amount < 70
) as mr_less_70,
(
SELECT COUNT(*) FROM orders
WHERE shipping_method = 'mondial_relais'
AND DATE(paid_at) = day
AND state IN ('paid','shipped','completed')
AND paid_amount >= 70
) as mr_plus_70
FROM orders
WHERE MONTH(paid_at) = 11
AND YEAR(paid_at) = 2011
AND state IN ('paid','shipped','completed')
GROUP BY day;
Any idea what I could be doing wrong or what I could be doing better? I have other queries of similar length that don't take as much time to load as this. I thought this would be faster than for example having an individual query for each day (in my programming instead of the SQL query).
It is because you are using sub-queries where you don't need them.
As a general rule, where you have a sub-query within a main SELECT clause, that sub-query will query the tables within it once for each row in the main SELECT clause - so if you have 7 subqueries and are selecting a date range of 30 days, you will effectively be running 210 separate subqueries (plus your main query).
(Some query optimisers can resolve sub-queries into the main query under some circumstances, but as a general rule you can't rely on this.)
In this case, you don't need any of the orders sub-queries, because all the orders data you require is included in the main query - so you can rewrite this as:
SELECT
DATE(paid_at) AS day,
COUNT(*) as order_count,
(
SELECT COUNT(*) FROM line_items
WHERE order_id IN (SELECT id from orders WHERE DATE(paid_at) = day)
) as product_count,
sum(case when shipping_method = 'colissimo' then 1 end) as orders_co,
sum(case when shipping_method = 'colissimo' AND
paid_amount < 70 then 1 end) as co_less_70,
sum(case when shipping_method = 'colissimo' AND
paid_amount >= 70 then 1 end) as co_plus_70,
sum(case when shipping_method = 'mondial_relais' then 1 end) as orders_mr,
sum(case when shipping_method = 'mondial_relais' AND
paid_amount < 70 then 1 end) as mr_less_70,
sum(case when shipping_method = 'mondial_relais' AND
paid_amount >= 70 then 1 end) as mr_plus_70
FROM orders
WHERE MONTH(paid_at) = 11
AND YEAR(paid_at) = 2011
AND state IN ('paid','shipped','completed')
GROUP BY day;
The problem in your query is that scans the same table over and over. All scans (selects in your case) of ORDER table can be transformed to multiple SUM+CASE or COUNT+CASE as in SQL query with count and case statement.