Is this select subquery avoidable? - mysql

I have two tables (Invoices and taxes) in mysql:
Invoices:
- id
- account_id
- issued_at
- total
- gross_amount
- country
Taxes:
- id
- invoice_id
- tax_name
- tax_rate
- taxable_amount
- tax_amount
I'm trying to retrive a report like this
rep_month | country | total_amount | tax_name | tax_rate(%) | taxable_amount | tax_amount
--------------------------------------------------------------------------------------
2017-01-01 | ES | 1000 | TAX1 | 21 | 700 | 147
2017-01-01 | ES | 1000 | TAX2 | -15 | 700 | 105
2016-12-01 | FR | 100 | TAX4 | 20 | 30 | 6
2016-12-01 | FR | 100 | B2B | 0 | 70 | 0
2017-01-01 | GB | 2500 | TAX3 | 20 | 1000 | 200
The idea behind this is that an invoice has a has_many relation with taxes. So an invoice can have or not taxes. The report should show the total amount collected (total_amount) for a given country (regardess if it includes taxes)
and indicate which part of that total amount is taxable (taxable_amount) for an specific tax.
My current approach is this one:
SELECT
DATE_FORMAT(invoices.issued_at, '%Y-%m-01') AS rep_month,
invoices.country AS country
( SELECT sum(docs.gross_amount)
FROM invoices AS docs
WHERE docs.country = invoices.country
AND DATE_FORMAT(docs.issue_date, '%Y-%m-01') = rep_month
) AS total_amount,
taxes.tax_name AS tax_name,
taxes.tax_rate AS tax_rate,
SUM(taxes.taxable_amount) AS taxable_amount,
SUM(taxes.tax_amount) AS tax_amount
FROM invoices
JOIN taxes ON invoices.id = taxes.document_id
AND documents.issue_date BETWEEN '2016-01-01' AND '2017-12-31'
GROUP BY account_id, rep_month, country, tax_name, tax_rate
ORDER BY country desc
Well, this works but for a real dataset (thousands of records) it's really slow as the select subquery for retrieving the total_amount is being run for each row of the report.
I cannot make a LEFT JOIN taxes with a direct SUM(gross_amount) as the GROUP BY groups by tax name and rate and I need to show the total collected per country regardless if the amount was taxed or not. Is there a faster alternative to this?

I do not know the exact use case of using this query but the issue is the way with which you're trying to structure the DB, you're trying to get the entire data in one go.
Ideally, you should run the query you have and store in a different table (summary table) and then query directly from the summary table whenever you want. And if you have a new entry in the Invoices table then you can use the query to run either on every entry or periodically update the summary table via a cronjob.

Related

How can I calculate prices based on currency table in one select?

I have a table of invoices that can be in multiple currencies that looks like this:
| id | issue_date | total | currency |
|----|------------|-------|----------|
| 1 | 2020-04-20 | 1234 | EUR |
| 2 | 2020-04-26 | 2345 | USD |
| 1 | 2020-04-27 | 9876 | EUR |
| 3 | 2020-04-28 | 3456 | RON |
And i have a currency table that holds currency exchange rates that looks like this:
| id | date | currency_id | rate |
|----|------------|-------------|---------|
| 1 | 2020-04-20 | EUR | 1 |
| 2 | 2020-04-20 | USD | 1.08600 |
| 3 | 2020-04-20 | RON | 4.83560 |
What I would like to achieve is to calculate each invoice price based on its issue_date, currency and a target currency.
All currency exchange rates are based on EUR so its value will always be 1. Currencies are updated daily but there are dates missing (during weekend exchange rates don't update) so calculation needs to be based on most recent exchange rate until invoice.issue_date
So what I tried was this:
SELECT
`i`.`id`,
`i`.`total`,
`i`.`currency`,
`exr1`.`rate` as `invoice_rate`,
`exr2`.`rate` AS `target_rate`,
`i`.`total` * `exr1`.`rate` as `euro_price`,
`i`.`total` * `exr1`.`rate` / `exr2`.`rate` AS `target_price`
FROM `invoices` as `i`
LEFT JOIN `exchange_rates` AS `exr1`
ON
`exr1`.`currency_id` = `i`.`currency` AND
`exr1`.`date` = `i`.`issue_date`
LEFT JOIN `exchange_rates` as `exr2`
ON
`exr2`.`currency_id` = 'RON' AND
`exr2`.`date` = `i`.`issue_date`
GROUP BY
`i`.`id`,
`invoice_rate`,
`target_rate`
ORDER BY `i`.`issue_date` DESC
Problem nr. 1
Because there are no exhange rates for the exact invoice dates I get null values. I tried changing the LEFT JOIN ON to something like exr1.date <= i.issue_date but GROUP BY invoice doesn't work anymore (i get duplicates).
Problem nr. 2
For rows that have exchange rates on that exact day I get wrong values because based on the target currency I need to either multiply or divide:
i.total * exr1.rate * exr2.rate AS usd_price vs i.total * exr1.rate / exr2.rate AS usd_price
https://www.db-fiddle.com/f/e5GnVnry5sAiXwbuScV6JT/19
This is a (rare) case where a dependent subquery is the way to go. Here's the overall query (https://www.db-fiddle.com/f/e5GnVnry5sAiXwbuScV6JT/21)
SELECT id,
total,
currency,
rate,
total / rate euro_price
FROM ( SELECT i.id,
i.total,
i.currency,
(SELECT e.rate
FROM exchange_rates e
WHERE e.currency_id = i.currency
AND e.date <= i.issue_date
ORDER BY e.date DESC
LIMIT 1) rate
FROM invoices i
) d
The dependent subquery is this:
SELECT e.rate
FROM exchange_rates e
WHERE e.currency_id = i.currency
AND e.date <= i.issue_date
ORDER BY e.date DESC
LIMIT 1
It finds the exchange rate for the most recent date equal to or before the issue_date. It's called dependent because it refers to column values in its outer query.
This isn't going to be fast. A covering index on exchange_rates(currency_id, date DESC, rate) will help. Like this.
CREATE INDEX lookup ON exchange_rates(currency_id, date DESC, rate);
I used a nested query so the outer query can simply refer to rate as a column when it needs to, rather than repeating the dependent subquery.
Also note I think you want to divide, not multiply, when computing your euro_price.
I left the second rate lookup to you.
**Pro tip* Only use the backtick marks when your table or column name is a reserved word in the query language. Your queries are MUCH easier to read without them.

MySQL - Difference Between First and Last Record in Group

I have a table called updates which has the distance of a vehicle at the captured_at date. Using MySQL, How can I get the SUM of differences between the first captured update and the latest captured update per vehicle.
updates table:
id | vehicle_id | distance | captured_at
1 | 1 | 100 | 2018-02-10
2 | 1 | 50 | 2018-02-05
3 | 1 | 75 | 2018-02-07
4 | 2 | 200 | 2018-02-07
5 | 2 | 300 | 2018-02-09
The result I'm expecting is:
(100-50) + (300-200) = 150
One thing to keep in mind is that a bigger ID does not necessarily mean that it's the latest update as you can see in the example above.
(Comment: naming your tables with reserved words is a bad idea)
Getting the smallest and largest values is trivial:
SELECT vehicle_id, MAX(distance) - MIN(distance)
FROM `updates`
GROUP BY vehicle_id;
Adding these values is trivial when you know that a SELECT query can be used n place of a table - but you also need to create aliases for the aggregated attributes:
SELECT SUM(diff)
FROM (
SELECT vehicle_id, MAX(distance) - MIN(distance) AS diff
FROM `updates`
GROUP BY vehicle_id
) AS src

Query SUM mysql with if condition

I have some table in mysql database like this.
id | product | warehouse | price | date_shipping
------------------------------------------------
1 | Salt | 15 | 300 | 2017-03-08
2 | Salt | 15 | 300 | 2017-03-09
I want to SUM that column price with several condition. This is the condition.
From the product salt if the warehouse is have same id I don't want to SUM price value.
This is my second condition.
id | product | warehouse | price | date_shipping
------------------------------------------------
1 | Salt | 15 | 300 | 2017-03-08
2 | Salt | 18 | 300 | 2017-03-09
From the product salt if the warehouse is have different id I want to SUM price value.
This is the result what I want from the query.
From first condition :
salt | 15 | 300
From second condition :
salt | 15,18 | 600
This is query what I have doing.
SELECT product, GROUP_CONCAT(warehouse SEPARATOR ',') as warehouse, SUM(price)
FROM db_product
Somebody can help me with this ? Thank you.
Your problem requires two aggregations. The first one identifies, for each product in a warehouse, the record having the earliest shipping date. Then, a second aggregation is needed to group concatenate the warehouses for each product.
SELECT t.product,
GROUP_CONCAT(t.warehouse) AS warehouses,
SUM(t.price) AS total
FROM
(
SELECT t1.product,
t1.warehouse,
t1.price
FROM db_product t1
INNER JOIN
(
SELECT product, warehouse, MIN(date_shipping) AS min_date_shipping
FROM db_product
GROUP BY product, warehouse
) t2
ON t1.product = t2.product AND
t1.warehouse = t2.warehouse AND
t1.date_shipping = t2.min_date_shipping
) t
GROUP BY t.product
Output:
Demo here:
Rextester

Getting COUNT of Specific set of Customers - MySQL Query - Is there a faster way?

I am trying to do a custom report right now. It involves running this query over 50 times for different date conditions.
Anyway, this report revolves around two tables:
agreement
(a list of customer promised to pay - tied to customer table by customer.id = agreement.customer_id)
|----|-------------|---------------------|--------|----------|
| id | customer_id | entered_timestamp | amount | campaign |
|----|-------------|---------------------|--------|----------|
| 1 | 123 | 2015-12-22 13:12:00 | 30 | 'xyz' |
|----|-------------|---------------------|--------|----------|
| 2 | 400 | 2015-12-22 13:15:00 | 20 | 'abc' |
|----|-------------|---------------------|--------|----------|
previous_customer_ids
(a list of customer ids that have at least one paid agreement - tied to customer table by customer.id = previous_customer_ids.customer_id)
|----|-------------|
| id | customer_id |
|----|-------------|
| 1 | 123 |
|----|-------------|
I am trying to get a count of all unique customer_ids whose most recent agreement was in jan or july for a certain campaign and also exist in previous_customer_ids.
I was able to figure out how to get a list of each customer's most recent agreement who exists in previous_customer_ids, and get a count of that number of customers.
However, the query takes 35 seconds to run. I have to run it 60 times over each time this report is pulled (using php to display the results).
select count(t1.customer_id)
from agreement t1
inner join (
select customer_id, max(entered_timestamp) as latestOrder
from agreement
where campaign = 'vsf'
group by customer_id
) t2
inner join previous_customer_ids pcids
on t1.customer_id = pcids.customer_id
where t1.customer_id = t2.customer_id
AND t1.entered_timestamp= t2.latestOrder
AND (substr(t1.entered_timestamp,6,2) = '01'
OR substr(t1.entered_timestamp,6,2) = '07')
How to optimize this?

MySQL - count multiple tables by day

I have various tables that track stuff being created within the system, sales, customer accounts, etc, and they all have created times on them. I can summarize any one of these on a per day basis with the following query:
select date(created_time), count(*) from customers group by date(created_time)
Which produces output like:
+--------------------+----------+
| date(created_time) | count(*) |
+--------------------+----------+
| 2012-10-12 | 15 |
| 2012-10-13 | 4 |
That gets the job done although it does skip over days when nothing happened.
However what I'd like to do is generate the same thing for multiple tables at once, producing something like:
+--------------------+--------------+------------------+
| date(created_time) | count(sales) | count(customers) |
+--------------------+--------------+------------------+
| 2012-10-12 | 15 | 1 |
| 2012-10-13 | 4 | 3 |
I could run the query separately for each table and join them by hand, but the skipping 0 days makes that join difficult.
Is there a way I can do this in a single mysql query?
Try this:
SELECT created_time, SUM(customers), SUM(sales)
FROM (SELECT DATE(created_time) created_time, COUNT(*) customers, 0 sales
FROM customers
GROUP BY created_time
UNION
SELECT DATE(created_time) created_time, 0 customers, COUNT(*) sales
FROM sales
GROUP BY created_time
) as A
GROUP BY created_time;