I have two tables that track sales:
orders order_line_items
------ ----------------
id id
customer_id order_id
created_datetime item_id
quantity
was_paid_for
An order can have many order_line_items. For some orders, all of the line items have been paid for. For other orders, they have not all been paid for.
I am trying to fetch a list of all the orders for a specific customer, and indicate if the order was fully paid for, or not. I have it working with this query:
SELECT o.id,
(SELECT count(*) from order_line_items WHERE order_id = o.id AND was_paid_for = 0) = 0 as isFullyPaid
FROM orders o
WHERE o.customer_id = 12345
However some customers have 1000+ orders and the query takes 70 seconds to run (this is a simplified example, the real one joins in five other tables).
Is indexes the only way to speed this up? Thanks.
You could try adding the following index to the order_line_items:
CREATE INDEX idx ON order_line_items(order_id, was_paid_for);
That being said, you also could try using the following join version of your query:
SELECT o.id, COALESCE(oli.cnt, 0) = 0 AS isFullyPaid
FROM orders o
LEFT JOIN
(
SELECT order_id, COUNT(*) AS cnt
FROM order_line_items
WHERE was_paid_for = 0
GROUP BY order_id
) oli
ON oli.order_id = o.id
WHERE o.customer_id = 12345;
The same index suggestion applied to the above join query.
Related
Currently I'm studying and I received task to write query (join 4 tables: people, goods, orders and order details). So main table Order_details has two columns: Order_id and Good_id, in order to make possible to have many goods in one order (e.g. order with id = 1 has 3 rows in Order_details table but has different goods primary keys in each row).
So the problem is that I don't know any other possible methods(besides using group by, distinct or over()) to receive only one row of each order in Order_details table (like I would get by using for example Distinct keyword). I'm receiving completely same rows of each order (with same Order_id and Good_id) but i don't know how to get only one row of each order.
Here's my query(so basically i need to select sum of all goods in order but i don't think that it really matters in my problem) and scheme (if it'll help)
By the way I'm working with MYSQL.
SELECT
Order_details.Order_id,
Orders.Date, People.First_name,
People.Surname,
(
SELECT SUM(Goods.Price * Order_details.Quantity)
FROM Order_details, Goods
WHERE Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
) AS Total_price
FROM Order_details, Goods, Orders, People
WHERE Order_details.Order_id = Orders.Order_id
AND Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
AND Orders.Person_id = People.Person_id
ORDER BY Order_id ASC;
I have tried several methods, but still cant figure it out. Maybe somehow it is possible with subquery? But i'm not sure...
(I have tried method with UNION but it's not the key as well)
Remove the Goods and Order_details tables from the FROM clause and the corresponding conditions in the WHERE clause. You are not selecting anything from it anyway, except the SUM in the subselect. The Order_id can be selected from the Orders table. The join is just causing multiple rows per order.
Also please don't join with comma. Use the JOIN .. ON syntax. This makes it easier to see if the join conditions are reasonable.
SELECT
Orders.Order_id
Orders.Date,
People.First_name,
People.Surname,
(
SELECT SUM(Goods.Price * Order_details.Quantity)
FROM Order_details
JOIN Goods ON Order_details.Good_id = Goods.Good_id
WHERE Order_details.Order_id = Orders.Order_id
) AS Total_price
FROM Orders
JOIN People ON Orders.Person_id = People.Person_id
ORDER BY Orders.Order_id ASC;
you can use row_number() for this kind of thing it will assign a row number based on your criteria and then you can just pick the rows where the value is 1.
with t as (SELECT
Order_details.Order_id,
Orders.Date, People.First_name,
People.Surname,
row_number() over (
partition by order_id, good_id
order by order_id, good_id) rn,
(
SELECT SUM(Goods.Price * Order_details.Quantity)
FROM Order_details, Goods
WHERE Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
) AS Total_price
FROM Order_details, Goods, Orders, People
WHERE Order_details.Order_id = Orders.Order_id
AND Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
AND Orders.Person_id = People.Person_id
ORDER BY Order_id ASC)
select * from t where rn = 1
I have an 'Orders' table and a 'Records' table.
Orders table has the following columns:
order_id order_date seller order_price
Records table has the following columns:
order_id record_created_at record_log
I'm trying to pull and compile the following list of data but I keep getting an error message:
order_week
seller
total_num_orders
under100_count --this is the number of orders that were < $100
over100_count --this is the number of order that >= $100
approved --this is the number of orders that were approved by the payment platform
Here's my query:
SELECT order_week, seller, total_num_orders, under100_count, over100_count, approved
FROM (
SELECT
EXTRACT(WEEK FROM order_created_at) AS order_week,
merchant_name AS seller,
COUNT(merchant_name) AS total_num_orders,
SUM(DISTINCT total_order_price < 100) AS under100_count,
SUM(DISTINCT total_order_price >= 100) AS over100_count
FROM orders o
GROUP BY order_week, seller)
INNER JOIN (
SELECT
COUNT(DISTINCT o.order_id) AS approved
FROM records r
WHERE record_log = 'order approved'
GROUP BY order_id)
ON l.order_id = o.order_id;
What am I doing wrong?
The subquery in the join needs an alias. It also needs to return the order_id column, so it can be joined.
inner join ( select order_id, ... from records ... group by order_id) r --> here
on l.order_id = o.order_id
I would actually write your query as:
select
extract(week from o.order_created_at) as order_week,
o.merchant_name as seller,
count(*) as total_num_orders,
sum(o.total_order_price < 100) as under100_count,
sum(o.total_order_price >= 100) as over100_count,
sum(r.approved) approved
from orders o
inner join (
select order_id, count(*) approved
from records r
where record_log = 'order approved'
group by order_id
) r on r.order_id = o.order_id;
group by order_week, seller, approved
Rationale:
you don't want, and need, distinct in the aggregate functions here; it is inefficient, and might even yield wrong results
count(*) is more efficient count(<expression>) - so, use it, unless you know why you are doing otherwise
I removed an unecessary level of nesting
If there are orders without records, you might want a left join instead.
I have the following (here simplified) database and want to get the month with the highest revenue.
invoices
- id
- order_id
- issued (timestamp)
orders
- id
orderItems
- id
- order_id
- article_id
articles
- id
- price
So far I got the following statement:
Select articles.price * orderItems.order_id as revenue, Extract(month
from invoices.issued)
FROM orderItems
INNER JOIN articles ON orderItems.article_id = articles.id
Inner JOIN orders ON orderItems.order_id = orders.id
Inner JOIN invoices ON orders.id = invoices.order_id
GROUP BY year(issued), month(issued)
Order by revenue DESC Limit 1
The calculated revenue is wrong as the price is multiplied with the order_id but should be actually multiplied with the count of the respective order_id. I tried to implement count(orderItems.order_id) but it's not working. Any ideas? Thanks!
I think you want:
SELECT year(i.issued), month(i.issued), SUM(a.price) as revenue,
FROM orderItems oi JOIN
articles a
ON oi.article_id = a.id JOIN
orders o
ON oi.order_id = o.id JOIN
invoices i
ON o.id = i.order_id
GROUP BY year(i.issued), month(i.issued)
ORDER BY revenue DESC
LIMIT 1;
In other words, this is a simple aggregation query. There is no need -- ever -- to multiply by orderid. Also note that this query introduces table aliases so the query is easier to write and to read.
I'm trying to write a simple 'customers who bought this also bought...'
I have an order table, which contains orders, and an order_product table which contains all the products relating to an order.
In an attempt to find out the five most popular products that were bought with product_id = 155 I've composed the following query:
select product_id, count(*) as cnt
from order_product
where product_id != 155
and order_id in
(select order_id from order_product where product_id = 155)
group by product_id
order by cnt desc
limit 5;
So the inner query gets a list of all the orders that have the product I'm interested in (product_id = 155) then the outer query looks for all the products that aren't the same product but are in the one of the order that my product is in.
They are then ordered and limited to the top 5.
I think this works ok but it takes ages - I imagine this is because I'm using IN with a list of a couple of thousand.
I wonder if anyone could point me in the direction of writing it in a more optimised way.
Any help much appreciated.
You could try changing this:
select p1.product_id, p1.count(*) as cnt
To
select p1.product_id, count(distinct p1.order_id) as cnt
And see if that gives you any different result
Edit:
From the comments
If you prefer having the result you generate in your first query, you can try using this:
select a.product_id, count(*) as cnt
from order_product a
join (select distinct order_id from order_product where product_id = 155) b on (a.order_id = b.order_id)
where a.product_id != 155
group by a.product_id
order by cnt desc
limit 5;
A small alteration to your existing query :)
You can try a Join instead a subselect. Something like:
select p1.product_id, p1.count(*) as cnt
from order_product p1 JOIN order_product p2 on p1.order_id = p2. order_id
where p1.product_id != 155
and p2.product_id = 155
group by p1.product_id
order by p1.cnt desc
limit 5;
I'm working on a simple ordering system in MySQL and I came across this snag that I'm hoping some SQL genius can help me out with.
I have a table for Orders, Payments (with a foreign key reference to the Order table), and OrderItems (also, with a foreign key reference to the Order table) and what I would like to do is get the total outstanding balance (Total and Paid) for the Order with a single query. My initial thought was to do something simple like this:
SELECT Order.*, SUM(OrderItem.Amount) AS Total, SUM(Payment.Amount) AS Paid
FROM Order
JOIN OrderItem ON OrderItem.OrderId = Order.OrderId
JOIN Payment ON Payment.OrderId = Order.OrderId
GROUP BY Order.OrderId
However, if there are multiple Payments or multiple OrderItems, it messes up Total or Paid, respectively (eg. One OrderItem record with an amount of 100 along with two Payment Records will produce a Total of 200).
In order to overcome this, I can use some subqueries in the following way:
SELECT Order.OrderId, OrderItemGrouped.Total, PaymentGrouped.Paid
FROM Order
JOIN (
SELECT OrderItem.OrderId, SUM(OrderItem.Amount) AS Total
FROM OrderItem
GROUP BY OrderItem.OrderId
) OrderItemGrouped ON OrderItemGrouped.OrderId = Order.OrderId
JOIN (
SELECT Payment.OrderId, SUM(Payment.Amount) AS Paid
FROM Payment
GROUP BY Payment.OrderId
) PaymentGrouped ON PaymentGrouped.OrderId = Order.OrderId
As you can imagine (and as an EXPLAIN on this query will show), this is not exactly an optimal query so, I'm wondering, is there any way to convert these two subqueries with GROUP BY statements into JOINs?
The following is likely to be faster with the right indexes:
select o.OrderId,
(select sum(oi.Amount)
from OrderItem oi
where oi.OrderId = o.OrderId
) as Total,
(select sum(p.Amount)
from Payment p
where oi.OrderId = o.OrderId
) as Paid
from Order o;
The right indexes are OrderItem(OrderId, Amount) and Payment(OrderId, Amount).
I don't like writing aggregation queries this way, but it can sometimes help performance in MySQL.
Some answers have already suggested using a correlated subquery, but have not really offered an explanation as to why. MySQL does not materialise correlated subqueries, but it will materialise a derived table. That is to say with a simplified version of your query as it is now:
SELECT Order.OrderId, OrderItemGrouped.Total
FROM Order
JOIN (
SELECT OrderItem.OrderId, SUM(OrderItem.Amount) AS Total
FROM OrderItem
GROUP BY OrderItem.OrderId
) OrderItemGrouped ON OrderItemGrouped.OrderId = Order.OrderId;
At the start of execution MySQL will put the results of your subquery into a temporary table, and hash this table on OrderId for faster lookups, whereas if you run:
SELECT Order.OrderId,
( SELECT SUM(OrderItem.Amount)
FROM OrderItem
WHERE OrderItem.OrderId = OrderId
) AS Total
FROM Order;
The subquery will be executed once for each row in Order. If you add something like WHERE Order.OrderId = 1, it is obviously not efficient to aggregate the entire OrderItem table, hash the result to only lookup one value, but if you are returning all orders then the inital cost of creating the hash table will make up for itself it not having to execute the subquery for every row in the Order table.
If you are selecting a lot of rows and feel the materialisation will be of benefit, you can simplifiy your JOIN query as follows:
SELECT Order.OrderId, SUM(OrderItem.Amount) AS Total, PaymentGrouped.Paid
FROM Order
INNER JOIN OrderItem
ON OrderItem.OrderID = Order.OrderID
INNER JOIN
( SELECT Payment.OrderId, SUM(Payment.Amount) AS Paid
FROM Payment
GROUP BY Payment.OrderId
) PaymentGrouped
ON PaymentGrouped.OrderId = Order.OrderId;
GROUP BY Order.OrderId, PaymentGrouped.Paid;
Then you only have one derived table.
What about something like this:
SELECT Order.OrderId, (
SELECT SUM(OrderItem.Amount)
FROM OrderItem as OrderItemGrouped
where
OrderItemGrouped.OrderId = Order.OrderId
), AS Total,
(
SELECT SUM(Payment.Amount)
FROM Payment as PaymentGrouped
where
PaymentGrouped.OrderId = Order.OrderId
) as Paid
FROM Order
PS: You win again #Gordon xD
Select o.orderid, i.total, s.paid
From orders o
Left join (select orderid, sum(amount)
From orderitem) i
On i.orderid = o.orderid
Ieft join (select orderid, sum(amount)
From payments) s
On s.orderid = o.orderid