I have two tables, customers and orders which are inner joined. A customer can have several orders associated with them. In my selection, I then group by customers.id. I need to select the most recent order of each customer, but also the amount of money spent in that order. Currently, I can select the most recent order_date but do not know how to select the amount in the same row as the order_date.
This is my current query:
SELECT
first_name,
last_name,
email,
MAX(order_date) AS recent_order,
amount -- this needs to select amount associated with recent_order
FROM customers
JOIN orders
ON customers.id = orders.customer_id
GROUP BY customers.id;
The query selects the most recent date, but does not select the amount associated with the most recent order_date.
Table declarations:
CREATE TABLE customers (
id INT AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(100)
);
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
order_date DATE,
amount DECIMAL(8,2),
customer_id INT,
FOREIGN KEY(customer_id) REFERENCES customers(id)
);
I would recommend a correlated subquery in the where clause:
SELECT c.*, o.* -- or whatever columns you want
FROM customers c JOIN
orders o
ON c.id = o.customer_id
WHERE o.order_date = (SELECT max(o2.order_date)
FROM orders o2
WHERE o2.customer_id = o.customer_id
);
For performance, you want an index on orders(customer_id, order_date).
SELECT
first_name,
last_name,
email,
MAX(order_date) AS recent_order,
(SELECT amount FROM orders WHERE order_date = MAX(order_date) AND customers.id = orders.customer_id) as amount
FROM customers
JOIN orders
ON customers.id = orders.customer_id
GROUP BY customers.id;
OR
SELECT
first_name,
last_name,
email,
order_date AS recent_order,
amount AS recent_order_amount
FROM customers
JOIN orders
ON customers.id = orders.customer_id
GROUP BY customers.id
ORDER BY orders.order_date DESC;
Use this:
SELECT TOP 1
t1.first_name,
t1.last_name,
t1.email,
t2.order_date,
t2.amount
FROM customers t1
JOIN orders t2
ON t1.id = t2.customer_id
ORDER BY
t2.order_date
Add a GROUP BY t1.id if the intent is to return all rows with the most recent order_date. Omit this if every order is resented by a single row in orders. Note this will not sum the amounts. You will have to do that in code or else use a different query. Also note that performance of this query will be affected by configuration of indexes. This query may not be performant if order_date is not part of an index and the table contains a large data set.
Related
Question: Write a SQL query to find and display a customer who made 2 consecutive orders in the same category?
I am struggling with the answer. Any help would be appreciated.
Queries:
CREATE TABLE customers (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
name TEXT,
email TEXT);
CREATE TABLE orders (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
customer_id INTEGER,
item TEXT,
price REAL,
ORDER_DATE DATETIME,
category TEXT);
INSERT INTO customers (name, email) VALUES ("Doctor Who", "doctorwho#timelords.com");
INSERT INTO customers (name, email) VALUES ("Harry Potter", "harry#potter.com");
INSERT INTO customers (name, email) VALUES ("Captain Awesome", "captain#awesome.com");
INSERT INTO orders (customer_id, item, price,ORDER_DATE,category)
VALUES (1, "Sonic Screwdriver", 1000.00,'21-04-15 09.00.00','tools');
INSERT INTO orders (customer_id, item, price,ORDER_DATE,category)
VALUES (1, "Light", 1000.00,'21-10-15 09.00.00','tools');
INSERT INTO orders (customer_id, item, price,ORDER_DATE,category)
VALUES (2, "High Quality Broomstick", 40.00,'20-12-20 09.00.00','cleaner');
INSERT INTO orders (customer_id, item, price,ORDER_DATE,category)
VALUES (3, "TARDIS", 1000000.00,'21-01-20 09.00.00','other');
Step: 1
First of all, you add a foreign key in the column containing the customer id of the order table, then after that add the customers and orders tables together.
Step: 2
After adding both tables together run this query and you will get your result.
SELECT DISTINCT orders.category , customers.id,customers.name,customers.email FROM customers JOIN orders ON customers.id= orders.customer_id WHERE orders.category in ( select category from orders group by category having count(*) >= 2 )
You can also solve it by using LEAD. Get lead_category and lead_customers_id and filter with category, customers_id
select * from
(SELECT orders.category , orders.item, customers.id,customers.name,customers.email,
LEAD(category) OVER (ORDER BY customers.id ASC) AS lead_category,
LEAD(customers.id) OVER (ORDER BY customers.id ASC) AS lead_customers_id
FROM orders
JOIN
customers ON
orders.customer_id = customers.id) AS T
where category = lead_category and id = lead_customers_id
SELECT DISTINCT t1.customer_id
FROM orders t1
JOIN orders t2 USING (customer_id, category)
WHERE t1.ORDER_DATE < t2.ORDER_DATE
AND NOT EXISTS ( SELECT NULL
FROM orders t3
WHERE t1.customer_id = t3.customer_id
AND t1.category != t3.category
AND t1.ORDER_DATE < t3.ORDER_DATE
AND t3.ORDER_DATE < t2.ORDER_DATE )
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a26be6164e027b1a4b0aa9a736764da3
I.e. we simply search for a pair of orders for the same customer and category where an order with the same customer but another category not exists between these orders.
Join customers table if needed.
Assume table Orders
CREATE TABLE Orders (
order_id int(11),
customer_id int(11),
purchase_date datetime
)
We need the following report: For customers who did not place an order prior to this month, how many orders were placed per customer?
Here is the very slow sql that I am currently using:
SELECT count(order_id) num_of_orders, customer_id
FROM (SELECT order_id, customer_id
FROM orders
WHERE customer_id NOT IN (SELECT DISTINCT customer_id
FROM orders
WHERE purchase_date < '2019-03-01')) a
GROUP BY customer_id;
Is there a faster/more efficient way to write this query?
I would re-write it as :
SELECT COUNT(order_id), customer_id
FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM orders o1 WHERE o1.customer_id = o.customer_id AND o1.purchase_date < '2019-03-01')
GROUP BY customer_id;
This will speed up your performance thanks to an index on orders(customer_id).
However, if the orders table contains null for customer_id, then NOT IN subquery will return null.
I'm not sure this will be faster, but it is simpler:
SELECT customer_id, count(*) as num_of_orders
FROM orders o
GROUP BY customer_id
HAVING MIN(purchase_date) >= '2019-03-01';
I have two tables, a customers and orders table.
The customers table contains a unique ID for each customer. It contains 1141 entries.
The orders table contains many entries with a customerID and a date.
I am trying to query my database and return a list of customers and the max(date) from the orders list.
SELECT *
FROM customers
INNER JOIN
(
SELECT CustomerID, max(date) as date
FROM orders
GROUP BY CustomerID
) Sub1
ON customers.id = Sub1.CustomerID
INNER JOIN orders
ON orders.CustomerID = Sub1.CustomerID
AND orders.date = Sub1.Date
However this query is returning 1726 rows instead of 1141 rows. Where is this getting extra from?
I think it's beacause ORDERS table contains same customerID multiple times, so when you join the table with CUSTOMERS, each CUSTOMER.id matches multiple rows of ORDERS.
The problem is that there are ties.
For a given customer, some place more than one order per day. So there's a possibility that occasionally some may have placed more than one order on the date that is their max date.
To fix this, you need to use MAX() or some column that is always unique in the Orders table (or at least unique within a given date). This is easy if you can depend on an auto-increment primary key in the Orders table:
SELECT *
FROM customers
INNER JOIN
(
SELECT CustomerID, max(orderid) as orderid as date
FROM orders
GROUP BY CustomerID
) Sub1
ON customers.id = Sub1.CustomerID
INNER JOIN orders
ON orders.CustomerID = Sub1.CustomerID
AND orders.orderid = Sub1.orderid
This assumes that orderid increases in lock-step with increasing dates. That is, you'll never have an order with a greater auto-inc id but an earlier date. That might happen if you allow data to be entered out of chronological order, e.g. back-dating orders.
;with cte as
(
select CustomerID, orderdate
, rn = row_number() over (partition by customerID order by orderdate desc)
from orders
)
select c.*, cte.orderdate
from customer c
join cte on cte.customerID = c.customerid
where rn =1 -- This will limit to latest orderdate
I am using mysql.
I have orders table with column customer_id, order_id, order_date(datetime) now I want to find all orders on Dec 20, 2013, which are from repeat customers( not new customer, i.e. customers has placed some order before as well) in a single query.
Orders table has other typical columns as well not mentioned here. Let me know I can provide more data.
UPDATE: Can we do it without subquery? If yes how? (Just curious)
select customer_id , order_id, order_date from orders where order_date between
'20/12/2013 00:00:00' and '20/12/2013 23:59:00' AND
customer_id IN (SELECT customer_id FROM orders where order_date < '20/12/2013')
Assuming that order_id is the primary key in the table orders and customer_id is a foreign key, you can use the following query with self join to pull the list of all order_id corresponding to repeat customers:
select order_id, customer_id
from orders a, orders b
where a.customer_id = b.customer_id
and order_date = '20131220'
Now check this query
SELECT * FROM `orders` WHERE DATE='20/12/2013' AND customer_id IN (SELECT customer_id FROM yearly_sales WHERE DATE < '20/12/2013' )
MySQL seems to be unable to optimise a select with a GROUP BY subquery and ends up in long execution times. There must be a known optimisation for such common scenario.
Let's assume that we're trying to return all orders from the database, with a flag indicating if it is the first order for the customer.
CREATE TABLE orders (order int, customer int, date date);
Retrieving the first orders by customer is superfast.
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
However, it becomes very slow once we join this with the full order set using a subquery
SELECT order, first_order FROM orders LEFT JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders ON orders.order=first_orders.first_order;
I hope there is a simple trick that we're missing, because otherwise it would be about 1000x faster to do
CREATE TEMPORARY TABLE tmp_first_order AS
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
CREATE INDEX tmp_boost ON tmp_first_order (first_order)
SELECT order, first_order FROM orders LEFT JOIN tmp_first_order
ON orders.order=tmp_first_order.first_order;
EDIT:
Inspired by #ruakh proposed option 3, there is indeed a less ugly workaround using INNER JOIN and UNION, which has acceptable performance yet does not require temporary tables. However, it is a bit specific to our case and I am wondering if a more generic optimisation exists.
SELECT order, "YES" as first FROM orders INNER JOIN (
SELECT min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_1 ON orders.order=first_orders_1.first_order
UNION
SELECT order, "NO" as first FROM orders INNER JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_2 ON first_orders_2.customer = orders.customer
AND orders.order > first_orders_2.first_order;
Here are a few things you might try:
Removing customer from the subquery's field-list, since it's not doing anything anyway:
SELECT order,
first_order
FROM orders
LEFT
JOIN ( SELECT MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.order = first_orders.first_order
;
Conversely, adding customer to the ON clause, so it actually does something for you:
SELECT order,
first_order
FROM orders
LEFT
JOIN ( SELECT customer,
MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.customer = first_orders.customer
AND orders.order = first_orders.first_order
;
Same as previous, but using an INNER JOIN instead of a LEFT JOIN, and converting your original ON clause into a CASE expression:
SELECT order,
CASE WHEN first_order = order THEN first_order END AS first_order
FROM orders
INNER
JOIN ( SELECT customer,
MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.customer = first_orders.customer
;
Replacing the whole JOIN approach with an uncorrelated IN-subquery in a CASE expression:
SELECT order,
CASE WHEN order IN
( SELECT MIN(order)
FROM orders
GROUP
BY customer
)
THEN order
END AS first_order
FROM orders
;
Replacing the whole JOIN approach with a correlated EXISTS-subquery in a CASE expression:
SELECT order,
CASE WHEN NOT EXISTS
( SELECT 1
FROM orders AS o2
WHERE o2.customer = o1.customer
AND o2.order < o1.order
)
THEN order
END AS first_order
FROM orders AS o1
;
(It's very likely that some of the above will actually perform worse, but I think they're all worth trying.)
I would expect this to be faster when using a variable instead of the LEFT JOIN:
SELECT
`order`,
If(#previous_customer<>(#previous_customer:=`customer`),
`order`,
NULL
) AS first_order
FROM orders
JOIN ( SELECT #previous_customer := -1 ) x
ORDER BY customer, `order`;
That's what my example on SQL Fiddle returns:
CUSTOMER ORDER FIRST_ORDER
1 1 1
1 2 (null)
1 3 (null)
2 4 4
2 5 (null)
3 6 6
4 7 7