I am fairly new to MySQL and have this theoretical problem given to me. I am given these tables
customers
---------------
id
name
country
order_date
orders
---------------
id
order_number
order_type
customers_order_details
---------------
id
customer_id
order_id
price
A customer can have multiple different orders. I need to retrieve the customers with the largest total price spent, with the total price must be at least 100. Is my approach correct?
SELECT c.id, c.name AS customer_name, c.country , SUM(d.price) AS total_price
FROM customers c
JOIN customers_order_details d
ON c.id = d.customer_id
GROUP BY customer_name,
HAVING total_price >= 100
ORDER BY total_price DESC;
I ask due to not sure since I was told for GROUP BY that I needed to add all columns specified but feel that using the name is more than adequate
It looks almost correct.
Grouping by only customers.name isn't right though. Besides that this will throw an error on more tightly configured MySQL servers or newer versions or even DBMS from other vendors, what happens if there are two or more different customers with the same name, say some "John Smith"s? They're all aggregated in the same group giving false figures!
The safest bet is just to group by all columns not being an argument to an aggregation function. That would be customers.id, customers.name and customers.country in this case. In some DBMS you can also group by just a tuple of columns all the columns not given to an aggregation function are dependent of. If customers.id is declared as primary key, that would fulfill that rule and you could just group by it. But I'm not really sure if MySQL does implement that shortcut or in which versions or configurations. So you should better go with all the columns here.
Side note: The schema design is a little weird. Why are the order details directly linked to customers and not the orders themselves are linked to the customers? As it is now an order can have multiple details belonging to different customers. That may be right in your use case, but it's not the usual thing you would expect. Maybe you should revise that.
Your code looks quite fine. I would jus recommend aggregating by the primary key of the customer table rather than by the name:
SELECT c.id, c.name AS customer_name, c.country , SUM(d.price) AS total_price
FROM customers c
JOIN customers_order_details d ON c.id = d.customer_id
GROUP BY c.id
HAVING SUM(d.price) >= 100
ORDER BY total_price DESC;
This makes the code a valid aggregation query; all non-aggregated columns in the select clause are functionally dependent on the column in the group by clause.
As a side note: using column aliases in the HAVING clause is a MySQL extension to the SQL standard. You can use that feature, or phrase the HAVING clause in pure ANSI SQL, repeating the aggregate expression.
Related
In MySQL I have three tables:
Customer: which contains ID, Name, Balance, and Address
Orders: which contains the Order ID, Order Date, Shipping Date, and Customer ID as a foreign key
Order Lines: which contains Order ID, Part ID, and Number Ordered.
I'm trying to write a query such that I can figure out how many items each customer has ordered, but I'm not sure how to get discrete sums for each of the customers. The sample code I have so far just sums all the order lines together into one field.
SELECT
CONCAT(customer_last_name, ', ', customer_first_name) AS 'Customer',
SUM(number_ordered) AS 'Ordered'
FROM Customers t1
JOIN Orders t2
ON t1.Customer_id=t2.Customer_id
JOIN order_lines t3
ON t2.order_id=t3.order_id;
I'm pretty new to SQL and coding in general, so apologies if I'm missing something obvious.
As far as concerns, you just need to add a group by clause to that query to make it produce the result you want:
select
concat(c.last_name, ', ', c.first_name) as customer
sum(oi.number_ordered) as ordered
from customers c
join orders o on o.customer_id = c.customer_id
join order_lines oi on oi.order_id = o.order_id
group by c.id, c.last_name, c.first_name
This gives you one row per customer, along with the sum o number_ordered for all items they ordered.
Notes:
the column aliases should not be surrounded with single quotes (which stand for string litterals); usually no quoting is needed, unless the indentifier contains special characters, in which case you can use backticks
meaningful table aliases make the query easier to read and maintain
Well, I am struggling with this question in SQL using MySql:
I have to give the product that was mostly sold per supplier from the popular open source database called NORTHWIND: https://northwinddatabase.codeplex.com
Now what I wrote is:
SELECT products.SupplierID ,`order details`.ProductID, count(*) as NumSales FROM `order details`
JOIN products ON `order details`.ProductID = products.ProductID
JOIN orders ON `order details`.OrderID = orders.OrderID
WHERE `order details`.OrderID
IN
(SELECT OrderID FROM orders
WHERE MONTH(OrderDate) = 7 AND YEAR(orderDate) = 1997)
group by products.SupplierID , `order details`.ProductID
ORDER BY NumSales desc
;
The result is:
that this is all good but I need to give back for example for Supplier 1 Product 1 since it was sold 3 times (at 7/1997)
Adding to the start:
SELECT SupplierID, ProductID, MAX(b.NumSales)
FROM( ... )
gets me closer but it gives my the highest of all suppliers and not for every supplier.
Help will be great.
P.S.
This question is similar but the same and didn't completely help me.
Please know this as a psuedo answer and work with it as you will...appreciate that you are putting in the time to learn this.
select supplier_id, max(num_sales) max_sales
from (put your select statement here)
group by supplier_id
This now gives you what you max num_sales is for each supplier. Something like
supplier_id max_sales
1 3
2 1
3 2
4 2
Now join this back to your original query to get the product data for the whatever matches to the max.
select a.supplier_id, b.product_id, a.max_sales
from
(select supplier_id, max(num_sales) max_sales
from (put your select statement here)
group by supplier_id) a
inner join
(your original query again) b
on a.supplier_id = b.supplier_id
and a.max_sales = b.num_sales
As you learn SQL, you will see that there is usually hundreds of valid working scripts that will give you the answer you want....your job is to find the script that is the quickest to write, the most efficient to run, and meets the criteria of your task. The advantage to the method shown here is it will display multiple records in the event of a tie (supplier_id = 2 has two product that bot have a max sales of one. This query returns both those rows).
just as additional info...other databases allow common table expressions (with clause), however mysql does not. How do you use the "WITH" clause in MySQL? in other databases you are able to simplify this script further.
I have 3 tables:
1. products(product_id,name)
2. orders(id,order_id,product_id)
3. factors(id,order_id,date)
I want to retrieve product names(products.name) where have similar order_id on a date in two last tables.
I use this query for this purpose:
select products.name
from products
WHERE products.product_id ~IN
(
SELECT distinct orders.product_id FROM orders WHERE
order_id IN (select order_id FROM factors WHERE
factors.datex ='2017-04-29') GROUP BY product_id
)
but no result. where is my mistake? how can I resolve that? thanks
Your query should be fine. I am rewriting it to make a few changes to the structure, but not the logic (this makes it easier for me to understand the query):
select p.name
from products p
where p.product_id in (select o.product_id
from orders o
where o.order_id in (select f.order_id
from factors f
where f.datex = '2017-04-29'
)
) ;
Notes on the changes:
When using multiple tables in a query, always qualify the column names.
Use table aliases. They make queries easier to write and to read.
SELECT DISTINCT and GROUP BY are unnecessary in IN subqueries. The logic of IN already handles (i.e. ignores) duplicates. And by explicitly including the operations, you run the risk of a less efficient query plan.
Why might your query not work?
factors.datex has a time component. If so, then this will work date(f.datex) = '2017-04-29'.
There are no factors on that date.
There are no orders that match factors on that date.
There are no products in the orders that match the factors on that date.
In factors table column name is date so it should be -
factors.date ='2017-04-29'
You have written -
factors.datex ='2017-04-29'
I'm currently trying to implement a search engine function, trying to return information from 3 tables efficiently. The usage is numeric searches, free text won't be possible and as such I'm not trying to optimise for this scenario.
The tables being used are as follows:
Companies hasMany Products
Products hasMany Prices
The problem is as follows:
I want to return the single cheapest priced product for each company that meets any specified criteria (this could be criteria against the product or price)
The solution I have is as follows:
SELECT * FROM (
SELECT Company.id, Company.name AS CompanyName, Product.name, Product.quantity, Price.price
FROM Product
LEFT JOIN Price ON Product.id=Price.product_id
LEFT JOIN Company ON Product.company_id=Company.id
/* EXAMPLE CONDITIONS */
WHERE Price.price > 10 AND Product.quantity > 4
ORDER BY Price.price
) AS tmp_table
GROUP BY tmp_table.id
ORDER BY tmp_table.price;
Question: Is this method of a sub query with joins the most effective way to achieve this solution?
The execution times are ranging anywhere from 1ms to 140ms with 3 companies, each with 3 products, that each have 3 prices so if this were to go into the hundreds it could get messy.
I've created an SQL Fiddle at http://sqlfiddle.com/#!2/c194b7/1/0
This query relies on a "feature" of MySQL that is specifically documented not to work. That is, you are assuming that the extra columns in the outer group by come from the first row, and yet the documentation clearly states:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
I would not recommend the approach you are using.
Because you are using group by, you can do this with the group_concat()/substring_index() method:
SELECT c.id, c.name AS CompanyName, Product.name,
substring_index(group_concat(p.name order by pr.price, p.product_id), ',', 1) as price,
substring_index(group_concat(p.quantity order by pr.price, p.product_id), ',', 1) as price,
MIN(pr.price) as price
FROM Product p LEFT JOIN
Price pr
ON p.id = pr.product_id LEFT JOIN
Company c
ON p.company_id = c.id
WHERE pr.price > 10 AND p.quantity > 4
GROUP BY c.id, c.name
ORDER BY pr.price;
I am trying to get a list of possible customers along with the sum of their order history (ltv)
Without the order by, this query loads in under a second. With the order by and the query is taking over 90 seconds.
SELECT a.customerid,a.firstname,a.lastname,Orders.ltv
FROM customers a
LEFT JOIN (
SELECT customerid,
SUM(amount) as ltv
FROM orders
GROUP BY customerid) Orders
ON Orders.customerid=a.customerid
ORDER BY
Orders.ltv DESC
LIMIT 0,10
Any ideas how this could be sped up?
EDIT: I guess I cleaned up the query a little too much. The query is acually a little more complicated then this version. Other data is selected from the customers table, and can be sorted against as well.
Without the actual schema it is a bit hard to know how data is related but I guess this query should be equivalent and more performant:
SELECT a.customerid, coalesce(sum(o.amount), 0) TotalLtv FROM customers a
LEFT JOIN orders o ON a.customerid = o.cusomterid
GROUP BY a.customerid
ORDER BY TotalLtv DESC
LIMIT 10
The coalesce will make sure you return 0 for the customers without orders.
As #ypercube made me notice, an index on amount won't help either. You could give it a try to:
ALTER TABLE orders ADD INDEX(customer, amount)
After your question update
If you need to add more fields that functionally depend on the a.customerid in the select clause you can use the non-standard MySQL group by clause. This will result in better performance than grouping by a.customerid, a.firstname, a.lastname:
SELECT a.customerid, a.firstname, a.lastname, coalesce(sum(o.amount), 0) TotalLtv
FROM customers a
LEFT JOIN orders o ON a.customerid = o.cusomterid
GROUP BY a.customerid
ORDER BY TotalLtv DESC
LIMIT 10
A few things here. First it doesn't appear that you need to join the customers table at all here since you are only using it for the customerid, which already exists in orders table. If you have more than 10 customer id's with corresponding amounts, you will never even need to see the list of customer id's which don;t have amounts that you would get with LEFT JOIN from customers. As such, you should be able to reduce your query to this:
SELECT customerid, SUM(amount) AS ltv
FROM orders
GROUP BY customerid
ORDER BY ltv DESC LIMIT 0,10
You would need an index on customerid. Unfortunately, the sort is on a calculated field, so there is not a lot you can do to speed this up from that point.
I see the updated question. Since you do need additional fields from customers, I will revise my answer to include the customer table
SELECT c.customerid, c.firstname, c.lastname, coalesce(o.ltv, 0) AS total
FROM customers AS c
LEFT JOIN (
SELECT customerid, SUM(amount) as ltv
FROM orders
GROUP BY customerid
ORDER BY ltv DESC LIMIT 0,10) AS o
ON c.customerid = o.customerid
Note that I am joining on a sub-selected table as you were doing in your original query, however I have performed the sort and limit on the sub-selected table so you don't have to sort all the records without any entries on orders table.
Two things. First, don't use an inner query. MySQL does allow ORDER BY on a projection alias. Second, you should get a considerable improvment by having a B-TREE index on the composed key (customerid, amount). Then the engine will be able to execute this query by a simple traversal of the index, without fetching any row data.