Related products approach suggestion, MySQL InnoDB/PHP - mysql

I want to expand UI on my CodeIgniter shop with suggestions on what other people bought with the current product (either when viewing product or when product is put in the cart, irrelevant now for the question).
I have came up with this query (orders table contains order details, while order items contains products that are in specific order via foreign key, prd alias is for products table where all important info about prduct is stored).
Query looks like this
SELECT
pr.product_id,
COUNT(*) AS num,
prd.*
FROM
orders AS o
INNER JOIN order_items AS po ON o.id = po.order_id
INNER JOIN order_items AS pr ON o.id = pr.order_id
INNER JOIN products AS prd ON pr.product_id = prd.id
WHERE
po.product_id = '14211'
AND pr.product_id <> '14211'
GROUP BY
pr.product_id
ORDER BY
num DESC
LIMIT 3
It works nice and dandy, query time is 0.030ish seconds and it returns the products that bought together with the one I am currently viewing.
As for the questions and considerations, Percona query analyzer complains about this two things, Non-deterministic GROUP BY and GROUP BY or ORDER BY on different tables, which both I need so that I can get items on top that are actually relevant for the related query, but absolutely have no idea how to fix it, or even should I be really bothered with this notice from query analyzer.
Second question is regarding performace, since for this query, it using temporary and filesort, I was thinking of creating a view out of this query, and use it instead of actually executing the query each time some product is opened.
Mind you that I am not asking for CI model/view/controller tips, just tips on how to optimize this query, and/or suggestions regarding performance and going for views approach...
Any help is much than appreciated.

SELECT p.num, prd.*
FROM
(
SELECT a.product_id, COUNT(*) AS num
FROM orders AS o
INNER JOIN order_items AS b ON o.id = b.order_id
INNER JOIN order_items AS a ON o.id = a.order_id
WHERE b.product_id = '14211'
AND a.product_id <> '14211'
GROUP BY a.product_id
ORDER BY num DESC
LIMIT 3
) AS p
JOIN products AS prd ON p.product_id = prd.id
ORDER BY p.num DESC
This should
Run faster (especially as your data grows),
Avoid the group by complaint,
not over-inflate the count,
etc
Ignore the complaint about GROUP BY and ORDER BY coming from different tables -- that is a performance issue; you need it.
As for translating that back to CodeIgniter, good luck.

Related

MySQL View in place of subquery does not return the same result

The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.

How can I improve this inner join query?

My database has 3 tables. One is called Customer, one is called Orders, and one is called RMA. The RMA table has the info regarding returns. I'll include a screen shot of all 3 so you can see the appropriate attributes. This is the code of the query I'm working on:
SELECT State, SKU, count(*)
from Orders INNER JOIN Customer ON Orders.Customer_ID = Customer.CustomerID
INNER JOIN RMA ON Orders.Order_ID = RMA.Reason
Group by SKU
Order by SKU
LIMIT 10;
I'm trying to get how much of each product(SKU) is returned in each state(State). Any help would really be appreciated. I'm not sure why, but anytime I include a JOIN statement, my query takes anywhere from 5 minutes to 20 minutes to process.
[ Customer table]
!2[ RMA table]
!3
Your query should look like this:
SELECT c.State, o.SKU, COUNT(*)
FROM Orders o INNER JOIN
Customer c
ON o.Customer_ID = c.CustomerID JOIN
RMA
ON o.Order_ID = RMA.Order_Id
GROUP BY c.State, o.SKU
ORDER BY SKU;
Your issue is probably the incorrect JOIN condition between Orders and RMA.
If you have primary keys properly declared on the tables, then this query should have good-enough performance.
Given you are joining with an Orders table I'm going to assume this table contains all the orders that the company has ever done. This can be quite large and would likely cause the slowness you are seeing.
You can likely improve this query if you place some constraint on the Orders you are selecting, restricting what date range you use is common way to do this. If you provide more information about what the query is for and how large the dataset is everyone will be able to provide better guidance as to what filters would work best.

Slow aggregate query with join on same table

I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id

SQL 3 table join

I have 3 tables which need to be linked in an SQL statement (I'm using PHP - MySQL if it helps). I need to extract all orders where the vendor field from the third table equals '3', as below:
orders - orders_items - items
order_id -> order_id
item_id -> id
vendor = '3'
There are many ways to do this I believe with various WHERE and JOINS but I'm asking for the most efficient methods in comparison to my method below:
SELECT
orders.order_id
FROM
items, orders
INNER JOIN
orders_items
ON
orders.order_id = orders_items.order_id
WHERE
orders_items.item_id = items.id
AND
items.vendor = '3'
GROUP BY
orders.order_id
Using , notation is not universally considered bad practice, but I think it's quite a minority now that agree with it. Even Oracle (whose users seems to be the most vocal supporters of that syntax) recommend to not use it.
But I don't know anyone who would support mixing , and ANSI-92's JOIN syntax. It's just asking for trouble.
SELECT
orders.order_id
FROM
orders
INNER JOIN
orders_items
ON orders.order_id = orders_items.order_id
INNER JOIN
items
ON orders_items.item_id = items.id
WHERE
items.vendor = '3'
GROUP BY
orders.order_id
The SQL Optimiser doesn't execute that exactly as you specified it. SQL is just a expression from which the SQL Optimiser derives a plan to give a result that fits. By writing it as above the optimiser will find what it sees as the best order to filter, join, sort, etc, and which are the best indexes, etc to use to do those things.
EDIT
I've noticed people supporting DISTINCT over GROUP BY.
While DISTINCT is slightly shorter, it is not any quicker, and does place restrictions on you. You can't later add COUNT(*) for example, but with GROUP BY you can.
In short, GROUP BY can do anything DISTINCT can, but that's not true the other way around. I only use DISTINCT in very trivial pieces of code so I can get a shole query on one line. Even then I often later regret it a little as the code develops and I need to rever to GROUP BY.
select o.order_id from orders o inner join orders_items oi on o.order_id = oi.item_id inner join items i on oi.item_id = i.id where i.vendor='3';
Many ways to do the same like joins, sub query, in clause. Depends on the need like terms of time or terms of memory which will best to use also major dependance on the INDEX columns of table and amount of data join table having.
You don't need the GROUP BY, just make a DISTINCT if you need to remove duplicates:
SELECT DISTINCT o.order_id
FROM orders o
INNER JOIN orders_items oi ON oi.order_id = o.order_id
INNER JOIN items i ON i.id = oi.items_id
where i.vendor = '3'
And also, use INNER JOIN on all tables :)
This is efficient and will work too::
SELECT
DISTINCT(orders.order_id)
FROM
items
INNER JOIN orders_items on (items.id=orders_items.item_id )
inner join orders on (orders.order_id=order_items.order_id)
WHERE
items.vendor = '3'
SELECT
orders.order_id
FROM
orders o
INNER JOIN orders_items oi ON o.order_id = oi.order_id
INNER JOIN items i ON oi.item_id = i.item_id
WHERE
i.vendor = 3
The table1, table2 syntax isn't something that I've used, but I imagine listing the tables as joins is more efficient as that seems to be the most accepted way.
Also, you don't need to put speech marks on the vendor criteria if the field is an integer.
SELECT O.order_id AS Id
FROM orders O
INNER JOIN orders_items OI
ON O.order_id = OI.order_id
INNER JOIN items I
ON OI.item_id = I.id
WHERE I.vendor = '3'
GROUP BY O.order_id

Does this query look optimized?

I'm writing a query for an application that needs to list all the products with the number of times they have been purchased.
I came up with this and it works, but I am not too sure how optimized it is. My SQL is really rusty due to my heavy usage of ORM's, But in this case a query is a much more elegant solution.
Can you spot anything wrong (approach wise) with the query?
SELECT products.id,
products.long_name AS name,
count(oi.order_id) AS sold
FROM products
LEFT OUTER JOIN
( SELECT * FROM orderitems
INNER JOIN orders ON orderitems.order_id = orders.id
AND orders.paid = 1 ) AS oi
ON oi.product_id = products.id
GROUP BY products.id
The schema (with relevant fields) looks like this:
*orders* id, paid
*orderitems* order_id, product_id
*products* id
UPDATE
This is for MySQL
I'm not sure about the "(SELECT *" ... business.
This executes (always a good start) and I think is equivalent to what was posted.
SELECT products.id,
products.long_name AS name,
count(oi.order_id) AS sold
FROM products
LEFT OUTER JOIN
orderitems AS oi
INNER JOIN
orders
ON oi.order_id = orders.id AND orders.paid = 1
ON oi.product_id = products.id
GROUP BY products.id
Here a solution for those of us who are nesting impaired. (I get so confused when I start nesting joins)
SELECT products.id,
products.long_name AS name,
count(oi.order_id) AS sold
FROM orders
INNER JOIN orderitems AS oi ON oi.order_id = orders.id AND orders.paid = 1
RIGHT JOIN products ON oi.product_id = products.id
GROUP BY products.id
However, I tested your solution, Mike's and mine on MS SQL Server and the query plans are identical. I can't speak for MySql but if MS SQL Server is anything to go by, you may find the performance of all three solutions equivalent. If that is the case I guess you pick which solution is clearest to you.
Does it give you the right answer?
Except for just modifying it to get rid of the SELECT in the inner query, I don't see anything wrong with it.
Well you have "LEFT OUTER JOIN" that can be a performance issue depending on your Database.
Last time I remember it caused hell on MySQL, and it doesn't exist in SQLite. I think Oracle can handle it ok, and I guess DB and MSSQL too.
EDIT: If I remember correctly LEFT OUTER JOIN can be orders of magnitude slower on MySQL, but please correct me if I'm outdated here :)
Untested code, but try it:
SELECT products.id,
MIN(products.long_name) AS name,
count(oi.order_id) AS sold
FROM (products
LEFT OUTER JOIN orderitemss AS oi ON oi.product_id = products.id)
INNER JOIN orders AS o ON oi.order_id = o.id
WHERE orders.paid = 1
GROUP BY products.id
I don't know if the parentheses are needed for the LEFT OUTER JOIN, neither if MySQL allows multiple joins, however the MIN(products.long_name) gives just the description, since for every products.id you have only one description.
Perhaps the parentheses need to be around the INNER JOIN.
Here's a subquery form.
SELECT
p.id,
p.long_name AS name,
(SELECT COUNT(*) FROM OrderItems oi WHERE oi.order_id in
(SELECT o.id FROM Orders o WHERE o.Paid = 1 AND o.Product_id = p.id)
) as sold
FROM Products p
It should perform roughly equivalent to the join form. If it doesn't, let me know.