I have been going round in circles now for a couple of days with this query and it's driving me nuts. I've found partial answers using max(date) and left joins but have as yet failed to achieve the right result so would be very grateful for any help.
I have 3 tables:-
inventory
id
lots of other fields...
sale_price
id
date
*ticket_type_id
*inventory_id
ticket_type
id
ticket_name
The sale_price table is updated with a date and new ticket type every time the sale price is changed. All I want to do is display:-
find the latest record in sale_price
display the inventory information related to this
display the ticket_name relating to this
The query that I have arrived at is:-
$result = "
SELECT DISTINCT inv.*, tt.ticket_name, MAX(sp.date) AS spdate
FROM sale_price sp
LEFT
JOIN ticket_type tt
ON sp.ticket_type_id = tt.id
JOIN inventory inv
ON sp.inventory_id = inv.id
GROUP
BY sp.inventory_id
";
but it's clearly not working. I would be really grateful for any help you can give and (if it's not too much to as) a bit of detail on where I'm going wrong.
many thanks in anticipation!
If you only want to know the latest record of all then you could use following query:
SELECT
s.ticket_type_id,
s.inventory_id,
t.ticket_name,
i.lots of other fields
FROM
sale_price s
INNER JOIN
inventory i
ON
s.inventory_id = i.id
INNER JOIN
ticket_type t
ON
s.ticket_type_id = t.id;
WHERE
s.date = (
SELECT
MAX(s1.date)
FROM
sale_price s1
)
If you need the latest record for every inventory item, then you've got to extend this query by using the equality of the inventory_id in the WHERE clause of the subselect:
SELECT
s.ticket_type_id,
s.inventory_id,
t.ticket_name,
i.lots of other fields
FROM
sale_price s
INNER JOIN
inventory i
ON
s.inventory_id = i.id
INNER JOIN
ticket_type t
ON
s.ticket_type_id = t.id;
WHERE
-- gives us the newest date per inventory_id
s.date = (
SELECT
MAX(s1.date)
FROM
sale_price s1
WHERE
s.inventory_id = s1.inventory_id
)
By the way, this is a correlated subquery and it could get slow.
Related
The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.
This is a slight variant of the question I asked here
SQL Query for getting maximum value from a column
I have a Person Table and an Activity Table with the following data
-- PERSON-----
------ACTIVITY------------
I have got this data in the database about users spending time on a particular activity.
I intend to get the data when every user has spent the maximum number of hours.
My Query is
Select p.Id as 'PersonId',
p.Name as 'Name',
act.HoursSpent as 'Hours Spent',
act.Date as 'Date'
From Person p
Left JOIN (Select MAX(HoursSpent), Date from Activity
Group By HoursSpent, Date) act
on act.personId = p.Id
but it is giving me all the rows for Person and not with the Maximum Numbers of Hours Spent.
This should be my result.
You have several issues with your query:
The subquery to get hours is aggregated by date, not person.
You don't have a way to bring in other columns from activity.
You can take this approach -- joins and group by, but it requires two joins:
select p.*, a.* -- the columns you want
from Person p left join
activity a
on a.personId = p.id left join
(select personid, max(HoursSpent) as max_hoursspent
from activity a
group by personid
) ma
on ma.personId = a.personId and
ma.max_hoursspent = a.hoursspent;
Note that this can return duplicates for a given person -- if there are ties for the maximum.
This is written more colloquially using row_number():
select p.*, a.* -- the columns you want
from Person p left join
(select a.*,
row_number() over (partition by a.personid order by a.hoursspent desc) as seqnum
from activity a
) a
on a.personId = p.id and a.seqnum = 1
ma.max_hoursspent = a.hoursspent;
Please excuse the awkward title for this question.
I am trying to keep track of the prices of some products, ideally without having a load of entries with the same product_id and price (different dates). There are 160,000 products and the update is run every day.
I have the following tables:
products
product_id, price, date_added, date_updated
(id Primary Key)
price_index
product_id, price, date_updated
(product_id, price, date_updated primary unique)
I am doing the following query, however I cannot sort the select/join by the most recent date.
INSERT INTO price_index
(SELECT p.product_id,p.product_price,p.date_updated FROM products p
LEFT JOIN price_index pi ON (p.product_id = pi.product_id)
WHERE (p.product_price <> pi.product_price OR pi.product_price IS NULL))
If I replace the JOIN with
JOIN (SELECT * FROM price_index ORDER BY date_updated DESC) pi
It will do the correct sorting first, however this doesn't seem to use any indexed and keeps freezing my MySQL GUI tool.
Is there a better way to achieve what I am trying to do here?
UPDATE:
The update script updates the products table with the current price first. I then need to perform the check/update on the price_index table.
If I understand correctly, you want to insert a new row only when the most recent price has changed. I'm a little confused about the tables. Your query mentions three but your text only mentions two. I'll assume that product_prices and price_index are the same thing.
INSERT INTO product_prices(product_id, price, date_updated)
SELECT p.product_id, p.product_price, p.date_updated
FROM products p LEFT JOIN
price_index pi
ON p.product_id = pi.product_id AND
p.product_price = pi.product_price LEFT JOIN
(SELECT product_id, max(date_updated) as maxdu
FROM price_index
GROUP BY product_id
) pimax
ON pi.product_id = pimax.product_id and pi.date_updated = pimax.maxdu
WHERE pimax.product_id IS NULL;
This uses a left join to look for a match on both the product and the price and on the most recent
update date. If any of the three conditions do not match, then the insert takes place.
I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id
I have a query that I feel is very bulky and can do with optimisation. First thing would obviously be replacing not in subquery with join but it affects the sub-sub query that I have. I'd appreciate suggestions/workaround on it.
This is the query
SELECT *
FROM lastweeksales
WHERE productID = 1234
AND retailer NOT
IN (
SELECT retailer
FROM sales
WHERE productID
IN (
SELECT productID
FROM products
WHERE publisher = 123
)
AND DATE = date(now())
)
Basically, I want to get rows from lastweek's sales on a product where retailers are not present that did sales today but sales should only be on products by a certain publisher.
:S:S:S
You can group together 2 inner subqueries easily via INNER JOIN. For the outer one you should use LEFT OUTER join and then filter on retailer IS NULL, like this:
SELECT lws.*
FROM lastweeksales lws
LEFT JOIN (SELECT s.retailer
FROM sales s
JOIN products p USING (productID)
WHERE p.publisher = 123
AND s.date = date(now())) AS r
ON lws.retailer = r.retailer
WHERE r.retailer IS NULL;