RoR selecting newest "distinct by column" rows - mysql

I have a table which I use to log item price change over time.
I'm trying to write a method which grabs the entire set of items (without duplicates), together with their latest prices.
That means that a row with an item_id of 2 may appear several times inside my table, and a row with an item_id of 3 may appear several times inside the table etc', but the result should only include them once, with their latest price
I'm trying to figure out a way (without using Item.find_by_sql() if possible), to return the entire set of items and their latest prices.
Currently I have the following:
SELECT * FROM
(SELECT * FROM item_logs
ORDER BY created_at DESC) inner_table
GROUP BY item_id
It does work, but it seems wrong to do it like this, I guess i'm looking for a more elegant way to do this, since current implementation requires me to use find_by_sql which is not very flexible.

not sure it's any better, but another option would be use joins:
ItemLog.joins(
'join (select item_id, max(created_at) as created_at from item_logs group by 1)
as i on i.item_id = item_logs.item_id and i.created_at = item_logs.created_at'
)
longer than your find_by_sql solution, could be a more expensive query on your database, but keeps the result as an active record relation so you can chain other methods on.

Related

SQL: Correct Way to Grab First Row of a Group

I'm migrating a site to Google's Cloud SQL service, which has an odd default with ONLY_FULL_GROUP_BY, which means a common pattern that I use suddenly breaks down.
Consider the following:
SELECT `p`.`id`, `p`.`name`, `s`.`name` AS `latest_purchase`, `s`.`price` AS `latest_purchase_price`
FROM `Person` p
JOIN `Sales` s ON `s`.`person` = `p`.`ID`
GROUP BY `p`.`id`
ORDER BY `p`.`time` DESC
What I want is to get a list of results where each row is a unique person, with columns indicating the name and price of their most recent purchase.
The partial grouping behaviour I'm used to in MySQL is great for this, because it groups only on the person ID, and since my results are ordered the first row of each group is the one that I want, so I get the results that I expect.
But this isn't allowed with the SQL mode ONLY_FULL_GROUP_BY which requires that all selected items are either in the GROUP BY clause, or use aggregate functions to select a single result.
Neither of these works in the above example, because adding everything to the GROUP BY means I would get multiple results per person, while using aggregate functions could give an inaccurate result, as the sale price isn't necessarily the highest or lowest in the group (I could potentially end up with a sale name and price, neither of which is the latest).
Fortunately SQL mode is one of the settings Google SQL allows a user to change, so I've just done that for the time being (edit the instance and go to set flags).
However if I were to use another system in future where I can't group as I wish, then what is the "correct" way to do this when partial grouping isn't allowed?
I realise there are some similar questions on StackOverflow already, but none that I've found quite captures my problem (as they involve much simpler examples where aggregate functions can be used).
What you want is filtering not aggregation:
SELECT `p`.`id`, `p`.`name`, `s`.`name` AS `latest_purchase`, `s`.`price` AS `latest_purchase_price`
FROM `Person` p JOIN
`Sales` s
ON `s`.`person` = `p`.`ID`
WHERE s.time = (SELECT MAX(s2.time) FROM sales s2 WHERE s2.person = s.person)
ORDER BY `p`.`time` DESC

Query takes too long to run

I am running the below query to retrive the unique latest result based on a date field within a same table. But this query takes too much time when the table is growing. Any suggestion to improve this is welcome.
select
t2.*
from
(
select
(
select
id
from
ctc_pre_assets ti
where
ti.ctcassettag = t1.ctcassettag
order by
ti.createddate desc limit 1
) lid
from
(
select
distinct ctcassettag
from
ctc_pre_assets
) t1
) ro,
ctc_pre_assets t2
where
t2.id = ro.lid
order by
id
Our able may contain same row multiple times, but each row with different time stamp. My object is based on a single column for example assettag I want to retrieve single row for each assettag with latest timestamp.
It's simpler, and probably faster, to find the newest date for each ctcassettag and then join back to find the whole row that matches.
This does assume that no ctcassettag has multiple rows with the same createddate, in which case you can get back more than one row per ctcassettag.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
INNER JOIN
(
SELECT
ctcassettag,
MAX(createddate) AS createddate
FROM
ctc_pre_assets
GROUP BY
ctcassettag
)
newest
ON newest.ctcassettag = ctc_pre_assets.ctcassettag
AND newest.createddate = ctc_pre_assets.createddate
ORDER BY
ctc_pre_assets.id
EDIT: To deal with multiple rows with the same date.
You haven't actually said how to pick which row you want in the event that multiple rows are for the same ctcassettag on the same createddate. So, this solution just chooses the row with the lowest id from amongst those duplicates.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
WHERE
ctc_pre_assets.id
=
(
SELECT
lookup.id
FROM
ctc_pre_assets lookup
WHERE
lookup.ctcassettag = ctc_pre_assets.ctcassettag
ORDER BY
lookup.createddate DESC,
lookup.id ASC
LIMIT
1
)
This does still use a correlated sub-query, which is slower than a simple nested-sub-query (such as my first answer), but it does deal with the "duplicates".
You can change the rules on which row to pick by changing the ORDER BY in the correlated sub-query.
It's also very similar to your own query, but with one less join.
Nested queries are always known to take longer time than a conventional query since. Can you append 'explain' at the start of the query and put your results here? That will help us analyse the exact query/table which is taking longer to response.
Check if the table has indexes. Unindented tables are not advisable(until unless obviously required to be unindented) and are alarmingly slow in executing queries.
On the contrary, I think the best case is to avoid writing nested queries altogether. Bette, run each of the queries separately and then use the results(in array or list format) in the second query.
First some questions that you should at least ask yourself, but maybe also give us an answer to improve the accuracy of our responses:
Is your data normalized? If yes, maybe you should make an exception to avoid this brutal subquery problem
Are you using indexes? If yes, which ones, and are you using them to the fullest?
Some suggestions to improve the readability and maybe performance of the query:
- Use joins
- Use group by
- Use aggregators
Example (untested, so might not work, but should give an impression):
SELECT t2.*
FROM (
SELECT id
FROM ctc_pre_assets
GROUP BY ctcassettag
HAVING createddate = max(createddate)
ORDER BY ctcassettag DESC
) ro
INNER JOIN ctc_pre_assets t2 ON t2.id = ro.lid
ORDER BY id
Using normalization is great, but there are a few caveats where normalization causes more harm than good. This seems like a situation like this, but without your tables infront of me, I can't tell for sure.
Using distinct the way you are doing, I can't help but get the feeling you might not get all relevant results - maybe someone else can confirm or deny this?
It's not that subqueries are all bad, but they tend to create massive scaleability issues if written incorrectly. Make sure you use them the right way (google it?)
Indexes can potentially save you for a bunch of time - if you actually use them. It's not enough to set them up, you have to create queries that actually uses your indexes. Google this as well.

SELECT MIN (MAX) after SELECTING MIN value in MySQL

I have two tables: items and prices (one-to-many)
Each item has a default price, but this price can be overriden in this second table (under some circumstances).
Firstly I had a problem of fetching all the items and pre-calculate the MINIMUM PRICE - MIN between default and its overriding current price (if any ?).
You can see it here: http://sqlfiddle.com/#!2/f31d5/25
Luckily, it was solved, thanks to stackoverflow (you can see it here: Rails select subquery (without finder_sql, if possible)), but now I have similar kind of a problem!
Once I have all the items selected, can I determine the absolute MIN (or MAX) price AMONG the newly calculated field (min_price) ?
Of course I've tried something like this: http://sqlfiddle.com/#!2/f31d5/26
But, it did not work. Is there any smart SQL-true way to get these values ?
For this particular scenario in SQLFiddle it should return 5 (MIN), or 500 (MAX)
Of course I could have selected it straight-away from pricing table like this:
SELECT MIN(price) FROM prices;
But, I cannot rely on it, since item's default price might be lower and this way I cannot check it (I think, SELECT MIN/MAX does not work with JOIN or GROUP BY).
And also one thing to be aware of - I'm writing this for my search system and this is just a small part of it, so, "WHERE"-clause is pretty important there as well, because NOT all items are actually involved.
Is there any SMART way (SQL-way) I can solve this problem?
P.S Temporarily I've solved the problem simply by making a query that orders by min_price (ASC/DESC) and applying LIMIT 1 and then gathering the needed value from the item record itself, but I'm very unhappy with this solution, so I've decided to ask about it here.
Thanks
P.P.S Totally forgot to mention, that there is no way I can avoid this SQL-query, because I have a pagination system that actually appends LIMIT X, OFFSET Y. So, I need to select the value GLOBALLY, not just for a particular page!
You don't need the second call to min(). I think this does what you want:
SELECT LEAST(IFNULL(MIN(prices.price),
items.default_price),
default_price) as min_price
FROM items LEFT OUTER JOIN
prices
ON prices.item_id = items.id
GROUP BY items.id;
If you want the lowest price across all items, you can remove the group by clause. However, I think a subquery is clearer about the intention:
select min(min_price)
from (SELECT LEAST(IFNULL(MIN(prices.price),
items.default_price),
default_price) as min_price
FROM items LEFT OUTER JOIN
prices
ON prices.item_id = items.id
GROUP BY items.id
) p;

MYSQL rows to columns using CASE, GROUP BY... what about different versions?

This is my first post here.
My question is similar to a previous thread albeit different:
mysql converting multiple rows into columns in a single row
What I have is really a large form. There are many forms (sheets, really) and each has the same setup. Each form has labels and values, but the values in the forms can be changed and the forms only display the ''latest'' values. The database has a few tables but those important here are the field_labels and the field_values. These two are linked as one might suspect and, the field_value table has a ''date'' column.
Now, what I wan't to do is to select the field_label.id, and the latest value (field_value.fv_value). First I thought this might work fine with CASE but the problem is, CASE stops searching the table immediately after it finds a hit that matches, I want to select the latest hit, not just the first one matched.
The only good idea I had so far is to use a subquery and reform the value table by ordering it first by the (linked) id of the labels, and then by the ''date'' of the value. Here's what I got
SELECT T.msdsid,
field_label.id,
(CASE WHEN field_label.id = 1 THEN T.fv_value ELSE NULL END) AS value
FROM (SELECT * FROM field_value ORDER BY field_value.fl_id,field_value.date DESC) AS T
LEFT JOIN field_label ON(T.fl_id=field_label.id)
GROUP BY T.refid;
Now, this does do exactly what I want, but... is there a better way?
Thanks in advance.
This query will show you the latest values (record) for each field_value.fl_id:
SELECT fv1.* FROM field_value fv1
JOIN (SELECT fl_id, MAX(date) date FROM field_value GROUP BY fl_id) fv2
ON fv1.fl_id = fv2.fl_id AND fv1.date = fv2.date;
Try this query, play with it, and add it into your query.

Will grouping an ordered table always return the first row? MYSQL

I'm writing a query where I group a selection of rows to find the MIN value for one of the columns.
I'd also like to return the other column values associated with the MIN row returned.
e.g
ID QTY PRODUCT TYPE
--------------------
1 2 Orange Fruit
2 4 Banana Fruit
3 3 Apple Fruit
If I GROUP this table by the column 'TYPE' and select the MIN qty, it won't return the corresponding product for the MIN row which in the case above is 'Apple'.
Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
Thanks in advance.
Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
No, this is not the correct approach.
I believe you are talking about a query like this:
SELECT product.*, MIN(qty)
FROM product
GROUP BY
type
ORDER BY
qty
What you are doing here is using MySQL's extension that allows you to select unaggregated/ungrouped columns in a GROUP BY query.
This is mostly used in the queries containing both a JOIN and a GROUP BY on a PRIMARY KEY, like this:
SELECT order.id, order.customer, SUM(price)
FROM order
JOIN orderline
ON orderline.order_id = order.id
GROUP BY
order.id
Here, order.customer is neither grouped nor aggregated, but since you are grouping on order.id, it is guaranteed to have the same value within each group.
In your case, all values of qty have different values within the group.
It is not guaranteed from which record within the group the engine will take the value.
You should do this:
SELECT p.*
FROM (
SELECT DISTINCT type
FROM product p
) pd
JOIN p
ON p.id =
(
SELECT pi.id
FROM product pi
WHERE pi.type = pd.type
ORDER BY
type, qty, id
LIMIT 1
)
If you create an index on product (type, qty, id), this query will work fast.
It's difficult to follow you properly without an example of the query you try.
From your comments I guess you query something like,
SELECT ID, COUNT(*) AS QTY, PRODUCT_TYPE
FROM PRODUCTS
GROUP BY PRODUCT_TYPE
ORDER BY COUNT(*) DESC;
My advice, you group by concept (in this case PRODUCT_TYPE) and you order by the times it appears count(*). The query above would do what you want.
The sub-queries are mostly for sorting or dismissing rows that are not interested.
The MIN you look is not exactly a MIN, it is an occurrence and you want to see first the one who gives less occurrences (meaning appears less times, I guess).
Cheers,