Rails select subquery (without finder_sql, if possible) - mysql

I have an model called Object (doesn't really matter what it is)
It has a default price (column is called "price").
And then there is a Schedule object that allows to override the price for specific dates.
I want to be able to determine the MINIMUM price (which is by definition the MINIMUM between the default and "current" price) during the SQL-query just in order to be able to ORDER BY the calculated minimum price
I want to make my search query as efficient as possible and I was wondering if I can
do something like that:
Object.select("id AS p_id, id, (SELECT MIN(`schedules`.`price`) FROM `schedules` WHERE `schedules`.`object_id` = p_id`) AS objects.min_price").limit(5)
But, it generates an odd SQL that looks like this:
SELECT `objects`.`id` AS t0_r0, `objects`.`title` AS t0_r1, `objects`.`created_at` AS t0_r2, `objects`.`updated_at` AS t0_r3, `objects`.`preferences` AS t0_r4 ........ (a lot of columns here) ... ` WHERE `objects`.`id` IN (1, 2, 3, 4 ....)
So, as you can see it doesn't work. First of all - it loads all the columns from the objects table, and second of all - it looks horrible.
The reason why I don't want to use finder_sql is that I have a lot of optional parameters and stuff, so using the AR::Relation object is highly preferred prior to fetching the results themselves.
In addition to abovementioned, I have a lot of records in the DB, and I think that loading them all into the memory is not a good idea and that is the main reason why I want to perform this subquery - just to filter-out as many records as possible.
Can someone help me how to do it more efficiently ?

You can make this easier if you generate the subquery separately and use a join instead of a correlated subquery:
subquery = Schedule.select('MIN(price) as min_price, object_id')
.group(:object_id)
.to_sql
Object.joins("JOIN (#{subquery}) schedules ON objects.p_id = schedules.object_id")
.select('objects.*, schedules.min_price')
.limit(5)

Related

SQL Query: Joining on a SUM()

I'm trying to run a query that sums the value of items and then JOIN on the value of that SUM.
So in the below code, the Contract_For is what I'm trying to Join on, but I'm not sure if that's possible.
SELECT `items_value`.`ContractId` as `Contract`,
`items_value`.`site` as `SiteID`,
SUM(`items_value`.`value`) as `Contract_For`,
`contractitemlists`.`Text` as `Contracted_Text`
FROM items_value
LEFT JOIN contractitemlists ON (`items_value`.`Contract_For`) = `contractitemlists`.`Ref`;
WHERE `items_value`.`ContractID`='2';
When I've face similar issues in the past, I've just created a view that holds the SUM, then joined to that in another view.
At the moment, the above sample is meant to work for just one dummy value, but it's intended to be stored procedure, where the user selects the ContractID. The error I get at the moment is 'Unknown Column items_value.Contract_For
You cannot use aliases or aggregate using expressions from the SELECT clause anywhere but HAVING and ORDER BY*; you need to make the first "part" a subquery, and then JOIN to that.
It might be easier to understand, though a bit oversimplified and not precisely correct, if you look at it this way as far as order of evaluation goes...
FROM (Note: JOIN is only within a FROM)
WHERE
GROUP BY
SELECT
HAVING
ORDER BY
In actual implementation, "under the hood", most SQL implementations actually use information from each section to optimize other sections (like using some where conditions to reduce records JOINed in a FROM); but this is the conceptual order that must be adhered to.
*In some versions of MSSQL, you cannot use aliases from the SELECT in HAVING or ORDER BY either.
Your query needs to be something like this:
SELECT s.*
, `cil`.`Text` as `Contracted_Text`
FROM (
SELECT `iv`.`ContractId` as `Contract`
, `iv`.`site` as `SiteID`
, SUM(`iv`.`value`) as `Contract_For`
FROM items_value AS iv
WHERE `iv`.`ContractID`='2'
) AS s
LEFT JOIN contractitemlists AS cil ON `s`.`Contract_For` = cil.`Ref`
;
But as others have mentioned, the lack of a GROUP BY is something to be looked into; as in "what if there are multiple site values."

Query takes too long to run

I am running the below query to retrive the unique latest result based on a date field within a same table. But this query takes too much time when the table is growing. Any suggestion to improve this is welcome.
select
t2.*
from
(
select
(
select
id
from
ctc_pre_assets ti
where
ti.ctcassettag = t1.ctcassettag
order by
ti.createddate desc limit 1
) lid
from
(
select
distinct ctcassettag
from
ctc_pre_assets
) t1
) ro,
ctc_pre_assets t2
where
t2.id = ro.lid
order by
id
Our able may contain same row multiple times, but each row with different time stamp. My object is based on a single column for example assettag I want to retrieve single row for each assettag with latest timestamp.
It's simpler, and probably faster, to find the newest date for each ctcassettag and then join back to find the whole row that matches.
This does assume that no ctcassettag has multiple rows with the same createddate, in which case you can get back more than one row per ctcassettag.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
INNER JOIN
(
SELECT
ctcassettag,
MAX(createddate) AS createddate
FROM
ctc_pre_assets
GROUP BY
ctcassettag
)
newest
ON newest.ctcassettag = ctc_pre_assets.ctcassettag
AND newest.createddate = ctc_pre_assets.createddate
ORDER BY
ctc_pre_assets.id
EDIT: To deal with multiple rows with the same date.
You haven't actually said how to pick which row you want in the event that multiple rows are for the same ctcassettag on the same createddate. So, this solution just chooses the row with the lowest id from amongst those duplicates.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
WHERE
ctc_pre_assets.id
=
(
SELECT
lookup.id
FROM
ctc_pre_assets lookup
WHERE
lookup.ctcassettag = ctc_pre_assets.ctcassettag
ORDER BY
lookup.createddate DESC,
lookup.id ASC
LIMIT
1
)
This does still use a correlated sub-query, which is slower than a simple nested-sub-query (such as my first answer), but it does deal with the "duplicates".
You can change the rules on which row to pick by changing the ORDER BY in the correlated sub-query.
It's also very similar to your own query, but with one less join.
Nested queries are always known to take longer time than a conventional query since. Can you append 'explain' at the start of the query and put your results here? That will help us analyse the exact query/table which is taking longer to response.
Check if the table has indexes. Unindented tables are not advisable(until unless obviously required to be unindented) and are alarmingly slow in executing queries.
On the contrary, I think the best case is to avoid writing nested queries altogether. Bette, run each of the queries separately and then use the results(in array or list format) in the second query.
First some questions that you should at least ask yourself, but maybe also give us an answer to improve the accuracy of our responses:
Is your data normalized? If yes, maybe you should make an exception to avoid this brutal subquery problem
Are you using indexes? If yes, which ones, and are you using them to the fullest?
Some suggestions to improve the readability and maybe performance of the query:
- Use joins
- Use group by
- Use aggregators
Example (untested, so might not work, but should give an impression):
SELECT t2.*
FROM (
SELECT id
FROM ctc_pre_assets
GROUP BY ctcassettag
HAVING createddate = max(createddate)
ORDER BY ctcassettag DESC
) ro
INNER JOIN ctc_pre_assets t2 ON t2.id = ro.lid
ORDER BY id
Using normalization is great, but there are a few caveats where normalization causes more harm than good. This seems like a situation like this, but without your tables infront of me, I can't tell for sure.
Using distinct the way you are doing, I can't help but get the feeling you might not get all relevant results - maybe someone else can confirm or deny this?
It's not that subqueries are all bad, but they tend to create massive scaleability issues if written incorrectly. Make sure you use them the right way (google it?)
Indexes can potentially save you for a bunch of time - if you actually use them. It's not enough to set them up, you have to create queries that actually uses your indexes. Google this as well.

Getting different results from group by and distinct

this is my first post here since most of the time I already found a suitable solution :)
However this time nothing seems to help properly.
Im trying to migrate information from some mysql Database I have just read-only access to.
My problem is similar to this one: Group by doesn't give me the newest group
I also need to get the latest information out of some tables but my tables have >300k entries therefore checking whether the "time-attribute-value" is the same as in the subquery (like suggested in the first answer) would be too slow (once I did "... WHERE EXISTS ..." and the server hung up).
In addition to that I can hardly find the important information (e.g. time) in a single attribute and there never is a single primary key.Until now I did it like it was suggested in the second answer by joining with subquery that contains latest "time-attribute-entry" and some primary keys but that gets me in a huge mess after using multiple joins and unions with the results.
Therefore I would prefer using the having statement like here: Select entry with maximum value of column after grouping
But when I tried it out and looked for a good candidate as the "time-attribute" I noticed that this queries give me two different results (more = 39721, less = 37870)
SELECT COUNT(MATNR) AS MORE
FROM(
SELECT DISTINCT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG
FROM
FKT_LAB
) AS TEMP1
SELECT COUNT(MATNR) AS LESS
FROM(
SELECT
LAB_MTKNR AS MATNR,
LAB_STG AS FACH,
LAB_STGNR AS STUDIENGANG,
LAB_PDATUM
FROM
FKT_LAB
GROUP BY
LAB_MTKNR,
LAB_STG,
LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
)AS TEMP2
Although both are applied to the same table and use "GROUP BY" / "SELECT DISTINCT" on the same entries.
Any ideas?
If nothing helps and I have to go back to my mess I will use string variables as placeholders to tidy it up but then I lose the overview of how many subqueries, joins and unions I have in one query... how many temproal tables will the server be able to cope with?
Your second query is not doing what you expect it to be doing. This is the query:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG, LAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
HAVING LAB_PDATUM = MAX(LAB_PDATUM)
) TEMP2;
The problem is the having clause. You are mixing an unaggregated column (LAB_PDATUM) with an aggregated value (MAX(LAB_PDATAUM)). What MySQL does is choose an arbitrary value for the column and compare it to the max.
Often, the arbitrary value will not be the maximum value, so the rows get filtered. The reference you give (although an accepted answer) is incorrect. I have put a comment there.
If you want the most recent value, here is a relatively easy way:
SELECT COUNT(MATNR) AS LESS
FROM (SELECT LAB_MTKNR AS MATNR, LAB_STG AS FACH, LAB_STGNR AS STUDIENGANG,
max(LAB_PDATUM) as maxLAB_PDATUM
FROM FKT_LAB
GROUP BY LAB_MTKNR, LAB_STG, LAB_STGNR
) TEMP2;
It does not, however, affect the outer count.

Need a SQL Server 2008 case statement to evaluate a table and return two values

I'm a novice SQL programmer and have been banging my head against this all morning, so please bear with me. My situation is this: I have a table of SKUs that need to be sent to our eCommerce website. Each of these SKUs has a 'quantity', an 'active' value, and a 'discontinued' value. This was easy enough to handle when we were dealing with one SKU at a time, but now I have to send kits, which contain one or more SKUs.
For example, if my Kit's ID is 000920_001449_001718_999999 (a combination of four SKUs) I need to collect data for the entire set of SKUs like so:
Here's the logic I need to incorporate:
If any of the SKUs have null or WEBNO as an IsActive value, the entire kit must return WEBNO. Otherwise, return WEBYES.
If any of the SKUs have null or '1' as an IsDiscontinued value, the entire kit must return IsDiscontinued = '1'. Otherwise, return a 0.
My code is a bit of a mess, but here's what I've managed so far:
SELECT
CASE WHEN 'WEBNO' in
(
SELECT IsActive
FROM #SkusToSend as Sending
RIGHT JOIN
(
SELECT * FROM [eCommerce].[dbo].[Split] (
'000920_001449_001718_999999'
,'_')
) as SplitSkus
on Sending.SKU = SplitSkus.items
) THEN 'WEBNO'
ELSE 'WEBYES'
END
My question is this: Is it possible to write a statement that parses through my example table, returning only one row of 'IsActive' and 'IsDiscontinued'? I've tried using GROUP BY and HAVING statements on those fields, but always get multiple rows returned.
The code I have handles the WEBNO value, but not NULL, and doesn't even start to take into consideration the IsDiscontinued field yet. Is there a concise way to parse this together, or a better way to handle this type of problem?
I think a combination of ISNULL and MIN / MAX should do the trick:
SELECT
MIN(ISNULL(sending.IsActive, 'WEBNO')) AS IsActive,
MAX(ISNULL(sending.IsDiscontinuted, 1)) AS IsDiscontinuted
FROM
(
SELECT * FROM [eCommerce].[dbo].[Split] (
'000920_001449_001718_999999'
,'_')
) AS SplitSkus
LEFT JOIN #SkusToSend AS Sending
AS Sending.SKU = SplitSkus.items
I think this would be easier if you had a working example of some sample data in those tables. From guessing it looks like you have a table function splitting a string apart and giving multiple rows. You have some temp table that right joins to that so that is taking the function and essentially returning all rows it gets even if there are nulls in the temp table. This could return multiple rows as if you have a condition where you expect a single entity on a left or right join and there is a null at times you will get multiples. Or if you have a value repeated you will get multiples. You would have to ensure that you get one one result I am believing from your
Case when 'WEBNO' in
(
As while the logic may be correct to return the 'WEBNO' answer, it may be repeating the row result multiple times as the engine may interpret 'this happened' once, twice, three times. You could alleviate this by potentially doing a
'Select Distinct IsActive'
Which will make the expression return only a single result that is distinct for that column return.
Again this would be easier if we could see examples of what data those objects contained but this would be my guess.

RoR selecting newest "distinct by column" rows

I have a table which I use to log item price change over time.
I'm trying to write a method which grabs the entire set of items (without duplicates), together with their latest prices.
That means that a row with an item_id of 2 may appear several times inside my table, and a row with an item_id of 3 may appear several times inside the table etc', but the result should only include them once, with their latest price
I'm trying to figure out a way (without using Item.find_by_sql() if possible), to return the entire set of items and their latest prices.
Currently I have the following:
SELECT * FROM
(SELECT * FROM item_logs
ORDER BY created_at DESC) inner_table
GROUP BY item_id
It does work, but it seems wrong to do it like this, I guess i'm looking for a more elegant way to do this, since current implementation requires me to use find_by_sql which is not very flexible.
not sure it's any better, but another option would be use joins:
ItemLog.joins(
'join (select item_id, max(created_at) as created_at from item_logs group by 1)
as i on i.item_id = item_logs.item_id and i.created_at = item_logs.created_at'
)
longer than your find_by_sql solution, could be a more expensive query on your database, but keeps the result as an active record relation so you can chain other methods on.