I'll try and elaborate my question in the simplest form possible. I'm creating an e-commerce website and I'm trying to set up filters for the search result. Everything has been just fine, until I ran into this following issue. When I try to group by id I lose all the other foreign keys to the specification value and if I remove the group by I get a ton of results, each containing own foreign key to the value. (I'm inner joining each article with respective specification values). My question therefore is how do I filter all of these specification ids?
My query (translated and simplified) is as follows:
select * from article
left join article_specification on article_specification.fk_articles=article.id
where (fk_specification_value=172 or fk_specification_value=175 or fk_specification_value=184)
group by id order by date desc
By running this query I get 1 result (hence the group by), however if I don't group I can't really do anything with that result set. If I change the ORS into ANDS in the query, I get nothing, since there is only 1 value. That's the root of my question. Thanks in advance, sorry if my question was a little bit poorly sentenced.
If you want articles that meets all specifications, you can do:
select a.*
from article a join
article_specification ars
on ars.fk_articles = a.id
where ars.fk_specification_value in (172, 175, 184)
group by a.id
having count(distinct ars.fk_specification_value) = 3
order by a.date desc;
Note: In general, I discourage the use select * with group by. In this case it is okay because a.id is (presumably) the primary key in article. In fact, this use of group by is supported by the ANSI standard. However, interpreting the language in the standard requires understanding what a "functional dependency" is.
If you want articles that meet any of the specifications, you could use the above query without the having clause. However, an exists is more appropriate:
select a.*
from article a
where exists (select 1
from article_specification ars
where ars.fk_articles = a.id and
ars.fk_specification_value in (172, 175, 184)
);
Related
I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:
select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;
However, I'm immediately running into
"column must appear in the GROUP BY clause or be used in an aggregate function."
I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.
What am I doing wrong with this query?
Expected result: table with 3 columns which displays the highest-rated movie of each year.
If you want the maximum entry per year, then you should do something like this:
select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
r.year > 1989;
In other words, group by is the wrong approach to writing this query.
I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.
While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.
If you want to display all the columns you can user window function like :
select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural
join primary_names natural join ratings
where title_type = 'film' and year > 1989;
I know other posts talk about this, but I haven't been able to apply anything to this situation.
This is what I have so far.
SELECT *
FROM ccParts, ccChild, ccFamily
WHERE parGC = '26' AND
parChild = chiId AND
chiFamily = famId
ORDER BY famName, chiName
What I need to do is see the total number of ccParts with the same ccFamily in the results. Then, sort by the total.
It looks like this is close to what you want:
SELECT f.famId, f.famName, pc.parCount
FROM (
SELECT c.chiFamily AS famId, count(*) AS parCount
FROM
ccParts p
JOIN ccChild c ON p.parChild = c.chiId
WHERE p.parGC ='26'
GROUP BY c.chiFamily
) pc
JOIN ccFamily f ON f.famId = pc.famId
ORDER BY pc.parCount
The inline view (between the parentheses) is the headliner: it does your grouping and counting. Note that you do not need to join table ccFamily there to group by family, as table ccChild already carries the family information. If you don't need the family name (i.e. if its ID were sufficient), then you can stick with the inline view alone, and there ORDER BY count(*). The outer query just associates family name with the results.
Additionally, MySQL provides a non-standard mechanism by which you could combine the outer query with the inline view, but in this case it doesn't help much with either clarity or concision. The query I provided should be accepted by any SQL implementation, and it's to your advantage to learn such syntax and approaches first.
In the SELECT, add something like count(ccParts) as count then ORDER BY count instead? Not sure about the structure of your tables so you might need to improvise.
In the article Why Arel?, the author poses the problem:
Suppose we have a users table and a photos table and we want to select all user data and a *count* of the photos they have created.
His proposed solution (with a line break added) is
SELECT users.*, photos_aggregation.cnt
FROM users
LEFT OUTER JOIN (SELECT user_id, count(*) as cnt FROM photos GROUP BY user_id)
AS photos_aggregation
ON photos_aggregation.user_id = users.id
When I attempted to write such a query, I came up with
select users.*, if(count(photos.id) = 0, null, count(photos.id)) as cnt
from users
left join photos on photos.user_id = users.id
group by users.id
(The if() in the column list is just to get it to behave the same when a user has no photos.)
The author of the article goes on to say
Only advanced SQL programmers know how to write this (I’ve often asked this question in job interviews I’ve never once seen anybody get it right). And it shouldn’t be hard!
I don't consider myself an "advanced SQL programmer", so I assume I'm missing something subtle. What am I missing?
I believe your version would produce an error, at least in some database engines. In MSSQL your select would generate [Column Name] is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.. This is because you select can only contain values in the group by or the count.
You could modify your version to select users.id, count(photo.id) and it would work, but it would not be the same result as his query.
I would not say you have to be particularly advanced to come up with a working solution (or the specific solution he came up with) but it is necessary to do the group in a separate query either in the join or as #ron tornambe suggests.
In most DBMSs (MySQL and Postgres are exceptions) the version in your question would be invalid.
You would need to write the query which does not use the derived table as
select users.*, CASE WHEN count(photos.id) > 0 THEN count(photos.id) END as cnt
from users
left join photos on photos.user_id = users.id
group by users.id, users.name, users.email /* and so on*/
MySQL allows you to select non aggregated items that are not in the group by list but this is only safe if they are functionally dependant on the column(s) in the group by.
Whilst the group by list is more verbose without the derived table I would expect most optimisers to be able to transform one to the other anyway. Certainly in SQL Server if it sees you are grouping by the PK and some other columns it doesn't actually do group by comparisons on those other columns.
Some discussion about this MySQL behaviour vs standard SQL is in Debunking GROUP BY myths
Maybe the author of the article is wrong. Your solution works as well, and it may very well be faster.
Personally, I would drop the if alltogether. If you want to count the number of pictures, it makes sense that 'no pictures' results in 0 rather than null.
As an alternative, you can also write a correlated sub-query:
SELECT u.*, (SELECT Count(*) FROM photos p WHERE p.userid=u.id) as cnt
FROM users u
I have been using stackoverflow vastly during the last year - an excellent source w/ great contributors. Now it's my time to request for help.
The setup is normal:
Orders, OrderArticles and Articles
I want to get the total amount of articles sold during the last year, but only during the best 5 weeks.
Never mind the WEEK-function and UNIXTIME-blah blah - I've got that covered. My question is wether it's possible or not to do without resorting to stored procedures or functions.
I have created a subquery for the summary for each week and article and order the result by the sum descendingly. Now - I only have to LIMIT the query to 5. Easy, but I also have to filter the result on the ArticleID BUT since I'm inside a subquery I don't have access to the outer ArticleID and it doesn't help to JOIN the result - it's too late ;-)
The syntax (hard to understand w/o the actual sql, right...?)
SELECT a.ID, [more fields], omg.total
FROM Articles AS a
LEFT JOIN
(
SELECT weeklytotals.articleID, weeklytotals.total
FROM
(
SELECT SUM(ra.quantity) AS total, ra.articleID AS articleID
FROM OrderArticles ra
INNER JOIN Orders r
ON ra.orderID = r.ID
WHERE r.timeCreated >= UNIX_TIMESTAMP('2011-06-30')
GROUP BY ra.articleID, WEEK(FROM_UNIXTIME(r.timeCreated))
ORDER BY SUM(ra.quantity) DESC
) AS weeklytotals
WHERE omg.articleID = a.ID --<-- THIS IS NOT WORKING BUT NECESSARY!
LIMIT 0, 5
) AS omg
ON omg.articleID = a.ID
WHERE a.isEnabled = 1 --more WHERE-thingys
This here returns the top 5 articles and ties the them to the correct Article. yay.
I have left out the SUM-function (which could go into the omg-SELECT).
Do you understand? Do I understand what I want? Yes, of course we do!
Thanx in advance.
Edit: The conditions have been changed - which makes my life easier, but I still would like to know if there is a solution to the problem.
If you require the omg subquery to use data from the a table, place it into the SELECT part not the FROM part. Using terms from the mysql documentation, you want the result of a correlated subquery to appear as a scalar operand in the outer result set.
You wrote about being interested in the sum, i.e. only a single number per article, although you left out the SUM from your example query. My approach relies on that sum, and would probably break in a bad way if you really needed distinct values for each of the best five weeks.
SELECT a.ID, [more fields], IFNULL(SUM(
(
SELECT SUM(ra.quantity) AS total
FROM OrderArticles ra
INNER JOIN Orders r
ON ra.orderID = r.ID
WHERE ra.articleID = a.ID -- <-- reference a.ID here
AND r.timeCreated >= UNIX_TIMESTAMP('2011-06-30')
GROUP BY WEEK(FROM_UNIXTIME(r.timeCreated))
ORDER BY SUM(ra.quantity) DESC
LIMIT 0, 5
)), 0) AS total
FROM Articles AS a
WHERE a.isEnabled = 1 --more WHERE-thingys
GROUP BY a.ID
I'm not saying anything about performance here. Placing the subquery this way, it will be executed for every row of the result set. So it might be too slow for practical use if you have a large number of articles. But if that should happen, I doubt that stored procedures or similar tricks would fare any better.
Edit: I found out that my original suggestion, which used subquerys nested two levels deep, doesn't allow access the innermost subquery to use a column of the outermost. But toying with this on sqlfiddle I also found out that one may safely pass the result of a subquery to sum, thus avoiding one level of nesting. So the above code nos has actually been checked and executed by a MySQL server, and should therefore work as intended.
Is there any way to write a greatest-n-per-group query in HQL (or potentially with Hibernate Criteria) in a single query?
I'm struggling with a problem that's similar to this:
Schema:
Book has a publication_date
Book has an Author
Author has a Publisher
I have a Publisher in hand, as well as a search date. For all of that Publisher's Authors, I want to find the book that they published most recently before the search date.
I've been trying to get this working (and have looked at many of the other questions under hibernate as well as greatest-n-per-group) but haven't found any examples that work.
In straight MySQL, I'm able to use a subselect to get this:
select * from
(select
a.id author_id,
b.id book_id,
b.publication_date publication_date
from book b
join author a on a.id = b.author_id
join publisher p on p.id = a.publisher_id
where
b.publication_date <= '2011-07-01'
and p.id = 2
order by b.publication_date desc) as t
group by
t.author_id
This forces the order by to happen first in the subquery, then the group by happens afterwards and picks the most recently published book as the first item to group by. I get the author ID paired with the book ID and publication date. (I know this isn't a particularly efficient way to do it in SQL, this is just an example of one way to do it in native SQL that's easy to understand).
In hibernate, I've been unable to construct a subquery that emulates this. I also haven't had any luck with trying to use a having clause like this, it returns zero results. If I remove the having clause, it returns the first book in the database (based on it's ID) as the group by happens before the order by:
Book.executeQuery("""
select b.author, b
from Book b
where b.publicationDate <= :date
and b.author.publisher = :publisher
group by b.author
having max(b.publicationDate) = b.publicationDate
order by py.division.id
""", [date: date, publisher: publisher])
Is there any way to get hibernate to do what I want without having to spin through objects in memory or dive back down to raw SQL?
This is against a MySQL database and using Hibernate through Grails if it matters.
For that, you need a SQL window function. There is no way to do it in Hibernate/HQL, HQL doesn't support window functions.
greatest-n-per-group tag has the correct answers. For instance, this approach is pretty readable, though not always optimal.