In the article Why Arel?, the author poses the problem:
Suppose we have a users table and a photos table and we want to select all user data and a *count* of the photos they have created.
His proposed solution (with a line break added) is
SELECT users.*, photos_aggregation.cnt
FROM users
LEFT OUTER JOIN (SELECT user_id, count(*) as cnt FROM photos GROUP BY user_id)
AS photos_aggregation
ON photos_aggregation.user_id = users.id
When I attempted to write such a query, I came up with
select users.*, if(count(photos.id) = 0, null, count(photos.id)) as cnt
from users
left join photos on photos.user_id = users.id
group by users.id
(The if() in the column list is just to get it to behave the same when a user has no photos.)
The author of the article goes on to say
Only advanced SQL programmers know how to write this (I’ve often asked this question in job interviews I’ve never once seen anybody get it right). And it shouldn’t be hard!
I don't consider myself an "advanced SQL programmer", so I assume I'm missing something subtle. What am I missing?
I believe your version would produce an error, at least in some database engines. In MSSQL your select would generate [Column Name] is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.. This is because you select can only contain values in the group by or the count.
You could modify your version to select users.id, count(photo.id) and it would work, but it would not be the same result as his query.
I would not say you have to be particularly advanced to come up with a working solution (or the specific solution he came up with) but it is necessary to do the group in a separate query either in the join or as #ron tornambe suggests.
In most DBMSs (MySQL and Postgres are exceptions) the version in your question would be invalid.
You would need to write the query which does not use the derived table as
select users.*, CASE WHEN count(photos.id) > 0 THEN count(photos.id) END as cnt
from users
left join photos on photos.user_id = users.id
group by users.id, users.name, users.email /* and so on*/
MySQL allows you to select non aggregated items that are not in the group by list but this is only safe if they are functionally dependant on the column(s) in the group by.
Whilst the group by list is more verbose without the derived table I would expect most optimisers to be able to transform one to the other anyway. Certainly in SQL Server if it sees you are grouping by the PK and some other columns it doesn't actually do group by comparisons on those other columns.
Some discussion about this MySQL behaviour vs standard SQL is in Debunking GROUP BY myths
Maybe the author of the article is wrong. Your solution works as well, and it may very well be faster.
Personally, I would drop the if alltogether. If you want to count the number of pictures, it makes sense that 'no pictures' results in 0 rather than null.
As an alternative, you can also write a correlated sub-query:
SELECT u.*, (SELECT Count(*) FROM photos p WHERE p.userid=u.id) as cnt
FROM users u
Related
Thank to your help I made a view in my database called 'people' that retrieve data using three functions called 'isUserVerified', 'hasUserPicture' and 'userHobbies' from two tables called 'users' and 'user_hobbies':
SELECT
`u`.`id` AS `id`,
`isUserVerified`(`u`.`id`) AS `verification`,
`hasUserPicture`(`u`.id) AS `profile_picture`,
`userHobbies`(`h`.`user_id`) AS `hobbies`
FROM
`people`.`users` u
INNER JOIN
`people`.`user_hobbies` h
ON
`h`.`user_id` = `u`.`id`
It returns the following output:
I realise that this is because I am joining on:
`h`.`user_id` = `u`.`id`
But it is not what I want. For each user I want to run the tree function and return if they are verified, have a profile picture and a hobby. I am expecting 10 users with the relative information. Can you help? Thank you
I don't think you need to join to hobbies at all. Your functions are doing the work for you:
SELECT u.id,
isUserVerified(u.id) AS verification,
hasUserPicture(u.id) AS profile_picture,
userHobbies(u.id) AS hobbies
FROM people.users u;
Note that user-defined functions tend to slow queries down, sometimes a lot. Functions may be a good idea in some languages, but in SQL it is better to express the logic as JOINs and GROUP BYs.
Also, there is no reason to use backticks if the identifiers don't have "bad" characters. Unnecessary backticks just make the query harder to write and read.
You can replace INNER JOIN with LEFT JOIN to see all of the users, since users table is stated on the left of the JOIN keyword, and INNER looksup for the exact match in the condition. e.g. if there's no spesific user id inserted into the hobbies table, the related row is not returned by INNER JOIN.
I'm doing what I would have expected to be a fairly straightforward query on a modified version of the imdb database:
select primary_name, release_year, max(rating)
from titles natural join primary_names natural join title_ratings
group by year
having title_category = 'film' and year > 1989;
However, I'm immediately running into
"column must appear in the GROUP BY clause or be used in an aggregate function."
I've tried researching this but have gotten confusing information; some examples I've found for this problem look structurally identical to mine, where others state that you must group every single selected parameter, which defeats the whole purpose of a group as I'm only wanting to select the maximum entry per year.
What am I doing wrong with this query?
Expected result: table with 3 columns which displays the highest-rated movie of each year.
If you want the maximum entry per year, then you should do something like this:
select r.*
from ratings r
where r.rating = (select max(r2.rating) where r2.year = r.year) and
r.year > 1989;
In other words, group by is the wrong approach to writing this query.
I would also strongly encourage you to forget that natural join exists at all. It is an abomination. It uses the names of common columns for joins. It does not even use properly declared foreign key relationships. In addition, you cannot see what columns are used for the join.
While I am it, another piece of advice: qualify all column names in queries that have more than one table reference. That is, include the table alias in the column name.
If you want to display all the columns you can user window function like :
select primary_name, year, max(rating) Over (Partition by year) as rating
from titles natural
join primary_names natural join ratings
where title_type = 'film' and year > 1989;
I'll try and elaborate my question in the simplest form possible. I'm creating an e-commerce website and I'm trying to set up filters for the search result. Everything has been just fine, until I ran into this following issue. When I try to group by id I lose all the other foreign keys to the specification value and if I remove the group by I get a ton of results, each containing own foreign key to the value. (I'm inner joining each article with respective specification values). My question therefore is how do I filter all of these specification ids?
My query (translated and simplified) is as follows:
select * from article
left join article_specification on article_specification.fk_articles=article.id
where (fk_specification_value=172 or fk_specification_value=175 or fk_specification_value=184)
group by id order by date desc
By running this query I get 1 result (hence the group by), however if I don't group I can't really do anything with that result set. If I change the ORS into ANDS in the query, I get nothing, since there is only 1 value. That's the root of my question. Thanks in advance, sorry if my question was a little bit poorly sentenced.
If you want articles that meets all specifications, you can do:
select a.*
from article a join
article_specification ars
on ars.fk_articles = a.id
where ars.fk_specification_value in (172, 175, 184)
group by a.id
having count(distinct ars.fk_specification_value) = 3
order by a.date desc;
Note: In general, I discourage the use select * with group by. In this case it is okay because a.id is (presumably) the primary key in article. In fact, this use of group by is supported by the ANSI standard. However, interpreting the language in the standard requires understanding what a "functional dependency" is.
If you want articles that meet any of the specifications, you could use the above query without the having clause. However, an exists is more appropriate:
select a.*
from article a
where exists (select 1
from article_specification ars
where ars.fk_articles = a.id and
ars.fk_specification_value in (172, 175, 184)
);
I know other posts talk about this, but I haven't been able to apply anything to this situation.
This is what I have so far.
SELECT *
FROM ccParts, ccChild, ccFamily
WHERE parGC = '26' AND
parChild = chiId AND
chiFamily = famId
ORDER BY famName, chiName
What I need to do is see the total number of ccParts with the same ccFamily in the results. Then, sort by the total.
It looks like this is close to what you want:
SELECT f.famId, f.famName, pc.parCount
FROM (
SELECT c.chiFamily AS famId, count(*) AS parCount
FROM
ccParts p
JOIN ccChild c ON p.parChild = c.chiId
WHERE p.parGC ='26'
GROUP BY c.chiFamily
) pc
JOIN ccFamily f ON f.famId = pc.famId
ORDER BY pc.parCount
The inline view (between the parentheses) is the headliner: it does your grouping and counting. Note that you do not need to join table ccFamily there to group by family, as table ccChild already carries the family information. If you don't need the family name (i.e. if its ID were sufficient), then you can stick with the inline view alone, and there ORDER BY count(*). The outer query just associates family name with the results.
Additionally, MySQL provides a non-standard mechanism by which you could combine the outer query with the inline view, but in this case it doesn't help much with either clarity or concision. The query I provided should be accepted by any SQL implementation, and it's to your advantage to learn such syntax and approaches first.
In the SELECT, add something like count(ccParts) as count then ORDER BY count instead? Not sure about the structure of your tables so you might need to improvise.
I need help with an query. I have a 'members' table and a 'comments' table.
members: userid,name,bday etc...
comments: id,userid,message,rel etc...
Untill now i used 2 queries for membersdata and commentsCount, and combined both in PHP.
My Question. Is it possible to get both (all from members && count of comments) in only one query?
This is not workung...
SELECT members.*, count(comments.*) as count
FROM members, comments
WHERE members.userid=comments.userid
group by members.userid
Does somebody know an other solution?
Here's a cleaned-up version of your query, assuming you want the userid and number of comments for each:
SELECT members.userid, count(*) as count
FROM members
INNER JOIN comments
ON members.userid = comments.userid
GROUP BY members.userid
The issues I addressed:
only selected columns that are either in the group by clause, or have an aggregate function applied to them. It is incorrect to select columns which don't satisfy either of those criteria (although MySQL allows you to do it)
replaced implicit join with explicit join, and moved join condition from where clause to on clause
replaced select ... count(comments.*) with select ... count(*). count(*) works just fine
Thank you Matt
I used your version. And im learning by the was sql-joins.
The problem with my code was the 'count(comments.*)'. Mysql does not like this!
This is working:
SELECT members.*, count(comments.rel) as count
FROM members, comments
WHERE members.userid=comments.userid
group by members.userid