I need to use an alias in the WHERE clause, but It keeps telling me that its an unknown column. Is there any way to get around this issue? I need to select records that have a rating higher than x. Rating is calculated as the following alias:
sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
You could use a HAVING clause, which can see the aliases, e.g.
HAVING avg_rating>5
but in a where clause you'll need to repeat your expression, e.g.
WHERE (sum(reviews.rev_rating)/count(reviews.rev_id))>5
BUT! Not all expressions will be allowed - using an aggregating function like SUM will not work, in which case you'll need to use a HAVING clause.
From the MySQL Manual:
It is not allowable to refer to a
column alias in a WHERE clause,
because the column value might not yet
be determined when the WHERE clause
is executed. See Section B.1.5.4,
“Problems with Column Aliases”.
I don't know if this works in mysql, but using sqlserver you can also just wrap it like:
select * from (
-- your original query
select .. sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
from ...) Foo
where Foo.avg_rating ...
This question is quite old and one answer already gained 160 votes...
Still I would make this clear: The question is actually not about whether alias names can be used in the WHERE clause.
sum(reviews.rev_rating) / count(reviews.rev_id) as avg_rating
is an aggregation. In the WHERE clause we restrict records we want from the tables by looking at their values. sum(reviews.rev_rating) and count(reviews.rev_id), however, are not values we find in a record; they are values we only get after aggregating the records.
So WHERE is inappropriate. We need HAVING, as we want to restrict result rows after aggregation. It can't be
WHERE avg_rating > 10
nor
WHERE sum(reviews.rev_rating) / count(reviews.rev_id) > 10
hence.
HAVING sum(reviews.rev_rating) / count(reviews.rev_id) > 10
on the other hand is possible and complies with the SQL standard. Whereas
HAVING avg_rating > 10
is only possible in MySQL. It is not valid SQL according to the standard, as the SELECT clause is supposed to get executed after HAVING. From the MySQL docs:
Another MySQL extension to standard SQL permits references in the HAVING clause to aliased expressions in the select list.
The MySQL extension permits the use of an alias in the HAVING clause for the aggregated column
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
SELECT * FROM (SELECT customer_Id AS 'custId', gender, age FROM customer
WHERE gender = 'F') AS c
WHERE c.custId = 100;
If your query is static, you can define it as a view then you can use that alias in the where clause while querying the view.
Related
In MySQL, I observed that a statement which uses an AGGREGATE FUNCTION in SELECT list gets executed though there is no GROUP BY clause. Other RDBMS products like SQL Server throw an error if we do so.
For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3. The result of the above query is a single row.
Can anyone please tell why does this happen with MySQL?
Thanks in advance!!
It's by design - it's one of many extensions to the standard that MySQL permits.
For a query like SELECT name, MAX(age) FROM t; the reference docs says that:
Without GROUP BY, there is a single group and it is indeterminate
which name value to choose for the group
See the documentation on group by handling for more information.
The setting ONLY_FULL_GROUP_BY controls this behavior, see 5.1.7 Server SQL Modes enabling this would disallow a query with an aggregate function lacking a group by statement and it's enabled by default from MySQL version 5.7.5.
You have two points in your question:
Select with mixed with aggregated and not aggregated columns (which not presented in GROUP BY)
Select with aggregated columns without GROUP BY.
First one described well in #jpw answer.
The second one is possible by SQL standard. And result of this query consists of one row.
a) If T is not a grouped table, then
Case:
i) If the <select list> contains a <set function specifica-
tion> that contains a reference to a column of T or di-
rectly contains a <set function specification> that does
not contain an outer reference, then T is the argument or
argument source of each such <set function specification>
and the result of the <query specification> is a table con-
sisting of 1 row. The i-th value of the row is the value
specified by the i-th <value expression>.
set function means aggregate function.
P.S. result that query over empty table consists of one row with nulls (this is the difference between GROUP BY NULL query and query with out GROUP BY at all).
A quote from the MySQL documentation, the page about the aggregate functions:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
If you want a GROUP BY clause on your query then append GROUP BY NULL to it. I cannot tell about other RDBMS-es but on MySQL this is valid syntax. It works the same as the query without it.
Remarks about your query
A quote from your question:
"For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3."
The part with "the first row" is not something to rely on. It just happens most of the times that you get the first row.
Your query selects the columns col1 and col2 that are neither aggregate values nor functionally dependent on the columns in the GROUP BY clause. The query is not valid according to the SQL standard. MySQL allows it but its execution is undefined behaviour and the documentation about the handling of GROUP BY clearly states that:
... the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate...
I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.
According to w3schools group by needs an aggregate_function(column_name).
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
What it will happen if you commit this aggregate_function?
Thanks
If you are looking for best practice, do not use GROUP BY unless you are using an aggregate function. MySQL is unique in that it allows you to specify additional columns without using aggregates. This is not found in other DBMS's.
MySQL extends the use of GROUP BY to permit selecting fields that are not mentioned in the GROUP BY clause.
The other effect that GROUP BY has is it's sorting property.
If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns.
If you do not include any aggregate functions, it will most likely show the DISTINCT values of the grouped column.
Depending on MySQL version and ONLY_FULL_GROUP_BY mode, it either rejects the query or picks any value from each group (it is undefined which one it chooses).
Here is a thorough description on this: MySQL Handling of GROUP BY
Someone sent me a SQL query where the GROUP BY clause consisted of the statement: GROUP BY 1.
This must be a typo right? No column is given the alias 1. What could this mean? Am I right to assume that this must be a typo?
It means to group by the first column of your result set regardless of what it's called. You can do the same with ORDER BY.
SELECT account_id, open_emp_id
^^^^ ^^^^
1 2
FROM account
GROUP BY 1;
In above query GROUP BY 1 refers to the first column in select statement which is
account_id.
You also can specify in ORDER BY.
Note : The number in ORDER BY and GROUP BY always start with 1 not with 0.
In addition to grouping by the field name, you may also group by ordinal, or position of the field within the table. 1 corresponds to the first field (regardless of name), 2 is the second, and so on.
This is generally ill-advised if you're grouping on something specific, since the table/view structure may change. Additionally, it may be difficult to quickly comprehend what your SQL query is doing if you haven’t memorized the table fields.
If you are returning a unique set, or quickly performing a temporary lookup, this is nice shorthand syntax to reduce typing. If you plan to run the query again at some point, I’d recommend replacing those to avoid future confusion and unexpected complications (due to scheme changes).
It will group by first field in the select clause
That means *"group by the 1st column in your select clause". Always use GROUP BY 1 together with ORDER BY 1.
You can also use GROUP BY 1,2,3... It is convenient, but you need to pay attention to that condition; the result may not be what you want if someone has modified your select columns and it's not visualized.
It will group by the column position you put after the group by clause.
for example if you run 'SELECT SALESMAN_NAME, SUM(SALES) FROM SALES GROUP BY 1'
it will group by SALESMAN_NAME.
One risk on doing that is if you run 'Select *' and for some reason you recreate the table with columns on a different order, it will give you a different result than you would expect.
I need to use an alias in the WHERE clause, but It keeps telling me that its an unknown column. Is there any way to get around this issue? I need to select records that have a rating higher than x. Rating is calculated as the following alias:
sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
You could use a HAVING clause, which can see the aliases, e.g.
HAVING avg_rating>5
but in a where clause you'll need to repeat your expression, e.g.
WHERE (sum(reviews.rev_rating)/count(reviews.rev_id))>5
BUT! Not all expressions will be allowed - using an aggregating function like SUM will not work, in which case you'll need to use a HAVING clause.
From the MySQL Manual:
It is not allowable to refer to a
column alias in a WHERE clause,
because the column value might not yet
be determined when the WHERE clause
is executed. See Section B.1.5.4,
“Problems with Column Aliases”.
I don't know if this works in mysql, but using sqlserver you can also just wrap it like:
select * from (
-- your original query
select .. sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
from ...) Foo
where Foo.avg_rating ...
This question is quite old and one answer already gained 160 votes...
Still I would make this clear: The question is actually not about whether alias names can be used in the WHERE clause.
sum(reviews.rev_rating) / count(reviews.rev_id) as avg_rating
is an aggregation. In the WHERE clause we restrict records we want from the tables by looking at their values. sum(reviews.rev_rating) and count(reviews.rev_id), however, are not values we find in a record; they are values we only get after aggregating the records.
So WHERE is inappropriate. We need HAVING, as we want to restrict result rows after aggregation. It can't be
WHERE avg_rating > 10
nor
WHERE sum(reviews.rev_rating) / count(reviews.rev_id) > 10
hence.
HAVING sum(reviews.rev_rating) / count(reviews.rev_id) > 10
on the other hand is possible and complies with the SQL standard. Whereas
HAVING avg_rating > 10
is only possible in MySQL. It is not valid SQL according to the standard, as the SELECT clause is supposed to get executed after HAVING. From the MySQL docs:
Another MySQL extension to standard SQL permits references in the HAVING clause to aliased expressions in the select list.
The MySQL extension permits the use of an alias in the HAVING clause for the aggregated column
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
SELECT * FROM (SELECT customer_Id AS 'custId', gender, age FROM customer
WHERE gender = 'F') AS c
WHERE c.custId = 100;
If your query is static, you can define it as a view then you can use that alias in the where clause while querying the view.