What does SQL clause "GROUP BY 1" mean? - mysql

Someone sent me a SQL query where the GROUP BY clause consisted of the statement: GROUP BY 1.
This must be a typo right? No column is given the alias 1. What could this mean? Am I right to assume that this must be a typo?

It means to group by the first column of your result set regardless of what it's called. You can do the same with ORDER BY.

SELECT account_id, open_emp_id
^^^^ ^^^^
1 2
FROM account
GROUP BY 1;
In above query GROUP BY 1 refers to the first column in select statement which is
account_id.
You also can specify in ORDER BY.
Note : The number in ORDER BY and GROUP BY always start with 1 not with 0.

In addition to grouping by the field name, you may also group by ordinal, or position of the field within the table. 1 corresponds to the first field (regardless of name), 2 is the second, and so on.
This is generally ill-advised if you're grouping on something specific, since the table/view structure may change. Additionally, it may be difficult to quickly comprehend what your SQL query is doing if you haven’t memorized the table fields.
If you are returning a unique set, or quickly performing a temporary lookup, this is nice shorthand syntax to reduce typing. If you plan to run the query again at some point, I’d recommend replacing those to avoid future confusion and unexpected complications (due to scheme changes).

It will group by first field in the select clause

That means *"group by the 1st column in your select clause". Always use GROUP BY 1 together with ORDER BY 1.
You can also use GROUP BY 1,2,3... It is convenient, but you need to pay attention to that condition; the result may not be what you want if someone has modified your select columns and it's not visualized.

It will group by the column position you put after the group by clause.
for example if you run 'SELECT SALESMAN_NAME, SUM(SALES) FROM SALES GROUP BY 1'
it will group by SALESMAN_NAME.
One risk on doing that is if you run 'Select *' and for some reason you recreate the table with columns on a different order, it will give you a different result than you would expect.

Related

query on self defined variable [duplicate]

I need to use an alias in the WHERE clause, but It keeps telling me that its an unknown column. Is there any way to get around this issue? I need to select records that have a rating higher than x. Rating is calculated as the following alias:
sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
You could use a HAVING clause, which can see the aliases, e.g.
HAVING avg_rating>5
but in a where clause you'll need to repeat your expression, e.g.
WHERE (sum(reviews.rev_rating)/count(reviews.rev_id))>5
BUT! Not all expressions will be allowed - using an aggregating function like SUM will not work, in which case you'll need to use a HAVING clause.
From the MySQL Manual:
It is not allowable to refer to a
column alias in a WHERE clause,
because the column value might not yet
be determined when the WHERE clause
is executed. See Section B.1.5.4,
“Problems with Column Aliases”.
I don't know if this works in mysql, but using sqlserver you can also just wrap it like:
select * from (
-- your original query
select .. sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
from ...) Foo
where Foo.avg_rating ...
This question is quite old and one answer already gained 160 votes...
Still I would make this clear: The question is actually not about whether alias names can be used in the WHERE clause.
sum(reviews.rev_rating) / count(reviews.rev_id) as avg_rating
is an aggregation. In the WHERE clause we restrict records we want from the tables by looking at their values. sum(reviews.rev_rating) and count(reviews.rev_id), however, are not values we find in a record; they are values we only get after aggregating the records.
So WHERE is inappropriate. We need HAVING, as we want to restrict result rows after aggregation. It can't be
WHERE avg_rating > 10
nor
WHERE sum(reviews.rev_rating) / count(reviews.rev_id) > 10
hence.
HAVING sum(reviews.rev_rating) / count(reviews.rev_id) > 10
on the other hand is possible and complies with the SQL standard. Whereas
HAVING avg_rating > 10
is only possible in MySQL. It is not valid SQL according to the standard, as the SELECT clause is supposed to get executed after HAVING. From the MySQL docs:
Another MySQL extension to standard SQL permits references in the HAVING clause to aliased expressions in the select list.
The MySQL extension permits the use of an alias in the HAVING clause for the aggregated column
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
SELECT * FROM (SELECT customer_Id AS 'custId', gender, age FROM customer
WHERE gender = 'F') AS c
WHERE c.custId = 100;
If your query is static, you can define it as a view then you can use that alias in the where clause while querying the view.

Why does MySQL allow you to group by columns that are not selected

I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.

Whats wrong with this MYSQL query

I have the following SQL query , it seems to run ok , but i am concerned as my site grows it may not perform as expected ,I would like some feeback as to how effective and efficient this query really is:
select * from articles where category_id=XX AND city_id=XXX GROUP BY user_id ORDER BY created_date DESC LIMIT 10;
Basically what i am trying to achieve - is to get the newest articles by created_date limited to 10 , articles must only be selected if the following criteria are met :
City ID must equal the given value
Category ID must equal the given value
Only one article per user must be returned
Articles must be sorted by date and only the top 10 latest articles must be returned
You've got a GROUP BY clause which only contains one column, but you are pulling all the columns there are without aggregating them. Do you realise that the values returned for the columns not specified in GROUP BY and not aggregated are not guaranteed?
You are also referencing such a column in the ORDER BY clause. Since the values of that column aren't guaranteed, you have no guarantee what rows are going to be returned with subsequent invocations of this script even in the absence of changes to the underlying table.
So, I would at least change the ORDER BY clause to something like this:
ORDER BY MAX(created_date)
or this:
ORDER BY MIN(created_date)
some potential improvements (for best query performance):
make sure you have an index on all columns you querynote: check if you really need an index on all columns because this has a negative performance when the BD has to build the index. -> for more details take a look here: http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.html
SELECT * would select all columns of the table. SELECT only the ones you really require...

MySQL group by concatenated columns or functions?

This appears to work,
SELECT CONCATENATE(col1,col2) newcol,sum(othercol)
FROM mytable GROUP BY newcol.
Or even
SELECT STR_TO_DATE("%Y%m%d") as newcol,sum(othercol)
FROM mytable GROUP BY newcol.
Looking at these, one assumes the first example produces a count for each unique combination of col1,col2 as a string, the second example to produce a count for each day.
Having been bitten before by mysql e.g. silently ignoring "missing" columns in group by,
Does the above actually work, or are there any hidden gotchas ?
If you want to count rows, you should use count(*) instead of sum().
Otherwise, this pattern works fine.
(I personally use "GROUP BY 1" to signify grouping by the first column).

Can you use an alias in the WHERE clause in mysql?

I need to use an alias in the WHERE clause, but It keeps telling me that its an unknown column. Is there any way to get around this issue? I need to select records that have a rating higher than x. Rating is calculated as the following alias:
sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
You could use a HAVING clause, which can see the aliases, e.g.
HAVING avg_rating>5
but in a where clause you'll need to repeat your expression, e.g.
WHERE (sum(reviews.rev_rating)/count(reviews.rev_id))>5
BUT! Not all expressions will be allowed - using an aggregating function like SUM will not work, in which case you'll need to use a HAVING clause.
From the MySQL Manual:
It is not allowable to refer to a
column alias in a WHERE clause,
because the column value might not yet
be determined when the WHERE clause
is executed. See Section B.1.5.4,
“Problems with Column Aliases”.
I don't know if this works in mysql, but using sqlserver you can also just wrap it like:
select * from (
-- your original query
select .. sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
from ...) Foo
where Foo.avg_rating ...
This question is quite old and one answer already gained 160 votes...
Still I would make this clear: The question is actually not about whether alias names can be used in the WHERE clause.
sum(reviews.rev_rating) / count(reviews.rev_id) as avg_rating
is an aggregation. In the WHERE clause we restrict records we want from the tables by looking at their values. sum(reviews.rev_rating) and count(reviews.rev_id), however, are not values we find in a record; they are values we only get after aggregating the records.
So WHERE is inappropriate. We need HAVING, as we want to restrict result rows after aggregation. It can't be
WHERE avg_rating > 10
nor
WHERE sum(reviews.rev_rating) / count(reviews.rev_id) > 10
hence.
HAVING sum(reviews.rev_rating) / count(reviews.rev_id) > 10
on the other hand is possible and complies with the SQL standard. Whereas
HAVING avg_rating > 10
is only possible in MySQL. It is not valid SQL according to the standard, as the SELECT clause is supposed to get executed after HAVING. From the MySQL docs:
Another MySQL extension to standard SQL permits references in the HAVING clause to aliased expressions in the select list.
The MySQL extension permits the use of an alias in the HAVING clause for the aggregated column
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
SELECT * FROM (SELECT customer_Id AS 'custId', gender, age FROM customer
WHERE gender = 'F') AS c
WHERE c.custId = 100;
If your query is static, you can define it as a view then you can use that alias in the where clause while querying the view.