Can i use MAX() without GROUPBY()? [duplicate] - mysql

In MySQL, I observed that a statement which uses an AGGREGATE FUNCTION in SELECT list gets executed though there is no GROUP BY clause. Other RDBMS products like SQL Server throw an error if we do so.
For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3. The result of the above query is a single row.
Can anyone please tell why does this happen with MySQL?
Thanks in advance!!

It's by design - it's one of many extensions to the standard that MySQL permits.
For a query like SELECT name, MAX(age) FROM t; the reference docs says that:
Without GROUP BY, there is a single group and it is indeterminate
which name value to choose for the group
See the documentation on group by handling for more information.
The setting ONLY_FULL_GROUP_BY controls this behavior, see 5.1.7 Server SQL Modes enabling this would disallow a query with an aggregate function lacking a group by statement and it's enabled by default from MySQL version 5.7.5.

You have two points in your question:
Select with mixed with aggregated and not aggregated columns (which not presented in GROUP BY)
Select with aggregated columns without GROUP BY.
First one described well in #jpw answer.
The second one is possible by SQL standard. And result of this query consists of one row.
a) If T is not a grouped table, then
Case:
i) If the <select list> contains a <set function specifica-
tion> that contains a reference to a column of T or di-
rectly contains a <set function specification> that does
not contain an outer reference, then T is the argument or
argument source of each such <set function specification>
and the result of the <query specification> is a table con-
sisting of 1 row. The i-th value of the row is the value
specified by the i-th <value expression>.
set function means aggregate function.
P.S. result that query over empty table consists of one row with nulls (this is the difference between GROUP BY NULL query and query with out GROUP BY at all).

A quote from the MySQL documentation, the page about the aggregate functions:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
If you want a GROUP BY clause on your query then append GROUP BY NULL to it. I cannot tell about other RDBMS-es but on MySQL this is valid syntax. It works the same as the query without it.
Remarks about your query
A quote from your question:
"For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3."
The part with "the first row" is not something to rely on. It just happens most of the times that you get the first row.
Your query selects the columns col1 and col2 that are neither aggregate values nor functionally dependent on the columns in the GROUP BY clause. The query is not valid according to the SQL standard. MySQL allows it but its execution is undefined behaviour and the documentation about the handling of GROUP BY clearly states that:
... the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate...

Related

What does this SQL query with MIN() in HAVING clause do exactly?

I'm using MySQL and I have the following table employees: table.
I had an exercise in which I had to select the oldest person. I know the correct way to do that is with a subquery: SELECT name, dob FROM employees WHERE dob = (SELECT MIN(dob) FROM employees).
However, I did it like so: SELECT name, dob FROM employees HAVING dob = MIN(dob). Now this returns an empty set, but doesn't throw any errors. So what does it do exactly? I've read that MySQL allows to refer to columns from SELECT clause in HAVING clause, without any GROUP BY clause. But why does it return an empty set?
When you use MAX (or other aggregate functions) in the select columns or the having clause, you cause an implicit GROUP BY () (that is, all rows are grouped together into a single result row).
And when grouping (whether all rows or with a specific GROUP BY), if you specify a column outside of an aggregate function (such as your dob =) that is not one of the things being aggregated on or something functionally dependent on it (for example, some other column in a table when you are grouping by the primary key for that table), one of two things will happen:
If you have enabled the ONLY_FULL_GROUP_BY sql_mode (which is the default in newer versions), you will receive an error:
In aggregated query without GROUP BY, expression ... contains nonaggregated column '...'; this is incompatible with sql_mode=only_full_group_by
If you have not enabled ONLY_FULL_GROUP_BY, a value from some arbitrary one of the grouped rows will be used. So it is possible your dob = MIN(dob) will be true (and it will definitely be true if all rows have the same dob), but you can't rely on it doing anything useful and should avoid doing this.

query on self defined variable [duplicate]

I need to use an alias in the WHERE clause, but It keeps telling me that its an unknown column. Is there any way to get around this issue? I need to select records that have a rating higher than x. Rating is calculated as the following alias:
sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
You could use a HAVING clause, which can see the aliases, e.g.
HAVING avg_rating>5
but in a where clause you'll need to repeat your expression, e.g.
WHERE (sum(reviews.rev_rating)/count(reviews.rev_id))>5
BUT! Not all expressions will be allowed - using an aggregating function like SUM will not work, in which case you'll need to use a HAVING clause.
From the MySQL Manual:
It is not allowable to refer to a
column alias in a WHERE clause,
because the column value might not yet
be determined when the WHERE clause
is executed. See Section B.1.5.4,
“Problems with Column Aliases”.
I don't know if this works in mysql, but using sqlserver you can also just wrap it like:
select * from (
-- your original query
select .. sum(reviews.rev_rating)/count(reviews.rev_id) as avg_rating
from ...) Foo
where Foo.avg_rating ...
This question is quite old and one answer already gained 160 votes...
Still I would make this clear: The question is actually not about whether alias names can be used in the WHERE clause.
sum(reviews.rev_rating) / count(reviews.rev_id) as avg_rating
is an aggregation. In the WHERE clause we restrict records we want from the tables by looking at their values. sum(reviews.rev_rating) and count(reviews.rev_id), however, are not values we find in a record; they are values we only get after aggregating the records.
So WHERE is inappropriate. We need HAVING, as we want to restrict result rows after aggregation. It can't be
WHERE avg_rating > 10
nor
WHERE sum(reviews.rev_rating) / count(reviews.rev_id) > 10
hence.
HAVING sum(reviews.rev_rating) / count(reviews.rev_id) > 10
on the other hand is possible and complies with the SQL standard. Whereas
HAVING avg_rating > 10
is only possible in MySQL. It is not valid SQL according to the standard, as the SELECT clause is supposed to get executed after HAVING. From the MySQL docs:
Another MySQL extension to standard SQL permits references in the HAVING clause to aliased expressions in the select list.
The MySQL extension permits the use of an alias in the HAVING clause for the aggregated column
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
SELECT * FROM (SELECT customer_Id AS 'custId', gender, age FROM customer
WHERE gender = 'F') AS c
WHERE c.custId = 100;
If your query is static, you can define it as a view then you can use that alias in the where clause while querying the view.

Can somebody elaborate on the `SELECT LAST_INSERT_ID()` statement?

I have always had the understanding that you use SELECT to select columns from a table. However, I was thrown off when I saw SELECT LAST_INSERT_ID(). I understand what it does... but I don't understand how we can simply just ask for the last inserted id like that. Isn't it true that the SELECT keyword expects to see column names immediately afterwards... so how does that function call satisfy that requirement?
The SELECT statement normally works with a FROM clause to select columns -- and expressions on columns and constants -- from rows in a table.
Without the FROM clause, a SELECT simply evaluates the expressions and returns one row. The function LAST_INSERT_ID() is simply an expression that returns a value, so:
SELECT LAST_INSERT_ID()
returns a result set with single row with a single (unnamed) column.
Some databases do not like the idea of a SELECT without a FROM. Oracle is one of them. It requires a FROM clause and provides a table with one column and one row. MySQL also supports dual, so you could write:
SELECT LAST_INSERT_ID()
FROM dual;
This is handy, if you want to include a WHERE clause with the SELECT (the WHERE requires a FROM in MySQL).

For My SQL 'Group By', what is the criteria of picking one row from many rows?

For My SQL 'Group By', what is the criteria of picking one row from many rows? For example if I use group by user_id would it choose the row in some order or in some random way?
For example this table
id user_id message created_at
1 1 a 2016-08-25 07:00:15
2 2 c 2016-08-25 08:00:15
3 1 b 2016-08-25 09:46:15
4 2 d 2016-08-25 10:49:12
who will group by user_id find which row to take for user_id=1 row 1 or 3 because I could find any solution.
It will find the one specified in the aggregation (MAX(), MIN() etc.) statement, as you should only select grouped or aggregated columns when using GROUP BY.
Otherwise it is not determined which value will be chosen, it is pretty random.
Also see the MySQL manual:
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
MySQL 5.7.5 and up implements detection of functional dependence. If
the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default),
MySQL rejects queries for which the select list, HAVING condition, or
ORDER BY list refer to nonaggregated columns that are neither named in
the GROUP BY clause nor are functionally dependent on them.
So since MySQL 5.7 you explicitly have to enable an option so mysql can execute those queries.
Before MySQL 5.7 it allowed those queries but, as mentioned, chose the values of the nonaggegated and nongrouped fields randomly.
Group by works on a specific field. If you group by user_id and SELECT any other column then that column from that particular GROUP will be selected randomly.
That is why it is not recommended to SELECT the field which is not in GROUP BY clause.
who will group by user_id find which row to take for user_id=1 row 1
or 3 because i could find any solution.
Yes it will take other fields randomly.
If you have a query like
select user_id from yourtable group by user_id
then it does not matter from which record the values come from. However, if you have a query like
select user_id, created_at from yourtable group by user_id
where you have a field in the select list that is not subject of an aggregate function (max(), min(), etc), then as MySQL documentation on MySQL Handling of GROUP BY says:
In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
In reality, MySQL will pick the value for such fields from the 1st record it encounters while assembling the resultset.
Pls alo note that unless such fields are functionally dependent on the fields in the group by, the query is against all sql standards. In MySQL you can use the only_full_group_by sql mode setting (also part of the strict sql mode) to determine if MySQL accepts such queries at all. In the more recent versions of MySQL this qsl mode is turned on by default preventing you to run such queries without changing the settings.
The GROUP BY clause does not return rows from the database. It generates values using the rows filtered by the WHERE clause.
There are three types of columns that are valid in the expressions present in the SELECT clause of a query that contains a GROUP BY clause:
columns that also appear in the GROUP BY clause;
columns that are functionally dependent on the columns that appear in the GROUP BY clause;
any column can be used as argument of a GROUP BY aggregate function.
A GROUP BY query whose columns present in the SELECT clause do not follow the rules above is invalid SQL.
Up to version 5.7.5, MySQL allows invalid GROUP BY queries. It is explained in the documentation that for the columns that do not follow the rules above, "the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want."
Since version 5.7.5 MySQL rejects such invalid queries. Other RDBMSes (SQL Server, Oracle etc) do not allow them too because, well, they are invalid SQL.

MySQL GROUP BY returns only first row

I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?
As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.
It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.
Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.
as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID
Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'
SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work
The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.
Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.
Query result is perfect; it will return only one row.