Explanation query with GROUP_BY and ONLY_FULL_GROUP_BY - mysql

I want to understand how queries works with ONLY_FULL_GROUP_BY enabled.
If i list all the columns of the table with a MIN() on one column, it works fine:
$query = "SELECT id, member_id, name, code, MIN(price) AS price, FROM tbl_product GROUP BY code";
But if I select everything I have an error:
$query = "SELECT *, MIN(price) AS price FROM tbl_product GROUP BY code";
Can you explain me the differences between both ?

It's about a bug that was fixed in MySQL 5.7.5. According to the manual 12.20.3 MySQL Handling of GROUP BY MySQL 5.7.5 and newer can detect functional dependence between the primary key and the rest of the columns of the table. Literally it says:
MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.
ONLY_FULL_GROUP_BY is now the default option and works as specified by the SQL Standard.

The question is really why you would think that group by would work with select *.
What group by does is produce one row for each combination of values for the group by keys. That is by definition. Multiple rows become one.
The expressions allowed in the select are then:
The group by keys or expressions containing only those keys.
Summary functions on other values.
Combinations of summary functions with group by keys.
Any column in the select that is not in the group by could have multiple values among the original row. SQL does not allow this. Most databases do not allow this. MySQL no longer allows this by default.
Once upon a time it did, but the returned values were from indeterminate matching rows. That "functionality" (really a bug) has now been fixed.
Note: There is an exception to this -- allowed by the standard -- that allows aggregating by primary keys/unique keys and then select the rest of the columns. This is allowed because the primary key uniquely identifies the rest of the column values.

Related

What does this SQL query with MIN() in HAVING clause do exactly?

I'm using MySQL and I have the following table employees: table.
I had an exercise in which I had to select the oldest person. I know the correct way to do that is with a subquery: SELECT name, dob FROM employees WHERE dob = (SELECT MIN(dob) FROM employees).
However, I did it like so: SELECT name, dob FROM employees HAVING dob = MIN(dob). Now this returns an empty set, but doesn't throw any errors. So what does it do exactly? I've read that MySQL allows to refer to columns from SELECT clause in HAVING clause, without any GROUP BY clause. But why does it return an empty set?
When you use MAX (or other aggregate functions) in the select columns or the having clause, you cause an implicit GROUP BY () (that is, all rows are grouped together into a single result row).
And when grouping (whether all rows or with a specific GROUP BY), if you specify a column outside of an aggregate function (such as your dob =) that is not one of the things being aggregated on or something functionally dependent on it (for example, some other column in a table when you are grouping by the primary key for that table), one of two things will happen:
If you have enabled the ONLY_FULL_GROUP_BY sql_mode (which is the default in newer versions), you will receive an error:
In aggregated query without GROUP BY, expression ... contains nonaggregated column '...'; this is incompatible with sql_mode=only_full_group_by
If you have not enabled ONLY_FULL_GROUP_BY, a value from some arbitrary one of the grouped rows will be used. So it is possible your dob = MIN(dob) will be true (and it will definitely be true if all rows have the same dob), but you can't rely on it doing anything useful and should avoid doing this.

Mysql Inner Join Group By Tutorial Question

https://www.mysqltutorial.org/tryit/query/mysql-inner-join/#2
Hi folks!
I wonder why after I delete the GROUP BY orderNumber then it fetches only one row:
Is it their "tutorial" database mistake or is it a correct MySQL behavior? If it's correct, then why does it produces this exactly result?
SQL "aggregate functions" including SUM(), COUNT(), MIN(), MAX() among others require a frame to aggregate over. Typically that is one or more other columns to apply the SUM() or other aggregate onto, and GROUP BY is how you specify that frame.
An aggregate query with no GROUP BY implies you are taking the SUM() of all rows matched by the query's WHERE clause filter.
MySQL is unlike most other RDBMS in that it allows you to remove the GROUP BY with unaggregated columns in SELECT and still get some rowset back from your query. In Oracle, MS SQL Server, or Postgresql, the query without the GROUP BY would be a syntax error. They would also treat it as an error if you used GROUP BY orderNumber while still including status in the SELECT list. A GROUP BY should include every column which is in the SELECT list that isn't being used in the aggregate SUM(), COUNT(), MIN(), MAX(), etc.
But MySQL is lenient about its presence and instead tries to guess over which frame to apply your SUM() aggregate. Some of the time it can get the answer you were actually expecting, but most other times the values it gives you for the non-aggregated columns are essentially indeterminate. It will collapse several possible values down to just one, and you have no way to pick which one you get.
That is the query result you are seeing. MySQL chose orderNumber = 10100 and status = 'Shipped' to go with your SUM() even though they are not specifically related to that sum. The sum in your result 9604190.61 is the sum of quantityOrdered * priceEach for ALL rows in that table despite what the orderNumber says.
Documentation on MySQL's GROUP BY handling
So the most reliable version of your query and the only version which would work outside of MySQL, where you can actually predict the results would be:
SELECT
T1.orderNumber,
status,
SUM(quantityOrdered * priceEach) total
FROM
orders AS T1
INNER JOIN
orderdetails AS T2 ON T1.orderNumber = T2.orderNumber
GROUP BY
orderNumber,
status /* added */
;
Note that the tutorial omitted status from the GROUP BY even though it is in SELECT. That would be an error in most other RDBMS.
MySQL's default handling of this misfeature has changed with recent versions. Prior to 5.7, the ONLY_FULL_GROUP_BY mode was disabled by default, arguably causing a lot of developers to grow dependent on the grouping behavior. In recent versions, ONLY_FULL_GROUP_BY is enabled by default and prevents queries with a missing or incomplete GROUP BY.

For My SQL 'Group By', what is the criteria of picking one row from many rows?

For My SQL 'Group By', what is the criteria of picking one row from many rows? For example if I use group by user_id would it choose the row in some order or in some random way?
For example this table
id user_id message created_at
1 1 a 2016-08-25 07:00:15
2 2 c 2016-08-25 08:00:15
3 1 b 2016-08-25 09:46:15
4 2 d 2016-08-25 10:49:12
who will group by user_id find which row to take for user_id=1 row 1 or 3 because I could find any solution.
It will find the one specified in the aggregation (MAX(), MIN() etc.) statement, as you should only select grouped or aggregated columns when using GROUP BY.
Otherwise it is not determined which value will be chosen, it is pretty random.
Also see the MySQL manual:
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
MySQL 5.7.5 and up implements detection of functional dependence. If
the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default),
MySQL rejects queries for which the select list, HAVING condition, or
ORDER BY list refer to nonaggregated columns that are neither named in
the GROUP BY clause nor are functionally dependent on them.
So since MySQL 5.7 you explicitly have to enable an option so mysql can execute those queries.
Before MySQL 5.7 it allowed those queries but, as mentioned, chose the values of the nonaggegated and nongrouped fields randomly.
Group by works on a specific field. If you group by user_id and SELECT any other column then that column from that particular GROUP will be selected randomly.
That is why it is not recommended to SELECT the field which is not in GROUP BY clause.
who will group by user_id find which row to take for user_id=1 row 1
or 3 because i could find any solution.
Yes it will take other fields randomly.
If you have a query like
select user_id from yourtable group by user_id
then it does not matter from which record the values come from. However, if you have a query like
select user_id, created_at from yourtable group by user_id
where you have a field in the select list that is not subject of an aggregate function (max(), min(), etc), then as MySQL documentation on MySQL Handling of GROUP BY says:
In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
In reality, MySQL will pick the value for such fields from the 1st record it encounters while assembling the resultset.
Pls alo note that unless such fields are functionally dependent on the fields in the group by, the query is against all sql standards. In MySQL you can use the only_full_group_by sql mode setting (also part of the strict sql mode) to determine if MySQL accepts such queries at all. In the more recent versions of MySQL this qsl mode is turned on by default preventing you to run such queries without changing the settings.
The GROUP BY clause does not return rows from the database. It generates values using the rows filtered by the WHERE clause.
There are three types of columns that are valid in the expressions present in the SELECT clause of a query that contains a GROUP BY clause:
columns that also appear in the GROUP BY clause;
columns that are functionally dependent on the columns that appear in the GROUP BY clause;
any column can be used as argument of a GROUP BY aggregate function.
A GROUP BY query whose columns present in the SELECT clause do not follow the rules above is invalid SQL.
Up to version 5.7.5, MySQL allows invalid GROUP BY queries. It is explained in the documentation that for the columns that do not follow the rules above, "the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want."
Since version 5.7.5 MySQL rejects such invalid queries. Other RDBMSes (SQL Server, Oracle etc) do not allow them too because, well, they are invalid SQL.

SELECT DISTINCT and ORDER BY in MySQL

It seems like in version 5.7 of MySQL, they added one nasty thing which was (or still is) a real headache for those who deal with SQL Server.
The thing is: MySQL throws an error, when you try to SELECT DISTINCT rows for one set of columns and want to ORDER BY another set of columns. Previously, in version 5.6 and even in some builds of version 5.7 you could do this, but now it is prohibited (at least by default).
I hope there exists some configuration, some variable that we could set to make it work. But unfortunately I do not know that nasty variable. I hope someone knows that.
EDIT
This is some typical query in my case that worked literally for years (until the last build of MySQL 5.7):
SELECT DISTINCT a.attr_one, a.attr_two, a.attr_three, b.attr_four FROM table_one a
LEFT JOIN table_two b ON b.some_idx = a.idx
ORDER BY b.id_order
And, indeed, if I now include b.id_order to the SELECT part (as MySQL suggests doing), then what I will get, will be rubbish.
In most cases, a DISTINCT clause can be considered as a special case of GROUP BY. For example,
ONLY_FULL_GROUP_BY
MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them. (Before 5.7.5, MySQL does not detect functional dependency and ONLY_FULL_GROUP_BY is not enabled by default. For a description of pre-5.7.5 behavior )
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses. Disabling ONLY_FULL_GROUP_BY is useful primarily when you know that, due to some property of the data, all values in each nonaggregated column not named in the GROUP BY are the same for each group.
for more http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
for particular answer
SELECT DISTINCT attr_one,
attr_two,
attr_three,
attr_four
FROM
(SELECT a.attr_one,
a.attr_two,
a.attr_three,
b.attr_four
FROM table_one a
LEFT JOIN table_two b ON b.some_idx = a.idx
ORDER BY b.id_order) tmp
I have read the post on the link you mentioned, and looks like been given the clear explanation of why the error is thrown and how to avoid it.
In your case you may want to try the following (not tested of course):
SELECT a.attr_one, a.attr_two, a.attr_three, b.attr_four
FROM table_one a
LEFT JOIN table_two b ON b.some_idx = a.idx
GROUP BY a.attr_one, a.attr_two, a.attr_three, b.attr_four
ORDER BY max(b.id_order)
You should choose whether to use ORDER BY max(b.id_order), or ORDER BY min(b.id_order) or other aggregate function

Why isn't the GROUP BY clause in this MySQL statement throwing errors?

I am using w3s as my example in here
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders,
Orders.OrderDate
FROM Orders
LEFT JOIN Shippers
ON Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName;
I am confused why the above statement isn't throwing an error. How does MySQL know which OrderDate to use when we are aggregating by all OrderID?
How does mysql know which OrderDate to use when we are aggregating by all OrderID?
It doesn't. It just picks one, because it assumes you would have grouped by all the necessary columns, and any columns that weren't in the GROUP BY and weren't subject to any aggregate functions would have the same values for each group.
It's non-standard behavior that works as an optimization, allowing the server to "leak" one of the values through from each group in the source rows into the result-set, reducing the size of the data the GROUP BY has to manage. Which source row's value is used for each group is undefined, so this is intended to be used only in queries where the non-grouped columns are functionally dependent on the grouped columns... because, in that case, "which" row doesn't matter, because they're all the same within each group.
MySQL extends the standard SQL use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the [queries excluding non-aggregated columns are] legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. [emphasis added]
https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html
You can disable this behavior by including ONLY_FULL_GROUP_BY in ##SQL_MODE.