I'm using MySQL and I have the following table employees: table.
I had an exercise in which I had to select the oldest person. I know the correct way to do that is with a subquery: SELECT name, dob FROM employees WHERE dob = (SELECT MIN(dob) FROM employees).
However, I did it like so: SELECT name, dob FROM employees HAVING dob = MIN(dob). Now this returns an empty set, but doesn't throw any errors. So what does it do exactly? I've read that MySQL allows to refer to columns from SELECT clause in HAVING clause, without any GROUP BY clause. But why does it return an empty set?
When you use MAX (or other aggregate functions) in the select columns or the having clause, you cause an implicit GROUP BY () (that is, all rows are grouped together into a single result row).
And when grouping (whether all rows or with a specific GROUP BY), if you specify a column outside of an aggregate function (such as your dob =) that is not one of the things being aggregated on or something functionally dependent on it (for example, some other column in a table when you are grouping by the primary key for that table), one of two things will happen:
If you have enabled the ONLY_FULL_GROUP_BY sql_mode (which is the default in newer versions), you will receive an error:
In aggregated query without GROUP BY, expression ... contains nonaggregated column '...'; this is incompatible with sql_mode=only_full_group_by
If you have not enabled ONLY_FULL_GROUP_BY, a value from some arbitrary one of the grouped rows will be used. So it is possible your dob = MIN(dob) will be true (and it will definitely be true if all rows have the same dob), but you can't rely on it doing anything useful and should avoid doing this.
Related
Why am I only getting one result from the query below? The suggested "answer" has the first name "Susan" instead of what I got in my results.
SELECT EmpFirstName, EmpLastName, p.ProductName as ProductName,
YEAR(c.OrderDate) AS Year,
SUM(o.QuotedPrice + o.QuantityOrdered) AS TotalValue
FROM Employees
NATURAL JOIN Products p
NATURAL JOIN Order_Details o
NATURAL JOIN Orders c
ORDER BY Year, TotalValue DESC
Image of results
Image of Table Structure
Because there are a Sum in your Query
The result returned by the query does not match your expectations because the query is invalid. And your expectations are incorrect.
The presence of an aggregate (GROUP BY) function in the expression from the SELECT clause requires the presence of a GROUP BY clause. When such a clause does not exists, the SQL standard automatically adds a GROUP BY 1 clause that produces only one group from all the selected rows.
Each expression that appears in the SELECT clause of a GROUP BY query must follow one of these rules, in order to have a valid SQL query:
it also appears in the GROUP BY clause;
it's a call to an aggregate (GROUP BY) function;
is functionally dependent of one column that appears in the GROUP BY clause.
Because your query does not have a GROUP BY clause, the expressions EmpFirstName, EmpLastName, p.ProductName and YEAR(c.OrderDate) are not valid in the SELECT clause.
Before version 5.7.5, MySQL used to allow such invalid SQL queries but it reserved its privilege to return indeterminate values for the invalid expressions.
Since version 5.7.5, MySQL handles such queries correctly and rejects them. Other RDBMS-es handle them correctly since many years ago.
The explanation for the indeterminate values is simple: the JOIN and WHERE clauses extract some rows from the table(s). The (missing) GROUP BY clause produces only one record from all these rows. A GROUP BY query never returns rows from the table, it generates the values it puts in the result set. Since there are multiple different values for EmpFirstName in the group, the SQL standard says the query is invalid. MySQL used to ignore the standard but it had no valid rule about what value to pick from the EmpFirstName expression in the SELECT clause. Any value from the rows in the group is equally valid and that's what it returns: one random value from the group.
In order to get the results you expect you have to group the rows by OrderNumber and ProductNumber (and EmployeeID to get a valid SQL query):
For My SQL 'Group By', what is the criteria of picking one row from many rows? For example if I use group by user_id would it choose the row in some order or in some random way?
For example this table
id user_id message created_at
1 1 a 2016-08-25 07:00:15
2 2 c 2016-08-25 08:00:15
3 1 b 2016-08-25 09:46:15
4 2 d 2016-08-25 10:49:12
who will group by user_id find which row to take for user_id=1 row 1 or 3 because I could find any solution.
It will find the one specified in the aggregation (MAX(), MIN() etc.) statement, as you should only select grouped or aggregated columns when using GROUP BY.
Otherwise it is not determined which value will be chosen, it is pretty random.
Also see the MySQL manual:
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
MySQL 5.7.5 and up implements detection of functional dependence. If
the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default),
MySQL rejects queries for which the select list, HAVING condition, or
ORDER BY list refer to nonaggregated columns that are neither named in
the GROUP BY clause nor are functionally dependent on them.
So since MySQL 5.7 you explicitly have to enable an option so mysql can execute those queries.
Before MySQL 5.7 it allowed those queries but, as mentioned, chose the values of the nonaggegated and nongrouped fields randomly.
Group by works on a specific field. If you group by user_id and SELECT any other column then that column from that particular GROUP will be selected randomly.
That is why it is not recommended to SELECT the field which is not in GROUP BY clause.
who will group by user_id find which row to take for user_id=1 row 1
or 3 because i could find any solution.
Yes it will take other fields randomly.
If you have a query like
select user_id from yourtable group by user_id
then it does not matter from which record the values come from. However, if you have a query like
select user_id, created_at from yourtable group by user_id
where you have a field in the select list that is not subject of an aggregate function (max(), min(), etc), then as MySQL documentation on MySQL Handling of GROUP BY says:
In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
In reality, MySQL will pick the value for such fields from the 1st record it encounters while assembling the resultset.
Pls alo note that unless such fields are functionally dependent on the fields in the group by, the query is against all sql standards. In MySQL you can use the only_full_group_by sql mode setting (also part of the strict sql mode) to determine if MySQL accepts such queries at all. In the more recent versions of MySQL this qsl mode is turned on by default preventing you to run such queries without changing the settings.
The GROUP BY clause does not return rows from the database. It generates values using the rows filtered by the WHERE clause.
There are three types of columns that are valid in the expressions present in the SELECT clause of a query that contains a GROUP BY clause:
columns that also appear in the GROUP BY clause;
columns that are functionally dependent on the columns that appear in the GROUP BY clause;
any column can be used as argument of a GROUP BY aggregate function.
A GROUP BY query whose columns present in the SELECT clause do not follow the rules above is invalid SQL.
Up to version 5.7.5, MySQL allows invalid GROUP BY queries. It is explained in the documentation that for the columns that do not follow the rules above, "the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want."
Since version 5.7.5 MySQL rejects such invalid queries. Other RDBMSes (SQL Server, Oracle etc) do not allow them too because, well, they are invalid SQL.
I am using w3s as my example in here
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders,
Orders.OrderDate
FROM Orders
LEFT JOIN Shippers
ON Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName;
I am confused why the above statement isn't throwing an error. How does MySQL know which OrderDate to use when we are aggregating by all OrderID?
How does mysql know which OrderDate to use when we are aggregating by all OrderID?
It doesn't. It just picks one, because it assumes you would have grouped by all the necessary columns, and any columns that weren't in the GROUP BY and weren't subject to any aggregate functions would have the same values for each group.
It's non-standard behavior that works as an optimization, allowing the server to "leak" one of the values through from each group in the source rows into the result-set, reducing the size of the data the GROUP BY has to manage. Which source row's value is used for each group is undefined, so this is intended to be used only in queries where the non-grouped columns are functionally dependent on the grouped columns... because, in that case, "which" row doesn't matter, because they're all the same within each group.
MySQL extends the standard SQL use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the [queries excluding non-aggregated columns are] legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. [emphasis added]
https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html
You can disable this behavior by including ONLY_FULL_GROUP_BY in ##SQL_MODE.
I have 2 columns in my product table -name and brand, Given is the data,
NAME BRAND
'Ruby Axe Guitar', 'Guitar''s & Co'
'TV' , 'LG'
When I tried this query its working fine,
select name,brand, sum(1000) as sum,'Test' as name1
from products
group by name,brand
but I got surprised even when I dont include brand in the group by clause the query is working fine..
select name,brand, sum(1000) as sum,'Test' as name1
from products
group by name
Can someone explain?
You cannot select ungrouped row without aggregate function - MySQL will give you random value. I guess you are lucky with this second query
Because NAME is already unique with your data, so GROUP BY NAME is same as GROUP BY NAME, OTHER_FIELD.
NAME is unique, then the combination with any other column is unique too.
MySQL is a lot less strict than it should be IMHO. According to the actual SQL specification, any non-grouped column needs an aggregate function in a query containing a GROUP BY clause.
MySQL will allow retrieving non-grouped columns without such aggregate functions, returning an arbitrary value. They have an explanation of this choice in their documentation:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
I believe its intended as an ignored oversight. TYPICALLY, you would be required to include any column that is a non-aggregate into the "GROUP BY" clause. However, MySQL basically grabs the first entry for the column not part of the aggregate it encounters.
This could be ok, such as doing a query against a table/columns that you know wont change no matter how many records in the corresponding group by. For example. You want a list of customers and their total orders. The orders table has a customer ID that joins to the customer table. So, you can do a SUM( Orders.Amount ), yet still get customer ID, Name, Address, Phone. Since the join is on a customer ID, the corresponding name, address, etc will never change and thus not be important within the group by. Just group by a customer ID.
So, MySQL won't choke on you if you inadvertently leave out a column...
I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?
As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.
It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.
Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.
as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID
Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'
SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work
The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.
Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.
Query result is perfect; it will return only one row.