/* erroneous query */
select dept name, ID, avg (salary)
from instructor
group by dept name;
I know that every non-aggregated function must appear in group by if it appears in select. However this query still runs in mySQL.
should it be:
/* erroneous query */
select dept name, ID, avg (salary)
from instructor
group by dept name, **ID**;
Because I ran the both queries and they give the exact same answers!
MySQL will allow you to not include non-aggregated columns in your group by, which is just a terrible idea to me. This can result in some very un-predictable results. Here's a link to the documentation:
Clicky!
it should be:
select [dept name], ID, AVG(salary)
from instructor
group by [dept name]
Now it would be more instructive to show the columns defined in your table, but you CANNOT have spaces in a column name without the column being wrapped in brackets live I did above.
From the MySQL documentation on this particular point:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to non-aggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY ...
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.
So roughly spoken the omitted columns get added automatically.
However, note that it is not exactly the same. Have a look at this example.
SELECT name, address, MAX(age) FROM customers GROUP BY name, address;
might give you something different as:
SELECT name, address, MAX(age) FROM customers GROUP BY name;
Check this Fiddle.
Your statement that "I know that every non-aggregated function must appear in group by if it appears in select" is according to me correct. I am not a SQL guru but thats my understanding too. I would have expected a syntax error to be flagged if your statement does not meet that condition. However, if it gives the same result, then the one possibility is that you have the same value in all rows for the ID field or whatever field it is that is missing in the group by list. Just check different values and see. Also, it may help to expressly use "as" for alias rather than blanks.
MySQL extends the use of GROUP BY so you can select nonaggregated columns, not named in the group by clause:
SELECT dept_name, ID, avg(salary)
FROM instructor
GROUP BY dept_name;
the previous query is perfectly valid in MySQL, while other DBMS will rise an error because of ID not present in the group by clause.
However, if there are more than one ID for each dept_name, the value of ID returned by MySQL will be undetermined.
You can configure MySQL to disable this extension.
Related
MySQL GROUP BY clause groups records even when they have different values.
However I would like it to as with DB2 SQL so that if records not contain exactly the same information they are not grouped.
Currently in MySQL for:
id Name
A Amanda
A Ana
the Group by id would return 1 record randomly (unless aggregation clauses used of course)
However in DB2 SQL the same Group by id would not group those: returning 2 records and never doing such a thing as picking randomly one of the values when grouping without using aggregation functions.
First, id is a bad name for a column that is not the primary key of a table. But that is not relevant to your question.
This query:
select id, name
from t
group by id;
returns an error in almost any database other than MySQL. The problem is that name is not in the group by and is not the argument of an aggregation function. The failure is ANSI-standard behavior, not honored by MySQL.
A typical way to write the query is:
select id, max(name)
from t
group by id;
This should work in all databases (assuming name is not some obscure type where max() doesn't work).
Or, if you want each name, then:
select id, name
from t
group by id, name;
or the simpler:
select distinct id, name
from t;
In MySQL, you can get the ANSI standard behavior by setting ONLY_FULL_GROUP_BY for the database/session. MySQL will then return an error, as DB2 does in this case.
The most recent versions of MySQL have ONLY_FULL_GROUP_BY set by default.
Group by in mysql will group the records according to the set fields. Think of it as: It gets one and the others will not show up. It has uses, for example, to count how many times that ID is repeated on the table:
select count(id), id from table group by id
You can, however, to achieve your purpose, group by multiple fields, something among the lines of:
select * from table group by id, name
I do not think there is an automated way to do this but using
GROUP BY id, name
Would give you the solution you are looking for
Suppose we want to find the max or min age for a person with a specific name.
We can do:
select name, min(age) from users group by name;
select name, max(age) from users group by name;
min and max are clearly documented with other aggregate functions.
Another way to (seemingly) accomplish the above is as follows:
select name, age from (select name, age from users order by age asc) sorted group by name;
select name, age from (select name, age from users order by age desc) sorted group by name;
Although this works, it relies on the guarantee that when building a result set, MySQL will take the content from the first record found, in the case that there are multiple records for the group by field.
I cannot find documentation that clearly states such a guarantee to be true. Is it?
Quoting from the official documentation:
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard
SQL use of GROUP BY permits the select list, HAVING condition, or
ORDER BY list to refer to nonaggregated columns even if the columns
are not functionally dependent on GROUP BY columns. This causes MySQL
to accept the preceding query. In this case, the server is free to
choose any value from each group, so unless they are the same, the
values chosen are indeterminate, which is probably not what you want.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Result set sorting occurs
after values have been chosen, and ORDER BY does not affect which
value within each group the server chooses. Disabling
ONLY_FULL_GROUP_BY is useful primarily when you know that, due to some
property of the data, all values in each nonaggregated column not
named in the GROUP BY are the same for each group.
So, adding an order by does not provide any guarantee that the first value from the group will be chosen.
Your first version is correct. The second version is patently incorrect and documented as such. Here is the example in the documentation:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c [sic . . . why doesn't the documentation use JOIN???]
WHERE o.custid = c.custid
GROUP BY o.custid;
. . .
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
Could someone explain why the following query throws an error, if I am trying to get the names of all customers along with the total number of customers?
SELECT name, COUNT(*)
FROM CUSTOMER
I know that selecting columns along with an aggregate function requires a GROUP BY statement containing all the column names, but I don't understand the logical principle behind this.
edit:
http://sqlfiddle.com/#!2/90233/595
I guess 'error' isn't quite right, but notice how the current query returns Allison 9 as the only result.
I don't understand why it doesn't return:
Alison 9
Alison 9
Alison 9
Alison 9
Jason 9
...
(This is a new answer based on the comment and looking at the fiddle.)
The issue here is how mysql handles aggregate functions -- which is a non-standard way and different then everyone else.
mysql lets you use aggregate functions (count() is an example of an aggregate function) without a group by. All (or most?) other sql implementations require the group by when you use count(*). When you have a group by you have to say the range in the group by (for example group by name). Also every column has to be in the range or the result of an aggregate function.
SINCE you don't have a range mysql assumes the whole table and since you have a column that is not the result of a aggregate function or in the range (in this case name) mysql does something to make that column the result of an aggregate function. I'm not sure if it is specified in mysql what it does -- lets say "max()". (Fairly sure it is max()). So the real sql that is getting executed is
SELECT ANY_VALUE(name), COUNT(*)
FROM CUSTOMER
Thus you only see one name.
mysql documentation - http://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
After reading the above I see that mysql will use the default aggregate function ANY_VALUE() for columns which are not in the range.
If you just want the total number of customers on each row you could do this
SELECT DISTINCT NAME, COUNT(NAME) OVER () AS CustomerCount
FROM CUSTOMER
In this way you don't need the GROUP BY syntax. Under the covers it is probably doing the same thing as #GordonLinoff 's answer.
I added this because maybe it makes it clearer how group by works.
Select name, Count(*) as 'CountCustomers'
FROM CUSTOMER
Group by name
Order by name
Think of it as giving an instruction of which field to aggregate by. For example, if you had a field with the State of the Customer, you could group by State which would give a count of customers by state.
Also, note you can have multiple aggregate functions in the same select using the "over (partition by" construct.
If you want the names along with the total number of customers, then use a window function:
select name, count(*) as NumCustomersWithName,
sum(count(*)) over () as NumCustomers
from customer
group by name;
Edit:
You actually seem to want:
select name, count(*) over () as NumCustomers
from customer;
In MySQL, you would do this with a subquery:
select name, cnt
from customers cross join
(select count(*) as cnt from customers) x;
The reason your query doesn't work is because it is an aggregation query that returns exactly one row. When you use aggregation functions without a GROUP BY, then the query always returns exactly one row.
I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.
I have 2 columns in my product table -name and brand, Given is the data,
NAME BRAND
'Ruby Axe Guitar', 'Guitar''s & Co'
'TV' , 'LG'
When I tried this query its working fine,
select name,brand, sum(1000) as sum,'Test' as name1
from products
group by name,brand
but I got surprised even when I dont include brand in the group by clause the query is working fine..
select name,brand, sum(1000) as sum,'Test' as name1
from products
group by name
Can someone explain?
You cannot select ungrouped row without aggregate function - MySQL will give you random value. I guess you are lucky with this second query
Because NAME is already unique with your data, so GROUP BY NAME is same as GROUP BY NAME, OTHER_FIELD.
NAME is unique, then the combination with any other column is unique too.
MySQL is a lot less strict than it should be IMHO. According to the actual SQL specification, any non-grouped column needs an aggregate function in a query containing a GROUP BY clause.
MySQL will allow retrieving non-grouped columns without such aggregate functions, returning an arbitrary value. They have an explanation of this choice in their documentation:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
I believe its intended as an ignored oversight. TYPICALLY, you would be required to include any column that is a non-aggregate into the "GROUP BY" clause. However, MySQL basically grabs the first entry for the column not part of the aggregate it encounters.
This could be ok, such as doing a query against a table/columns that you know wont change no matter how many records in the corresponding group by. For example. You want a list of customers and their total orders. The orders table has a customer ID that joins to the customer table. So, you can do a SUM( Orders.Amount ), yet still get customer ID, Name, Address, Phone. Since the join is on a customer ID, the corresponding name, address, etc will never change and thus not be important within the group by. Just group by a customer ID.
So, MySQL won't choke on you if you inadvertently leave out a column...