Mysql group by clause with multiple selects - mysql

I have 2 columns in my product table -name and brand, Given is the data,
NAME BRAND
'Ruby Axe Guitar', 'Guitar''s & Co'
'TV' , 'LG'
When I tried this query its working fine,
select name,brand, sum(1000) as sum,'Test' as name1
from products
group by name,brand
but I got surprised even when I dont include brand in the group by clause the query is working fine..
select name,brand, sum(1000) as sum,'Test' as name1
from products
group by name
Can someone explain?

You cannot select ungrouped row without aggregate function - MySQL will give you random value. I guess you are lucky with this second query

Because NAME is already unique with your data, so GROUP BY NAME is same as GROUP BY NAME, OTHER_FIELD.
NAME is unique, then the combination with any other column is unique too.

MySQL is a lot less strict than it should be IMHO. According to the actual SQL specification, any non-grouped column needs an aggregate function in a query containing a GROUP BY clause.
MySQL will allow retrieving non-grouped columns without such aggregate functions, returning an arbitrary value. They have an explanation of this choice in their documentation:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.

I believe its intended as an ignored oversight. TYPICALLY, you would be required to include any column that is a non-aggregate into the "GROUP BY" clause. However, MySQL basically grabs the first entry for the column not part of the aggregate it encounters.
This could be ok, such as doing a query against a table/columns that you know wont change no matter how many records in the corresponding group by. For example. You want a list of customers and their total orders. The orders table has a customer ID that joins to the customer table. So, you can do a SUM( Orders.Amount ), yet still get customer ID, Name, Address, Phone. Since the join is on a customer ID, the corresponding name, address, etc will never change and thus not be important within the group by. Just group by a customer ID.
So, MySQL won't choke on you if you inadvertently leave out a column...

Related

Understanding correlation in mysql

I have a table with duplicate IDs representing a person who has placed an order. Each of these orders has a date. Each order has a status code from 1 - 4. 4 means a cancelled order. I am using the following query:
SELECT
personID, MAX(date), status
FROM
orders
WHERE
status = 4
GROUP BY
personID
The problem is, while this DOES return a unique record for each person with their most recent order date, it does NOT give me the correct status. In other words, I assumed that the status would be correctly correlated to the MAX(date) and it is not. It simply pulls, seemingly at random, one of the statuses from one of the orders. Can I add specificity to say, in basic terms, give me the EXACT status from the same record as whatever the MAX(date) is.
Unfortunately, there is no simple way to get what you want. Most other RDBMS vendors don't even consider queries using aggregate functions valid unless all non-aggregated result fields are in the GROUP BY. The general solution for these kinds of questions usually involves a subquery to get the "last" records, which is then joined to the original table to get those rows.
Depending on the structure of your data this may or may not be possible. For instance, if you have multiple rows with the same personID and date there is no way to determine from those alone which one's status should be used.
To get result you want you could use:
SELECT personId, date, status
FROM orders
WHERE (personID,date) IN (SELECT personID, MAX(date)
FROM orders
-- WHERE status = 4
GROUP BY personID);
As for:
It simply pulls, seemingly at random, one of the statuses from one of the orders.
It works as intended:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate
Related: Group by clause in mySQL and postgreSQL, why the error in postgreSQL?

Why isn't the GROUP BY clause in this MySQL statement throwing errors?

I am using w3s as my example in here
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders,
Orders.OrderDate
FROM Orders
LEFT JOIN Shippers
ON Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName;
I am confused why the above statement isn't throwing an error. How does MySQL know which OrderDate to use when we are aggregating by all OrderID?
How does mysql know which OrderDate to use when we are aggregating by all OrderID?
It doesn't. It just picks one, because it assumes you would have grouped by all the necessary columns, and any columns that weren't in the GROUP BY and weren't subject to any aggregate functions would have the same values for each group.
It's non-standard behavior that works as an optimization, allowing the server to "leak" one of the values through from each group in the source rows into the result-set, reducing the size of the data the GROUP BY has to manage. Which source row's value is used for each group is undefined, so this is intended to be used only in queries where the non-grouped columns are functionally dependent on the grouped columns... because, in that case, "which" row doesn't matter, because they're all the same within each group.
MySQL extends the standard SQL use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the [queries excluding non-aggregated columns are] legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. [emphasis added]
https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html
You can disable this behavior by including ONLY_FULL_GROUP_BY in ##SQL_MODE.

Get AVG() of values from table with different names

I have a table :
CREATE TABLE data
(
value integer,
name varchar(100)
)
In my table there can be duplicate values of name possible with different value of value. Now I want to get DISTINCT name and there avg() value from the Table data.
I am able to get DISTINCT value of name but unable to get avg() of there values.
Now with following Query I get avg() of all data :
select floor(avg(value)) from data
I know this is incorrect but I am new to SQL. I want this select floor(avg(value)) for distinct values of name.
Data :
insert into data values(10, 'mnciitbhu')
insert into data values(20, 'mnciitbhu')
insert into data values(40, 'mafiya69')
insert into data values(20, 'mafiya69')
insert into data values(0, 'mafiya69')
Output :
mnciitbhu 15
mafiya69 20
Adding this because the other answers while accurate, are not detailed.
What you want to do here, are use the grouping and aggregation features of SQL.
grouping your results by particular fields, will divide your result set into discrete sections, which you can operate on with aggregate functions, to get averages, sums, counts etc, per group.
For a full list of aggregate functions, and other miscellaneous information about group by, you can read 12.16.1 GROUP BY (Aggregate) Functions.
In your instance, since you want the average per name, you will need to group by name. This would give the following query:
select name, avg(value)
from `data`
group by name; -- this is the important line
And this query will calculate the average of value, for each group of names in your table, returning one row per group.
One very important consideration when using group by, is that all fields contained in the select, must either be contained in the group by clause, or used in aggregate functions. If you refer to a field that isn't covered by this, you may end up with undesired indeterminate results.
From the manual 12.16.3 MySQL Handling of GROUP BY
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
The importance of that paragraph cannot be overstated. It is very easy to mis-understand how this works, arrive at a query that seems to give the desired result, but will occasionally give incorrect/undesired data.
Use this code:
select name,AVG(value) as Average from data
group by name
order by name desc
OUTPUT:
name Average
mnciitbhu 15
mafiya69 20
Try this
select name,avg(value) from data group by name

Why is the following SQL query erroneous?

/* erroneous query */
select dept name, ID, avg (salary)
from instructor
group by dept name;
I know that every non-aggregated function must appear in group by if it appears in select. However this query still runs in mySQL.
should it be:
/* erroneous query */
select dept name, ID, avg (salary)
from instructor
group by dept name, **ID**;
Because I ran the both queries and they give the exact same answers!
MySQL will allow you to not include non-aggregated columns in your group by, which is just a terrible idea to me. This can result in some very un-predictable results. Here's a link to the documentation:
Clicky!
it should be:
select [dept name], ID, AVG(salary)
from instructor
group by [dept name]
Now it would be more instructive to show the columns defined in your table, but you CANNOT have spaces in a column name without the column being wrapped in brackets live I did above.
From the MySQL documentation on this particular point:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to non-aggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY ...
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.
So roughly spoken the omitted columns get added automatically.
However, note that it is not exactly the same. Have a look at this example.
SELECT name, address, MAX(age) FROM customers GROUP BY name, address;
might give you something different as:
SELECT name, address, MAX(age) FROM customers GROUP BY name;
Check this Fiddle.
Your statement that "I know that every non-aggregated function must appear in group by if it appears in select" is according to me correct. I am not a SQL guru but thats my understanding too. I would have expected a syntax error to be flagged if your statement does not meet that condition. However, if it gives the same result, then the one possibility is that you have the same value in all rows for the ID field or whatever field it is that is missing in the group by list. Just check different values and see. Also, it may help to expressly use "as" for alias rather than blanks.
MySQL extends the use of GROUP BY so you can select nonaggregated columns, not named in the group by clause:
SELECT dept_name, ID, avg(salary)
FROM instructor
GROUP BY dept_name;
the previous query is perfectly valid in MySQL, while other DBMS will rise an error because of ID not present in the group by clause.
However, if there are more than one ID for each dept_name, the value of ID returned by MySQL will be undetermined.
You can configure MySQL to disable this extension.

MySQL GROUP BY returns only first row

I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?
As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.
It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.
Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.
as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID
Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'
SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work
The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.
Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.
Query result is perfect; it will return only one row.