GROUP BY clause with non aggregate functions - mysql

Why mysql allows use non aggregate functions with GROUP BY clause ?
For example, this query works fine:
SELECT col, CHAR_LENGTH(col) FROM table
GROUP BY col
There is acceptable using querys like this ?

Sometimes is quite acceptable. Your query, written in more standard SQL, would be something like:
SELECT col, CHAR_LENGTH(col)
FROM (SELECT col FROM table GROUP BY col) c
or as:
SELECT col, MAX(CHAR_LENGTH(col))
FROM table
GROUP BY col
using non aggregate functions you can simplify the query a little bit, but the query would be a little more difficult to read.
It could also be useful when you are sure that all non aggregated columns share the same value:
SELECT id, name, surname
FROM table
GROUP BY id
HAVING COUNT(*)=1
or when it doesn't matter which value you need to return:
SELECT id, name
FROM table
GROUP BY id
will return a single name associated to that id (probably the first name encountered, but we can't be sure which one is the first, order by doesn't help here...). Be warned that if you want to select multiple non aggregated columns:
SELECT id, name, surname
FROM table
GROUP BY id
we have no guarantees that the name and surname returned will belong to the same row.
I would prefer not to use this extension, unless you are 100% sure of why you are using it.

MySQL has some "improvements" and tries to run and return result from invalid queries, in example like yours every good RDBMS should throw syntax error, but MySQL will run it, group the result by col and put value of randomly chosen row into second column.

If I'm guessing correctly about what you want to do, DISTINCT is a better choice:
SELECT DISTINCT col, CHAR_LENGTH(col) FROM table;
It more clearly indicates the readers what you're trying to accomplish.
Here is a SQLFiddle.

Related

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

How to use subselect in MySQL to get all unique names and their count?

I'm trying to make an SQL query, that returns all the unique names and a sum of occurences for each name.
This is what I came up with, but it merely gets the sum of all names and not the sum of each name separately.
select distinct(etunimi) as etunimi,
(select count(distinct(etunimi)) as määrä from jasenet)
from jasenet;
Is this the right way to go when solving this problem or is there another way of achieving this? thank you.
If you group by a column then aggregate functions like count() apply to each group and not the complete result set.
select etunimi, count(*)
from jasenet
group by etunimi
That because you haven't reference the colomn from outerquery with subquery
So, it should be referenced like that :
select distinct etunimi,
(select count(*)
from jasenet j1
where j1.etunimi = j.etunimi
) as määrä
from jasenet j;
However, i would also suggest to use GROUP BY clause which is more efficient than correlated subquery.

GROUP BY clause in MySQL groups records with different values

MySQL GROUP BY clause groups records even when they have different values.
However I would like it to as with DB2 SQL so that if records not contain exactly the same information they are not grouped.
Currently in MySQL for:
id Name
A Amanda
A Ana
the Group by id would return 1 record randomly (unless aggregation clauses used of course)
However in DB2 SQL the same Group by id would not group those: returning 2 records and never doing such a thing as picking randomly one of the values when grouping without using aggregation functions.
First, id is a bad name for a column that is not the primary key of a table. But that is not relevant to your question.
This query:
select id, name
from t
group by id;
returns an error in almost any database other than MySQL. The problem is that name is not in the group by and is not the argument of an aggregation function. The failure is ANSI-standard behavior, not honored by MySQL.
A typical way to write the query is:
select id, max(name)
from t
group by id;
This should work in all databases (assuming name is not some obscure type where max() doesn't work).
Or, if you want each name, then:
select id, name
from t
group by id, name;
or the simpler:
select distinct id, name
from t;
In MySQL, you can get the ANSI standard behavior by setting ONLY_FULL_GROUP_BY for the database/session. MySQL will then return an error, as DB2 does in this case.
The most recent versions of MySQL have ONLY_FULL_GROUP_BY set by default.
Group by in mysql will group the records according to the set fields. Think of it as: It gets one and the others will not show up. It has uses, for example, to count how many times that ID is repeated on the table:
select count(id), id from table group by id
You can, however, to achieve your purpose, group by multiple fields, something among the lines of:
select * from table group by id, name
I do not think there is an automated way to do this but using
GROUP BY id, name
Would give you the solution you are looking for

using distinct with all attributes

We can use * to select all attribute from table ,I am using distinct and my table contain 16 columns, How can I use distinct with it.I cannot do select distinct Id,* from abc;
What would be the best way.
Another way could be select distinct id,col1,col2 etc.
If you want in the results, one row per id, you can use GROUP BY id. But then, it's not advisable to use the other columns in the SELECT list (even if MySQL allows it - that depends on whether you have ANSI setting On or Off). It's advisable to use the other columns with aggregate functions like MIN(), MAX(), COUNT(), etc. In MySQL, there is also a GROUP_CONCAT() aggregate function that will collect all values from a column for a group:
SELECT
id
, COUNT(*) AS number_of_rows_with_same_id
, MIN(col1) AS min_col1
, MAX(col1) AS max_col1
--
, GROUP_CONCAT(col1) AS gc_col1
, GROUP_CONCAT(col2) AS gc_col2
--
, GROUP_CONCAT(col16) AS gc_col16
FROM
abc
GROUP BY
id ;
The query:
SELECT *
FROM abc
GROUP BY id ;
is not valid SQL (up to 92) because you have non-aggregated results in the SELECT list and valid in SQL (2003+). Still, it's invalid here because the other columns are not functionally dependent on the grouping column (id). MySQL unfortunately allows such queries and does no checking of functional dependency.
So, you never know which row (of the many with same id) will be returned or even if - horror! - you get results from different rows (with same id). As #Andriy comments, the consequences are that values for columns other than id will be chosen arbitrarily. If you want predictable results, just don't use such a technique.
An example solution: If you want just one row from every id, and you have a datetime or timestamp (or some other) column that you can use for ordering, you can do this:
SELECT t.*
FROM abc AS t
JOIN
( SELECT id
, MIN(some_column) AS m -- or MAX()
FROM abc
GROUP BY id
) AS g
ON g.id = t.id
AND g.m = t.some_column ;
This will work as long as the (id, some_column) combination is unique.
use group by instead of distinct
group by col1, col2,col3
its doing like distinct
SELECT DISTINCT * FROM `some_table`
Is absolutely valid syntax.
The error is caused by the fact that you call Id, *. Well * includes the Id column too, which usually is unique anyway.
So what you'll need in your case is just:
SELECT DISTINCT * FROM `abc`
SELECT * FROM abc where id in(select distinct id from abc);
You can totally do this.
Hope this helps
Initially I thought it would work for group by is best one. This is same as doing select * froom abc. Sorry guys

MySQL GROUP BY returns only first row

I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?
As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.
It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.
Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.
as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID
Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'
SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work
The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.
Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.
Query result is perfect; it will return only one row.