MySQL: newbie using SELECT and MAX - mysql

I have a table called child like this
+---------+-----+
| name | age |
+---------+-----+
| Alfred | 5 |
| Maria | 6 |
+---------+-----+
When I run SELECT 'name' FROM 'child' I get both rows. No problem. It is what I expected.
But if I run SELECT 'name', MAX('age') FROM 'child' I get:
+---------+------------+
| name | MAX(`age`) |
+---------+------------+
| Alfredo | 6 |
+---------+------------+
This result is extrange for me.. I expected both rows like before, why it is outputting just one row? why Alfredo is outputted since Maria is who is 6 years old? where can I find documentation about this behaviour?

You need to use GROUP BY to get more than one row. Otherwise the aggregate function MAX() is applied on all rows. Notice, that Alfredo's age is actually 5. The name is the group in this case.
MySQL is kind of special here, since it doesn't follow ANSI-Standard SQL. Usually an error is thrown, when you don't specify a column from the select clause in the group by clause or apply an aggregate function on it. MySQL allows this (this will be changed in future versions, btw) and displays a random row from this group. So don't do this.
To get two rows in your example, you'd have to do
SELECT name, MAX(age) FROM your_table GROUP BY name;
Each name is a "group". If you would have another Alfredo with age 25 in your table, the result would be Alfredo - 25 and Maria - 6.
It gets more complicated than this when you want to get the row which belongs to the group-wise maximum. Here are some examples how to solve this.
More info to read.
To be on the safe side, you can disable this by setting the sql_mode only_full_group_by. Ask your administrator if you don't have the rights to do so.

Use of SQL aggregate functions should be accompanied by a GROUP BY clause. Here's a good place to start: https://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

You should SQL aggregate functions like Average, Max, etc with group by sql statements only. Otherwise you will get undefined behaviours like this one.
Here if you write max(age) only, everything looks good and you get 6, but now you also ask it to print the name(with no condition, i.e. asking it to print all names while max will only be one row), so it tries to do something intelligent and printing the first row is what it does in your case.

MAX() is an aggregate function to be used with GROUP BY. When the GROUP BY clause is missing, any RDBMS will produce a single group from all the selected rows and it will return a single row.
When grouping is involved, the expressions that appear in the SELECT clause are evaluated independently. There is no relationship between name and MAX(age). MAX(age) is the maximum value of column age from the rows filtered by the WHERE clause (all the rows in your case).
The standard SQL language does not allow SELECTing columns that are not dependent on the GROUP BY columns or used in aggregate functions.
MySQL allows this before version 5.7.5. Starting with version 5.7.5 it adheres to the standard and rejects such queries with errors. The old behaviour can still be achieved using configuration.
As explained in the documentation, for SELECT columns that are neither dependent on the GROUP BY columns nor used in aggregate functions, "the server is free to choose any value from each group". This is undefined behaviour.
Back to your query:
SELECT 'name', MAX('age') FROM 'child'
It has no WHERE all the rows are included. Then, because of MAX(age) (which is an aggregate function), MySQL creates a group that contains all the filtered rows (all the rows) and evaluates each of the expressions from the SELECT clause.
MAX(age) is very clear, it evaluates to the maximum value found the column age of the rows from the group. That is 6 and nothing more. No reference to the row where it was extracted from is kept.
Selecting a name is affected by the undefined behaviour exposed above. The server will select any value and, this time, it seems it preferred to pick the value from the first row. It could be different on another server. It could be different on the same server after you add, remove or update a row on that table. It just cannot be predicted.
Why this behaviour?
Why the server doesn't get the value from the same row where it got the value of MAX(age)? Is it that difficult to accomplish? -- This is how a lot of beginners think when they start working with SQL.
The short answer is: because there is no such row.
Let's say SQL should select name from the same row it selected MAX('age').
Let's put more aggregate functions in the query:
SELECT 'name', MAX('age'), MIN('age'), AVG('age'), COUNT(*) FROM 'child'
If the above assertion would be correct, SQL should get name from the same row that contains MAX(age) (row #2). What if there are two rows containing that value?
But, on the same time it should get name from the same row that contains MIN(age) (ahem, this is row #1).
Or, it should get it from the row where is finds AVG(age) (which is 5.5; oops, there is no such row).
What about the row that contains COUNT(*) in column... errr... in what column should it check for COUNT(*)? Btw, COUNT(*) is not an age or a name, it is just a number. It doesn't make any sense to compare it with values you store in the table.

Related

MySQL returns only first line when querying for MAX() and all names

I have the following table:
students:
name: semester:
AAAA 1
BBBB 2
CCCC 3
DDDD 4
Where students is the name of the table and name & semester are the colums.
I want to select all the names and the max number of semesters.
Therefore I use the following SQL-Statement:
SELECT MAX(semester), name FROM students;
As result I get the correct maximum number of semesters, but only the first name.
Why does it only return one name and why the first of the table (and not the one with max semesters)?
I'm not interested in how to fix this but in why it behaves like this.
I'm using MariaDB 10.4.11.
You must use GROUP BY to group each name. Otherwise it will aggregate the whole table, which of course only return one row.
SELECT MAX(semester), name
FROM students
GROUP BY name;
It behaves like this because you are using an older version of MySQL that allows malformed queries.
Why is the query malformed? You have an aggregation query with no GROUP BY, so it returns one row. But, name is not aggregated -- and presumably you have multiple names in the database.
MySQL used to support this syntax wholeheartedly but returns an arbitrary value of name. Now it requires that you set a configuration parameter in order to use it.

Not selecting duplicates in join / where query

I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...

How Are Other Columns MySQL Aggregate with Other Columns

If I have a table like so:
Name Type Val
Mike A 1
John A 4
Jerry 6
(Notice that Jerry has an empty string for Type)
And I do a query like
Select sum(Val), Type from table
How does MySQL choose which Type to put in the one row result? If I wanted to return the "non blanked"
To give some context, the Type for every row in this table should actually be the same, but there used to be a bug where there are now some values that are blank. (Note that the sum should still include the blanks)
I know I can do this in two queries, and just select Type from table where Type!="" but I was curious if there was a trick to do it in that single query.
This is valid SQL in MySQL. You are aggregating all rows to one row (by using the aggregate function SUM and having no GROUP BY clause).
Any value that is neither in a GROUP BY clause nor being aggregated is a random one. The type you get is just one of those types in the table; it could even be another one when you execute the same query again.
You can use max(type) or min(type) for instance to get the value you are after.

Normal columns followed with agregate functions but not in group by

I knew that normal columns followed by aggregate functions are allowed only if a Group By including them follows.
But then why is the following working?
mysql> select payee,sum(amount) from checks;
+---------+-------------+
| payee | sum(amount) |
+---------+-------------+
| Ma Bell | 893.76 |
+---------+-------------+
1 row in set (0.00 sec)
This behavior is an "extension" to MySql:
MySQL extends the use of GROUP BY so that the select list can refer
to nonaggregated columns not named in the GROUP BY clause.
However, This behavior is actually a configurable setting in MySql:
ONLY_FULL_GROUP_BY
Do not permit queries for which the select list or (as of MySQL
5.1.10) HAVING list refers to nonaggregated columns that are not named in the GROUP BY clause.
It is best to respect the group by and add all non-aggregated columns, especially if there's the possibility that you might someday migrate to a server that has ONLY_FULL_GROUP_BY turned on.
EDIT Manual reference which does a better job explaining than I do and note the details: http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html
This is intended to work fine. Without an aggregate function it does it on all rows returned in the query. The reason you see little on this I think is that this is rarely what you are actually trying to accomplish. You frequently drop the group by when you have a where clause which you know would only return things from which you were planning to group by anyway. i.e. if you query is:
select payee,sum(amount) from checks where payee = 'Ma Bell'
The group by in the following is technically redundant:
select payee,sum(amount) from checks where payee = 'Ma Bell' group by payee
Personally - I typically include the GROUP BY clause as I THINK it is more consistently supported cross platform... not 100% sure of that though.
Again, in your query above I would again ask - even though it technically works, are you getting the result you are after without a where clause?

MySQL GROUP BY returns only first row

I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?
As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.
It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.
Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.
as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID
Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'
SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work
The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.
Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.
Query result is perfect; it will return only one row.