MySQL MAX Function mixes rows - mysql

I have the query SELECT id, MAX(value) FROM table1 and it returns the correct value, but it takes the first id of the table instead of the one corresponding to the value returned (id is primary key).
I've already seen solutions, but they all needed a WHERE clause which i can't use in my case.

I believe what you're trying to do is return the id of the row with the max value. Is that right?
I'm curious why you can't use a WHERE clause?
But ok, using that constraint this can be solved. I'm going to assume that your table is unique on id (if not, you should really talk to whoever built it and ask why ?)
SELECT id, value
FROM table1
ORDER BY value DESC
LIMIT 1
This will sort your table, by value descending (greatest -> least), and then only show the first row (ie, the row with the largest "value").
If your table is not unique on id, you can still group by ID and get the same
SELECT id, max(value) as max_value
FROM table1
GROUP BY id
ORDER BY max_value DESC
LIMIT 1

First, to answer why your query is behaving in the way you observe: I suspect you are running without sql_mode = only_full_group_by as your query would likely generate an error otherwise. As you've noticed, this can lead to somewhat odd results.
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.
In this case, since you have no GROUP BY clause, the entire table is effectively the group.
To get one id associated with the largest value in the table, you can select all the rows, order by the value (descending), and then just limit to the first result, no need for the aggregation operator (or a WHERE caluse):
SELECT id, value FROM table1 ORDER BY value DESC LIMIT 1
Note that if there are multiple ids with the (same) max value, this only returns one of them. In the comments, #RaymondNijland points out that this may give different results (for id, the value will always be the maximum) each time you run it, and you can make it more deterministic by ordering by id as well:
SELECT id, value FROM table1 ORDER BY value DESC, id ASC LIMIT 1
Likewise, if there are for some reason multiple values for the same ID, it will still return that ID if one of its rows happens to be the max value -- thankfully this doesn't apply in this case, as you mentioned that id is the primary key.

I think you forgot a group by clause :
SELECT id, MAX(value) FROM table1 GROUP BY id
EDIT : To answer your need you could do
SELECT id, MAX(value)
FROM table1
GROUP BY id
HAVING MAX(value) = (SELECT MAX(value) FROM table1)
This could give you multiple results if you have multiple ids with the max value. In this case you could add "LIMIT 1" to get only one result but that would be quite strange and random.

Related

Filter rows in a query using HAVING in MySQL

HAVING is usually used with GROUP BY, but in my query I need it so that I can filter a derived column
sample query:
SELECT
id,
NOW() < expiration_date
OR expiration_date IS NULL AS is_active
FROM
products
HAVING is_active = 1 ;
I could also use a temp table and just use WHERE instead of HAVING,
example:
SELECT id
FROM
(SELECT
id,
NOW() < expiration_date
OR expiration_date IS NULL AS is_active
FROM
products)
WHERE is_active = 1 ;
Either way, I'm getting the desired results but is it really appropriate to use HAVING even if you have no GROUP BY and just for filtering derived rows. Which one is better?
The second query is better.
BTW, as you limit your results to the expression you can even shorten it to:
SELECT
id,
1 AS is_active
FROM
products
WHERE NOW() < expiration_date OR expiration_date IS NULL;
Your first query is not good. Mainly because it's not standard SQL and may thus confuse its reader. The query is not valid in most other dbms. HAVING is for aggregated records.
The typical thing is to aggregate and GROUP BY and then filter the results with HAVING. Omitting GROUP BY would usually give you one record (as in select max(col) from mytable). HAVING would in this case filter the one result record, so you get that one or none. Example: select max(col) as maxc from mytable having maxc > 100).
In MySQL you are allowed to omit GROUP BY expressions. For instance select id, name from mytable group by id would give you the id plus a name matching that ID (and as there is usually one record per ID, you get that one name). In another dbms you would have to use an aggregate function on name, such as MIN or MAX, or have name in the GROUP BY clause. In MySQL you don't have to. Omitting it means: get one of the values in the (group's) records found.
So your first query looks a bit like: Aggregate my data (because you are using HAVING) to one record (as there is no GROUP BY clause), so you'd get one record with a random id. This is obviously not what the query does, but to tell the truth I wouldn't have been able to tell just from the look at it. I am no MySQL guy, so I would have had to try to know how it is working, had you not told us it is working as expected by you.

How do I use MAX() to return the row that has the max value?

I have table orders with fields id, customer_id and amt:
SQL Fiddle
And I want get customer_id with the largest amt and value of this amt.
I made the query:
SELECT customer_id, MAX(amt) FROM orders;
But the result of this query contained an incorrect value of customer_id.
Then I built such the query:
SELECT customer_id, MAX(amt) AS maximum FROM orders GROUP BY customer_id ORDER BY maximum DESC LIMIT 1;
and got the correct result.
But I do not understand why my first query not worked properly. What am I doing wrong?
And is it possible to change my second query to obtain the necessary information to me in a simpler and competent way?
MySQL will allow you to leave GROUP BY off of a query, thus returning the MAX(amt) in the entire table with an arbitrary customer_id. Most other RDBMS require the GROUP BY clause when using an aggregate.
I don't see anything wrong with your 2nd query -- there are other ways to do it, but yours will work fine.
Some versions of SQL give you a warning or error when you select a field, have an aggregate operator like MAX or SUM, and the field you are selecting does not appear in GROUP BY.
You need a more complicated query to fetch the customer_id corresponding to the max amt. Unfortunately SQL is not as naive as you think. Once such way to do this is:
select customer_id from orders where amt = ( select max(amt) from orders);
Although a solution using joins is likely more performant.
To understand why what you were trying to do doesn't make sense, replace MAX with SUM. From the stance of how aggregate operators are interpreted, it's a mere coincidence that MAX returns something that corresponds to an actual row. SUM does not have this property, for instance.
Practically your first query can be seen as if it were GROUP BY-ed into a big single group.
Also, MySQL is free to choose each output value from different source rows from the same group.
http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause.
The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The problem with MAX() is that it will select the highest value of that specified field, considering the specified field alone. The other values in the same row are not considered or given preference for the result at any degree. MySQL will usually return whatever value is the first row of the GROUP (in this case the GROUP is composed by the entire table sinse no group was specified), dropping the information of the other rows during the agregation.
To solve this, you could do that:
SELECT customer_id, amt FROM orders ORDER BY amt DESC LIMIT 1
It should return you the customer_id and the highest amt while preserving the relation between both, because no agregation was made.

SQL select distinct but "keep first"?

According to another SO post (SQL: How to keep rows order with DISTINCT?), distinct has pretty undefined behavior as far as sorting.
I have a query:
select col_1 from table order by col_2
This can return values like
3
5
3
2
I need to then select a distinct on these that preserves ordering, meaning I want
select distinct(col_1) from table order by col_2
to return
3
5
2
but not
5
3
2
Here is what I am actually trying to do. Col_1 is a user id, and col_2 is a log in timestamp event by that user. So the same user (col_1) can have many login times. I am trying to build a historical list of users in which they were seen in the system. I would like to be able to say "our first user ever was, our second user ever was", and so on.
That post seems to suggest to use a group by, but group by is not meant to return an ordering of rows, so I do not see how or why this would be applicable here, since it does not appear group by will preserve any ordering. In fact, another SO post gives an example where group by will destroy the ordering I am looking for: see "Peter" in what is the difference between GROUP BY and ORDER BY in sql. Is there anyway to guarantee the latter result? The strange thing is, if I were implementing the DISTINCT clause, I would surely do the order by first, then take the results and do a linear scan of the list and preserve the ordering naturally, so I am not sure why the behavior is so undefined.
EDIT:
Thank you all! I have accepted IMSoP answer because not only was there an interative example that I could play around with (thanks for turning me on to SQL Fiddle), but they also explained why several things worked the way they worked, instead of simply "do this". Specifically, it was unclear that GROUP BY does not destroy (rather, keeps them in some sort of internal list) values in the other columns outside of the group by, and these values can still be examined in an ORDER BY clause.
This all has to do with the "logical ordering" of SQL statements. Although a DBMS might actually retrieve the data according to all sorts of clever strategies, it has to behave according to some predictable logic. As such, the different parts of an SQL query can be considered to be processed "before" or "after" one another in terms of how that logic behaves.
As it happens, the ORDER BY clause is the very last step in that logical sequence, so it can't change the behaviour of "earlier" steps.
If you use a GROUP BY, the rows have been bundled up into their groups by the time the SELECT clause is run, let alone the ORDER BY, so you can only look at columns which have been grouped by, or "aggregate" values calculated across all the values in a group. (MySQL implements a controversial extension to GROUP BY where you can mention a column in the SELECT that can't logically be there, and it will pick one from an arbitrary row in that group).
If you use a DISTINCT, it is logically processed after the SELECT, but the ORDER BY still comes afterwards. So only once the DISTINCT has thrown away the duplicates will the remaining results be put into a particular order - but the rows that have been thrown away can't be used to determine that order.
As for how to get the result you need, the key is to find a value to sort by which is valid after the GROUP BY/DISTINCT has (logically) been run. Remember that if you use a GROUP BY, any aggregated values are still valid - an aggregate function can look at all the values in a group. This includes MIN() and MAX(), which are ideal for ordering by, because "the lowest number" (MIN) is the same thing as "the first number if I sort them in ascending order", and vice versa for MAX.
So to order a set of distinct foo_number values based on the lowest applicable bar_number for each, you could use this:
SELECT foo_number
FROM some_table
GROUP BY foo_number
ORDER BY MIN(bar_number) ASC
Here's a live demo with some arbitrary data.
EDIT: In the comments, it was discussed why, if an ordering is applied before the grouping / de-duplication takes place, that order is not applied to the groups. If that were the case, you would still need a strategy for which row was kept in each group: the first, or the last.
As an analogy, picture the original set of rows as a set of playing cards picked from a deck, and then sorted by their face value, low to high. Now go through the sorted deck and deal them into a separate pile for each suit. Which card should "represent" each pile?
If you deal the cards face up, the cards showing at the end will be the ones with the highest face value (a "keep last" strategy); if you deal them face down and then flip each pile, you will reveal the lowest face value (a "keep first" strategy). Both are obeying the original order of the cards, and the instruction to "deal the cards based on suit" doesn't automatically tell the dealer (who represents the DBMS) which strategy was intended.
If the final piles of cards are the groups from a GROUP BY, then MIN() and MAX() represent picking up each pile and looking for the lowest or highest value, regardless of the order they are in. But because you can look inside the groups, you can do other things too, like adding up the total value of each pile (SUM) or how many cards there are (COUNT) etc, making GROUP BY much more powerful than an "ordered DISTINCT" could be.
I would go for something like
select col1
from (
select col1,
rank () over(order by col2) pos
from table
)
group by col1
order by min(pos)
In the subquery I calculate the position, then in the main query I do a group by on col1, using the smallest position to order.
Here the demo in SQLFiddle (this was Oracle, the MySql info was added later.
Edit for MySql:
select col1
from (
select col1 col1,
#curRank := #curRank + 1 AS pos
from table1, (select #curRank := 0) p
) sub
group by col1
order by min(pos)
And here the demo for MySql.
The GROUP BY in the referenced answer isn't attempting to perform an ordering... it is simply picking a single associated value for the column that we want to be distinct.
Like #bluefeet states, if you want a guaranteed ordering, you must use ORDER BY.
Why can't we specify a value in the ORDER BY that isn't included in the SELECT DISTINCT?
Consider the following values for col1 and col2:
create table yourTable (
col_1 int,
col_2 int
);
insert into yourTable (col_1, col_2) values (1, 1);
insert into yourTable (col_1, col_2) values (1, 3);
insert into yourTable (col_1, col_2) values (2, 2);
insert into yourTable (col_1, col_2) values (2, 4);
With this data, what should SELECT DISTINCT col_1 FROM yourTable ORDER BY col_2 return?
That's why you need the GROUP BY and the aggregate function, to decide which of the multiple values for col_2 you should order by... could be MIN(), could be MAX(), maybe even some other function such as AVG() would make sense in some cases; it all depends on the specific scenario, which is why you need to be explicit:
select col_1
from yourTable
group by col_1
order by min(col_2)
SQL Fiddle Here
For MySQL only, when you select columns that are not in the GROUP BY it will return columns from the first record in the group. You can use this behavior to select which record is returned from each group like this:
SELECT foo_number, bar_number
FROM
(
SELECT foo_number, bar_number
FROM some_table
ORDER BY bar_number
) AS t
GROUP BY foo_number
ORDER BY bar_number DESC;
This is more flexible because it allows you to order the records within each group using expressions that are not possible with aggregates - in my case I wanted to return the one with the shortest string in another column.
For completeness, my query looks like this:
SELECT
s.NamespaceId,
s.Symbol,
s.EntityName
FROM
(
SELECT
m.NamespaceId,
i.Symbol,
i.EntityName
FROM ImportedSymbols i
JOIN ExchangeMappings m ON i.ExchangeMappingId = m.ExchangeMappingId
WHERE
i.Symbol NOT IN
(
SELECT Symbol
FROM tmp_EntityNames
WHERE NamespaceId = m.NamespaceId
)
AND
i.EntityName IS NOT NULL
ORDER BY LENGTH(i.RawSymbol), i.RawSymbol
) AS s
GROUP BY s.NamespaceId, s.Symbol;
What this does is return a distinct list of symbols in each namespace, and for duplicated symbols returns the one with the shortest RawSymbol. When the RawSymbol lengths are the same, it returns the one who's RawSymbol comes first alphabetically.

How do records get ordered in a mysql group by?

so suppose I do
SELECT * FROM table t GROUP BY t.id
Suppose there are multiple rows in the table with the same id, only one row of that id will ultimately come out...I suppose mysql will order the results that have the same id and the return the first one or something....my question is...how does mysql perform this ordering and is there a way that I can control its ordering so that for instance, it uses a certain field etc?
You can GROUP BY several different columns to arrange the order for which the result is grouped.
SELECT * FROM table t GROUP BY t.id, t.foo, t.bar
In strictly correct SQL, when you use GROUP BY all the values being selected must be either columns named in the GROUP BY clause or aggregate functions. MySQL allows you to violate this rule, unless you set the only_full_group_by mode. But although it allows you to perform such queries, it doesn't specify which row will be selected for each grouped column.
If you want to select a row that corresponds to the max or min of some other column, you can do something like this:
SELECT a.*
FROM table a
JOIN (SELECT id, max(somecol) maxcol
FROM table
GROUP BY id) b
ON a.id = b.id
AND a.somecol = b.maxcol
Note that this can still return multiple rows per ID if they both have the max value. You can add a final GROUP BY clause and it will select one of them arbitrarily.
When using this MySQL extension to the standard SQL GROUP BY functionality you cannot control which values for the non-aggregated, non-GROUP BY columns are selected by the server. The MySQL documentation discusses this specific case and states clearly that
The server is free to choose any value from each group, so unless they
are the same, the values chosen are indeterminate. Furthermore, the
selection of values from each group cannot be influenced by adding an
ORDER BY clause.

Ordering records by a field only with respect to records with the same ID

I have an EAV table in SQL which as usual has several records for each ID. Each of these records has a numerical value in a column called 'weight' and I'm trying to sort this table so that for each ID the records are ranked in descending order of weight. This is going to be a one off process because I intend to make sure that data is sorted when it goes into the table in future.
Also is the normal protocol for doing something like this to SELECT all of the data, sort it the way you want, and then use the REPLACE command to replace the old data in the table?
I know that I can do this for one id by doing:
SELECT * FROM my_table WHERE id = 'X' ORDER BY weight DESC
but I need to somehow do this for each id in my table. How is this normally done?
you will ALWAYS return the data in the order you wish in this case:
SELECT * from theTable order by id, weight desc
you do not store the data in any particular order. (even if you try this will not matter).
Remove the WHERE clause and add id to ORDER BY. This way, your result set will be ordered first by id, and for each id, it will be ordered by weight.
SELECT * FROM my_table ORDER BY id, weight DESC