groupwise max causes sql_mode error - mysql

The challenge is to select daily maximum temperature from a table along with date and time information for each.
SELECT datestamp, max(temp) hitemp from Weather w group by `year`, `month`, `day`;
This causes
Expression #1 of SELECT list is not in GROUP BY clause and contains
nonaggregated column 'Weather.datestamp' which is not
functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
Other similar questions propose using JOIN, but I can't see how I can use JOIN syntax because the high temperature values are not unique.

RDBMS work best on set based logic. So think of the data in terms of two sets:
w2: a set of data containing the max temp for a given day
w: The universe of data containing all the measurements for a period of time
By joining these two sets we can obtain just the data from w that have the max temperature for a given day.
By using an inline view {w2} against the entire universe set {w} we can generate the max temp for each day then join back to the base set {w} to get the time information for each day's max temp.
This assumes that:
If a max temp is on multiple records for the same date you want them all as you've not indicated how to handle ties.
datestamp has a time component; and it is the date/time you want to see for max temp on a day.
This is what others meant by join most likely:
date(datestamp) simply returns the date component of a date/time.
max() returns the max temp by the group denoted (in this case date of datestamp)
.
SELECT datestamp, Temp
FROM weather W
INNER JOIN (SELECT date(datestamp) mDate, max(temp) as mtemp
FROM weather
GROUP BY Date(DateStamp)) W2
on W.temp = W2.mtemp
and Date(w.Datestamp) = w2.mDate
ADDITIONAL INFO:
MySQL doesn't support cross apply nor analytical functions row_number() Over (partition by date(datestamp) order by temp desc) which could also be used to solve this issue with likely greater performance. SQL Server, Oracle, DB2, Postgresql all have different ways of solving this; however the above example would work on all RDBMS engines (that I can think of); yet not be the most efficient in all cases.

Your sql mode is full group by. That means all the columns in select must be in the Group By clause
datestamp is in select but not in group by.
But for temp, since you are using an aggregate function MAX, it need not be in GROUP BY.
Use datestamp in group by or change your sql mode.
The exact reason is the mysql full group by mode and the logical query execution order of statements in mysql
Logical Order
FROM
WHERE
GROUP BY
AGGREGATIONS
HAVING
SELECT
So, GROUPING is done before SELECT. So, if full group by is selected, SELECT can access GROUPED and AGGREGATED columns alone.

Related

Correct format for Select in SQL Server

I have what should be a simple query for any database and which always runs in MySQL but not in SQL Server
select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
group by assetid
order by ts desc
The error is:
column tagalerts.id is invalid in the select list because it is not contained in either an aggregate function or the group by clause.
It is not a simple case of adding tagalerts.id to the group by clause because the error repeats for ts and for assetid etc, implying that all the selects need to be in a group or in aggregate functions... either of which will result in a meaningless and inaccurate result.
Splitting the select into a subquery to sort and group correctly (which again works fine with MySQL, as you would expect) makes matters worse
SELECT * from
(select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
order by ts desc
)T1
group by assetid
the order by clause is invalid in views, inline functions, derived tables and expressions unless TOP etc is used
the 'correct output' should be
id ts assetid zonename battlevel
1234 a datetime 1569 Reception 0
3182 another datetime 1572 Reception 0
Either I am reading SQL Server's rules entirely wrong or this is a major flaw with that database.
How can I write this to work on both systems?
In most databases you can't just include columns that aren't in the GROUP BY without using an aggregate function.
MySql is an exception to that. But MS SQL Server isn't.
So you could keep that GROUP BY with only the "assetid".
But then use the appropriate aggregate functions for all the other columns.
Also, use the JOIN syntax for heaven's pudding sake.
A SQL like select * from table1, table2 where table1.id2 = table2.id is using a syntax from the previous century.
SELECT
MAX(node.id) AS id,
MAX(ta.ts) AS ts,
ta.assetid,
MAX(node.zonename) AS zonename,
MAX(ta.battlevel) AS battlevel
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
GROUP BY ta.assetid
ORDER BY ta.ts DESC;
Another trick to use in MS SQL Server is the window function ROW_NUMBER.
But this is probably not what you need.
Example:
SELECT id, ts, assetid, zonename, battlevel
FROM
(
SELECT
node.id,
ta.ts,
ta.assetid,
node.zonename,
ta.battlevel,
ROW_NUMBER() OVER (PARTITION BY ta.assetid ORDER BY ta.ts DESC) AS rn
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
) q
WHERE rn = 1
ORDER BY ts DESC;
I strongly suspect this query is WRONG even in MySql.
We're missing a lot of details (sample data, and we don't know which table all of the columns belong to), but what I do know is you're grouping by assetid, where it looks like one assetid value could have more than one ts (timestamp) value in the group. It also looks like you're counting on the order by ts desc to ensure both that you see recent timestamps in the results first and that each assetid group uses the most recent possible ts timestamp for that group.
MySql only guarantees the former, not the latter. Nothing in this query guarantees that each assetid is using the most recent timestamp available. You could be seeing the wrong timestamps, and then also using those wrong timestamps for the order by. This is the problem the Sql Server rule is there to stop. MySql violates the SQL standard to allow you to write that wrong query.
Instead, you need to look at each column and either add it to the group by (best when all of the values are known to be the same, anyway) or wrap it in an aggregrate function like MAX(), MIN(), AVG(), etc, so there is a deterministic result for which value from the group is used.
If all of the values for a column in a group are the same, then there's no problem adding it to the group by. If the values are different, you want to be precise about which one is chosen for the result set.
While I'm here, the tagalerts, node join syntax has been obsolete for more than 20 years now. It's also good practice to use an alias with every table and prefix every column with the alias. I mention these to explain why I changed it for my code sample below, though I only prefix columns where I am confident in which table the column belongs to.
This query should run on both databases:
SELECT ta.assetid, MAX(ta.id) "id", MAX(ta.ts) "ts",
MAX(n.zonename) "zonename", MAX(battlevel) "battlevel"
FROM tagalerts ta
INNER JOIN node n ON ta.nodeid = n.id
WHERE ack = 0
GROUP BY ta.assetid
ORDER BY ts DESC
There is also a concern here the results may be choosing values from different records in the joined node table. So if battlevel is part of the node table, you might see a result that matches a zonename with a battlevel that never occurs in any record in the data. In Sql Server, this is easily fixed by using APPLY to match only one node record to each tagalert. MySql doesn't support this (APPLY or an equivalent has been in every other major database since at least 2012), but you can simulate with it in this case with two JOINs, where the first join is a subquery that uses GROUP BY to determine values will uniquely identify the needed node record, and second join is to the node table to actually produce that record. Unfortunately, we need to know more about the tables in question to actually write this code for you.

Get values from first sorted member of grouped(?) sql query

I feel like this is obvious but i'm struggling. Must be because it's a monday.
I have a licenses table in MySQL which has fields id (int), start_date (date), licensable_id (int), licensable_type (string) and fixed_end_point (boolean).
I want to get all licenses where the start date is equal to or less than today, group them by licensable_id and licensable_type, and then get the most recently starting one so I can get the fixed_end_point field out of it, along with licensable_id and licensable_type.
This is what i'm trying:
SELECT licensable_id, licensable_type, fixed_end_point
FROM licenses
WHERE start_date <= "2016-08-01"
GROUP BY licensable_id, licensable_type
ORDER BY start_date desc;
At the moment, the ORDER BY field seems to be being ignored, and it's just returning the values from the first license for each group, rather than the most recent. Can anyone see what I'm doing wrong? Do I need to make a nested query?
You shouldn't be thinking about this as a group by. You want to select the most recent start_date for each license, given the constraints in the question. One method uses a correlated subquery:
select l.*
from licenses l
where l.start_date = (select max(l2.start_date)
from licenses l2
where l2.licensable_id = l.licensable_id and
l2.licensable_type = l.licensable_type and
l2.start_date <= '2016-08-01'
);
You don't use aggregation function so you should use distinct
SELECT DISTINCT licensable_id, licensable_type, fixed_end_point
FROM licenses
WHERE date(start_date) <= date(now())
ORDER BY start_date desc
limit 1;
The reason this doesn't give you the results you want is how GROUP CONCAT works.
With standard SQL any field in the SELECT must either be also mentioned in the GROUP BY clause or must be an aggregate field (there is an exception for fields 100% related to a field that is returned, but many flavours of SQL do not support this).
MySQL does allow a field to be in the SELECT clause which is not an aggregate value and is not mentioned in the GROUP BY clause, and allowing this was the default until recently. However for these fields there could be multiple values for the GROUP BY fields, and in this case which one is chosen is not defined. As this is worked out prior to the ORDER BY statement being processed, the ORDER BY clause has no effect on which one is chosen.
There are a few normal ways to do this. You can use a as Gordon has suggested, or similarly (and possibly more efficiently depending on records and indexes) you can use a sub query to get the latest rows date for each of your important rows, and then join that back to your main table:-
SELECT l.licensable_id,
l.licensable_type,
l.fixed_end_point
FROM licenses l
INNER JOIN
(
SELECT licensable_id,
licensable_type,
MAX(l2.start_date) AS max_start_date
FROM licenses
GROUP BY licensable_id,
licensable_type
) sub0
ON l.licensable_id = sub0.licensable_id
AND l.licensable_type = sub0.licensable_type
AND l.start_date = sub0.max_start_date
In some situations another option is to (ab)use the GROUP_CONCAT and SUBSTRING_INDEX functions. This way you can GROUP BY the fields you want to, but do a GROUP_CONCAT or the other fields in the descending order of the date. Then use SUBSTRING_INDEX to get everything up to the first comma (the default delimiter for GROUP_CONCAT):-
SELECT licensable_id,
licensable_type,
SUBSTRING_INDEX(GROUP_CONCAT(COALESCE(fixed_end_point, '') ORDER BY start_date DESC), ',', 1)
FROM licenses
WHERE start_date <= "2016-08-01"
GROUP BY licensable_id, licensable_type
Obviously this has issues if the latest row has a null value, hence I have used COALESCE to fudge in non null values. Also if the field contains commas you will need to use an alternative delimiter. And if the field is large then you might have issues with the max field length for GROUP_CONCAT (default is 1024 I think).

MySQL select AVG, ORDER BY, GROUP BY & LIMIT

The bellow statement does not work but i cant seem to figure out why
select AVG(delay_in_seconds) from A_TABLE ORDER by created_at DESC GROUP BY row_type limit 1000;
I want to get the avg's of the most recent 1000 rows for each row_type. created_at is of type DATETIME and row_type is of type VARCHAR
If you only want the 1000 most recent rows, regardless of row_type, and then get the average of delay_in_seconds for each row_type, that's a fairly straightforward query. For example:
SELECT t.row_type
, AVG(t.delay_in_seconds)
FROM (
SELECT r.row_type
, r.delay_in_seconds
FROM A_table r
ORDER BY r.created_at DESC
LIMIT 1000
) t
GROUP BY t.row_type
I suspect, however, that this query does not satisfy the requirements that were specified. (I know it doesn't satisfy what I understood as the specification.)
If what we want is the average of the most recent 1000 rows for each row_type, that would also be fairly straightforward... if we were using a database that supported analytic functions.
Unfortunately, MySQL doesn't provide support for analytic functions. But it is possible to emulate one in MySQL, but the syntax is a bit involved, and it is dependent on behavior that is not guaranteed.
As an example:
SELECT s.row_type
, AVG(s.delay_in_seconds)
FROM (
SELECT #row_ := IF(#prev_row_type = t.row_type, #row_ + 1, 1) AS row_
, #prev_row_type := t.row_type AS row_type
, t.delay_in_seconds
FROM A_table t
CROSS
JOIN (SELECT #prev_row_type := NULL, #row_ := NULL) i
ORDER BY t.row_type DESC, t.created_at DESC
) s
WHERE s.row_ <= 1000
GROUP
BY s.row_type
NOTES:
The inline view query is going to be expensive for large sets. What that's effectively doing is assigning a row number to each row. The "order by" is sorting the rows in descending sequence by created_at, what we want is for the most recent row to be assigned a value of 1, the next most recent 2, etc. This numbering of rows will be repeated for each distinct value of row_type.
For performance, we'd want a suitable index with leading columns (row_type,created_at,delay_seconds) to avoid an expensive "Using filesort" operation. We need at least those first two columns for that, including the delay_seconds makes it a covering index (the query can be satisfied entirely from the index.)
The outer query then runs against the resultset returned from the view query (a "derived table"). The predicate in the WHERE filters out all rows that were assigned a row number greater than 1000, the rest is a straighforward GROUP BY and and AVG aggregate.
A LIMIT clause is entirely unnecessary. It may be possible to incorporate some additional predicates for some additional performance enhancement... like, what if we specified the most recent 1000 rows, but only that were create_at within the past 30 or 90 days?
(I'm not entirely sure this answers the question that OP was asking. What this answers is: Is there a query that can return the specified resultset, making use of AVG aggregate and GROUP BY, ORDER BY and LIMIT clauses.)
N.B. This query is dependent on a behavior of MySQL user-defined variables which is not guaranteed.
The query above shows one approach, but there is also another approach. It's possible to use a "join" operation (of A_table with A_table) to get a row number assigned (getting a COUNT of the number of rows that are "more recent" than each row. With large sets, however, that can produce a humongous intermediate result, if we aren't careful to limit it.
Write the ORDER BY at the last of the statement.
SELECT AVG(delay_in_seconds) from A_TABLE GROUP BY row_type ORDER by created_at DESC limit 1000;
read mysql dev site for details.

How do I use MAX() to return the row that has the max value?

I have table orders with fields id, customer_id and amt:
SQL Fiddle
And I want get customer_id with the largest amt and value of this amt.
I made the query:
SELECT customer_id, MAX(amt) FROM orders;
But the result of this query contained an incorrect value of customer_id.
Then I built such the query:
SELECT customer_id, MAX(amt) AS maximum FROM orders GROUP BY customer_id ORDER BY maximum DESC LIMIT 1;
and got the correct result.
But I do not understand why my first query not worked properly. What am I doing wrong?
And is it possible to change my second query to obtain the necessary information to me in a simpler and competent way?
MySQL will allow you to leave GROUP BY off of a query, thus returning the MAX(amt) in the entire table with an arbitrary customer_id. Most other RDBMS require the GROUP BY clause when using an aggregate.
I don't see anything wrong with your 2nd query -- there are other ways to do it, but yours will work fine.
Some versions of SQL give you a warning or error when you select a field, have an aggregate operator like MAX or SUM, and the field you are selecting does not appear in GROUP BY.
You need a more complicated query to fetch the customer_id corresponding to the max amt. Unfortunately SQL is not as naive as you think. Once such way to do this is:
select customer_id from orders where amt = ( select max(amt) from orders);
Although a solution using joins is likely more performant.
To understand why what you were trying to do doesn't make sense, replace MAX with SUM. From the stance of how aggregate operators are interpreted, it's a mere coincidence that MAX returns something that corresponds to an actual row. SUM does not have this property, for instance.
Practically your first query can be seen as if it were GROUP BY-ed into a big single group.
Also, MySQL is free to choose each output value from different source rows from the same group.
http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause.
The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
The problem with MAX() is that it will select the highest value of that specified field, considering the specified field alone. The other values in the same row are not considered or given preference for the result at any degree. MySQL will usually return whatever value is the first row of the GROUP (in this case the GROUP is composed by the entire table sinse no group was specified), dropping the information of the other rows during the agregation.
To solve this, you could do that:
SELECT customer_id, amt FROM orders ORDER BY amt DESC LIMIT 1
It should return you the customer_id and the highest amt while preserving the relation between both, because no agregation was made.

How do records get ordered in a mysql group by?

so suppose I do
SELECT * FROM table t GROUP BY t.id
Suppose there are multiple rows in the table with the same id, only one row of that id will ultimately come out...I suppose mysql will order the results that have the same id and the return the first one or something....my question is...how does mysql perform this ordering and is there a way that I can control its ordering so that for instance, it uses a certain field etc?
You can GROUP BY several different columns to arrange the order for which the result is grouped.
SELECT * FROM table t GROUP BY t.id, t.foo, t.bar
In strictly correct SQL, when you use GROUP BY all the values being selected must be either columns named in the GROUP BY clause or aggregate functions. MySQL allows you to violate this rule, unless you set the only_full_group_by mode. But although it allows you to perform such queries, it doesn't specify which row will be selected for each grouped column.
If you want to select a row that corresponds to the max or min of some other column, you can do something like this:
SELECT a.*
FROM table a
JOIN (SELECT id, max(somecol) maxcol
FROM table
GROUP BY id) b
ON a.id = b.id
AND a.somecol = b.maxcol
Note that this can still return multiple rows per ID if they both have the max value. You can add a final GROUP BY clause and it will select one of them arbitrarily.
When using this MySQL extension to the standard SQL GROUP BY functionality you cannot control which values for the non-aggregated, non-GROUP BY columns are selected by the server. The MySQL documentation discusses this specific case and states clearly that
The server is free to choose any value from each group, so unless they
are the same, the values chosen are indeterminate. Furthermore, the
selection of values from each group cannot be influenced by adding an
ORDER BY clause.