MySQL query with GROUP BY behaving differently between MySQL versions?

MySQL query with GROUP BY behaving differently between MySQL versions? - mysql

I have two MySQL tables - equipment and calibration, where equipment represents an inventory of equipment and calibration holds records for each equipment calibration. One equipment will have multiple calibrations.
In MySQL 5.5 the following query was fully working to identify equipment where the most recent calibration has expired:
SELECT * FROM equipment AS e
LEFT JOIN (
SELECT calibration_id, equipment_id, calibration_company, certificate_no, date_certified, date_nextdue
FROM (
SELECT calibration_id, equipment_id, calibration_company, certificate_no, date_certified, date_nextdue
FROM calibration
WHERE deleted=0
ORDER BY date_certified DESC
) AS a GROUP BY a.equipment_id
) AS c ON c.equipment_id=e.equipment_id
WHERE e.deleted=0 AND c.date_nextdue <= CURRENT_DATE()
However in MySQL 5.7 the same SQL query works but returns rows including the oldest calibration not the most recent.
I've been experimenting with different joins but all need the GROUP BY facility and that seems to be where this all goes wrong.
While the above is a fictional example I have a lot of queries that are very similar in structure and behaving the same way. My question in two parts is:
Why is this behaving differently in MySQL 5.7, and
What changes do I need to make to the SQL to get it to function as desired in MySQL 5.7?
Thank you for your help.

Your original query has non-aggregated columns in the select clause that do not belong to the group by clause. While MySQL might allow that, which value will be picked is actually undefined, meaning that it is not guaranteed to be consistent over consecutive executions.
Preventing developers from falling into this trap is one of the reason why MySQL enables ONLY_FULL_GROUP_BY by default starting version 5.7 (since your query is still running, it means that this sql mode was explicitly disabled in the configuration of your new database server).
If I understood your query correctly, you can use a correlated subquery instead:
select e.*
from equipments e
where (
select max(c.date_certified)
from calibration c
where c.equipment_id = e.equipment_id and c.deleted = 0
) <= current_date

Related

Correct format for Select in SQL Server

I have what should be a simple query for any database and which always runs in MySQL but not in SQL Server
select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
group by assetid
order by ts desc
The error is:
column tagalerts.id is invalid in the select list because it is not contained in either an aggregate function or the group by clause.
It is not a simple case of adding tagalerts.id to the group by clause because the error repeats for ts and for assetid etc, implying that all the selects need to be in a group or in aggregate functions... either of which will result in a meaningless and inaccurate result.
Splitting the select into a subquery to sort and group correctly (which again works fine with MySQL, as you would expect) makes matters worse
SELECT * from
(select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
order by ts desc
)T1
group by assetid
the order by clause is invalid in views, inline functions, derived tables and expressions unless TOP etc is used
the 'correct output' should be
id ts assetid zonename battlevel
1234 a datetime 1569 Reception 0
3182 another datetime 1572 Reception 0
Either I am reading SQL Server's rules entirely wrong or this is a major flaw with that database.
How can I write this to work on both systems?

In most databases you can't just include columns that aren't in the GROUP BY without using an aggregate function.
MySql is an exception to that. But MS SQL Server isn't.
So you could keep that GROUP BY with only the "assetid".
But then use the appropriate aggregate functions for all the other columns.
Also, use the JOIN syntax for heaven's pudding sake.
A SQL like select * from table1, table2 where table1.id2 = table2.id is using a syntax from the previous century.
SELECT
MAX(node.id) AS id,
MAX(ta.ts) AS ts,
ta.assetid,
MAX(node.zonename) AS zonename,
MAX(ta.battlevel) AS battlevel
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
GROUP BY ta.assetid
ORDER BY ta.ts DESC;
Another trick to use in MS SQL Server is the window function ROW_NUMBER.
But this is probably not what you need.
Example:
SELECT id, ts, assetid, zonename, battlevel
FROM
(
SELECT
node.id,
ta.ts,
ta.assetid,
node.zonename,
ta.battlevel,
ROW_NUMBER() OVER (PARTITION BY ta.assetid ORDER BY ta.ts DESC) AS rn
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
) q
WHERE rn = 1
ORDER BY ts DESC;

I strongly suspect this query is WRONG even in MySql.
We're missing a lot of details (sample data, and we don't know which table all of the columns belong to), but what I do know is you're grouping by assetid, where it looks like one assetid value could have more than one ts (timestamp) value in the group. It also looks like you're counting on the order by ts desc to ensure both that you see recent timestamps in the results first and that each assetid group uses the most recent possible ts timestamp for that group.
MySql only guarantees the former, not the latter. Nothing in this query guarantees that each assetid is using the most recent timestamp available. You could be seeing the wrong timestamps, and then also using those wrong timestamps for the order by. This is the problem the Sql Server rule is there to stop. MySql violates the SQL standard to allow you to write that wrong query.
Instead, you need to look at each column and either add it to the group by (best when all of the values are known to be the same, anyway) or wrap it in an aggregrate function like MAX(), MIN(), AVG(), etc, so there is a deterministic result for which value from the group is used.
If all of the values for a column in a group are the same, then there's no problem adding it to the group by. If the values are different, you want to be precise about which one is chosen for the result set.
While I'm here, the tagalerts, node join syntax has been obsolete for more than 20 years now. It's also good practice to use an alias with every table and prefix every column with the alias. I mention these to explain why I changed it for my code sample below, though I only prefix columns where I am confident in which table the column belongs to.
This query should run on both databases:
SELECT ta.assetid, MAX(ta.id) "id", MAX(ta.ts) "ts",
MAX(n.zonename) "zonename", MAX(battlevel) "battlevel"
FROM tagalerts ta
INNER JOIN node n ON ta.nodeid = n.id
WHERE ack = 0
GROUP BY ta.assetid
ORDER BY ts DESC
There is also a concern here the results may be choosing values from different records in the joined node table. So if battlevel is part of the node table, you might see a result that matches a zonename with a battlevel that never occurs in any record in the data. In Sql Server, this is easily fixed by using APPLY to match only one node record to each tagalert. MySql doesn't support this (APPLY or an equivalent has been in every other major database since at least 2012), but you can simulate with it in this case with two JOINs, where the first join is a subquery that uses GROUP BY to determine values will uniquely identify the needed node record, and second join is to the node table to actually produce that record. Unfortunately, we need to know more about the tables in question to actually write this code for you.

Knowage autogenerated query understanding

I'm using knowage software for data analysis, I'm facing performance issues, now I'm watching 'dataset audit' log to see what queries does the system perform. I found this one that, to me, is a nonsense:
SELECT COUNT(*)
FROM
(select TOP(100) PERCENT "ATC_1" AS "ATC_1"
from
(SELECT [ID_AFo]
,[ATC]
,[ATC_1]
,[ATC_3]
,[ATC_4]
,[ATC_5]
FROM [AFO]
) T order by "ATC_1" ASC
) u
inner T query is the dataset definition query I entered that basically is a select * from [AFO] on my table, outer wrap are made by knowage (I never wrote them)
doesn't a select count (*) from T have performed the same calculation but avoiding a cexpensive order by?
EDIT:
Backend (data source) is MSSQL, cache server is MYSQL so frequent queries are on mysql

This query is equivalent to:
SELECT COUNT(*)
FROM [AFO];
The only reason that I can think of for constructing such a query is if the "100" could be set to another value. I'm not sure if SQL Server's optimizer is good enough to eliminate the ORDER BY in the subquery.

MySQL Union sort works unexpected at different server

Good day,
i have problem with "simple" query. When i execute it in different server i got other result set as what i need..
I tried to re-import all "tables" via export->import and still it's not working.
Where can be problem? Can be problem in MariaDB?
Database versions:
5.6.27 - MySQL Community Server (GPL), client: libmysql 5.0.11-dev
10.0.25-MariaDB-0+deb8u1 - (Debian), client: libmysql 5.5.49
Both running on MyISAM engine.
Query:
SELECT id, datum, ordinary FROM (SELECT *, 0 as `ordinary` FROM
`user_todolist` WHERE `done` = '0'
AND `deleted` = '0' AND `id_uzivatel` = '1' ORDER BY `datum` ASC) AS a1
UNION
SELECT id, datum, ordinary FROM (SELECT *, 1 as `ordinary` FROM
`user_todolist` WHERE `done` = '1' AND `deleted` = '0' AND
`id_uzivatel` = '1' ORDER BY `datum` DESC) AS a2
ORDER BY `ordinary`
Results (Left EXPECTED, Right Invalid):
SQL Explain(Top for Expected, Bot Invalid)

ORDER BY is not necessarily a stable sort. When you apply ORDER BY to the result of the UNION, it can reorder within the groups.
You don't need ORDER BY ordinary in the outer query. When you use UNION, the results are ordinarily in the order of the sub-queries, so the results of the first SELECT will come first, and the second SELECT after that.
You should change UNION to UNION ALL, though. By default, it's UNION DISTINCT, which means it has to combine the results of the queries to remove duplicates. Since there can never be duplicates between the queries (since they have different ordinary columns) this is unnecessary.
Another solution that doesn't rely on this (I'm not actually sure if it's guaranteed) is to take ORDER BY datum out of the subqueries, and make the main query use:
ORDER by ordinary, IF(ordinary = 0, datum, '') ASC, IF(ordinary = 1, datum, '') DESC

ANSI standard says that the ORDER BY in a subquery can be ignored. This is equivalent to saying that a table has no intrinsic order to the rows.
Recently both Oracle and MariaDB (apparently independently) started taking advantage of this standard.
UNION ALL is appropriate since there is no overlap of values, and since it is faster than UNION DISTINCT, due to the absence of a de-dup pass.
UNION has traditionally been implemented by creating a temp table, feeding rows from one select into it, then the next select.
Recently the need for the temp table was eliminated in certain situations. This is a nice optimization.
In the future, I expect multiple threads to perform the SELECTs in parallel. This will really invalidate any assumptions you may enjoy today about how things or ordered.
Bottom line: Remove the internal ORDER BYs and add an external ORDER BY, such as #Barmar's suggestion.
That way, your query will work 'correctly' in all past, current, and future versions of MySQL/MariaDB. (I was first burned by the issue several years ago: Here.)
Meanwhile, switch to InnoDB, before it is totally removed.

Is there ANY_VALUE capability for mysql 5.6?

currently im working with mysql 5.7 in development, and 5.6 in production. Each time i run a query with a group by in development i get some error like "Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY "
Here is the query.
SELECT c.id, c.name, i.*
FROM countries c, images i
WHERE i.country_id = c.id
GROUP BY c.id; Fixed for 5.7;
SELECT c.id, c.name,
ANY_VALUE(i.url) url,
ANY_VALUE(i.lat) lat,
ANY_VALUE(i.lng) lng
FROM countries c, images i
WHERE i.country_id = c.id
GROUP BY c.id;
For solving that I use the mysql function from 5.7 ANY_VALUE, but the main issue is that its not available in mysql 5.6
So if I fix the sql statement for development i will get an error in production.
Do you know any solution or polifill for the ANY_VALUE function in mysql 5.6?

You're misusing the notorious nonstandard MySQL extension to GROUP BY. Standard SQL will always reject your query, because you're mentioning columns that aren't aggregates and aren't mentioned in GROUP BY. In your dev system you're trying to work around that with ANY_VALUE().
In production, you can turn off the ONLY_FULL_GROUP_BY MySQL Mode. Try doing this:
SET #mode := ##SESSION.sql_mode;
SET SESSION sql_mode = '';
/* your query here */
SET SESSION sql_mode = #mode;
This will allow MySQL to accept your query.
But look, your query isn't really correct. When you can persuade it to run, it returns a randomly chosen row from the images table. That sort of indeterminacy often causes confusion for users and your tech support crew.
Why not make the query better, so it chooses a particular image. If your images table has an autoincrement id column you can do this to select the "first" image.
SELECT c.id, c.name, i.*
FROM countries c
LEFT JOIN (
SELECT MIN(id) id, country_id
FROM images
GROUP BY country_id
) first ON c.id = first.country_id
LEFT JOIN images i ON first.id = i.id
That will return one row per country with a predictable image shown.

Instead of ANY_VALUE, you could use the MIN or MAX aggregate functions.
Alternatively, you might consider not setting the ONLY_FULL_GROUP_BY SQL mode, which is set by default for MySql 5.7, and is responsible for the difference you experience with MySql 5.6. Then you can delay the update of your queries until you have migrated all your environments to MySql 5.7.
Which of the two is the better option, is debatable, but in the long term it will be better to adapt your queries so they adhere to the ONLY_FULL_GROUP_BY rule. Using MIN or MAX can certainly be of use in doing that.

For decades you could write queries that were not valid in standard SQL but perfectly valid mysql
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the nonaggregated name column in the select list does not
appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment) FROM orders AS o, customers
AS c WHERE o.custid = c.custid GROUP BY o.custid; For the query to
be legal, the name column must be omitted from the select list or
named in the GROUP BY clause.
MySQL extends the standard SQL use of GROUP BY so that the select list
can refer to nonaggregated columns not named in the GROUP BY clause.
This comes from the Mysql 5.6 manual's page on GROUP BY. If you look at the same page for 5.7.6 you see that things have changed. And changed dramatically!!
That page also gives you the solution. Disable ONLY_FULL_GROUP_BY That will make it possible for your old 5.6 query to work on 5.7.6 (remove ANY_VALUE from your query since it's not available in 5.7.6 but use the ONLY_FULL_GROUP_BY instead).

miracle in sqlite GROUP BY statement

I have a table which is grouped by city,side column for a unique entry, and I need to query latest entries of each city,side group. (newer entry always have higher timestamp value)
In SQLite I can use GROUP BY for the job: http://sqlfiddle.com/#!5/6c1c4/1/0
but in MySQL, it doesn't work in this way: http://sqlfiddle.com/#!9/9ead9/1/0
I think I misuse/abuse GROUP BY here, but how can I have a correct statement for by MySQL and SQLite?

In your examples, both MySQL and SQLite are breaking SQL standard.
In standard SQL (supported by all major SQL engines), if you use GROUP BY in SELECT statement, then the only expressions permitted in SELECT list are either columns listed in GROUP BY, or aggregate function calls (like count(), sum(), avg(), etc) over any other columns.
Most SQL engines: PostgreSQL, Oracle, MSSQL, DB2 follow this rule strictly - they do not permit any other syntax.
Both MySQL and SQLite, however, decided to be more lax in that regard, which I think is big mistake and endless source for confusion. While it seems to work, it is absolutely not clear what is really happening. For example, SQLite timestamp column generated by your query does not look like anything that was present in original table source.
If you don't want to have any surprises, you should follow the standard. In your case, it means using statement like:
SELECT
min(period),
side,
city,
min(gold),
min(silver),
min(normal),
min(timestamp)
FROM cities
GROUP BY city, side
ORDER BY min(timestamp)
When you use it, both MySQL and SQLite (and any other database for that matter) return identical results: SQLFiddle for MySQL, SQLFiddle for SQLite.
UPDATE:
This statement will do what you want in standards-compliant SQL:
SELECT
c.period,
g.side,
g.city,
c.gold,
c.silver,
c.normal,
g.timestamp
FROM cities c,
(SELECT
side,
city,
max(timestamp) AS timestamp
FROM cities
GROUP BY city, side) g
WHERE c.side = g.side
AND c.city = g.city
AND c.timestamp = g.timestamp
ORDER BY c.timestamp
It generates identical result for both MySQL and SQLite.
(There is only one catch: it assumes that timestamp is unique per group).

In the absence of any aggregating functions (and I'm talking specifically about MySQL here), the use of a GROUP BY clause is inappropriate and meaningless. Perhaps you meant to use the DISTINCT operator.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008