I have a table which is grouped by city,side column for a unique entry, and I need to query latest entries of each city,side group. (newer entry always have higher timestamp value)
In SQLite I can use GROUP BY for the job: http://sqlfiddle.com/#!5/6c1c4/1/0
but in MySQL, it doesn't work in this way: http://sqlfiddle.com/#!9/9ead9/1/0
I think I misuse/abuse GROUP BY here, but how can I have a correct statement for by MySQL and SQLite?
In your examples, both MySQL and SQLite are breaking SQL standard.
In standard SQL (supported by all major SQL engines), if you use GROUP BY in SELECT statement, then the only expressions permitted in SELECT list are either columns listed in GROUP BY, or aggregate function calls (like count(), sum(), avg(), etc) over any other columns.
Most SQL engines: PostgreSQL, Oracle, MSSQL, DB2 follow this rule strictly - they do not permit any other syntax.
Both MySQL and SQLite, however, decided to be more lax in that regard, which I think is big mistake and endless source for confusion. While it seems to work, it is absolutely not clear what is really happening. For example, SQLite timestamp column generated by your query does not look like anything that was present in original table source.
If you don't want to have any surprises, you should follow the standard. In your case, it means using statement like:
SELECT
min(period),
side,
city,
min(gold),
min(silver),
min(normal),
min(timestamp)
FROM cities
GROUP BY city, side
ORDER BY min(timestamp)
When you use it, both MySQL and SQLite (and any other database for that matter) return identical results: SQLFiddle for MySQL, SQLFiddle for SQLite.
UPDATE:
This statement will do what you want in standards-compliant SQL:
SELECT
c.period,
g.side,
g.city,
c.gold,
c.silver,
c.normal,
g.timestamp
FROM cities c,
(SELECT
side,
city,
max(timestamp) AS timestamp
FROM cities
GROUP BY city, side) g
WHERE c.side = g.side
AND c.city = g.city
AND c.timestamp = g.timestamp
ORDER BY c.timestamp
It generates identical result for both MySQL and SQLite.
(There is only one catch: it assumes that timestamp is unique per group).
In the absence of any aggregating functions (and I'm talking specifically about MySQL here), the use of a GROUP BY clause is inappropriate and meaningless. Perhaps you meant to use the DISTINCT operator.
Related
https://www.mysqltutorial.org/tryit/query/mysql-inner-join/#2
Hi folks!
I wonder why after I delete the GROUP BY orderNumber then it fetches only one row:
Is it their "tutorial" database mistake or is it a correct MySQL behavior? If it's correct, then why does it produces this exactly result?
SQL "aggregate functions" including SUM(), COUNT(), MIN(), MAX() among others require a frame to aggregate over. Typically that is one or more other columns to apply the SUM() or other aggregate onto, and GROUP BY is how you specify that frame.
An aggregate query with no GROUP BY implies you are taking the SUM() of all rows matched by the query's WHERE clause filter.
MySQL is unlike most other RDBMS in that it allows you to remove the GROUP BY with unaggregated columns in SELECT and still get some rowset back from your query. In Oracle, MS SQL Server, or Postgresql, the query without the GROUP BY would be a syntax error. They would also treat it as an error if you used GROUP BY orderNumber while still including status in the SELECT list. A GROUP BY should include every column which is in the SELECT list that isn't being used in the aggregate SUM(), COUNT(), MIN(), MAX(), etc.
But MySQL is lenient about its presence and instead tries to guess over which frame to apply your SUM() aggregate. Some of the time it can get the answer you were actually expecting, but most other times the values it gives you for the non-aggregated columns are essentially indeterminate. It will collapse several possible values down to just one, and you have no way to pick which one you get.
That is the query result you are seeing. MySQL chose orderNumber = 10100 and status = 'Shipped' to go with your SUM() even though they are not specifically related to that sum. The sum in your result 9604190.61 is the sum of quantityOrdered * priceEach for ALL rows in that table despite what the orderNumber says.
Documentation on MySQL's GROUP BY handling
So the most reliable version of your query and the only version which would work outside of MySQL, where you can actually predict the results would be:
SELECT
T1.orderNumber,
status,
SUM(quantityOrdered * priceEach) total
FROM
orders AS T1
INNER JOIN
orderdetails AS T2 ON T1.orderNumber = T2.orderNumber
GROUP BY
orderNumber,
status /* added */
;
Note that the tutorial omitted status from the GROUP BY even though it is in SELECT. That would be an error in most other RDBMS.
MySQL's default handling of this misfeature has changed with recent versions. Prior to 5.7, the ONLY_FULL_GROUP_BY mode was disabled by default, arguably causing a lot of developers to grow dependent on the grouping behavior. In recent versions, ONLY_FULL_GROUP_BY is enabled by default and prevents queries with a missing or incomplete GROUP BY.
currently im working with mysql 5.7 in development, and 5.6 in production. Each time i run a query with a group by in development i get some error like "Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY "
Here is the query.
SELECT c.id, c.name, i.*
FROM countries c, images i
WHERE i.country_id = c.id
GROUP BY c.id; Fixed for 5.7;
SELECT c.id, c.name,
ANY_VALUE(i.url) url,
ANY_VALUE(i.lat) lat,
ANY_VALUE(i.lng) lng
FROM countries c, images i
WHERE i.country_id = c.id
GROUP BY c.id;
For solving that I use the mysql function from 5.7 ANY_VALUE, but the main issue is that its not available in mysql 5.6
So if I fix the sql statement for development i will get an error in production.
Do you know any solution or polifill for the ANY_VALUE function in mysql 5.6?
You're misusing the notorious nonstandard MySQL extension to GROUP BY. Standard SQL will always reject your query, because you're mentioning columns that aren't aggregates and aren't mentioned in GROUP BY. In your dev system you're trying to work around that with ANY_VALUE().
In production, you can turn off the ONLY_FULL_GROUP_BY MySQL Mode. Try doing this:
SET #mode := ##SESSION.sql_mode;
SET SESSION sql_mode = '';
/* your query here */
SET SESSION sql_mode = #mode;
This will allow MySQL to accept your query.
But look, your query isn't really correct. When you can persuade it to run, it returns a randomly chosen row from the images table. That sort of indeterminacy often causes confusion for users and your tech support crew.
Why not make the query better, so it chooses a particular image. If your images table has an autoincrement id column you can do this to select the "first" image.
SELECT c.id, c.name, i.*
FROM countries c
LEFT JOIN (
SELECT MIN(id) id, country_id
FROM images
GROUP BY country_id
) first ON c.id = first.country_id
LEFT JOIN images i ON first.id = i.id
That will return one row per country with a predictable image shown.
Instead of ANY_VALUE, you could use the MIN or MAX aggregate functions.
Alternatively, you might consider not setting the ONLY_FULL_GROUP_BY SQL mode, which is set by default for MySql 5.7, and is responsible for the difference you experience with MySql 5.6. Then you can delay the update of your queries until you have migrated all your environments to MySql 5.7.
Which of the two is the better option, is debatable, but in the long term it will be better to adapt your queries so they adhere to the ONLY_FULL_GROUP_BY rule. Using MIN or MAX can certainly be of use in doing that.
For decades you could write queries that were not valid in standard SQL but perfectly valid mysql
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the nonaggregated name column in the select list does not
appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment) FROM orders AS o, customers
AS c WHERE o.custid = c.custid GROUP BY o.custid; For the query to
be legal, the name column must be omitted from the select list or
named in the GROUP BY clause.
MySQL extends the standard SQL use of GROUP BY so that the select list
can refer to nonaggregated columns not named in the GROUP BY clause.
This comes from the Mysql 5.6 manual's page on GROUP BY. If you look at the same page for 5.7.6 you see that things have changed. And changed dramatically!!
That page also gives you the solution. Disable ONLY_FULL_GROUP_BY That will make it possible for your old 5.6 query to work on 5.7.6 (remove ANY_VALUE from your query since it's not available in 5.7.6 but use the ONLY_FULL_GROUP_BY instead).
It seems like in version 5.7 of MySQL, they added one nasty thing which was (or still is) a real headache for those who deal with SQL Server.
The thing is: MySQL throws an error, when you try to SELECT DISTINCT rows for one set of columns and want to ORDER BY another set of columns. Previously, in version 5.6 and even in some builds of version 5.7 you could do this, but now it is prohibited (at least by default).
I hope there exists some configuration, some variable that we could set to make it work. But unfortunately I do not know that nasty variable. I hope someone knows that.
EDIT
This is some typical query in my case that worked literally for years (until the last build of MySQL 5.7):
SELECT DISTINCT a.attr_one, a.attr_two, a.attr_three, b.attr_four FROM table_one a
LEFT JOIN table_two b ON b.some_idx = a.idx
ORDER BY b.id_order
And, indeed, if I now include b.id_order to the SELECT part (as MySQL suggests doing), then what I will get, will be rubbish.
In most cases, a DISTINCT clause can be considered as a special case of GROUP BY. For example,
ONLY_FULL_GROUP_BY
MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them. (Before 5.7.5, MySQL does not detect functional dependency and ONLY_FULL_GROUP_BY is not enabled by default. For a description of pre-5.7.5 behavior )
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses. Disabling ONLY_FULL_GROUP_BY is useful primarily when you know that, due to some property of the data, all values in each nonaggregated column not named in the GROUP BY are the same for each group.
for more http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
for particular answer
SELECT DISTINCT attr_one,
attr_two,
attr_three,
attr_four
FROM
(SELECT a.attr_one,
a.attr_two,
a.attr_three,
b.attr_four
FROM table_one a
LEFT JOIN table_two b ON b.some_idx = a.idx
ORDER BY b.id_order) tmp
I have read the post on the link you mentioned, and looks like been given the clear explanation of why the error is thrown and how to avoid it.
In your case you may want to try the following (not tested of course):
SELECT a.attr_one, a.attr_two, a.attr_three, b.attr_four
FROM table_one a
LEFT JOIN table_two b ON b.some_idx = a.idx
GROUP BY a.attr_one, a.attr_two, a.attr_three, b.attr_four
ORDER BY max(b.id_order)
You should choose whether to use ORDER BY max(b.id_order), or ORDER BY min(b.id_order) or other aggregate function
I have run this query in Oracle
select studentid, attndmark
from attendance_master m,
attendance_detail d
where m.attnid = d.attendid
group by studentid
and got the error:
ORA-00979: not a GROUP BY expression
The error is fine and I know the issue of column list in select clause. But similar query is valid in MySQL.
SELECT aff.akey, username
FROM `affiliates` aff,
affstats ast
WHERE aff.akey = ast.akey
group by aff.akey
I need a query trick that can run on both RDBMS Oracle/ Mysql and also MSSQL.
What could be the trick?
MySQL is wrong, in the sense that it does not conform to the SQL standard (or even common sense in this case). It allows columns in the SELECT that are not arguments to aggregation functions and that are not in the GROUP BY. The documentation is explicit that the values come from "indeterminate" rows.
By the way, you should learn proper explicit JOIN syntax. The query can be written as:
SELECT aff.akey, MAX(username)
FROM affiliates aff JOIN
affstats ast
ON aff.akey=ast.akey
GROUP BY aff.akey;
This will work in both databases.
Surprise -- this is a perfectly valid query in MySQL:
select X, Y from someTable group by X
If you tried this query in Oracle or SQL Server, you’d get the natural error message:
Column 'Y' is invalid in the select list because it is not contained in
either an aggregate function or the GROUP BY clause.
So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.
There’s even a MySQL configuration parameter to turn off this “looseness”.
http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard.
http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html
My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?
According to this page (the 5.0 online manual), it's for better performance and user convenience.
I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts
FROM user
LEFT OUTER JOIN post ON post.owner_id=user.id
GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY clause (although, as you say, there is definite scope for problems)
Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.
It sounds to me like they intended it to be treated like the "FIRST(Y)" function that many other systems have.
More than likely, this construct is something that the MySQL team regret, but don't want to stop supporting because of the number of applications that would break.
MySQL treats this is a single column DISTINCT when you use GROUP BY without an aggregate function. Using other options you either have the whole result be distinct, or have to use subqueries, etc. The question is whether the results are truly predictable.
Also, good info is in this thread.
From what I have read in the mysql reference page, it says:
"You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group."
I suggest you to read this page (link to the reference manual of mysql):
http://dev.mysql.com/doc/refman/5.5/en//group-by-extensions.html
Its actually a very useful tool that all other fields dont have to be in an aggregate function when you group by a field. You can manipulate the result which will be returned by simply ordering it first and then grouping it after. for instance if i wanted to get user login information and i wanted to see the last time the user logged in i would do this.
Tables
USER
user_id | name
USER_LOGIN_HISTORY
user_id | date_logged_in
USER_LOGIN_HISTORY has multiple rows for one user so if i joined users to it it would return many rows. as i am only interested in the last entry i would do this
select
user_id,
name,
date_logged_in
from(
select
u.user_id,
u.name,
ulh.date_logged_in
from users as u
join user_login_history as ulh
on u.user_id = ulh.user_id
where u.user_id = 1234
order by ulh.date_logged_in desc
)as table1
group by user_id
This would return one row with the name of the user and the last time that user logged in.