Why does MySQL allow "group by" queries WITHOUT aggregate functions? - mysql

Surprise -- this is a perfectly valid query in MySQL:
select X, Y from someTable group by X
If you tried this query in Oracle or SQL Server, you’d get the natural error message:
Column 'Y' is invalid in the select list because it is not contained in
either an aggregate function or the GROUP BY clause.
So how does MySQL determine which Y to show for each X? It just picks one. From what I can tell, it just picks the first Y it finds. The rationale being, if Y is neither an aggregate function nor in the group by clause, then specifying “select Y” in your query makes no sense to begin with. Therefore, I as the database engine will return whatever I want, and you’ll like it.
There’s even a MySQL configuration parameter to turn off this “looseness”.
http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
This article even mentions how MySQL has been criticized for being ANSI-SQL non-compliant in this regard.
http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html
My question is: Why was MySQL designed this way? What was their rationale for breaking with ANSI-SQL?

According to this page (the 5.0 online manual), it's for better performance and user convenience.

I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts
FROM user
LEFT OUTER JOIN post ON post.owner_id=user.id
GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY clause (although, as you say, there is definite scope for problems)

Unfortunately almost all the SQL varieties have situations where they break ANSI and have unpredictable results.
It sounds to me like they intended it to be treated like the "FIRST(Y)" function that many other systems have.
More than likely, this construct is something that the MySQL team regret, but don't want to stop supporting because of the number of applications that would break.

MySQL treats this is a single column DISTINCT when you use GROUP BY without an aggregate function. Using other options you either have the whole result be distinct, or have to use subqueries, etc. The question is whether the results are truly predictable.
Also, good info is in this thread.

From what I have read in the mysql reference page, it says:
"You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group."
I suggest you to read this page (link to the reference manual of mysql):
http://dev.mysql.com/doc/refman/5.5/en//group-by-extensions.html

Its actually a very useful tool that all other fields dont have to be in an aggregate function when you group by a field. You can manipulate the result which will be returned by simply ordering it first and then grouping it after. for instance if i wanted to get user login information and i wanted to see the last time the user logged in i would do this.
Tables
USER
user_id | name
USER_LOGIN_HISTORY
user_id | date_logged_in
USER_LOGIN_HISTORY has multiple rows for one user so if i joined users to it it would return many rows. as i am only interested in the last entry i would do this
select
user_id,
name,
date_logged_in
from(
select
u.user_id,
u.name,
ulh.date_logged_in
from users as u
join user_login_history as ulh
on u.user_id = ulh.user_id
where u.user_id = 1234
order by ulh.date_logged_in desc
)as table1
group by user_id
This would return one row with the name of the user and the last time that user logged in.

Related

what is the mean of this sql grammar: having 1?

My requirements are: I now have a table, I need to group according to one of the fields, and get the latest record in the group, and then I search the scheme on the Internet,
SELECT
* FROM(
SELECT
*
FROM
record r
WHERE
r.id in (xx,xx,xx) HAVING 1
ORDER BY
r.time DESC
) a
GROUP BY
a.id
, the result is correct, but I can't understand the meaning of "having 1" after the where statement. I hope a friend can give me an answer. Thank you very much.
It does nothing, just like having true would. Presumably it is a placeholder where sometimes additional conditions are applied? But since there is no group by or use of aggregate functions in the subquery, any having conditions are going to be treated no differently than where conditions.
Normally you select rows and apply where conditions, then any grouping (explicit, or implicit as in select count(*)) occurs, and the having clause can specify further constraints after the grouping.
Note that your query is not guaranteed to give the results you want; the order by in the subquery in theory has no effect on the outer query and the optimizer may skip it. It is possible the presence of having makes a difference to the optimizer, but that is not something you should rely on, certainly from one version of mysql to another.

SQL Query: Joining on a SUM()

I'm trying to run a query that sums the value of items and then JOIN on the value of that SUM.
So in the below code, the Contract_For is what I'm trying to Join on, but I'm not sure if that's possible.
SELECT `items_value`.`ContractId` as `Contract`,
`items_value`.`site` as `SiteID`,
SUM(`items_value`.`value`) as `Contract_For`,
`contractitemlists`.`Text` as `Contracted_Text`
FROM items_value
LEFT JOIN contractitemlists ON (`items_value`.`Contract_For`) = `contractitemlists`.`Ref`;
WHERE `items_value`.`ContractID`='2';
When I've face similar issues in the past, I've just created a view that holds the SUM, then joined to that in another view.
At the moment, the above sample is meant to work for just one dummy value, but it's intended to be stored procedure, where the user selects the ContractID. The error I get at the moment is 'Unknown Column items_value.Contract_For
You cannot use aliases or aggregate using expressions from the SELECT clause anywhere but HAVING and ORDER BY*; you need to make the first "part" a subquery, and then JOIN to that.
It might be easier to understand, though a bit oversimplified and not precisely correct, if you look at it this way as far as order of evaluation goes...
FROM (Note: JOIN is only within a FROM)
WHERE
GROUP BY
SELECT
HAVING
ORDER BY
In actual implementation, "under the hood", most SQL implementations actually use information from each section to optimize other sections (like using some where conditions to reduce records JOINed in a FROM); but this is the conceptual order that must be adhered to.
*In some versions of MSSQL, you cannot use aliases from the SELECT in HAVING or ORDER BY either.
Your query needs to be something like this:
SELECT s.*
, `cil`.`Text` as `Contracted_Text`
FROM (
SELECT `iv`.`ContractId` as `Contract`
, `iv`.`site` as `SiteID`
, SUM(`iv`.`value`) as `Contract_For`
FROM items_value AS iv
WHERE `iv`.`ContractID`='2'
) AS s
LEFT JOIN contractitemlists AS cil ON `s`.`Contract_For` = cil.`Ref`
;
But as others have mentioned, the lack of a GROUP BY is something to be looked into; as in "what if there are multiple site values."

MS ACCESS: How to make an update query with group by clause

First, sorry for any inaccuracy in notions, as my native language is not English and I have an access to ACCESS :) only from my university (and now I'm home).
I need a help concerning MS ACCESS program.
In my minimal example have two tables:
user with three columns: id, nick and disabled (true/false type)
warnings with two columns: user_id and date
There's also a relation between these two columns, by the pair id <--> user_id.
I would like to disable all users that have 3 or more warnings.
I can easily make a select query that finds all such users, by using GROUP BY clause:
user.id / group by
warnings.user_id / count / criterion: >=3 / not visible
When I try to change it into update query, the GROUP BY clause disappear. I tried to tackle this problem by saving the select query above as QUERY1 and create another update query that uses QUERY1.
I would need something like:
update all rows from user such that id is inside QUERY1.
I tried putting such statement in two places:
a) as a criterion in user.id row:
user.id / criterion: user.id belongs to QUERY1.
user.disabled / set true
b) as an additional row:
b1)
user.id
user.disabled / set true
expression1: id belongs to QUERY1 / criterion: true
or
b2)
user.id
user.disabled / set true
expression1: look for id in QUERY1 / criterion: not null
Looking for help on the internet I found some information about subqueries, but unfortunately my attempts failed. I also found a function DLookup (I think fruitful for b2 approach). I think I don't understand the use of square brackets in ACCESS syntax.
What is important for me, the solution should be as easy and 'clickable' as possible, as I need to explain it to some group of people that are totally beginners.
I thank you for any help I'll obtain from you.
As noted in the comment above, 'group by' makes no sense directly in an update query. What you need to do is use the results of a grouped query inside an update query. That either means a separate grouped query which is referenced by your update query, or it means a grouped subquery inside your update. I'll show the latter as it is more direct. I'm not totally sure what 'clickable' means to you, but the 2 query approach should be easy enough to work out from the example with a subquery, if you prefer that instead. (subquery is more contained and terse, but introduces the concept of subqueries which might be a complication)
update user
set disabled = true
where id in
(select user_id from warnings group by user_id having count(*) > 2)

MySQL Group By and Order by not playing nicely together

I have read a few post on this, but not seeming to be able to fix my problem.
I am calling two database queries to populate two array's that run along side by side of each other, but they aren't matching, as the order that they come out is different. I believe i have something to do with the Group By, and this may require a sub query, but again a little lost...
Query 1:
SELECT count(bids_bid.total_bid), bidtime_bid, users_usr.company_usr, users_usr.id_usr
FROM bids_bid
INNER JOIN users_usr
ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY user_bid
ORDER BY bidtime_bid ASC
Query 2:
SELECT auction_bid, user_bid, bidtime_bid, bids_bid.total_bid
FROM bids_bid
WHERE auction_bid = 36
ORDER BY bidtime_bid ASC
Even though the 'Order by' is the same the results aren't matching. The users are coming out in a different sequence.
I hope this makes sense, and thanks in advance.
* Update *
I just wanted to add a bit of clarity on what the output I want is. I need to only show 1 result by one user (user_bid) the second query show all users rows. I only need the first one to show the first row entered for each user. So if I could order before the the group and by min date, that would be ace...
It's to be expected. You're fetching fields that are NOT involved in the grouping, and are not part of an aggregate function. MySQL allows such things, but generally the results of the ungrouped/unaggregated functions can be wonky.
Because MySQL is free to chose WHICH of the potentially multiple 'free' rows to choose for the actual result row, you will get different results. Generally it picks the first-encountered 'free choice' result, but that's not defined/guaranteed.
You use grouping when you want unique results in result set according to some
group id (column name). usually grouping is used with aggregate functions such as
(min, max,count,sum..).
Ordering or inner query is nothing to do with result set, i suggest read some introductory
tutorials about grouping and think/treat Sql as a set based language and most of the set theory is applied on sql you'll be fine.
So I was complicating issues that I didn't need to. The solution I found was before.
SELECT users_usr.company_usr,
users_usr.id_usr,
bids_bid.bidtime_bid, min(bidtime_bid) as minbid FROM bids_bid INNER JOIN users_usr ON bids_bid.user_bid = users_usr.id_usr
WHERE auction_bid = 36
GROUP BY id_usr
ORDER BY minbid ASC
Thanks everyone for making me look (try) harder...

Mysql count vs mysql SELECT, which one is faster?

If I want to do a check a name, I want to see how many rows/name exists in the "username" column under users table. Lets say thousands ... hundred of thousands, should I use:
count(name),
count(*) or
SELECT username FROM users where username = 'name'
Which one is the more appropriate? Or they will give same result in term of speed/response?
EDIT:
Thanks guys, I found the answer, count() will definitely faster
Is this query correct
SELECT COUNT( username )
FROM users
WHERE `username` = 'tim'
COUNT(*) and COUNT(Name) might produce different values. COUNT will not include NULL values, so if there are any instances of Name that equal NULL they will not be counted.
COUNT(*) will also perform better than Count(Name). By specifying COUNT(*) you are leaving the optimizer free to use any index it wishes. By specifying COUNT(Name) you are forcing the query engine to use the table, or at least an index that contains the NAME column.
COUNT(name) or COUNT(*) will be somewhat faster because they do not need to return much data.
(see Andrew Shepherd's reply on the semantic difference between these two forms of COUNT, as well as COUNT() ). The focus being to "check a name", these differences matter little with the following trick: Instead than count you can also use
SELECT username FROM users where username = 'name' LIMIT 1;
Which will have the effect of checking (the existence) of the name, but returning as soon at one is found.
Count(*) would be faster as MySQL engine is free to choose index to count.
Select statement produces more traffic if there lot of users with same name(lots of rows instead of one).
Try all three and use whichever preforms the best. If they all preform around the same this is an and example of premature optimization and you should probably just use whichever one you feel most comfortable with and tweak it later if necessary. If you superstitious you could also consider using count(1) which I have been told could performance advantages as well.
I would say you should use a select top 1, but your checking a username which is probably an indexed unique column so theoretically count should preform just as well considering there can only be one.