MYSQL to order before grouping by - mysql

I have the following:
user_id date_created project_id
3 10/10/2013 1
3 09/10/2013 1
5 10/10/2013 1
8 10/10/2013 1
10 10/10/2013 1
3 08/10/2013 1
The end result i want is:
user_id date_created project_id
3 10/10/2013 1
5 10/10/2013 1
8 10/10/2013 1
10 10/10/2013 1
Context:
I have this thing called an influence, and a user can have many influences for a project.
I want to get a list of the latest influence from a user on a project.
I have tried:
select * from influences
where project_id = 1
group by user_id
ORDER BY created_at DESC
but of course this ignores first ordering by user created at, and then ordering the full list. It simply just squishes the users together and orders the end list
THE LARAVEL - Eloquent FOR THE ANSWER PROVIDED IS THIS:
return Influence::select( "user_id", "influence", DB::raw( "MAX(created_at) as created_at" ) )
->where( "project_id", "=", $projectID )
->groupBy( "user_id", "project_id" )->get();

You don't want to order before group by, because given the structure of your query, it won't necessary do what you want.
If you want the most recently created influence, then get it explicitly:
select i.*
from influences i join
(select user_id, max(created_at) as maxca
from influences i
where project_id = 1
group by user_id
) iu
on iu.user_id = i.user_id and iu.maxca = i.created_at
where i.project_id = 1;
Your intention is to use a MySQL extension that the documentation explicitly warns against using. You want to include columns in the select that are not in the group by. As the documentation says:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.

Use this:
SELECT user_id, project_id, MAX(date_created) as latest
FROM influences
WHERE project_id = 1
GROUP BY user_id, project_id
How it works: MySQL selects all the rows that match the WHERE conditions and sorts them by user_id then, for each user_id by project_id. From each set of rows having the same user_id and project_id it will produce a single row in the final result set.
You can use in the SELECT clause the columns used in the GROUP BY clause (user_id and project_id); their values are unambiguous: all the rows from each group have the same user_id and project_id.
You can also use aggregate functions. Each of them uses one column from all the rows in the group to compute a single value. The most recent created_at is, of course, MAX(created_at).
If you select a column that is neither included in the GROUP BY clause, nor passed to an aggregate function (like created_at you have in your query), MySQL has no hint how to compute that value. The standard SQL forbids it (the query is not valid) but MySQL allows it. It will simply pick a value from that column but there is no way to make it pick it from a specific row because this is, in fact, undefined behaviour.
You can omit the project_id from the GROUP BY clause because the WHERE clause will make all the rows having the same project_id. This will coincidentally make the result correct even if project_id does not appear in a GROUP BY clause and it's not computed using an aggregate function.
I recommend you to keep project_id into the GROUP BY clause. It doesn't affect the result or the query speed and it allows you to loose the filtering conditions (f.e. use WHERE project_id IN (1, 2)) always get the correct result (this doesn't happen if you remove it from GROUP BY).

Related

Mysql match exact set of result in a single table

I am having trouble with generating result set. This is what my 'user_roles' table looks like,
id user_id role_id
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 3 3
... ... ...
I want this result where user has exact both roles i:e 1 and 2, I do not want those user having roles other than 1,2.
id user_id role_id
1 1 1
2 1 2
... ... ...
I have tried so far,
SELECT
*
FROM
`user_roles`
WHERE `role_id` IN (1,2)
HAVING COUNT(id) = 2
But, it returns null.
Why your query doesn't work
HAVING applies after GROUP BY and your query doesn't have one. When the query contains HAVING or GROUP BY aggregate functions but it doesn't contain the GROUP BY clause, a single group containing all the selected rows is created.
Before applying HAVING, your query selects the rows having id in 1..5 (i.e. 5 rows). A single group is created from them, COUNT(id) returns 5 and the HAVING condition doesn't match. That's why the query doesn't return anything.
In order to correctly count the number of roles of each user it needs to group the records by user_id:
SELECT `user_id`
FROM `user_roles`
WHERE `role_id` IN (1, 2)
GROUP BY `user_id`
HAVING COUNT(`id`) = 2
This way, the WHERE clause selects the user having the roles 1 or 2 (but it ignores other roles), the GROUP BY clause allows the function COUNT(id) to count the number of selected roles for each user and the HAVING clause keeps only those users having both roles (1 and 2). The SELECT clause is not allowed to contain * because for the columns that are not in the GROUP BY clause, MySQL is free to pick any value it finds in the corresponding column and it may return different results on different executions of the query.
However, the query above doesn't return the values you want. It completely ignore the roles that are not 1 or 2 and it will return the user having user_id = 3.
A query that works
This query returns the users having only the roles 1 and 2 is:
SELECT `user_id`
FROM `user_roles`
GROUP BY `user_id`
HAVING COUNT(`role_id`) = 2 AND GROUP_CONCAT(`role_id`) = '1,2'
The condition COUNT(role_id) = 2 is not needed. In theory it should improve the execution speed (because counting works faster that string concatenation) but in real life it might have no impact whatsoever. The MySQL engine knows better.
Update
#martin-schneider asks in a comment:
is the order of GROUP_CONCAT(role_id) deterministic? or could it be that the result is '2,1'?
It's a very good question that has the answer in the documentation of function GROUP_CONCAT():
To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword.
The complete query is:
SELECT `user_id`
FROM `user_roles`
GROUP BY `user_id`
HAVING COUNT(`role_id`) = 2
AND GROUP_CONCAT(`role_id` ORDER BY `role_id` ASC SEPARATOR ',') = '1,2'
I omitted ORDER BY and SEPARATOR because their default values (sort ascending by the values that are concatenated and use comma as separator) are good for our needs in this query.
Important to notice
There is a limit for the length of the value computed by the GROUP_CONCAT() function. Its result is truncated to the value stored in the system variable group_concat_max_len whose default value is 1024.
This value can be increased using the SET MySQL statement before running the query:
SET group_concat_max_len = 1000000
However, for this particular query the default limit of 1024 characters is more than enough.
You could aggregate by user_id and use HAVING:
SELECT *
FROM `user_roles`
WHERE `user_id` IN (SELECT user_id
FROM `user_roles`
GROUP BY user_id
HAVING SUM(role_id IN (1,2)) = 2
AND SUM(role_id NOT IN (1,2)) = 0);
LiveDemo*
*SQLFiddle does not respond so SQL Server equivalent
Note:
I assumed that user_id, role_id are unique and not null.

mysql return different result when using view and group by

I get a wrong result when using view and group by in mysql.
A simple table test
id name value
1 a 200
2 a 100
3 b 150
4 b NULL
5 c 120
when using normal syntax as
select * from (select * from test order by name asc, value asc ) as test group by test.name;
it returns
id name value
2 a 100
4 b NULL
5 c 120
however, if a replace the subquery as a view,
it shows different results.
create view test_view as select * from test order by name asc, value asc;
select * from test_view as test group by test.name;
it returns
id name value
1 a 200
3 b 150
5 c 120
it really bothers me, please someone give me some hint. Thanks.
Group it before and then order the result, try this, more simple same result:
select * from test group by name order by name asc, value asc
If you really need to make a subquery, it's the same, first group by:
select * from (select * from test group by name) as test order by test.name asc, test.value asc
http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html
"MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses."
There is nothing that suggests that your subquery trick should make the difference and assure the deterministic result you are hoping for.

Filter rows in a query using HAVING in MySQL

HAVING is usually used with GROUP BY, but in my query I need it so that I can filter a derived column
sample query:
SELECT
id,
NOW() < expiration_date
OR expiration_date IS NULL AS is_active
FROM
products
HAVING is_active = 1 ;
I could also use a temp table and just use WHERE instead of HAVING,
example:
SELECT id
FROM
(SELECT
id,
NOW() < expiration_date
OR expiration_date IS NULL AS is_active
FROM
products)
WHERE is_active = 1 ;
Either way, I'm getting the desired results but is it really appropriate to use HAVING even if you have no GROUP BY and just for filtering derived rows. Which one is better?
The second query is better.
BTW, as you limit your results to the expression you can even shorten it to:
SELECT
id,
1 AS is_active
FROM
products
WHERE NOW() < expiration_date OR expiration_date IS NULL;
Your first query is not good. Mainly because it's not standard SQL and may thus confuse its reader. The query is not valid in most other dbms. HAVING is for aggregated records.
The typical thing is to aggregate and GROUP BY and then filter the results with HAVING. Omitting GROUP BY would usually give you one record (as in select max(col) from mytable). HAVING would in this case filter the one result record, so you get that one or none. Example: select max(col) as maxc from mytable having maxc > 100).
In MySQL you are allowed to omit GROUP BY expressions. For instance select id, name from mytable group by id would give you the id plus a name matching that ID (and as there is usually one record per ID, you get that one name). In another dbms you would have to use an aggregate function on name, such as MIN or MAX, or have name in the GROUP BY clause. In MySQL you don't have to. Omitting it means: get one of the values in the (group's) records found.
So your first query looks a bit like: Aggregate my data (because you are using HAVING) to one record (as there is no GROUP BY clause), so you'd get one record with a random id. This is obviously not what the query does, but to tell the truth I wouldn't have been able to tell just from the look at it. I am no MySQL guy, so I would have had to try to know how it is working, had you not told us it is working as expected by you.

`Count Distinct` and `Group By` produce weird results

Please consider the following query:
SELECT artist.id, COUNT(DISTINCT artist$styles.v_id)
FROM artist
LEFT JOIN artist$styles ON artist$styles.p_id = artist.id
This is the result I get:
id count
1 4
The questions are:
How come it's only selecting one row from the artist table, when there are 4 rows in it and there are no WHERE, HAVING, LIMIT or GROUP BY clauses applied to the query?
There are only three records in artist$styles having p_id of value 1, why is it counting 4?
Why if I add a GROUP BY clause to it I get the correct results?
SELECT artist.id, COUNT(DISTINCT artist$styles.v_id)
FROM artist
LEFT JOIN artist$styles ON artist$styles.p_id = artist.id
GROUP BY artist.id
----
id count
1 3
2 1
3 3
4 1
This all just doesn't make sense to me. Could this be a bug of MySQL? I'm running Community 5.5.25a
As stated in the manual page on aggregate functions (of which COUNT() is one):
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
As stated in the manual page on GROUP BY with hidden columns:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In other words, the server has chosen one (indeterminate) value for column p_id, which happens in this case to be the value 1, whilst it has properly aggregated and counted the result for the COUNT() function.
Because you are then grouping on the correct columns, rather than on all rows.
It's not a bug; this behaviour is documented and by design.
It is a possible bug in Mysql. All non aggeregate columns should be included in Group by clause. MySQL does not force this and the result is unpredictable and hard to debug. As a rule always include all non-aggregate columns in the Group by clause. This is how all RDBMSs work
Count Function return single row result if you are not using group by clause and that's why its returning one row.
2.In your output
id count
1 4
4 is total no of results in that table not result for id 1.and it display in front of 1 because only one row produce.
3.when you use group by then a group of that column value is created that's why you get that output.
And finally its not a bug.Mysql provide a proper documentation for that you can read on mysql site.

mysql ORDER BY MIN() not matching up with id

I have a database that has the following columns:
-------------------
id|domain|hit_count
-------------------
And I would like to perform this query on it:
SELECT id,MIN(hit_count)
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY MIN(hit_count)
I would like this query to give me the id of the row that had the smallest hit_count for $domain. The only problem is that if I have two rows that have the same domain, say www.bestbuy.com, the query will just group by whichever one came first, and then although I will get the correct lowest hit_count, the id may or may not be the id of the row that has the lowest hit_count.
Does anyone know of a way for me to perform this query and to get the id that matches up with MIN(hit_count)? Thanks!
Try this:
SELECT id,MIN(hit_count),domain FROM table GROUP BY domain HAVING domain='$domain'
See, when you're using aggregates, either via aggregate functions (and min() is such a function) or via GROUP BY or HAVING operators, your data is being grouped. In your case it is grouped by domain. You have 2 fields in your select list, id and min(hit_count).
Now, for each group database knows which hit_count to pick, as you've specified this explicitly via the aggregate function. But what about id — which one should be included?
MySQL internally wraps such fields into max() aggregate function, which I find an error prone approach. In all other RDBMSes you will get an error for such a query.
The rule is: if you use aggregates, then all columns should be either arguments of aggregate functions or arguments of GROUP BY operator.
To achieve the desired result, you need a subquery:
SELECT id, domain, hit_count
FROM `table`
WHERE domain = '$domain'
AND hit_count = (SELECT min(hit_count) FROM `table` WHERE domain = '$domain');
I've used backticks, as table is a reserved word in SQL.
SELECT
id,
hit_count
FROM
table
WHERE
domain='$domain'
AND hit_count = (SELECT MIN(hit_count) FROM table WHERE domain='$domain')
Try this:
SELECT id,hit_count
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY hit_count ASC;
This should also work:
select id, MIN(hit_count) from table where domain="$domain";
I had same question. Please see that question below.
min(column) is not returning me correct data of other columns
You are using a GROPU BY. Which means each row in result represents a group of values.
One of those values is the group name (the value of the field you grouped by). The rest are arbitrary values from within that group.
For example the following table:
F1 | F2
1 aa
1 bb
1 cc
2 gg
2 hh
If u will group by F1: SELECT F1,F2 from T GROUP BY F1
You will get two rows:
1 and one value from (aa,bb,cc)
2 and one value from (gg,hh)
If u want a deterministic result set, you need to tell the software what algorithem to apply to the group. Several for example:
MIN
MAX
COUNT
SUM
etc etc
There is a most simplist way your query is OK just modify it with DESC keyword after GROUP BY domain
SELECT
id,
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
Explanation:
When you use group by with aggregate function it always selects the first record but if you restrict it with desc keyword it will select the lowest or last record of that group.
For testing puspose use this query that has only group_concat added.
SELECT
group_concat(id),
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
If you can have duplicated domains group by id:
SELECT id,MIN(hit_count)
FROM domain WHERE domain='$domain'
GROUP BY id ORDER BY MIN(hit_count)