Mysql match exact set of result in a single table - mysql

I am having trouble with generating result set. This is what my 'user_roles' table looks like,
id user_id role_id
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 3 3
... ... ...
I want this result where user has exact both roles i:e 1 and 2, I do not want those user having roles other than 1,2.
id user_id role_id
1 1 1
2 1 2
... ... ...
I have tried so far,
SELECT
*
FROM
`user_roles`
WHERE `role_id` IN (1,2)
HAVING COUNT(id) = 2
But, it returns null.

Why your query doesn't work
HAVING applies after GROUP BY and your query doesn't have one. When the query contains HAVING or GROUP BY aggregate functions but it doesn't contain the GROUP BY clause, a single group containing all the selected rows is created.
Before applying HAVING, your query selects the rows having id in 1..5 (i.e. 5 rows). A single group is created from them, COUNT(id) returns 5 and the HAVING condition doesn't match. That's why the query doesn't return anything.
In order to correctly count the number of roles of each user it needs to group the records by user_id:
SELECT `user_id`
FROM `user_roles`
WHERE `role_id` IN (1, 2)
GROUP BY `user_id`
HAVING COUNT(`id`) = 2
This way, the WHERE clause selects the user having the roles 1 or 2 (but it ignores other roles), the GROUP BY clause allows the function COUNT(id) to count the number of selected roles for each user and the HAVING clause keeps only those users having both roles (1 and 2). The SELECT clause is not allowed to contain * because for the columns that are not in the GROUP BY clause, MySQL is free to pick any value it finds in the corresponding column and it may return different results on different executions of the query.
However, the query above doesn't return the values you want. It completely ignore the roles that are not 1 or 2 and it will return the user having user_id = 3.
A query that works
This query returns the users having only the roles 1 and 2 is:
SELECT `user_id`
FROM `user_roles`
GROUP BY `user_id`
HAVING COUNT(`role_id`) = 2 AND GROUP_CONCAT(`role_id`) = '1,2'
The condition COUNT(role_id) = 2 is not needed. In theory it should improve the execution speed (because counting works faster that string concatenation) but in real life it might have no impact whatsoever. The MySQL engine knows better.
Update
#martin-schneider asks in a comment:
is the order of GROUP_CONCAT(role_id) deterministic? or could it be that the result is '2,1'?
It's a very good question that has the answer in the documentation of function GROUP_CONCAT():
To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword.
The complete query is:
SELECT `user_id`
FROM `user_roles`
GROUP BY `user_id`
HAVING COUNT(`role_id`) = 2
AND GROUP_CONCAT(`role_id` ORDER BY `role_id` ASC SEPARATOR ',') = '1,2'
I omitted ORDER BY and SEPARATOR because their default values (sort ascending by the values that are concatenated and use comma as separator) are good for our needs in this query.
Important to notice
There is a limit for the length of the value computed by the GROUP_CONCAT() function. Its result is truncated to the value stored in the system variable group_concat_max_len whose default value is 1024.
This value can be increased using the SET MySQL statement before running the query:
SET group_concat_max_len = 1000000
However, for this particular query the default limit of 1024 characters is more than enough.

You could aggregate by user_id and use HAVING:
SELECT *
FROM `user_roles`
WHERE `user_id` IN (SELECT user_id
FROM `user_roles`
GROUP BY user_id
HAVING SUM(role_id IN (1,2)) = 2
AND SUM(role_id NOT IN (1,2)) = 0);
LiveDemo*
*SQLFiddle does not respond so SQL Server equivalent
Note:
I assumed that user_id, role_id are unique and not null.

Related

Optimize range query with group by

Having trouble with a query. Here is the outline -
Table structure:
CREATE TABLE `world` (
`placeRef` int NOT NULL,
`forenameRef` int NOT NULL,
`surnameRef` int NOT NULL,
`incidence` int NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb3;
ALTER TABLE `world`
ADD KEY `surnameRef_forenameRef` (`surnameRef`,`forenameRef`),
ADD KEY `forenameRef_surnameRef` (`forenameRef`,`surnameRef`),
ADD KEY `forenameRef` (`forenameRef`,`placeRef`);
COMMIT;
This table contains data like and has over 600,000,000 rows:
placeRef forenameRef surnameRef incidence
1 1 2 100
2 1 3 600
This represents the number of people with a given forename-surname combination in a place.
I would like to be able to query all the forenames that a surname is attached to; and then perform another search for where those forenames exist, with a count of the sum incidence. For Example: get all the forenames of people who have the surname "Smith"; then get a list of all those forenames, grouped by place and with the sum incidence. I can do this with the following query:
SELECT placeRef, SUM( incidence )
FROM world
WHERE forenameRef IN
(
SELECT DISTINCT forenameRef
FROM world
WHERE surnameRef = 214488
)
GROUP BY world.placeRef
However, this query takes about a minute to execute and will take more time if the surname being searched for is common.
The root problem is: performing a range query with a group doesn't utilize the full index.
Any suggestions how the speed could be improved?
In my experience, if your query has a range condition (i.e. any kind of predicate other than = or IS NULL), the column for that condition is the last column in your index that can be used to optimize search, sort, or grouping.
In other words, suppose you have an index on columns (a, b, c).
The following uses all three columns. It is able to optimize the ORDER BY c, because since all rows matching the specific values of a and b will by definition be tied, and then those matching rows will already be in order by c, so the ORDER BY is a no-op.
SELECT * FROM mytable WHERE a = 1 AND b = 2 ORDER BY c;
But the next example only uses columns a, b. The ORDER BY needs to do a filesort, because the index is not in order by c.
SELECT * FROM mytable WHERE a = 1 AND b > 2 ORDER BY c;
A similar effect is true for GROUP BY. The following uses a, b for row selection, and it can also optimize the GROUP BY using the index, because each group of values per distinct value of c is guaranteed to be grouped together in the index. So it can count the rows for each value of c, and when it's done with one group, it is assured there will be no more rows later with that value of c.
SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b = 2 GROUP BY c;
But the range condition spoils that. The rows for each value of c are not grouped together. It's assumed that the rows for each value of c may be scattered among each of the higher values of b.
SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b > 2 GROUP BY c;
In this case, MySQL can't optimize the GROUP BY in this query. It must use a temporary table to count the rows per distinct value of c.
MySQL 8.0.13 introduced a new type of optimizer behavior, the Skip Scan Range Access Method. But as far as I know, it only applies to range conditions, not ORDER BY or GROUP BY.
It's still true that if you have a range condition, this spoils the index optimization of ORDER BY and GROUP BY.
Unless I don't understand the task, it seems like this works:
SELECT placeRef, SUM( incidence )
FROM world
WHERE surnameRef = 214488
GROUP BY placeRef;
Give it a try.
It would benefit from a composite index in this order:
INDEX(surnameRef, placeRef, incidence)
Is incidence being updated a lot? If so, leave it off my Index.
You should consider moving from MyISAM to InnoDB. It will need a suitable PK, probably
PRIMARY KEY(placeRef, surnameRef, forenameRef)
and it will take 2x-3x the disk space.

Why does SQL LIMIT clause returns random rows for every query?

It is a very simple query. For every query, I get a different result. Similar things happen when I used TOP 1. I would like a random sub-sample and it works. But am I missing something? Why does it return a different value every time?
SELECT DISTINCT user_id FROM table1
where day_id>="2009-01-09" and day_id<"2011-02-16"
LIMIT 1;
There's no guarantee that you will get a random result with your query. It's quite likely you'll get the same result each time (although the actual result returned will be indeterminate). To guarantee that you get a random, unique user_id, you should SELECT a random value from the list of DISTINCT values:
SELECT user_id
FROM (SELECT DISTINCT user_id
FROM table1
WHERE day_id >= "2009-01-09" AND day_id < "2011-02-16"
) u
ORDER BY RAND()
LIMIT 1
SQL statements represent unordered sets, add order by clause such as
...
ORDER BY user_id
LIMIT 1

mysql return different result when using view and group by

I get a wrong result when using view and group by in mysql.
A simple table test
id name value
1 a 200
2 a 100
3 b 150
4 b NULL
5 c 120
when using normal syntax as
select * from (select * from test order by name asc, value asc ) as test group by test.name;
it returns
id name value
2 a 100
4 b NULL
5 c 120
however, if a replace the subquery as a view,
it shows different results.
create view test_view as select * from test order by name asc, value asc;
select * from test_view as test group by test.name;
it returns
id name value
1 a 200
3 b 150
5 c 120
it really bothers me, please someone give me some hint. Thanks.
Group it before and then order the result, try this, more simple same result:
select * from test group by name order by name asc, value asc
If you really need to make a subquery, it's the same, first group by:
select * from (select * from test group by name) as test order by test.name asc, test.value asc
http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html
"MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses."
There is nothing that suggests that your subquery trick should make the difference and assure the deterministic result you are hoping for.

MYSQL to order before grouping by

I have the following:
user_id date_created project_id
3 10/10/2013 1
3 09/10/2013 1
5 10/10/2013 1
8 10/10/2013 1
10 10/10/2013 1
3 08/10/2013 1
The end result i want is:
user_id date_created project_id
3 10/10/2013 1
5 10/10/2013 1
8 10/10/2013 1
10 10/10/2013 1
Context:
I have this thing called an influence, and a user can have many influences for a project.
I want to get a list of the latest influence from a user on a project.
I have tried:
select * from influences
where project_id = 1
group by user_id
ORDER BY created_at DESC
but of course this ignores first ordering by user created at, and then ordering the full list. It simply just squishes the users together and orders the end list
THE LARAVEL - Eloquent FOR THE ANSWER PROVIDED IS THIS:
return Influence::select( "user_id", "influence", DB::raw( "MAX(created_at) as created_at" ) )
->where( "project_id", "=", $projectID )
->groupBy( "user_id", "project_id" )->get();
You don't want to order before group by, because given the structure of your query, it won't necessary do what you want.
If you want the most recently created influence, then get it explicitly:
select i.*
from influences i join
(select user_id, max(created_at) as maxca
from influences i
where project_id = 1
group by user_id
) iu
on iu.user_id = i.user_id and iu.maxca = i.created_at
where i.project_id = 1;
Your intention is to use a MySQL extension that the documentation explicitly warns against using. You want to include columns in the select that are not in the group by. As the documentation says:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
Use this:
SELECT user_id, project_id, MAX(date_created) as latest
FROM influences
WHERE project_id = 1
GROUP BY user_id, project_id
How it works: MySQL selects all the rows that match the WHERE conditions and sorts them by user_id then, for each user_id by project_id. From each set of rows having the same user_id and project_id it will produce a single row in the final result set.
You can use in the SELECT clause the columns used in the GROUP BY clause (user_id and project_id); their values are unambiguous: all the rows from each group have the same user_id and project_id.
You can also use aggregate functions. Each of them uses one column from all the rows in the group to compute a single value. The most recent created_at is, of course, MAX(created_at).
If you select a column that is neither included in the GROUP BY clause, nor passed to an aggregate function (like created_at you have in your query), MySQL has no hint how to compute that value. The standard SQL forbids it (the query is not valid) but MySQL allows it. It will simply pick a value from that column but there is no way to make it pick it from a specific row because this is, in fact, undefined behaviour.
You can omit the project_id from the GROUP BY clause because the WHERE clause will make all the rows having the same project_id. This will coincidentally make the result correct even if project_id does not appear in a GROUP BY clause and it's not computed using an aggregate function.
I recommend you to keep project_id into the GROUP BY clause. It doesn't affect the result or the query speed and it allows you to loose the filtering conditions (f.e. use WHERE project_id IN (1, 2)) always get the correct result (this doesn't happen if you remove it from GROUP BY).

MYSQL Finding count of duplicate value

In below query there is a column called lead_id i want to find count of duplicate lead_id
in my result if there are 10 unique lead id i must get 10
but it must be grouped based on created_time ie.. if there are 2 unique lead_id for today's date then the result would be 2 ..
select
t.created_time,
t.timecreated,
sum(t.suggested_pending_cnt),
sum(t.suggested_dropped_cnt)
from
(select
date_format(timecreated, '%d-%b-%Y') created_time,
timecreated,
case
when source = 2 then 1
else 0
end suggested_pending_cnt,
case
when (source = 2 && directory_status = 4) then 1
else 0
end suggested_dropped_cnt
from
mg_lead_suggested_listing) t
group by t.created_time
order by t.timecreated desc
limit 10
It looks like u want to get the unique ids for a particular date not duplicate
Following is the query to count duplicate records with first_name and last_name in a table.
mysql> SELECT COUNT(*) as repetitions, last_name, first_name
-> FROM person_tbl
-> GROUP BY last_name, first_name
-> HAVING repetitions > 1;
This query will return a list of all the duplicate records in person_tbl table. In general, to identify sets of values that are duplicated, do the following:
Determine which columns contain the values that may be duplicated.
List those columns in the column selection list, along with COUNT(*).
List the columns in the GROUP BY clause as well.
Add a HAVING clause that eliminates unique values by requiring group counts to be greater than one.