Anyone can explain behind this logic in MySQL IN clause - mysql

Anyone can explain behind this logic in MySQL IN clause and help me understand this issue
I have a user table and this table users are belongs to one or many groups.
The group table primary key reference is updated in users table by comma(,) separated values as follows
Query 1. SELECT * FROM user;
+---------+-----------+-------------------------+-----------+
| user_id | user_name | user_email | group_id |
+---------+-----------+-------------------------+-----------+
| 1 | suresh | xxxx#yyyyyyyyyy.com | 22 |
| 2 | sundar | s7sundera#gmail.com | 2 |
| 3 | tester | xxxxxxxx#yyyyyyyyyy.com | 1,2,3,4 |
| 4 | gail | zzzzzz#gmail.com | 1,2,3,4,5 |
+---------+-----------+-------------------------+-----------+
If I use IN clause and group id value as 2 in MySQL I got only one result
Query 2. SELECT * FROM user WHERE group_id IN(2)
+---------+-----------+---------------------+----------+
| user_id | user_name | user_email | group_id |
+---------+-----------+---------------------+----------+
| 2 | sundar | s7sundera#gmail.com | 2 |
+---------+-----------+---------------------+----------+
If I use IN clause and group id value as (1,2) in MySQL I got three results
Query 3. SELECT * FROM user WHERE group_id IN(1,2)
+---------+-----------+-------------------------+-----------+
| user_id | user_name | user_email | group_id |
+---------+-----------+-------------------------+-----------+
| 2 | sundar | s7sundera#gmail.com | 2 |
| 3 | tester | xxxxxxxx#yyyyyyyyyy.com | 1,2,3,4 |
| 4 | gail | zzzzzz#gmail.com | 1,2,3,4,5 |
+---------+-----------+-------------------------+-----------+
I want to get group id 2 users like following output but it is not working as expected
If I use this query I need to get query 3 results is it possible?
SELECT * FROM user WHERE group_id IN(2)

This is too long to be a comment, but you need to reconsider your current table design. You should not be storing the group_id values as a comma separated list.
Your tables should be structured similar to the following:
create table user
(
user_id int, PK
user_name varchar(50),
user_email varchar(100)
);
create table groups
(
group_id int, PK
group_name varchar(10)
);
create table user_group
(
user_id int,
group_id int
);
The user_group table will have a Primary Key of both the user_id and the group_id so you cannot get duplicates and then these columns should be foreign keys to the respective tables. This table will allow you to have multiple groups for each user_id.
Then when you query your tables, the query will be:
select u.user_id,
u.user_name,
u.user_email,
g.group_id
from user u
inner join user_group ug
on u.user_id = ug.user_id
inner join groups g
on ug.group_id = g.group_id
See SQL Fiddle with Demo.
If you needed to for display purposes show the group_id values in a comma separated list, you can use GROUP_CONCAT():
select u.user_id,
u.user_name,
u.user_email,
group_concat(g.group_id order by g.group_id) group_id
from user u
inner join user_group ug
on u.user_id = ug.user_id
inner join groups g
on ug.group_id = g.group_id
group by u.user_id, u.user_name, u.user_email
See SQL Fiddle with Demo
If you redesign your tables, then when you search it becomes much easier:
select u.user_id,
u.user_name,
u.user_email,
g.group_id
from user u
inner join user_group ug
on u.user_id = ug.user_id
inner join groups g
on ug.group_id = g.group_id
where g.group_id in (1, 2)
See SQL Fiddle with Demo

When passing 1,2 to the IN operator, you're asking for 1 and 2; this is why it will return all three results. If you have a column with comma separated values, you're violating normal form; as each column should not contain more than one value. If you want to find a single value in a multi-valued comma separated column, then you can use FIND_IN_SET.
A normalized schema would look like:
+---------+-----------+-------------------------+
| user_id | user_name | user_email |
+---------+-----------+-------------------------+
| 2 | sundar | s7sundera#gmail.com |
| 3 | tester | xxxxxxxx#yyyyyyyyyy.com |
| 4 | gail | zzzzzz#gmail.com |
+---------+-----------+-------------------------+
+---------+-----------+
| user_id | group_id |
+---------+-----------+
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
| 4 | 1 |
| 4 | 2 |
| 4 | 3 |
| 4 | 4 |
| 4 | 5 |
+---------+-----------+
+----------+
| group_id |
+----------+
| 1 |
| 2 |
| 3 |
| 4 |
+----------+

MySQL doesn't treat comma separated lists as anything more than just a string. When you do WHERE group_id IN(2), it converts group_id to an INT, so it can compare it with 2.
When casting to an INT, MySQL stops at the first non-number character.
For example, '1,2,3,4,5' IN (2) becomes 1 IN (2). Which is FALSE.
You can try to use FIND_IN_SET to do what you want, but it's not very efficient (because it can't use indexes; it need to read every single row to see if it matches).
WHERE FIND_IN_SET(2, group_id)
To search for multiple rows, use OR.
WHERE FIND_IN_SET(1, group_id) OR FIND_IN_SET(2, group_id)
The correct way to do this, is to create a "link table" that contains one (or more) rows for each user, showing what group(s) they are in.

EXPLANATION
What is the logic of the query SELECT * FROM user WHERE group_id IN(1,2); ?
You gave a list of numbers (1,2)
The groud_id was being compare numerically
Anything that numerically matched 1 or 2 up to the first comma came up as a result
SUGGESTION
What I am about to present to you may seem rather unorthodox but please follow me...
Here is the query that will get every row that has both 1 and 2 in group_ids:
SELECT user.* FROM
(SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
FROM user) U WHERE LOCATE(',2,',group_ids)) U1
INNER JOIN
(SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
FROM user) U WHERE LOCATE(',4,',group_ids)) U2
ON U1.id = U2.id
INNER JOIN user ON user.id = U2.id;
Here is the code create our sample data
DROP DATABASE IF EXISTS sundar;
CREATE DATABASE sundar;
use sundar
CREATE TABLE user
(
id int not null auto_increment,
user_name VARCHAR(30),
user_email VARCHAR(70),
group_id VARCHAR(128),
PRIMARY KEY (id)
);
INSERT INTO user (user_name,user_email,group_id) VALUES
('suresh' , 'xxxx#yyyyyyyyyy.com' ,'22'),
('sundar' , 's7sundera#gmail.com' ,'2'),
('tester' , 'xxxxxxxx#yyyyyyyyyy.com' ,'1,2,3,4'),
('gail' , 'zzzzzz#gmail.com' ,'1,2,3,4,5');
SELECT * FROM user;
Let's create your sample
mysql> DROP DATABASE IF EXISTS sundar;
Query OK, 1 row affected (0.00 sec)
mysql> CREATE DATABASE sundar;
Query OK, 1 row affected (0.00 sec)
mysql> use sundar
Database changed
mysql> CREATE TABLE user
-> (
-> id int not null auto_increment,
-> user_name VARCHAR(30),
-> user_email VARCHAR(70),
-> group_id VARCHAR(128),
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO user (user_name,user_email,group_id) VALUES
-> ('suresh' , 'xxxx#yyyyyyyyyy.com' ,'22'),
-> ('sundar' , 's7sundera#gmail.com' ,'2'),
-> ('tester' , 'xxxxxxxx#yyyyyyyyyy.com' ,'1,2,3,4'),
-> ('gail' , 'zzzzzz#gmail.com' ,'1,2,3,4,5');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql>
and here is what it looks like
mysql> SELECT * FROM user;
+----+-----------+-------------------------+-----------+
| id | user_name | user_email | group_id |
+----+-----------+-------------------------+-----------+
| 1 | suresh | xxxx#yyyyyyyyyy.com | 22 |
| 2 | sundar | s7sundera#gmail.com | 2 |
| 3 | tester | xxxxxxxx#yyyyyyyyyy.com | 1,2,3,4 |
| 4 | gail | zzzzzz#gmail.com | 1,2,3,4,5 |
+----+-----------+-------------------------+-----------+
4 rows in set (0.00 sec)
mysql>
Again, here is the messy query that will get what you want:
SELECT user.* FROM
(SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
FROM user) U WHERE LOCATE(',1,',group_ids)) U1
INNER JOIN
(SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
FROM user) U WHERE LOCATE(',2,',group_ids)) U2
ON U1.id = U2.id
INNER JOIN user ON user.id = U2.id;
Here it is executed:
mysql> SELECT user.* FROM
-> (SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
-> FROM user) U WHERE LOCATE(',1,',group_ids)) U1
-> INNER JOIN
-> (SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
-> FROM user) U WHERE LOCATE(',2,',group_ids)) U2
-> ON U1.id = U2.id
-> INNER JOIN user ON user.id = U2.id;
+----+-----------+-------------------------+-----------+
| id | user_name | user_email | group_id |
+----+-----------+-------------------------+-----------+
| 3 | tester | xxxxxxxx#yyyyyyyyyy.com | 1,2,3,4 |
| 4 | gail | zzzzzz#gmail.com | 1,2,3,4,5 |
+----+-----------+-------------------------+-----------+
2 rows in set (0.00 sec)
mysql>
OK, how about looking for (2,4) ?
mysql> SELECT user.* FROM
-> (SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
-> FROM user) U WHERE LOCATE(',2,',group_ids)) U1
-> INNER JOIN
-> (SELECT * FROM (SELECT id,CONCAT(',',group_id ,',') group_ids
-> FROM user) U WHERE LOCATE(',4,',group_ids)) U2
-> ON U1.id = U2.id
-> INNER JOIN user ON user.id = U2.id;
+----+-----------+-------------------------+-----------+
| id | user_name | user_email | group_id |
+----+-----------+-------------------------+-----------+
| 3 | tester | xxxxxxxx#yyyyyyyyyy.com | 1,2,3,4 |
| 4 | gail | zzzzzz#gmail.com | 1,2,3,4,5 |
+----+-----------+-------------------------+-----------+
2 rows in set (0.00 sec)
mysql>
Looks like it works.
Give it a Try !!!

Related

Cross distinct column values with other table

From the distinct values of 'USER' that have in PERMISSIONS table (query_1), I intend to cross information with the query values from the table of ACCESS_CONTROL (query_2) to know how long each 'USER' with access permissions has not been loged-in.
I intend to cross query_1 with query_1 through the 'USER' key field.
How it's possible to do?
query_1:
SELECT DISTINCT(`USER`) FROM `PERMISSIONS`;
query_2:
SELECT
`USER`,
MAX(`REGISTRY_DATE`) AS MAX_REGISTRY_DATE,
DATEDIFF(CURDATE(),MAX(`REGISTRY_DATE`)) AS DIFFERENCE_IN_DAYS
FROM `ACCESS_CONTROL`
WHERE STATUS = 'Access Allowed'
GROUP BY `USER` ORDER BY DIFFERENCE_IN_DAYS DESC;
Expected Results: https://imgur.com/a/f5KQXWC
With a left join of the 1st query to the 2nd:
select
u.user,
coalesce(a.registry_date, 'never') max_registry_date,
coalesce(a.difference_in_days, 'never') difference_in_days
from (
select distinct user
from permissions
) u left join (
select user, max(registry_date) registry_date,
datediff(curdate(), max(registry_date)) difference_in_days
from access_control
where `status` = 'Access Allowed'
group by user
) a on a.user = u.user
See the demo.
Results:
| user | max_registry_date | difference_in_days |
| -------- | ----------------- | ------------------ |
| john | 2019-09-06 | 0 |
| susan | 2019-09-01 | 5 |
| mike | 2019-08-06 | 31 |
| anderson | never | never |

Query to get subjects of interest for all User Y where Y shares >=3 interests with a User X

These are two tables from a part of supposed Twitter like database where users can follow other users. The User.name field is unique.
mysql> select uID, name from User;
+-----+-------------------+
| uID | name |
+-----+-------------------+
| 1 | Alice |
| 2 | Bob |
| 5 | Iron Maiden |
| 4 | Judas Priest |
| 6 | Lesser Known Band |
| 3 | Metallica |
+-----+-------------------+
6 rows in set (0.00 sec)
mysql> select * from Follower;
+-----------+------------+
| subjectID | observerID |
+-----------+------------+
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
| 6 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 2 |
+-----------+------------+
7 rows in set (0.00 sec)
mysql> call newFollowSuggestionsForName('Bob');
+-------------------+
| name |
+-------------------+
| Lesser Known Band |
+-------------------+
1 row in set (0.00 sec)
I want to make an operation that will suggest for a user X a list of users they may be interested in following. I thought one heuristic could be to show X for all y who user y follows where X and y follow at least 3 of the same Users. Below is the SQL I came up with to do this. My question is if it could be done more efficiently or nicer in some other ways.
DELIMITER //
CREATE PROCEDURE newFollowSuggestionsForName(IN in_name CHAR(60))
BEGIN
DECLARE xuid INT;
SET xuid = (select uID from User where name=in_name);
select name
from User, (select subjectID
from follower
where observerID in (
select observerID
from Follower
where observerID<>xuid and subjectID in (select subjectID from Follower where observerID=xuid)
group by observerID
having count(*)>=3
)
) as T
where uID = T.subjectID and not exists (select * from Follower where subjectID=T.subjectID and observerID=xuid);
END //
DELIMITER ;
Consider the following refactored SQL code (untested without data) for use in stored procedure.
select u.`name`
from `User` u
inner join
(select subf.observerID, subf.subjectID
from follower subf
where subf.observerID <> xuid
) f
on u.UID = f.subjectID
inner join
(select f1.observerID
from follower f1
inner join follower f2
on f1.subjectID = f2.subjectID
and f1.observerID <> xuid
and f2.observerID = xuid
group by f1.observerID
having count(*) >= 3
) o
on f.observerID = o.observerID
I think the basic query starts as getting all "observers" who share three "subjects" with a given observer:
select f.observerid
from followers f join
followers f2
on f.subjectid = f2.subjectid and
f2.observerid = 2
group by f.observerid
having count(*) = 3;
The rest of the query is just joining in the names to fit into your paradigm of using names for references rather than ids.

Nested query in From clause syntax and performance

I'm having two tables: one for user information, the second for mapping some relation between users (two column table with two ids, from id to id relation)
I'm trying to find for a specific userid all his users' relations ids (inner select) and then get more info about them by joining to a table which has more info to show.
Given the following error:
Error: #1064 - You have an error in your SQL syntax; check the manual list that corresponds to your mySQL server version for the right syntax to use near ') AS i Limit 0,30' at line 6
What wrong with my query?
Is this query is okay in terms of performance, or there are other way to do so?
Query:
SELECT i.*
FROM
((SELECT uc.contactId
FROM tbl_users AS u
JOIN tbl_users_contacts AS uc ON u.Id = uc.userId
WHERE uc.userId =1) AS contacts_ids JOIN tbl_users AS u
ON contacts_ids.contactId = u.Id) AS i;
Edit: Fixed to:
SELECT *
FROM
((SELECT uc.contactId
FROM tbl_users AS u
JOIN tbl_users_contacts AS uc ON u.Id = uc.userId
WHERE uc.userId =1) AS contacts_ids JOIN tbl_users AS u
ON contacts_ids.contactId = u.Id);
Don't know why the final As i was a problem, so I ask for question 2 mainly for this post.
Consider the following
mysql> create table tbl_users ( iduser int,name varchar(100),email varchar(100));
Query OK, 0 rows affected (0.10 sec)
mysql> insert into tbl_users values
-> (1,'A','a#a.com'),
-> (2,'B','b#b.com'),
-> (3,'C','c#c.com'),
-> (4,'D','d#d.com'),
-> (5,'E','e#e.com');
Query OK, 5 rows affected (0.09 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> create table contacts (iduser int, contactid int );
Query OK, 0 rows affected (0.14 sec)
mysql> insert into contacts values
-> (1,2),(1,3),(1,5),(2,1),(2,5),(3,1),(3,4);
mysql> select * from tbl_users ;
+--------+------+---------+
| iduser | name | email |
+--------+------+---------+
| 1 | A | a#a.com |
| 2 | B | b#b.com |
| 3 | C | c#c.com |
| 4 | D | d#d.com |
| 5 | E | e#e.com |
+--------+------+---------+
5 rows in set (0.00 sec)
mysql> select * from contacts ;
+--------+-----------+
| iduser | contactid |
+--------+-----------+
| 1 | 2 |
| 1 | 3 |
| 1 | 5 |
| 2 | 1 |
| 2 | 5 |
| 3 | 1 |
| 3 | 4 |
+--------+-----------+
7 rows in set (0.00 sec)
Now as we can see userid = 1 has 3 contacts and we can get them as
select u.* from tbl_users u
join contacts c on c.contactid = u.iduser
where c.iduser = 1 ;
The output will be as
+--------+------+---------+
| iduser | name | email |
+--------+------+---------+
| 2 | B | b#b.com |
| 3 | C | c#c.com |
| 5 | E | e#e.com |
+--------+------+---------+
To boost up the performance you may add the following indexes
alter table tbl_users add index userid_idx(iduser);
alter table contacts add index cu_idx(iduser,contactid);
Change the table and column name into the query as per your need.

Omiting entries with a subquery

I'm having trouble understanding how to use a subquery to remove entries from a main query. I have two tables;
mysql> select userid, username, firstname, lastname from users_accounts where (userid = 7) or (userid = 8);
+--------+----------+-----------+----------+
| userid | username | firstname | lastname |
+--------+----------+-----------+----------+
| 7 | csmith | Chris | Smith |
| 8 | dsmith | Dan | Smith |
+--------+----------+-----------+----------+
2 rows in set (0.00 sec)
mysql> select * from users_contacts where (userid = 7) or (userid = 8);
+---------+--------+-----------+-----------+---------------------+
| tableid | userid | contactid | confirmed | timestamp |
+---------+--------+-----------+-----------+---------------------+
| 4 | 7 | 7 | 0 | 2013-10-03 12:34:24 |
| 6 | 8 | 8 | 0 | 2013-10-04 09:05:00 |
| 7 | 7 | 8 | 1 | 2013-10-04 09:08:20 |
+---------+--------+-----------+-----------+---------------------+
3 rows in set (0.00 sec)
What I would like to do is pull a list of contacts from the users_accounts table that will;
1) Omit the user's own account (in other words, I don't want to see my own name in the list).
2) See all contacts that have a "confirmed" state of "0", but
3) If the contact also happens to have a "confirmed" status of "1" (request sent) or "2" (request confirmed), do not include them in the results.
How can a sub-query be written to pull anything that turns up as a 1 or 2?
Subqueries at this point do not look necessary. You could join the tables like so:
select u.userid, u., firstname, u.lastname from users_accounts u join user_contacts c on u.userid = c.userid where u.userid != your_user_id and c.confirmed = 0;
in this generic example, your_user_id is obviously a placeholder for however you determine the current user's id.
but if you absolutely must use a subquery:
select userid, username, firstname, lastname from users_accounts where userid != your_user_id and userid not in (select userid from user_contacts where confirmed = 1 or confirmed = 2);

twitter-style follower/following/friend sql query

I'm working on a twitter type of following system. I'm joining two tables, users and followers to get the first and lastname of users who are in the followers table. Then I'm running an inner join on the followers table to capture follower and friend relationships. I'm displaying the results as followers (who follows you), following (who you follow), and friends (mutual following).
With the query below, I'm only able to show the name of the user who wants to see their friends. I'd like to show the FRIENDS of the user, not the user's own name, but can't figure out how to get the users table to do double duty--that is, show me the name of the user and the name of their friend, or just the friend's name.
Thanks.
SELECT users.id, users.firstname, users.lastname, followers.follower_user_id, followers.followee_user_id
FROM users
JOIN followers ON followers.follower_user_id = users.id
INNER JOIN followers ff ON followers.followee_user_id = ff.follower_user_id AND followers.follower_user_id = ff.followee_user_id
I believe that your schema requires a union table to assemble the information you need; and it may be more efficient to do this in multiple tables. To maintain a separate table of followers with (possible) duplicate information from users may also be undesireable. A more efficient schema would be:
mysql> select * from users;
+-----+------------+---------+
| uid | fname | lname |
+-----+------------+---------+
| 1 | Phillip | Jackson |
| 2 | Another | Name |
| 3 | Some Crazy | User |
| 4 | Nameless | Person |
+-----+------------+---------+
4 rows in set (0.00 sec)
mysql> select * from follows;
+---------+-----------+
| user_id | follow_id |
+---------+-----------+
| 1 | 4 |
| 2 | 3 |
| 3 | 2 |
| 4 | 2 |
+---------+-----------+
4 rows in set (0.00 sec)
And then your query would look like:
select users.uid,
users.fname,
users.lname,
u.uid,
u.fname,
u.lname from users
inner join follows f on (f.user_id=users.uid)
inner join users u on (u.uid=f.follow_id)
Which returns:
mysql> select users.uid,
-> users.fname,
-> users.lname,
-> u.uid,
-> u.fname,
-> u.lname from users
-> inner join follows f on (f.user_id=users.uid)
-> inner join users u on (u.uid=f.follow_id);
+-----+------------+---------+-----+------------+--------+
| uid | fname | lname | uid | fname | lname |
+-----+------------+---------+-----+------------+--------+
| 1 | Phillip | Jackson | 4 | Nameless | Person |
| 4 | Nameless | Person | 2 | Another | Name |
| 2 | Another | Name | 3 | Some Crazy | User |
| 3 | Some Crazy | User | 2 | Another | Name |
+-----+------------+---------+-----+------------+--------+
4 rows in set (0.00 sec)
SELECT u.id, u.first_name, u.last_name, uf.id, uf.first_name, uf.last_name
FROM users u
JOIN followers f
ON f.follower_user_id = u.id
JOIN followers ff
ON (ff.followee_user_id, ff.follower_user_id) = (f.follower_user_id, f.followee_user_id)
JOIN users uf
ON uf.id = f.followee_user_id