compare values of cells in MySQL - mysql

I have a database as:
Student (ID, Name, Grade)
Likes (ID1, ID2)
Where ID1 and ID2 in last table are foreign key referenced student(ID)
Note: Liking isn't a mutual relation, e.g its not necessary that if (123, 456) is in Likes table, then (456,123) is also in Likes table.
I have to write query for the following statement:
"For every pair of students, who both like each other, return the name and grade of both students. Include each pair only once, with the two names in alphabetical order."
So far I have given the data in which ID1 and ID2 mutually like each other:
SELECT s1.ID, s1.name, s2.ID, s2.name
FROM student s1, student s2, likes l
WHERE s1.ID = l.ID1 AND s2.ID = l.ID2
AND l.ID1 IN (SELECT ID2 FROM likes)
AND l.ID2 IN (SELECT ID1 FROM likes);
Someone kindly help me how to avoid duplicate pairs.
Database is: (If someone needs it)
INSERT INTO `student` VALUES (1025,'John',12),(1101,'Haley',10),(1247,'Alexis',11),(1304,'Jordan',12),(1316,'Austin',11),(1381,'Tiffany',9),(1468,'Kris',10),(1501,'Jessica',11),(1510,'Jordan',9),(1641,'Brittany',10),(1661,'Logan',12),(1689,'Gabriel',9),(1709,'Cassandra',9),(1782,'Andrew',10),(1911,'Gabriel',11),(1934,'Kyle',12);
INSERT INTO `likes` VALUES (1689,1709),(1709,1689),(1782,1709),(1911,1247),(1247,1468),(1641,1468),(1316,1304),(1501,1934),(1934,1501),(1025,1101);
and according to data entered:
DATA I GET
1689 Gabriel 1709 Cassandra
1709 Cassandra 1689 Gabriel
1501 Jessica 1934 Kyle
1934 Kyle 1501 Jessica
IDEAL DATA
1689 Gabriel 1709 Cassandra
1501 Jessica 1934 Kyle

Since the question is "how to avoid duplicate pairs.":
You join 2 tables to get the ones where they both like each other, you will get 2 rows for each pair.
You can discard one by comparing against some distinct value. ID is a great candidate:
select * -- put fields here
from likes li
join likes li2 on li2.ID1 = li.ID2 and li2.ID2 = li.ID1
-- join 2 students here
where li.ID1 < li.ID2

Try below query:
SELECT L1.*
FROM `likes` l1
LEFT JOIN `likes` l2 ON l1.id1 = l2.id2 AND l1.id2 = l2.id1
WHERE l2.id2 IS NOT NULL
GROUP BY l1.id1 - l1.id2
HAVING l1.id1 - l1.id2 < 0

Related

How to retrieve all the data from the Left Table while using Group by on 2nd table with Having Condition

Consider three tables,
Team
Members (Each member belongs to some team)
Tasks (each task is performed by some member)
Tasks Table
t_id member_id
1 1
2 1
3 2
4 1
Members Table
id name team_id
1 Ali 1
2 Khalil 1
3 Bilal 1
4 John 2
5 Smith 2
Now the result I want is the complete details of the Members Table of A PARTICULAR TEAM along with the Number of Total Tasks each member has performed.
To solve this, I wrote this query,
select m.*, count(t.member_id) as 'Tasks'
from tbl_member m
left join tbl_task t on m.m_id = t.member_id
group by t.member_id
having m.team_id = :team_id
where team_id can be any variable given by the user.
When I run this query for team_id = 1, I get these results (only printing Member Names and his total tasks)
m_name Tasks
Ali 3
Khalil 1
As you can see, it skips Bilal who is also part of Team_ID = 1 but because he has performed 0 Tasks, it doesn't print Bilal (even though I used left join)
Similarly, if I use Team_ID = 2, I get these reuslts,
m_name Tasks
John 0
It now prints John (who has done 0 Tasks) but it doesn't print Smith who also is part of Team 2 but has not done any task.
So, basically, the query is missing all those people who have done 0 tasks (unless all team members have done 0 tasks. In such a case, it only prints the first member of that team and skips the other, like in the case of Team ID = 2)
Can anyone please tell me how do I fix this? I want to print all the members of one team along with their count, even if their total task count is zero. Please note that it is not compulsory that this must be done using Joins. This can also be done with Subqueries but again, I couldn't make the right logic with subqueries either.
You can use subquery to get the number of task done without any left join or group by clause.
DB-Fiddle:
Schema and insert statements:
create table tbl_task(t_id int, member_id int);
insert into tbl_task values(1, 1);
insert into tbl_task values(2, 1);
insert into tbl_task values(3, 2);
insert into tbl_task values(4, 1);
create table tbl_member(id int, name varchar(100), team_id int);
insert into tbl_member values(1, 'Ali' ,1);
insert into tbl_member values(2, 'Khalil' ,1);
insert into tbl_member values(3, 'Bilal' ,1);
insert into tbl_member values(4, 'John' ,2);
insert into tbl_member values(5, 'Smith' ,2);
Query:
select m.*,(select count(t_id)from tbl_task t where t.member_id=m.id) as 'Tasks'
from tbl_member m
where m.team_id=1
Ouput:
id
name
team_id
Tasks
1
Ali
1
3
2
Khalil
1
1
3
Bilal
1
0
db<>fiddle here
Your GROUP BY and HAVING are undoing the LEFT JOIN. Try this:
select m.*, count(t.member_id) as Tasks
from tbl_member m left join
tbl_task t
on m.m_id = t.member_id and
m.team_id = :team_id
group by m.m_id;
Group by the left table columns, m.id ..
select m.*, count(t.member_id) as 'Tasks'
from tbl_member m
left join tbl_task t on m.id = t.member_id
where m.team_id = :team_id
group by m.id, m.name, m.team_id
I'm using column names from your table definitions. Your query uses different naming. Correct it as needed.
Gordon's answer is, of course, correct. There's another approach you might take however. It involves this subquery to count the tasks.
select member_id, count(*) numb
from tbl_task
group by member_id
Then you left join that to your members table.
select m.*, t.numb as 'Tasks'
from tbl_member m
left join ( select member_id, count(*) numb
from tbl_task
group by member_id
) t on m.m_id = t.member_id
where m.team_id = :team_id
This query pattern uses the main LEFT JOIN aggregate pattern, where the aggregate table contains either zero or one row corresponding to the main table. You may get some NULL values from team members who haven't done any tasks. You can fix that with COALESCE().
select m.*, COALESCE(t.numb, 0) as 'Tasks'
I wrote this up because I find the main LEFT JOIN aggregate pattern very useful for various report queries. For example, you might need this to get aggregates by member from two different tables. If you don't use the pattern you'll get a combinatorial explosion and high numbers. Here's an example counting absences as well as tasks.
select m.*, t.numb as 'Tasks', a.numb as 'Absences'
from tbl_member m
left join ( select member_id, count(*) numb
from tbl_task
group by member_id
) t on m.m_id = t.member_id
left join ( select member_id, count(*) numb
from tbl_absence
group by member_id
) a on m.m_id = t.member_id
where m.team_id = :team_id
Your original query didn't work correctly because you can convert a LEFT JOIN into an ordinary JOIN by mentioning columns from the second table in WHERE or HAVING clauses. That's because both NULL = value and NULL <> value always is false, so any WHERE criterion except WHERE (col IS NULL OR col = val) will not be met.

Adding Default Values on Joining Tables

I have the following tables:
Users
user_id course_id completion_rate
1 2 0.4
1 23 0.6
1 49 0.5
... ... ...
Courses
course_id title
1 Intro to Python
2 Intro to R
... ...
70 Intro to Flask
Each entry in the user table represents a course that the user took. However, it is rare that users have taken every course.
What I need is a result set with user_id, course_id, completion_rate. In the case that the user has taken the course, the existing completion_rate should be used, but if not then the completion_rate should be set to 0. That is, there would be 70 rows for each user_id, one for each course.
I don't have a lot of experience with SQL, and I'm not sure where to start. Would it be easier to do this in something like R?
Thank you.
You should first cross join the courses with distinct users. Then left join on this to get the desired result. If the user hasn't taken a course the completion_rate would be null and we use coalesce to default a 0.
select c.course_id,cu.user_id,coalesce(u.completion_rate,0) as completion_rate
from courses c
cross join (select distinct user_id from users) cu
left join users u on u.course_id=c.course_id and cu.user_id=u.user_id
Step1: Take the distinct client_id from client_data (abc) and do 1 on 1 merge with the course data (abc1) . 1 on 1 merge helps up write all the courses against each client_id
Step2: Merge the above dataset with the client info on client_id as well as course
create table ans as
select p.*,case when q.completion_rate is not null then q.completion_rate else 0
end as completion_rate
from
(
select a.client_id,b.course from
(select distinct client_id from abc) a
left join
abc1 b
on 1=1
) p
left join
abc q
on p.client_id = q.client_id and p.course = q.course
order by client_id,course;
Let me know in case of any queries.

Two tables with similar columns but different primary keys

I have two tables from two different databases, and both contain lastName and firstName columns. I need to create JOINa relationship between the two. The lastName columns match about 80% of the time, while the firstName columns match only about 20% of the time. And each table has totally different personID primary keys.
Generally speaking, what would be some "best practices" and/or tips to use when I add a foreign key to one of the tables? Since I have about 4,000 distinct persons, any labor-saving tips would be greatly appreciated.
Sample mismatched data:
db1.table1_____________________ db2.table2_____________________
23 Williams Fritz 98 Williams Frederick
25 Wilson-Smith James 12 Smith James Wilson
26 Winston Trudy 73 Winston Gertrude
Keep in mind: sometimes they match exactly, often they don't, and sometimes two different people will have the same first/last name.
You can join on multiple fields.
select *
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
From this you can determine how many 'duplicate' firstname / last name combos there are.
select table1.firstName, table2.lastName, count(*)
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
group by table1.firstName, table2.lastName
having count(*) > 1
Conversely, you can also determine the ones which match identically, and only once:
select table1.firstName, table2.lastName
from table1
inner join table2
on table1.firstName = table2.firstName
and table1.lastName = table2.lastName
group by table1.firstName, table2.lastName
having count(*) = 1
And this last query could be the basis for performing the bulk of your foreign key updates.
For those names that match more than once between the tables, they'll likely need some sort of manual intervention, unless there are other fields in the table that can be used to differentiate them?

Most efficient way in MySQL to get all associated values from a relation table while filtering after the same values

Simplified I have two tables in MySQL: One holding a sort of person entity and one relation table which associates multiple rights two the person entity. The tables look like:
person
person_id | person_name
1 | Michael
2 | Kevin
person2right
person_id | right_id
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
What I want to achieve now is getting all persons including all associated rights, which have at least the defined rights - right_id 1 and 4 in this example.
What I have so far is a query with subselect, but I wonder if there is a more efficient way to achieve my goal without the subselect, because of MySQL not being able to use an index when joining a subselect. Here is my query:
SELECT person_name, GROUP_CONCAT(`person2right`.`right_id`) as `all_rights`
FROM `person`
LEFT JOIN `person2right` ON `person`.`person_id` = `person2right`.`person_id`
LEFT JOIN (
SELECT `person_id` FROM `person2right` WHERE `right_id` IN (1, 4)
GROUP BY `person_id` HAVING COUNT(`person2right`.`right_id`) >= 2
) as `p2r` ON `person`.`person_id` = `p2r`.`person_id`
WHERE `p2r`.`person_id` IS NOT NULL GROUP BY `person_id`
Maybe someone has an idea to do it without the subquery. Appriciate your help.
Thx in advance!
This will only select persons (and all their associated rights) who have both rights 1 and 4. Note that it's different from your query in that your query selects all persons (regardless of their rights) and only selects their associated rights if they have rights 1 and 4.
SELECT person_name, GROUP_CONCAT(`person2right`.`right_id`) as `all_rights`
FROM `person`
JOIN `person2right` ON `person`.`person_id` = `person2right`.`person_id`
GROUP BY `person`.`person_id`
HAVING SUM(`right_id` = 4) > 0 AND SUM(`right_id` = 1) > 0
Edit: if the rows in person2right are unique, then you can change your having clause to
HAVING SUM(`right_id` IN (1,4)) = 2
Let's see if additional joins can do the trick:
select person_name, group_concat(distinct p2r.right_id) as all_rights
from person as p
inner join person2right as p2r using (person_id) -- You don't need LEFT JOIN, because you'll only return persons with rights
-- The new stuff starts here: Two new LEFT JOINs to track the rights you want
left join person2right as p2r_1 using (person_id)
left join person2right as p2r_4 using (person_id)
where
-- Here is where you check if the rights exist
(p2r_1.right_id = 1 and p2r_4.right_id = 4)
group by p.person_id;
Take a look to this SQL fiddle example.
you can try checking the rights in the where clause (avoiding the second left join)
...
WHERE 2 = (select count(*)
from person2right
where person_id = person.person_id
and right_id in (1, 4))

How do I delete duplicate values with group by

Example database :
ID StudentName StudentClass
1 John A
2 John B
3 Peter A
4 John A
5 John B
I want the result should be
ID StudentName StudentClass
1 John A
2 John B
3 Peter A
Statment
DELETE FROM Student
WHERE ID NOT IN (SELECT *
FROM (SELECT MIN(n.ID)
FROM Student n
GROUP BY n.StudentName) x)
How do I keep John name on class A & B?
DELETE a FROM Student a
LEFT JOIN
(
SELECT MIN(ID) AS minid
FROM Student
GROUP BY StudentName, StudentClass
) b ON a.id = b.minid
WHERE
b.minid IS NULL
A better method to disallow even insertion of such duplicates would be multi-column unique index(it will optimize your searches too). Here is how:
ALTER TABLE `Student`
ADD UNIQUE INDEX `idx` (`StudentName`, `StudentClass`)
You should be able to join Students against itself, with a JOIN predicate that ensures the JOIN matches duplicate students, and delete the join'd row:
DELETE
duplicate_students.*
FROM Students JOIN Students as duplicate_students
ON Students.StudentName = duplicate_students.StudentName
AND Students.StudentClass = duplicate_students.StudentClass
AND duplicate_students.ID > Students.ID
NOTE: Please back up your data first; I take no responsibility for lost data :-) This is a conceptual idea and has not been tested.
This should work:
DELETE S FROM Student S
INNER JOIN(
SELECT MIN(ID) AS ID,StudentName,StudentClass FROM Student
GROUP BY StudentName,StudentClass
) S2 ON S.ID != S2.ID AND S.StudentName = S2.StudentName AND S.StudentClass = S2.StudentClass
Its basically selecting the minimum ID out of all the duplicate records in sub query. Then we simply delete everything that matches that Class and Name, But we don't match the Minimum Id, so at end of day, we are keeping (presumably) 1st record out of duplicates and eradicating rest.