mySQL find similar but not identical records - mysql

All, I've found out my users have been inputting Customer names all wrong. Below is an example of how they are entering customer names. I guess they thought they needed an account for each residence this guy owns. I have similar entries as well, but the fake middle initial is before the last name. If I wanted to pull a list of customers that share names and emails how would I go about this? I've already used a query I'll include below my example data, but it's missing results like in my example data. Instead it returns other duplicates I want it to return, just not records like 1,2 below.
Example:
ID | first Name | last Name | email | Residence |
---+------------+-----------+----------------+---------------+
1 | Bill A | Bob | bill#bob.com | 1-2 broad st |
2 | Bill B | Bob | bill#bob.com | 1-3 broad st |
3 | Fred | Jones | f.jones#me.com | 1 example st |
4 | Fred | Jones | f.jones#me.com | 200 South ave |
5 | Alex | Man | Manley#grt.com | 25 N Main st |
6 | Alex | Man | Manley#grt.com | 39 Front st |
Query:
SELECT C.ID, R.Customer_ID , C.orgName, C.fName, C.lName, C.email, R.hNumber, R.street, R.aNumber, R.city
FROM Customer C
LEFT JOIN Residence R ON C.ID = R.Customer_ID
JOIN (
SELECT X.fName, X.lName
FROM Customer X
GROUP BY X.fName, X.lName
HAVING COUNT(*) > 1
) X ON X.fName = C.fName AND X.lName = C.lName
ORDER BY C.fName, C.lName

You can use (at least for mysql)
SELECT C.ID, R.Customer_ID , C.orgName, C.fName, C.lName, C.email,
R.hNumber, R.street, R.aNumber, R.city
FROM Customer C
LEFT JOIN Residence R ON C.ID = R.Customer_ID
JOIN Customer C1 on C.ID <> C1.id
LEFT JOIN Residence R1 ON C1.ID = R1.Customer_ID
where
C1.fName = C.fName AND C1.lName = C.lName
or C1.email = C.email
or <whatever else you like to compare, eg. same adress + same lastname>
group by C.ID
or, more general,
SELECT C.ID, R.Customer_ID , C.orgName, C.fName, C.lName, C.email,
R.hNumber, R.street, R.aNumber, R.city
FROM Customer C
LEFT JOIN Residence R ON C.ID = R.Customer_ID
where exists (
select * from
Customer C1
LEFT JOIN Residence R1 ON C1.ID = R1.Customer_ID
where
C.ID <> C1.id
and (
C1.fName = C.fName AND C1.lName = C.lName
or C1.email = C.email
or <whatever else you like to compare, eg. same adress + same lastname>
)
)
Of course this will only give you a limited duplicate check, especially if someone is intentionally trying to bypass this (e.g. in a shopsystem, but there are tools and procedures to help you with that).

I don't think there is no... each way of doing it will probably involve manually identifying a pattern that has been used and modifying it, like using a large case statement... which isn't that "automatic"
Closest would be to use the soundex to tell if they sound the same... http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex
If you can use another programming language then I'd recommend something like... http://php.net/manual/en/function.similar-text.php but it will be comutationally heavy

Related

MySQL where joined value is multiple ANDs

Running into a seemingly simple JOIN problems here..
I have two tables, users and courses
| users.id | users.name |
| 1 | Joe |
| 2 | Mary |
| 3 | Mark |
| courses.id | courses.name |
| 1 | History |
| 2 | Math |
| 3 | Science |
| 4 | English |
and another table that joins the two:
| users_id | courses_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
I'm trying to find distinct user names who are in course 1 and course 2
It's possible a user is in other courses, too, but I only care that they're in 1 and 2 at a minimum
SELECT DISTINCT(users.name)
FROM users_courses
LEFT JOIN users ON users_courses.users_id = users.id
LEFT JOIN courses ON users_courses.courses_id = courses.id
WHERE courses.name = "History" AND courses.name = "Math"
AND courses.name NOT IN ("English")
I understand why this is returning an empty set (since no single joined row has History and Math - it only has one value per row.
How can I structure the query so that it returns "Joe" because he is in both courses?
Update - I'm hoping to avoid hard-coding the expected total count of courses for a given user, since they might be in other courses my search does not care about.
Join users to a query that returns the user ids that are in both courses:
select u.name
from users u
inner join (
select users_id
from users_courses
where courses_id in (1, 2)
group by users_id
having count(distinct courses_id) = 2
) c on c.users_id = u.id
You can omit distinct from the condition:
count(distinct courses_id) = 2
if there are no duplicates in users_courses.
See the demo.
If you want to search by course names and not ids:
select u.name
from users u
inner join (
select uc.users_id
from users_courses uc inner join courses c
on c.id = uc.courses_id
where c.name in ('History', 'Math')
group by uc.users_id
having count(distinct c.id) = 2
) c on c.users_id = u.id
See the demo.
Results:
| name |
| ---- |
| Joe |
You can use in operator and use select to generate list of potential users_id attending the second course, to find matching ones in the first course. This is many times faster than using joins.
select distinct u.users_id, users.name
from users_courses u, users
where u.users_id in (select distinct users_id from users_courses where courses_id = 2)
and u.courses_id = 1
and users.users_id = u.users_id
Almost similar to what #Nae's solution.
select u.name from users u
where exists
(select 1
from users_courses uc
where uc.course_id in (1, 2)
and uc.user_id = u.id
group by uc.user_id
having count(0) = 2);
Your code is close. Just use GROUP BY and a HAVING clause:
SELECT u.name
FROM users_courses uc JOIN
users u
ON uc.users_id = u.id JOIN
courses c
ON uc.courses_id = c.id
WHERE c.name IN ('History', 'Math')
GROUP BY u.name
HAVING COUNT(DISTINCT c.name) = 2;
Notes:
This assumes that users cannot have the same name. You might want to use GROUP BY u.id, u.name to ensure that you are counting individual users.
If users cannot take the same course multiple times, then use COUNT(*) = 2 rather than COUNT(DISTINCT).
I'd write:
SELECT MAX(u.name)
FROM users_courses uc
LEFT JOIN users u ON uc.users_id = u.id
WHERE uc.courses_id IN (1, 2)
GROUP BY uc.users_id
HAVING COUNT(0) = 2
;
For more complex conditions (for example requiring the user to be in certain classes but also not in certain classes such as "Science") this should also work:
SELECT MAX(u.name)
FROM users_courses uc
LEFT JOIN users u ON uc.users_id = u.id
GROUP BY uc.users_id
HAVING (
SUM(uc.courses_id = 1) = 1
-- user enrolled exactly once in the course 2
AND SUM(uc.courses_id = 2) = 1
-- user enrolled in course 3, 0 times
AND SUM(uc.courses_id = 3) = 0
)
;

SQL - How to find the person with the highest grade

I am trying to find the name of the person who received the highest grade in the "Big Data" course.
I have 3 different tables:
People (id, name, age, address)
---------------------------------------------------
p1 | Tom Martin| 24 | 11, Integer Avenue, Fractions, MA
p2 | Al Smith | 33 | 26, Main Street, Noman's Land, PA
p3 | Kim Burton| 40 | 45, Elm Street, Blacksburg, VA
---------------------------------------------------
Courses (cid, name, department)
---------------------------------------------------------
c1 | Systematic Torture | MATH
c2 | Pretty Painful | CS
c3 | Not so Bad | MATH
c4 | Big Data | CS
---------------------------------------------------------
Grades (pid, cid, grade)
---------------------------------------------------
p1 | c1 | 3.5
p2 | c3 | 2.5
p3 | c2 | 4.0
p3 | c4 | 3.85
---------------------------------------------------
I can't figure out how to find the person with the highest grade without using any fancy SQL feature. That is, I just want to use SELECT, FROM, WHERE, UNION, INTERSECT, EXCEPT, CREATE VIEW and arithmetic comparison operators like =, <, >.
My outcome is showing something other than what I try to achieve.
This is what I have tried so far:
CREATE VIEW TEMPFIVE AS
SELECT G1.pid FROM Grades AS G1, Grades AS G2 WHERE G1.pid = G2.pid AND G1.cid = G2.cid
SELECT People.name, Courses.name FROM TEMPFIVE, People, Courses WHERE TEMPFIVE.pid = People.pid AND Courses.name = "Big Data";
+------------+----------+
| name | name |
+------------+----------+
| Tom Martin | Big Data |
| Al Smith | Big Data |
|Kim Burton | Big Data |
|Kim Burton | Big Data |
+------------+----------+
The easiest way is to use LIMT 1 with an ORDER BY DESC clause:
SELECT p.name, c.name, g.grade
FROM People AS p
JOIN Grades AS g ON p.id = g.pid
JOIN Courses AS c ON c.cid = g.cid
WHERE c.name = "Big Data"
ORDER BY g.grade DESC LIMIT 1
No Idea for MySql Query structure. So Explained in steps. I hope you can build query based on that.
join three tables according to their relationship
set course name 'Big data' in where clause
set grade order to DESC order
set the limit to fetch only first row.
Try this
select * from(
select p.id pid,p.name name, p.age age,p.address address,
c.cid cid, c.name coursname, c.department department,g.grade grade
from Grades G
left join
Courses C on g.cid = c.cid
left join
People p on g.pid = p.id
)a where coursname= 'Big Data' order by grade desc
you can apply the operators on the where clause
GiorgosBestos shoes the correct way if you only want 1 record. If you want ties. meaning if more than 1 student has the same MAX grade then you can do a subselect as follows:
SELECT p.name, c.name, g.grade
FROM
(
SELECT c.cid, MAX(g.grade) MaxGrade
FROM
Grades g
INNER JOIN Courses c
ON c.cid = g.cid
AND c.name = 'Big Data'
GROUP BY
c.cid
) m
INNER JOIN Grades g
ON g.cid = m.cid
AND g.grade = m.MaxGrade
INNER JOIN People p
ON g.pid = p.id
The following SQL covers the case when tow or more students have the same maximum grade:
SELECT P.NAME,
C.NAME,
G.GRADE
FROM PEOPLE P
JOIN GRADES G ON G.PID = P.ID
JOIN COURSES C ON C.CID = G.CID
WHERE C.NAME = 'Big data'
AND G.GRADE = (SELECT MAX(G2.GRADE)
FROM PEOPLE P2
JOIN GRADES G2 ON G2.PID = P2.ID
JOIN COURSES C2 ON C2.CID = G2.CID
WHERE C2.NAME = 'Big data');
It is similar but not identical to the SQL proposed by Matt.

MYSQL return results where there exist at least one from the other table

Say I have these data from two table:
Student Table columns:
id | name
Course Table columns:
id | code | name
and I want to use the Student.id AS Student and Course.id AS Course
to get the following:
Student | Course
-----------------
1 | C
1 | B
1 | A
2 | F
2 | B
2 | A
3 | C
3 | B
3 | F
How would I query it so it will return only the Students with a Course C and their other Courses like below:
Student | Course
-----------------
1 | C
1 | B
1 | A
3 | C
3 | B
3 | F
?
I have tried :
SELECT Student.id, Course.code FROM Course
INNER JOIN Student ON Course.student = Student.id
WHERE Course.code = 'C'
but I got only
Student | Course
-----------------
1 | C
3 | C
SELECT s.id, c.code
FROM Course c
INNER JOIN Student s
ON c.student = s.id
WHERE EXISTS
(
SELECT 1
FROM Course c1
WHERE c.student = c1.student
AND c1.Course = 'C'
)
The most efficient approach to this problem is usually an inline view and a JOIN operation, although there are several ways to get an equivalent result.
SELECT Student.id
, Course.code
FROM ( SELECT c.Student
FROM Course c
WHERE c.code = 'C'
GROUP BY c.Student
) o
JOIN Course
ON Course.Student = o.Student
JOIN Student
ON Student.id = Course.Student
Here, we're using an inline view (aliased as o) to get a list of Student taking course code = 'C'.
(NOTE: the query in my answer is based on your original query. If there's a foreign key definition between Course and Student, and we only need to return the Student.id, we could improve performance by omitting the join to Student, and return Course.Student AS id in place of Student.id in the SELECT list.)
Here the first JOIN selects only those students which have course C, and second JOIN gives you all the courses for each of those students.
SELECT st.id, c2.code FROM
Student st
JOIN Course c ON c.student = st.id AND c.code = "C"
JOIN Course c2 ON c2.student = st.id
You actually don't even need two tables here, because both student and course is available in the Course table, just JOIN it on itself:
SELECT c2.student, c2.code FROM
Course c JOIN Course c2 ON c.student = c2.student
WHERE c.course = "C"
Here the WHERE clause leaves student id's which have course C and then you JOIN those to find all their courses.

Pass value from query 1 to query 2

I have a query join 2 table as below:
SELECT * FROM Staff s INNER JOIN Account a on s.AccNo = a.AccNo WHERE a.Status = 'Active'
The result as shown:
AccNo | Name | ID
------------------
1 | Alex | S01
2 | John | S02
After I get the staff ID,I write second query to find out the max sale as below:
SELECT s.ProductID,Max(s.Amount) from Sales s WHERE StaffID = 'S01' GROUP BY s.ProductID
The max sale for Staff 'S01' as below:
ProductID | Amount
------------------
Cloth | 2000
How to I combine these 2 queries and become result as below? Thanks
AccNo | Name | ID | Amount
--------------------------
1 | Alex | S01 | 2000
2 | John | S02 | 5000
You can create a subquery and join it:
SELECT a.AccNo, b.Name, b.ID, c.maximum
FROM transaction as a
INNER JOIN Account as b
ON a.AccNo = b.AccNo
LEFT JOIN (SELECT StaffID, Max(Amount) as maximum FROM Sales GROUP BY StaffID) as c
ON c.StaffID = b.ID
WHERE b.Status = 'Active'
See the SQLFiddle example (I've tried to guess the schema)
So what you want to do is join to sales on the staffId then group.
SELECT a.AccNo,a.Name,a.ID,Max(s.Amount)
FROM Transaction t
INNER JOIN Account a on t.AccNo = a.AccNo
INNER JOIN Sales s on s.staffId = a.ID
WHERE a.Status = 'Active'
GROUP BY a.AccNo,a.Name,a.ID
You could try something like this:
Select Account.*, Max(Sales.amount) from Sales
JOIN Account ON Sales.StaffID = Account.ID
where Account.status = 'Active'
group by Sales.ProductID, Account.AccNo, Account.Name, Account.ID
Honestly, I don't understand why do you use Transascation table in your queries, because you don't use it.
I think this should work
Just do a join and retrieve the max amount associated with each staff
SELECT t.AccNo , t.Name, t.ID, s.ProductID, Max(s.Amount) FROM Transaction t
INNER JOIN Account a ON t.AccNo = a.AccNo
INNER JOIN Sales s ON s.StaffID = a.ID
WHERE a.Status = 'Active';
Thanks

Join from multiple tables not returning desired data in MySQL

I have three tables in my database (contracts, partners and customers) which a contract can be used for both customers and partners.
I only keep the contract data in contracts table and customers table and partners table contains a field called contract_id which is a foreign key to contracts id field.
No I want to select contracts and show them in a list, but if a contract has been used for a customer and a partner simultaneously, I want my list to show both of them, but I cant make it work.
My query is:
SELECT c.*, p.id AS partner_id, p.name AS partner_name,
cu.id AS customer_id, cu.name AS customer_name
FROM contracts AS c
LEFT JOIN partners AS p ON c.id = p.contract_id
LEFT JOIN customers AS cu ON c.id = cu.contract_id
SAMPLES:
Records of contract table are like the following:
id | title | contract_start | contract_end
-------------------------------------------------------------
1 | Test | 2012-10-02 | 2013-10-02
2 | Test2 | 2012-09-27 | 2013-09-27
Records of customers table are like:
id | code | name | contract_id
-------------------------------------------------------------
1 | 123456 | Customer1 | 1
2 | 654321 | Dummy Co. LTD. | 2
Records of partners table are like:
id | code | name | contract_id
-------------------------------------------------------------
1 | 789456 | Partner1 | 1
No I want a list with 3 records each one showing a contract (considering one of them is repeated) and each one show partner or customer name and id. If a contract is involving a customer, partner field should be null and vice-versa.
You say:
No I want to select contracts and show them in a list, but if a contract has been used for a customer and a partner simultaneously, I want my list to show both of them
You probably need to join the two tables separately and use a UNION:
SELECT c.*,
p.id AS partner_id, p.name AS partner_name,
NULL AS customer_id, NULL AS customer_name
FROM contracts AS c
INNER JOIN partners AS p ON c.id = p.contract_id
UNION ALL
SELECT c.*,
NULL AS partner_id, NULL AS partner_name,
cu.id AS customer_id, cu.name AS customer_name
FROM contracts AS c
INNER JOIN customers AS cu ON c.id = cu.contract_id
UNION ALL
SELECT c.*,
NULL, NULL,
NULL, NULL
FROM contracts AS c
WHERE NOT EXISTS ( SELECT *
FROM partners AS p
WHERE c.id = p.contract_id
)
AND NOT EXISTS ( SELECT *
FROM customers AS cu
WHERE c.id = cu.contract_id
) ;
(Another way with different output setup)
If you prefer, you can combine the last four columns into two, adding a column to distinguish between partners and customers:
SELECT c.*,
p.id AS partner_customer_id, p.name AS partner_customer_name, 'P' AS type
FROM contracts AS c
INNER JOIN partners AS p ON c.id = p.contract_id
UNION ALL
SELECT c.*,
cu.id , cu.name, 'C'
FROM contracts AS c
INNER JOIN customers AS cu ON c.id = cu.contract_id ;