MySQL many-to-many JOIN returning duplicates - mysql

I have two three tables. users, jobs and users_jobs. One user can have many jobs and one job can have many users.
Here is my users table:
+----+------+--------+
| ID | Name | gender |
+----+------+--------+
| 1 | Bob | male |
| 2 | Sara | female |
+----+------+--------+
my jobs table:
+----+----------+
| id | job_id |
+----+----------+
| 1 | Engineer |
| 2 | Plumber |
| 3 | Doctor |
+----+----------+
users_jobs table:
+---------+--------+
| user_id | job_id |
+---------+--------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
+---------+--------+
As an example, i want to select all males and check if they have at least 1 job and then return their info. If they have no jobs, then don't select/return that user. This is my query:
SELECT * FROM users
INNER JOIN users_jobs
ON users_jobs.user_id = users.id
WHERE users.gender = 'male'
But it returns Bob's info 3 times since he has 3 jobs. I don't want duplicates, how can I get rid of duplicates without using DISTINCT or GROUP BY since performance is very important here.
Thank you!

MySQL allows you to do one a little odd thing, you can select more columns than what's in the GROUP BY clause and aggregate functions (this is not allowed in most other SQL engines). While this sometimes can produce unexpected results, it can work if you don't select data which can appear in multiple rows in the resulting query.
So, for your question - the query WILL return multiple rows for the same user, as some of them have many jobs (busy life, huh?). You generally can't get all their jobs in a single row, as each row is the user's data + their jobs - that's what we JOIN on. But that's not entirely true - you can use GROUP BY and GROUP_CONCAT() to concat all the other data into a single string. I wouldn't generally recommend it, but if its what you need...
SELECT u.Name, GROUP_CONCAT(j.job_id SEPARATOR ', ') as jobs
FROM users u
JOIN users_jobs uj
ON u.ID = uj.user_id
JOIN jobs j
ON j.id = uj.job_id
GROUP BY u.ID
This would return
Name | jobs
--------+-------------------------------
Bob | Engineer, Plumber, Doctor
Sara | Engineer
If you only want males, add in the where clause,
SELECT u.Name, GROUP_CONCAT(j.job_id SEPARATOR ', ') as jobs
FROM users u
JOIN users_jobs uj
ON u.ID = uj.user_id
JOIN jobs j
ON j.id = uj.job_id`
WHERE u.gender = 'male'
GROUP BY u.ID
See live fiddle at http://sqlfiddle.com/#!9/df0afe/2

For this it may will help you,
You can use "Limit" keyword to limit the amount of records fetched
"SELECT * FROM users
INNER JOIN users_jobs
ON users_jobs.user_id = users.id
WHERE users.gender = 'male'" limit 1;
May this will help you!
Thanks!

To follow on from the comments, for performance, it's necessary to use a distinct in your query, try:
SELECT DISTINCT Name FROM users
INNER JOIN users_jobs
ON users_jobs.user_id = users.id
WHERE users.gender = 'male'
If you're looking to get all the columns but keep the id's distinct you can use a GROUP BY, try:
SELECT * FROM users
INNER JOIN users_jobs
ON users_jobs.user_id = users.id
WHERE users.gender = 'male'
GROUP BY users.id
Although this will also effect performance, it depends on what you prioritize the most.

Related

Get data from one table and count matching records from another

I'm not sure if this is possible. I have one table members and a second table transactions.
I need to get the name of the member from the members table, but also count the number of transactions that member has made from another table. Is this even possible in a JOIN statement, or do I need to write two statements?
SELECT
m.first_name,
m.last_name,
COUNT(t.giver_id),
COUNT(t.getter_id)
FROM
members AS m
JOIN
transactions AS t
ON
m.id = t.giver_id
WHERE
m.id = $i
I should add that it's possible a member has not made any transactions and would therefore not appear in the transactions table.
When I run this code, it returns all NULL columns. When I add the EXPLAIN statement, MySql says "Impossible WHERE noticed after reading const table..."
Is this possible? If so, then what am I doing wrong? Thanks in advance.
EDIT:
Sample data structure and expected output:
members
id | first_name | last_name
_______________________________
1 | Bill | Smith
2 | Joe | Jones
transactions table
id | giver_id | getter_id | status
________________________________________
1 | 1 | 2 | complete
2 | 1 | 2 | complete
So running my query should return:
1 | Bill | Smith | 2 | 0
2 | Joe | Jones | 0 | 2
Simple LEFT JOIN should suffice:
SELECT
m.first_name,
m.last_name,
SUM(CASE WHEN m.id = t.giver_id THEN 1 END) AS giver_count,
SUM(CASE WHEN m.id = t.getter_id THEN 1 END) AS getter_count
FROM members AS m
LEFT JOIN transactions AS t ON m.id = t.giver_id OR m.id = t.getter_id
GROUP BY m.first_name, m.last_name
Do not forget adding GROUP BY when using aggregate functions. Just because MySQL allows the query to go through without it, it doesn't mean it is advised. MySQL will pick up random row values for unaggregated columns which can be problematic. Avoid this anti-pattern.

Oh MySQL, how doth thou join related records by date

I have what can only be described as a seemingly simple problem with most likely a simple solution; yet that simple solution escapes me. I've searched and trudged through the vast web of StackOverflow only to come up short finding only solutions that seem to be extremely complex. I've already kind of solved this problem, but, it's disgusting and I'm ashamed of it. Surely there is a better way, for there must be a Knight in shining MySQL armor wielding a query sword who can come up with a simple and elegant solution. Here goes:
For the sake of this question, we'll keep the two tables simple.
Table 1
users (user_id, active, name)
and
Table 2
user_projects (user_project_id, user_id, start_date, details)
In this case, records for users are added as needed. Related records are then added as projects are completed by users.
My question is: Using a query, how can I get 1 record containing all of the information in table 1 active users joined with the record from table 2 with the most recent date based on start_date?
In other words, if I have this:
| user_id | active | name |
| 1 | 1 | brian |
and this:
| user_project_id | user_id | start_date | details |
| 1 | 1 | 2013-10-02 | proj 1 |
| 2 | 1 | 2013-11-26 | proj 2 |
| 3 | 1 | 2014-01-02 | proj 3 |
produce the query that gives me this:
| user_id | active | name | user_project_id | user_id | start_date | details |
| 1 | 1 | brian | 3 | 1 | 2014-01-02 | proj 3 |
Oh please oh please let there be an answer for I will surely wither without one.
Since in MySQL there is not such things as top selectors, you can use triple JOIN, like:
SELECT
user_projects.*,
users.*
FROM
(SELECT
MAX(start_date) AS max_date,
user_id
FROM
user_projects
GROUP BY
user_id) AS max_dates
LEFT JOIN
user_projects
ON max_dates.max_date=user_projects.start_date
AND max_dates.user_id=user_projects.user_id
LEFT JOIN
users
ON users_projects.user_id=users.user_id
First select the projects per user that are the most recent:
SELECT a.*
FROM user_projects a
JOIN (
SELECT user_id, MAX(start_date) AS max_start_date
FROM user_projects
GROUP BY user_id
) b ON a.user_id = b.user_id AND a.start_date = b.max_start_date
It creates a small helper table from user_projects comprising the user and for each row their most recent project date; to get all the corresponding table fields you must join that with user_projects again.
Then, you simply join users with the above outcome to get the final result:
SELECT *
FROM users
JOIN (
SELECT a.*
FROM user_projects a
JOIN (
SELECT user_id, MAX(start_date) AS max_start_date
FROM user_projects
GROUP BY user_id
) b ON a.user_id = b.user_id AND a.start_date = b.max_start_date
) c ON users.user_id = c.user_id
select u1.user_id, u1.name, u.strat_date, u.user_project_id, u.details
from user_projects u
left outer join users u1 on u1.user_id=u.user_id
left outer join
(select user_id, max(strat_date) as strat_date user_projects group by user_id) as A
on A.user_id=u.user_id and A.strat_date=u.strat_date
You can use top 1 and order by date
select top 1 a.user_id,a.name,b.user_project_id,b.user_id,b.start_date,b.details
from users a join users_project b on a.user_id = b.user_id
order by b.start_date desc
Sorry i didnt notice mysql tag in OP's question. You can use Limit keyword.
select a.user_id,a.name,b.user_project_id,b.user_id,b.start_date,b.details
from users a join users_project b on a.user_id = b.user_id
order by b.start_date desc Limit 1

SQL complicated select statement

I am trying to create a SELECT statement, but I am not really sure how to accomplish it.
I have 2 tables, user and group. Each user has a userid and each group has a ownerid that specifies who owns the group. Each group also has a name and then inside the user table, there is a column group designating which group that person belongs to. (excuse the annoying structure, I did not create it). I am trying to find all rows in group where the ownerid of that group does not have group (inside the user table) set to the name of that group. If this helps:
User
|-----------------------|
| id | username | group |
|----|----------|-------|
| 0 | Steve | night |
| 1 | Sally | night |
| 2 | Susan | sun |
| 3 | David | xray |
|-----------------------|
Group
|---------------------|
| ownerid | name |
|---------|-----------|
| 1 | night |
| 3 | bravo |
| 2 | sun |
|---------------------|
Where the SQL statement would return the group row for bravo because bravo's owner does not have his group set to bravo.
This is a join back to the original table and then a comparison of the values:
select g.*
from group g join
user u
on g.ownerid = id
where g.name <> u.group;
If the values can be NULL, then the logic would need to take that into account.
An anti-join is a familiar pattern:
SELECT g.*
FROM `Group` g
LEFT
JOIN `User` u
ON u.group = g.name
AND u.id = g.ownerid
WHERE u.id IS NULL
Let's unpack that a bit. We're going to start with returning all rows from Group. Then, we're going to "match" each row in Group with a row (or rows) from User. To be considered a "match", the User.id has to match the Group.ownerid, and the User.group value has to match the Group.name.
The "trick" is to eliminate all rows where we found a match (that's what the WHERE clause does), and that leaves us with only those rows from Group that didn't have a match.
Another way to obtain an equivalent result using a NOT EXISTS predicate
SELECT g.*
FROM `Group` g
WHERE NOT EXISTS
( SELECT 1
FROM `User` u
WHERE u.group = g.name
AND u.id = g.ownerid
)
This is uses a correlated subquery; it usually doesn't perform as fast as a join.
Note that these have the potential to return a slightly different result than the query from Gordon Linoff, if you had a row with in Group that had an ownerid value that wasn't in the user table.
SELECT G.*
FROM Group AS G
WHERE G.Name NOT IN (SELECT DISTINCT U.Group FROM User AS U)

SQL Count Rows From Multiple tables

Lets say i have 2 tables
Companies
company_id
name
Users
id
company_id
name
each company has multiple users assign to it... which is referenced in the company_id field from each record in the users table
HOW can i get a record showing the (company_id), (company_name) and (number or users)
for eg:
id# 1234 | name# Microsoft | n of users# 2000
I dont know how to make this query, i know i have to use the function COUNT() but i dont know how
If you want to get all companies even if they don't have any users yet use OUTER JOIN
SELECT c.company_id, c.name company_name, COUNT(u.id) no_of_users
FROM companies c LEFT JOIN users u
ON c.company_id = u.company_id
GROUP BY c.company_id, c.name
Sample output:
| COMPANY_ID | COMPANY_NAME | NO_OF_USERS |
|------------|--------------|-------------|
| 1 | Company1 | 3 |
| 2 | Company2 | 2 |
| 3 | Company3 | 0 |
Here is SQLFiddle demo
this will be the query
select Companies.company_id,Companies.name,count(Users .id) from Companies,Users where Companies=company_id and Users =company_id group by company_id
Try :
SELECT companies.company_id,companies.company_name,COUNT(users.id)
FROM companies, users
WHERE companies.id = users.company_id
group by companies.id

MySQL: Limit Results related to FROM with Joins that have multiple subelements

I habe a main table that i select from and a table with subelements that i select from in a join. Example:
person skill person_to_skill
id | name id | skill id | p_id | s_id
------------ ------------ ----------------
1 | jim 1 | sewing 1 | 1 | 2
2 | peter 2 | cooking 2 | 2 | 1
3 | susan 3 | singing 3 | 2 | 3
4 | kevin 4 | 3 | 1
5 | 3 | 2
6 | 4 | 3
So now we see, sim has only one skill, peter has two and so forth.
Now if i select from person, koin skill and then also join person_to_skill, but i only want two persons. How do i manage to do so without grouping and thereby not getting all the skills?
Shortly: I want to select two persons from "person" with all their skills.
I tried just using LIMIT but that limits the result rows, not the persons.
If i use GROUP BY i only get one skill per person.
Is this possible without a subselect?
Any ideas anyone?
My Approach so far, changed to work with the example, looks like this:
SELECT p.id,p.name,s.skill
FROM person AS p
LEFT JOIN person_to_skill psk ON (psk.p_id = p.id)
LEFT JOIN skill s ON (s.id = psk.s_id)
ORDER BY p.name
LIMIT 0,2
Limit number of persons at very beginning in subquery then join to them other tables as you've already done:
SELECT p.id,p.name,s.skill
FROM (select * from person ORDER BY name LIMIT 0,2) AS p
LEFT JOIN person_to_skill psk ON (psk.p_id = p.id)
LEFT JOIN skill s ON (s.id = psk.s_id)
Added after comment:
If you really can't use subqueries you can do it using two queries. Firstly select users ids:
select id from person ORDER BY name LIMIT 0,2
and then use those ids in next query:
SELECT p.id,p.name,s.skill
FROM person p
LEFT JOIN person_to_skill psk ON (psk.p_id = p.id)
LEFT JOIN skill s ON (s.id = psk.s_id)
WHERE p.id IN (ids from previous query)
You can do something like
SELECT p.id, p.name, group_concat(s.skill separator ',')
and then group by person and limit the number of rows.