LEFT JOIN but with WHERE criteria, rows getting lost

LEFT JOIN but with WHERE criteria, rows getting lost - mysql

I have a simple database with three tables. In the database I have a table for users of my system, a table for applications to a competition, and an intermediary table that allows me to track which users have selected which applications to view.
Table 1 = users (user_id, username, first, last, etc...)
Table 2 = applications (application_id, company_name, url, etc...)
Table 3 = picks (pick_id, user_id, application_id, picked)
I am trying to write an SQL query that will show all the applications that have been submitted and if any individual application has been selected by a user will show that it has been "picked" (1=picked, 0=not picked).
So for user_id = 1 I'd like to see:
Column Names (application_id, company_name, picked)
1, Foo, 1
2, Bar, 1
3, Alpha, Null
4, Beta, Null
I tried it with the following query:
SELECT applications.application_id, applications.company_name, picks.picked
FROM applications
LEFT JOIN picks ON applications.application_id = picks.application_id
ORDER BY applications.application_id ASC
Which is returning this:
1, Foo, 1
1, Foo, 1
2, Bar, null
3, Alpha, null
4, Beta, null
I have a second user (user_id = 2) that also picked application 1 ("Foo") which I know is returning the second row.
Then I tried to limit the scope by specifying user_id = 1 here:
SELECT applications.application_id, applications.company_name, picks.picked
FROM applications
LEFT JOIN picks ON applications.application_id = picks.application_id
WHERE user_id = 1
ORDER BY applications.application_id ASC
Now I'm only getting:
1, Foo, 1
Any suggestions on how I can get what I'm looking for? Again, ideally for a single user I'd like to see:
Column Names (application_id, company_name, picked)
1, Foo, 1
2, Bar, 1
3, Alpha, Null
4, Beta, Null

You have a so-called join table in your database schema. In your case it's called picks. This allows you to create a many-to-many relationship between your users and applications.
To use that join table correctly you need to join all three tables. These queries are easier to write if you use table aliases (applications AS a, etc.)
SELECT a.application_id, a.company_name, p.picked, u.user_id, u.username
FROM applications AS a
LEFT JOIN picks AS p ON a.application_id = p.application_id
LEFT JOIN users AS u ON p.user_id = u.user_id
ORDER BY a.application_id, u.user_id
This will give you a list of all applications with the users who have made them. If no users are related to an application, the LEFT JOIN operations will retain the application row and you'll see NULL values for columns from the picks and users table.
Now, if you add a WHERE p.something = something or u.something = something clause to this query in an attempt to narrow down the presentation, it has the effect of converting the LEFT JOIN clauses into INNER JOIN clauses. That is, you won't retain the applications rows that don't have matching rows in the other tables.
If you want to retain those unmatched rows in your result set, put the condition in the first ON clause instead of the WHERE clause, like so.
SELECT a.application_id, a.company_name, p.picked, u.user_id, u.username
FROM applications AS a
LEFT JOIN picks AS p ON a.application_id = p.application_id AND p.user_id = 1
LEFT JOIN users AS u ON p.user_id = u.user_id
ORDER BY a.application_id, u.user_id
Edit Many join tables like your picks table are set up with a composite primary key, in your example (application_id, user_id). That ensures just one row per possible relationship between the tables being joined. In your case you have the potential for multiple such rows.
To use only the most recent of those rows (the one with the highest pick_id) takes a little more work. You need a subquery (virtual table) to extract it, and to retrieve the appropriate value of picked so your query works. So now things get interesting.
SELECT MAX(pick_id) AS pick_id,
application_id, user_id
FROM picks
GROUP BY application_id, user_id
retrieves the unique relationship pair. That is good. But next we have to fetch the picked column detail value from those rows. That takes another join, using the MAX value of pick_id, like so
SELECT q.application_id, q.user_id, r.picked
FROM (
SELECT MAX(pick_id) AS pick_id,
application_id, user_id
FROM picks
GROUP BY application_id, user_id
) AS q
JOIN picks AS r ON q.pick_id = r.pick_id
So, we need to substitute this little virtual table (subquery) in place of the pick AS p table in the original query. That looks like this.
SELECT a.application_id, a.company_name, p.picked, u.user_id, u.username
FROM applications AS a
LEFT JOIN (
SELECT q.application_id, q.user_id, r.picked
FROM (
SELECT MAX(pick_id) AS pick_id,
application_id, user_id
FROM picks
GROUP BY application_id, user_id
) AS q
JOIN picks AS r ON q.pick_id = r.pick_id
) AS p ON a.application_id = p.application_id AND p.user_id = 1
LEFT JOIN users AS u ON p.user_id = u.user_id
ORDER BY a.application_id, u.user_id
Some developers prefer to create VIEW objects for subqueries like the one here, rather than creating a club sandwich of a query like this one. It's not called Structured Query Language on a foolish whim, eh? These subqueries sometimes can be elements of a structure.

Related

MySQL optimize a union-query by using a join-query instead

I have 3 tables - one for users, one for their incoming payments, and one for their outgoing payments. I want to display all incoming and outgoing payments in a single result set. I can do this with multiple selects and a union but it seems cumbersome, and I suspect its slow due to the subqueries - and the tables are extremely large (though I am using indexes). Is there a faster way to achieve this? Maybe using a full outer join?
Here is a simplified version of the schema with some example data:
create table users (
id int auto_increment,
name varchar(20),
primary key (id)
) engine=InnoDB;
insert into users (name) values ('bob'),('fred');
create table user_incoming_payments (
user_id int,
funds_in int
) engine=InnoDB;
insert into user_incoming_payments
values (1,100),(1,101),(1,102),(1,103),
(2,200),(2,201),(2,202),(2,203);
create table user_outgoing_payments (
user_id int,
funds_out int
) engine=InnoDB;
insert into user_outgoing_payments
values (1,100),(1,101),(2,200),(2,201);
And here is the ugly looking query which generates the result I want for user bob:
select * from (
(select u.name, i.funds_in, 0 as 'funds_out' from users u
inner join user_incoming_payments i on u.id = i.user_id)
union
(select u.name, 0 as 'funds_in', o.funds_out from users u
inner join user_outgoing_payments o on u.id = o.user_id)
) a where a.name = 'bob'
order by a.funds_in asc, a.funds_out asc;
And here is as close as I can get to doing the same thing with joins - its not correct though because I want this result set to look the same as the previous and I wasn't sure how to use full outer join:
select *
from users u
right join user_incoming_payments i on u.id = i.user_id
right join user_outgoing_payments o on u.id = o.user_id
where u.name = 'bob';
SQL Fiddle here

MySQL doesn't support FULL OUTER JOIN. Even if it did support it, I don't think you would want that, as it would introduce a semi-cartesian product... with each row from incoming_ matching every row in outgoing_, creating extra rows.
If there were four rows from incoming_ and six rows from outgoing_, the set produced by a join operation would contain 24 rows.
This really looks more like you want a set concatenation operation. That is, you have two separate sets that you want to concatenate together. That's not a JOIN operation. That's a UNION ALL set operation.
SELECT ... FROM ...
UNION ALL
SELECT ... FROM ...
If you don't need to remove duplicates (and it looks like you wouldn't want to in this scenario, if there are multiple rows in incoming_ with the same value of funds_in, I don't think you want to remove any of the rows.)...
Then use the UNION ALL set operator which does not perform the check for and removal of duplicate rows.
The UNION operator removes duplicate rows. Which (again) I don't think you want.
The derived table isn't necessary.
And MySQL doesn't "push" the predicate from the outer table into the inline view. Which means that MySQL is going to materialized a derived table with all incoming and outgoing for all users. And the the outer query is going to look through that to find the rows. And until the most recent versions of MySQL, there were no indexes created on derived tables.
See the answer from Strawberry for an example of a more efficient query.
With the small example set, indexes aren't going to make any difference. With a large set, however, you are going to want to add appropriate covering indexes.
Also, with queries like this, I tend to include a discriminator column that tells me which query returned a row.
(
SELECT 'i' AS src
, ...
FROM ...
)
UNION ALL
(
SELECT 'o' AS src
, ...
FROM ...
)
ORDER BY ...

With this model, I'd probably write that query as follows, but I doubt it makes much difference...
select u.name
, i.funds_in
, 0 funds_out
from users u
join user_incoming_payments i
on u.id = i.user_id
where u.name = 'bob'
union all
select u.name
, 0 funds_in
, o.funds_out
from users u
join user_outgoing_payments o
on u.id = o.user_id
where u.name = 'bob'
order
by funds_in asc
, funds_out asc;
However, note that there's no PK here, which may prove problematic.
If it was me, I'd have one table for transactions, which would include a transaction_id PK, a timestamp for each each transaction, and a column to record whether a value was a credit or a debit.

Retrieving data from 3 Mysql tables

Suppose I have 3 different tables relationships as following
1st is tbl_users(id,gender,name)
2nd is tbl_feeds(id,user_id,feed_value)
3rd is tbl_favs(id,user_id,feed_id)
where id is primary key for every table.
Now suppose I want to get data where those feeds should come which is uploaded by Gender=Male users with one field in every row that should say either the user who is calling this query marked that particular feed as favourite or not.
So final data of result should be like following :
where lets say the person who is calling this query have user_id=2 then is_favourite column should contain 1 if that user marked favourite that particular feed otherwise is_favourite should contain 0.
user_id feed_id feed_value is_favourite gender
1 2 xyz 1 M
2 3 abc 0 M
3 4 mno 0 M
I hope you getting my question , I m able to get feeds as per gender but problem is I m facing problem to get is_favourite flag as per particular user for every feed entry.
I hope some one have these problem before and I can get help from those for sure.
I would be so thankful if some one can resolve my this issue.
Thanks

Something like this should work:
SELECT
u.id AS user_id.
fe.id AS feed_id,
fe.feed_value,
IFNULL(fa.is_favourite, 0),
u.gender
FROM
tbl_users u
JOIN
tbl_feeds fe ON (fe.user_id = u.id)
LEFT JOIN
tbl_favs fa ON (
fa.user_id = u.id
AND
fa.feed_id = fe.id
)

In order to link your tables, you need to find the most common link between them all. This link is user_id. You'll want to create a relationship between all tables with JOIN in order to make sure each and every user has data.
Now I don't know if you're planning on making sure all tables have data with the user_id. But I would use INNER JOIN as it will ONLY show records of that user_id without nulls. If the other tables could POSSIBLY (Not always guaranteed) you should use a LEFT JOIN based on the tables that is it possible with.
Here is an SQLFiddle as an example. However, I recommend you name your ID fields as appropriate to your table's name so that way, there is no confusion!
To get your isFavorite I would use a subquery in order to validate and verify if the user has it selected as a favorite.
SELECT
u.userid,
u.gender,
f.feedsid,
f.feedvalue,
(
SELECT
COUNT(*)
FROM
tbl_favs a
WHERE
a.userid = u.userid AND
a.feedsid = f.feedsid
) as isFavorite
FROM
tbl_users u
INNER JOIN
tbl_feeds f
ON
u.userid = f.userid
~~~~EDIT 1~~~~
In response to your comment, I have updated the SQLFiddle and the query. I don't believe you really need a join now based on the information given. If you were to do a join you would get unexpected results since you would be trying to make a common link between two tables that you do not want. Instead you'll want to just combine the tables together and do a subquery to determine from the favs if it is a favorite of the user's.
SQLFiddle:
SELECT
u.userid,
f.feedsid,
u.name,
u.gender,
f.feedvalue,
(
SELECT
COUNT(*)
FROM
tbl_favs a
WHERE
a.userid = u.userid AND
a.feedsid = f.feedsid
) as isFavorite
FROM
tbl_users u,
tbl_feeds f
ORDER BY
u.userid,
f.feedsid

MySQL - 3 tables, is this complex join even possible?

I have three tables: users, groups and relation.
Table users with fields: usrID, usrName, usrPass, usrPts
Table groups with fields: grpID, grpName, grpMinPts
Table relation with fields: uID, gID
User can be placed in group in two ways:
if collect group minimal number of points (users.usrPts > group.grpMinPts ORDER BY group.grpMinPts DSC LIMIT 1)
if his relation to the group is manually added in relation tables (user ID provided as uID, as well as group ID provided as gID in table named relation)
Can I create one single query, to determine for every user (or one specific), which group he belongs, but, manual relation (using relation table) should have higher priority than usrPts compared to grpMinPts? Also, I do not want to have one user shown twice (to show his real group by points, but related group also)...
Thanks in advance! :) I tried:
SELECT * FROM users LEFT JOIN (relation LEFT JOIN groups ON (relation.gID = groups.grpID) ON users.usrID = relation.uID
Using this I managed to extract specified relations (from relation table), but, I have no idea how to include user points, respecting above mentioned priority (specified first). I know how to do this in a few separated queries in php, that is simple, but I am curious, can it be done using one single query?
EDIT TO ADD:
Thanks to really educational technique using coalesce #GordonLinoff provided, I managed to make this query to work as I expected. So, here it goes:
SELECT o.usrID, o.usrName, o.usrPass, o.usrPts, t.grpID, t.grpName
FROM (
SELECT u.*, COALESCE(relationgroupid,groupid) AS thegroupid
FROM (
SELECT u.*, (
SELECT grpID
FROM groups g
WHERE u.usrPts > g.grpMinPts
ORDER BY g.grpMinPts DESC
LIMIT 1
) AS groupid, (
SELECT grpUID
FROM relation r
WHERE r.userUID = u.usrID
) AS relationgroupid
FROM users u
)u
)o
JOIN groups t ON t.grpID = o.thegroupid
Also, if you are wondering, like I did, is this approach faster or slower than doing three queries and processing in php, the answer is that this is slightly faster way. Average time of this query execution and showing results on a webpage is 14 ms. Three simple queries, processing in php and showing results on a webpage took 21 ms. Average is based on 10 cases, average execution time was, really, a constant time.

Here is an approach that uses correlated subqueries to get each of the values. It then chooses the appropriate one using the precedence rule that if the relations exist use that one, otherwise use the one from the groups table:
select u.*,
coalesce(relationgroupid, groupid) as thegroupid
from (select u.*,
(select grpid from groups g where u.usrPts > g.grpMinPts order by g.grpMinPts desc limit 1
) as groupid,
(select gid from relations r where r.userId = u.userId
) as relationgroupid
from users u
) u

Try something like this
select user.name, group.name
from group
join relation on relation.gid = group.gid
join user on user.uid = relation.uid
union
select user.name, g1.name
from group g1
join group g2 on g2.minpts > g1.minpts
join user on user.pts between g1.minpts and g2.minpts

MySQL query optimization: Multiple SELECT IN to LEFT JOIN

I usually go with the join approach but in this case I am a bit confused. I am not even sure that it is possible at all. I wonder if the following query can be converted to a left join query instead of the multiple select in used:
select
users.id, users.first_name, users.last_name, users.description, users.email
from users
where id in (
select assigned.id_user from assigned where id_project in (
select assigned.id_project from assigned where id_user = 1
)
)
or id in (
select projects.id_user from projects where projects.id in (
select assigned.id_project from assigned where id_user = 1
)
)
This query returns the correct result set. However, I guess the repetition of the query that selects assigned.id_project is a waste.

You could start with the project assignments of user 1 a1. Then find all assignments of other people to those projects a2, and the user in the project table p. The users you are looking for are then in either a2 or p. I added distinct to remove users who can be reached in both ways.
select distinct u.*
from assigned a1
left join
assigned a2
on a1.id_project = a2.id_project
left join
project p
on a1.id_project = p.id
join user u
on u.id = a2.id_user
or u.id = p.id_user
where a1.id_user = 1

Since both subqueries have a condition where assigned.id_user = 1, I start with that query. Let's call that assignment(s) the 'leading assignment'.
Then join the rest, using left joins for the 'optional' tables.
Use an inner join on user that matches either users of assignments linked to the leading assignment or users of projects linked to the leading project.
I use distinct, because I assumen you'd want each user once, event if they have an assignment and a project (or multiple projects).
select distinct
u.id, u.first_name, u.last_name, u.description, u.email
from
assigned a
left join assigned ap on ap.id_project = a.id_project
left join projects p on p.id = a.id_project
inner join users u on u.id = ap.id_user or u.id = p.id_user
where
a.id_user = 1

Here's an alternative way to get rid of the repetition:
SELECT
users.id,
users.first_name,
users.last_name,
users.description,
users.email
FROM users
WHERE id IN (
SELECT up.id_user
FROM (
SELECT id_user, id_project FROM assigned
UNION ALL
SELECT id_user, id FROM projects
) up
INNER JOIN assigned a
ON a.id_project = up.id_project
WHERE a.id_user = 1
)
;
That is, the assigned table's pairs of id_user, id_project are UNIONed with those of projects. The resulting set is then joined with the user_id = 1 projects to obtain the list of all users who share the projects with the ID 1 user. And now it only remains to retrieve the details for those users, which in this case is done in the same way as in your query, i.e. using an IN clause.
I'm sorry to say that I don't have MySQL to thoroughly test the performance of this query and so cannot be quite sure if it is in any way better or worse than your original query or than the one suggested both by #GolezTrol and by #Andomar. Generally I tend to agree with #GolezTrol's comment that a query with simple (semi- or whatever-) joins and repetitive parts might turn out more efficient than an equivalent sophisticated query that doesn't have repetitions. In the end, however, it is testing that must reveal the final answer for you.

MySQL returning results from one table based on data in another table

Before delving into the issue, first I will explain the situation. I have two tables such as the following:
USERS TABLE
user_id
username
firstName
lastName
GROUPS TABLE
user_id
group_id
I want to retrieve all users who's first name is LIKE '%foo%' and who is a part of a group with group_id = 'givengid'
So, the query would like something like this:
SELECT user_id FROM users WHERE firstName LIKE '%foo'"
I can make a user defined sql function such as ismember(user_id, group_id) that will return 1 if the user is a part of the group and 0 if they are not and this to the WHERE clause in the aforementioned select statement. However, this means that for every user who's first name matches the criteria, another query has to be run through thousands of other records to find a potential match for a group entry.
The users and groups table will each have several hundred thousand records. Is it more conventional to use the user defined function approach or run a query using the UNION statement? If the UNION approach is best, what would the query with the union statement look like?
Of course, I will run benchmarks but I just want to get some perspective on the possible range of solutions for this situation and what is generally most effective/efficient.

You should use a JOIN to get users matching your two criteria.
SELECT
user_id
FROM
users
INNER JOIN
groups
ON groups.user_id = users.users_id
AND groups.group_id = given_id
WHERE
firstName LIKE '%foo'

You don't need to use either a UNION or a user-defined function here; instead, you can use a JOIN (which lets you join one table to another one based on a set of equivalent columns):
SELECT u.user_id
FROM users AS u
JOIN groups AS g
ON g.user_id = u.user_id
WHERE g.group_id = 'givengid'
AND u.firstName LIKE '%foo'
What this query does is join rows in the groups table to rows in the users table when the user_id is the same (so if you were to use SELECT *, you would end up with a long row containing the user data and the group data for that user). If multiple groups rows exist for the user, multiple rows will be retrieved before being filtered by the WHERE clause.

Use a join:
SELECT DISTINCT user_id
FROM users
INNER JOIN groups ON groups.user_id = users.user_id
WHERE users.firstName LIKE '%foo'
AND groups.group_id = '23'
The DISTINCT makes sure you don't have duplicate user IDs in the result.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008