Mysql query where not exists - mysql

I have three different tables - subscribers, unsubscribers, mass subscribers.
I'd like to print out each email from the mass subscribers table. However that email can only be printed if it doesn't exist in both subscribers and unsubscribers tables.
I know how to do this with arrays, however I want a plain mysql query.
What would mysql query be?
Thanks!

You can do that with a subquery (this is slow! Please read below the line):
SELECT email
FROM subscribers
WHERE email NOT IN(SELECT email FROM unsubscribers)
However, this is very bad for performance. I suggest you change the way you have your database, with just 1 table subscribers, and add a column active(tinyint). When someone unsubscribes, you set that value from 1 to 0. After that you can stay in 1 table:
SELECT email FROM subscribers WHERE active=1
This is faster because of some reasons:
No subquery
The where is bad, because you are going to select a heap of data, and compare strings
Selecting on integer in VERY fast (especially when you index it)
Apart from the fact that this is faster, it would be better for your database structure. You dont want two tables doing almost the same, with emailadresses. This will create duplicate data and a chance for misalignments

You sound like someone who doesn't have much experience with SQL. Your title does point in the right direction. Here is how you put the components together:
select m.*
from mass_subscribers m
where not exists (select 1 from subscribers s where s.email = m.email) and
not exists (select 1 from unsubscribers u where u.email = m.email);
NOT EXISTS happens to be a very good choice for this type of query; it is typically pretty efficient in both MySQL and other databases.

Without subqueries, using join
SELECT mass_subscribers.*
FROM mass_subscribers ms
LEFT JOIN subscribers s ON ms.email=s.email
LEFT JOIN unsubscribers us ON us.email=s.email
WHERE
ms.email IS NULL
AND
us.email IS NULL

Related

LEFT JOIN - narrow things down

I'm currently having a problem with a legacy app I just inherited on my new job. I have a SQL query that's way too long to respond and I need to find a way to fasten it.
This query acts on 3 tables:
SESSION contains all users visits
CONTACT contains all the messages people have been sending through a form and contains a "session_id" field that links back to the SESSION id field
ACCOUNT contains users accounts (people who registered on the website) and whose "id" field is linked back in SESSION (through a "SESSION.account_id" field). ACCOUNT and CONTACT are no linked in any way, besides the SESSION table (legacy app...).
I can't change this structure unfortunately.
My query tries to recover ALL the interesting sessions to serve to the administrator. I need to find all sessions that links back to an account OR a contact form.
Currently, the query is structured like that :
SELECT s.id
/* a few fields from ACCOUNT and CONTACT tables */
FROM session s
LEFT JOIN account act ON act.id = s.account_id
LEFT JOIN contact c on c.session_id = s.id
WHERE s.programme_id = :program_id
AND (
c.id IS NOT NULL
OR
act.id IS NOT NULL
)
Problem is, the SESSION table is growing pretty fast (as you can expect) and with 400k records it slows things down for some programs ( :programme_id in the query).
I tried to use an UNION query with two INNER JOIN query, one between SESSION and ACCOUNT and the other one between SESSION and CONTACT, but it doesn't give me the same number of records and I don't really understand why.
Can somebody help me to find a better way to make this query ?
Thanks a lot in advance.
I think you just need indexes. For this query:
SELECT s.id
/* a few fields from ACCOUNT and CONTACT tables */
FROM session s LEFT JOIN
account act
ON act.id = s.account_id LEFT JOIN
contact c
ON c.session_id = s.id
WHERE s.programme_id = :program_id AND
(c.id IS NOT NULL OR act.id IS NOT NULL);
You want indexes on session(programme_id, account_id, id), account(id) and contact(session_id).
It is important that programme_id be the first column in the index on session.
#Gordon already suggested you add an index, which is generally the easy and effective solution, so I'm going to answer a different part of your question.
I tried to use an UNION query with two INNER JOIN query, one between
SESSION and ACCOUNT and the other one between SESSION and CONTACT, but
it doesn't give me the same number of records and I don't really
understand why.
That part is rather simple: the JOIN returns a result set that contains the rows of both tables joined together. So in the first case you would end up with a result that looks like
session.id, session.column2, session.column3, ..., account.id, account.column2, account.column3, ....
and a second where
session.id, session.column2, session.column3, ..., contact.id, contact.column2, contact.column3, ....
Then an UNION will faill unless the contact and account tables have the same number of columns with correspoding types, which is unlikely. Otherwise, the database will be unable to perform a UNION. From the docs (emphasis mine):
The column names from the first SELECT statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each SELECT statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)
Just perform both INNER JOINs seperately and compare the results if you're unsure.
If you want to stick to an UNION solution, make sure to perform a SELECT only on corresponding columns : doing SELECT s.id would be trivial but it should work, for instance.

Joining tables on two fields

I always have confusion when it comes into JOINING tables.
So, I have a table that stores the user details called tblUsers having the following fields(for the sake of simplicity, I am including only the required fields here while posting):
user_id
first_name
And I have another table which stores the messages called tblMessages:
msg_id
sender_id
recipient_id
msg_body
Now what am trying to do is to fetch all messages, with the user names too. What I have tried is this:
SELECT
`msg_id`,
(SELECT `first_name` FROM `tblUsers` WHERE `tblUsers`.`user_id` = `tblMessages`.`sender_id`) AS `sender_name`,
(SELECT `first_name` FROM `tblUsers` WHERE `tblUsers`.`user_id` = `tblMessages`.`recipient_id`) AS `recipient_name`,
`msg_body`
FROM `tblMessages`
It seems to be working at the moment. But is this the correct way for attaining my goal? Or will JOINing the tables will be better? The tblMessages can grow to a large number of rows probably. If we are going to do the JOIN, then we will do 2 LEFT JOINs? First, on the sender_id of tblMessages with user_id of tblUsers and again recipient_id of tblMessages with user_id of tblUsers. Is that correct?
Let me know your suggestions or corrections on my approach.
This is going to be your best query (It will run queries once, and then join tables on their indices):
SELECT m.`msg_id`, su.`first_name` AS `sender_name`, ru.`first_name` AS `recipient_name`, m.`msg_body`
FROM `tblMessages` m
LEFT JOIN `tblUsers` su ON m.`sender_id` = su.`user_id`
LEFT JOIN `tblUsers` ru ON m.`recipient_id` = ru.`user_id`;
When in doubt, use EXPLAIN right before your query to determine what indexes it's going to use, and how efficient it's going to be. Check out these sqlfiddles containing the EXPLAIN's for each query.
You can read a bit about the reasoning for choosing this query over yours here and straight from the docs here. EXPLAIN is also a helpful tool that can help you understand where your bottlenecks are and what is causing performance issues on your database (This likely isn't going to impact it very much, but you can always do some performance tests when your database reaches a healthy size.
You should JOIN the same table twice, using two different aliases for example s and r:
SELECT
m.msg_id,
m.sender_id,
s.first_name,
m.recipient_id,
r.first_name,
m.msg_body
FROM
tblMessages AS m
LEFT JOIN tblUsers AS s ON m.sender_id=s.user_id
LEFT JOIN tblUsers AS r ON m.recipient_id=r.user_id
but your approach is not wrong, it works and with proper indexes shouldn't be much slower.

Query execution is taking too long

I currently have two tables in a database. Called Email and unsuscribed both tables have a column called Email. now I want to compare these two tables and wherever email matches I want to update column in email table called Email_status_id to 2...the query I am using is
UPDATE Email E
SET E.Email_status_id = 2
WHERE
E.Email
IN (
SELECT
U.Email
FROM
UNSUSCRIBED U);
I am currently using mysql.
email table has 2704569 rows of Email
and unsuscribed table has 12102 rows of Email
the query execution time is taking forever....
any suggestion to reduce query execution time...
The first thing is to create an index on Unsubscribed(Email):
create index idx_unsubscribed_email on unsubscribed(email);
Or, even better, declare it as the primary key, particularly if it is the only column in the table.
Then, MySQL sometimes does a poor job of implementing in. There are a variety of ways to write the query making use of the index. Exists is a typical method:
update email e
set email_status_id = 2
where exists (select 1 from unsubscribed u where u.email = e.email);
The join version should have similar performance with the index.
EDIT:
An index on email(email) could also help the query. For some reason, I assumed that this would already be a key in the table.
You're doing string comparisons over a large amount of data in an In clause. Since you don't actually need the data returned, you can do this in an Exists:
Update Email E
Set E.Email_status_id = 2
Where Exists
(
Select 1
From Unsubscribed U
Where U.Email = E.Email
)
Aside from that, proper indexing on the Email column in both the Email and Unsubscribed tables would up your performance as well.
IN statements against entire tables are usually slow. This is because it has to run your subquery against every single line in the table to get your filtered result set. Try using a join instead, like so:
Update Unsubscribed U join Email E on E.Email=U.Email
SET E.email_status_id = 2

Using keys on JOIN

I want to get data that is separated on three tables:
app_android_devices:
id | associated_user_id | registration_id
app_android_devices_settings:
owner_id | is_user_id | notifications_receive | notifications_likes_only
app_android_devices_favorites:
owner_id | is_user_id | image_id
owner_id is either the id from app_android_devices or the associated_user_id, indicated by is_user_id.
That is because the user of my app should be able to login to their account or use the app anonymously. If the user logged in he will have the same settings and likes on all devices.
associated_user_id is 0 if the device is used anonymously or the user ID from another table.
Now i've got the following query:
SELECT registration_id
FROM app_android_devices d
JOIN app_android_devices_settings s
ON ((d.id=s.owner_id AND
s.is_user_id=0)
OR (
d.associated_user_id=s.owner_id AND
s.is_user_id=1))
JOIN app_android_devices_favorites f
ON (((d.id=f.owner_id AND
f.is_user_id=0)
OR
d.associated_user_id=f.owner_id AND
f.is_user_id=1)
AND f.image_id=86)
WHERE s.notifications_receive=1
AND (s.notifications_likes_only=0 OR f.image_id=86);
To decide if the device should receive a push notification on a new comment. I've set the following keys:
app_android_devices: id PRIMARY, associated_user_id
app_android_devices_settings: (owner_id, is_user_id) UNIQUE, notifications_receive, notifications_likes_only
app_android_devices_favorites: (owner_id, is_user_id, image_id) UNIQUE
I've noticed that the above query is really slow. If I run EXPLAIN on that query I see that MySQL is using no keys at all, although there are possible_keys listed.
What can I do to speed this query up?
Having such complicated JOIN conditions makes life hard for everyone. It makes life hard for the developer who wants to understand your query, and for the query optimizer that wants to give you exactly what you ask for while preferring more efficient operations.
So the first thing that I want to do, when you tell me that this query is slow and not using any index, is to take it apart and put it back together with simpler JOIN conditions.
From the way you describe this query, it sounds like the is_user_id column is a sort of state variable telling you whether the user is or is not logged in to your app. This is awkward to say the least; what happens if s.is_user_id != f.is_user_id? Why store this in both tables? For that matter, why store this in your database at all, instead of in a cookie?
Perhaps there's something I'm not understanding about the functionality you're going for here. In any case, the first thing I see that I want to get rid of is the OR in your JOIN conditions. I'm going to try to avoid making too many assumptions about which values in your query represent user input; here's a slightly generic example of how you might be able to rewrite these JOIN conditions as a UNION of two SELECT statements:
SELECT ... FROM
app_android_devices d
JOIN
app_android_devices_settings s ON d.id = s.owner_id
JOIN
app_android_devices_favorites f ON d.id = f.owner_id
WHERE s.is_user_id = 0 AND f.is_user_id = 0 AND ...
UNION ALL
SELECT ... FROM
app_android_devices d
JOIN
app_android_devices_settings s ON d.associated_user_id = s.owner_id
JOIN
app_android_devices_favorites f ON d.associated_user_id = f.owner_id
WHERE s.is_user_id = 1 AND f.is_user_id = 1 AND ...
If these two queries hit your indexes and are very selective, you might not notice the additional overhead (creation of a temporary table) required by the UNION operation. It looks as though one of your result sets may even be empty, in which case the cost of the UNION should be nil.
But, maybe this doesn't work for you; here's another suggestion for an optimization you might pursue. In your original query, you have the following condition:
WHERE s.notifications_receive=1
AND (s.notifications_likes_only=0 OR f.image_id=86);
This isn't too cryptic - you want results only when the notifications_receive setting is true, and only if the notifications_likes_only setting is false or the requested image is a "favorite" image. Depending on the state of notifications_likes_only, it looks like you may not even care about the favorites table - wouldn't it be nice to avoid even reading from that table unless absolutely necessary?
This looks like a good case for EXISTS(). Instead of joining app_android_devices_favorites, try using a condition like this:
WHERE s.notifications_receive = 1
AND (s.notifications_likes_only = 0
OR EXISTS(SELECT 1 FROM app_android_devices_favorites
WHERE image_id = 86 AND owner_id = s.owner_id)
It doesn't matter what you try to SELECT in an EXISTS() subquery; some people prefer *, I like 1, but even if you gave specific columns it wouldn't affect the execution plan.

read a list of values from another table using subquery and check where in condition

A small question may be it is silly but I am not getting idea how to solve this problem
select * from customers where id in(select assigned from users where username='test');
in the above query
select assigned from users where username='test'
this returns 1,2
but the condition where in doesnot work which should be like below
select * from customers where id in(1,2);
this is not the exact output i am just guessing that it might be this way. which is not so the problem is occuring.
i am getting only one row that is corresponding to 1
so help me figuring this out.
please check the sqlfiddle below:
http://sqlfiddle.com/#!2/95c28/2
thanks
SELECT DISTINCT c.*
FROM customers c
JOIN users u ON FIND_IN_SET(c.id, u.assigned) IS NOT NULL
Putting comma-separated values is a bad idea in relational databases, it makes everything more complicated. You should use a relation table instead, so you can write a normal equality join. The above query cannot be indexed, so it will be very innefficient if the tables are large.
SQLFIDDLE
if select assigned from users where username='test' returns 1,2. This means your customer table contains only Id=1.