Query execution is taking too long - mysql

I currently have two tables in a database. Called Email and unsuscribed both tables have a column called Email. now I want to compare these two tables and wherever email matches I want to update column in email table called Email_status_id to 2...the query I am using is
UPDATE Email E
SET E.Email_status_id = 2
WHERE
E.Email
IN (
SELECT
U.Email
FROM
UNSUSCRIBED U);
I am currently using mysql.
email table has 2704569 rows of Email
and unsuscribed table has 12102 rows of Email
the query execution time is taking forever....
any suggestion to reduce query execution time...

The first thing is to create an index on Unsubscribed(Email):
create index idx_unsubscribed_email on unsubscribed(email);
Or, even better, declare it as the primary key, particularly if it is the only column in the table.
Then, MySQL sometimes does a poor job of implementing in. There are a variety of ways to write the query making use of the index. Exists is a typical method:
update email e
set email_status_id = 2
where exists (select 1 from unsubscribed u where u.email = e.email);
The join version should have similar performance with the index.
EDIT:
An index on email(email) could also help the query. For some reason, I assumed that this would already be a key in the table.

You're doing string comparisons over a large amount of data in an In clause. Since you don't actually need the data returned, you can do this in an Exists:
Update Email E
Set E.Email_status_id = 2
Where Exists
(
Select 1
From Unsubscribed U
Where U.Email = E.Email
)
Aside from that, proper indexing on the Email column in both the Email and Unsubscribed tables would up your performance as well.

IN statements against entire tables are usually slow. This is because it has to run your subquery against every single line in the table to get your filtered result set. Try using a join instead, like so:
Update Unsubscribed U join Email E on E.Email=U.Email
SET E.email_status_id = 2

Related

LEFT JOIN - narrow things down

I'm currently having a problem with a legacy app I just inherited on my new job. I have a SQL query that's way too long to respond and I need to find a way to fasten it.
This query acts on 3 tables:
SESSION contains all users visits
CONTACT contains all the messages people have been sending through a form and contains a "session_id" field that links back to the SESSION id field
ACCOUNT contains users accounts (people who registered on the website) and whose "id" field is linked back in SESSION (through a "SESSION.account_id" field). ACCOUNT and CONTACT are no linked in any way, besides the SESSION table (legacy app...).
I can't change this structure unfortunately.
My query tries to recover ALL the interesting sessions to serve to the administrator. I need to find all sessions that links back to an account OR a contact form.
Currently, the query is structured like that :
SELECT s.id
/* a few fields from ACCOUNT and CONTACT tables */
FROM session s
LEFT JOIN account act ON act.id = s.account_id
LEFT JOIN contact c on c.session_id = s.id
WHERE s.programme_id = :program_id
AND (
c.id IS NOT NULL
OR
act.id IS NOT NULL
)
Problem is, the SESSION table is growing pretty fast (as you can expect) and with 400k records it slows things down for some programs ( :programme_id in the query).
I tried to use an UNION query with two INNER JOIN query, one between SESSION and ACCOUNT and the other one between SESSION and CONTACT, but it doesn't give me the same number of records and I don't really understand why.
Can somebody help me to find a better way to make this query ?
Thanks a lot in advance.
I think you just need indexes. For this query:
SELECT s.id
/* a few fields from ACCOUNT and CONTACT tables */
FROM session s LEFT JOIN
account act
ON act.id = s.account_id LEFT JOIN
contact c
ON c.session_id = s.id
WHERE s.programme_id = :program_id AND
(c.id IS NOT NULL OR act.id IS NOT NULL);
You want indexes on session(programme_id, account_id, id), account(id) and contact(session_id).
It is important that programme_id be the first column in the index on session.
#Gordon already suggested you add an index, which is generally the easy and effective solution, so I'm going to answer a different part of your question.
I tried to use an UNION query with two INNER JOIN query, one between
SESSION and ACCOUNT and the other one between SESSION and CONTACT, but
it doesn't give me the same number of records and I don't really
understand why.
That part is rather simple: the JOIN returns a result set that contains the rows of both tables joined together. So in the first case you would end up with a result that looks like
session.id, session.column2, session.column3, ..., account.id, account.column2, account.column3, ....
and a second where
session.id, session.column2, session.column3, ..., contact.id, contact.column2, contact.column3, ....
Then an UNION will faill unless the contact and account tables have the same number of columns with correspoding types, which is unlikely. Otherwise, the database will be unable to perform a UNION. From the docs (emphasis mine):
The column names from the first SELECT statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each SELECT statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)
Just perform both INNER JOINs seperately and compare the results if you're unsure.
If you want to stick to an UNION solution, make sure to perform a SELECT only on corresponding columns : doing SELECT s.id would be trivial but it should work, for instance.

How can I optimise this COUNT DISTINCT on joined InnoDB tables?

SELECT COUNT(DISTINCT r.id)
FROM views v
INNER JOIN emails e ON v.email_id = e.id
INNER JOIN recipients r ON e.recipient_id = r.id
INNER JOIN campaigns c ON e.campaign_id = c.id
WHERE c.centre_id IS NULL;
... or, "how many unique email opens have we had? (on general campaigns)"
Currently takes about a minute and a half to run on an Amazon RDS instance. Total rows for the tables involved are roughly:
campaigns: 250
recipients: 330,000
views: 530,000
emails: 1,380,000
EXPLAIN gives me:
1 SIMPLE r index PRIMARY UNIQ_146632C4E7927C74 767 NULL 329196 Using index
1 SIMPLE e ref PRIMARY,IDX_4C81E852E92F8F78,IDX_4C81E852F639F774 IDX_4C81E852E92F8F78 111 ecomms.r.id 1 Using where
1 SIMPLE v ref IDX_11F09C87A832C1C9 IDX_11F09C87A832C1C9 111 ecomms.e.id 1 Using where; Using index
1 SIMPLE c eq_ref PRIMARY,IDX_E3737470463CD7C3 PRIMARY 110 ecomms.e.campaign_id 1 Using where
What can I do to get this total faster?
You need to join recipients only if you are not enforcing a foreign key constraint between recipients.id and emails.recipent_id, and you want to exclude recipients who are not (any longer) enlisted in the recipients table. Otherwise, omit that table from the join straight away; you can use emails.recipient_id instead of recipients.id. Omitting that join should be a big win.
Alternatively, omit recipients from the join on the basis that it is not relevant to the question posed, which is about unique emails opened, not about unique recipients to open any email. In that case you should be able to just SELECT COUNT(*) FROM ... because each emails row is already unique.
Other than that, it looks like you're already getting good use of your indexes, though I confess I find the EXPLAIN PLAN output difficult to read, especially without headings. Still, it looks like your query doesn't read the base tables at all, so it's unlikely that adding new indexes would help.
You could try executing an OPTIMIZE TABLE on the tables involved in your query, though that probably sounds more hopeful than it should.
You should periodically run ANALYZE TABLE on the tables involved in this query, to give the query optimizer has the greatest likelihood of choosing the best possible plan. It looks like the optimizer is already choosing a reasonable plan, though, so this may not help much.
If you still need better performance then there are other possibilities (including moving to faster hardware), but they are too numerous to discuss here.
You want MySQL to be able to utilize the WHERE clause to limit the result set immediately. In order to do that, you need the proper indexes to join from campaigns to emails, then from emails to recipients and views.
Put an index on campaigns.centre_id to aid the search (satisfy the WHERE clause). I'm assuming campaigns.id is the primary key on that table.
Put an index on emails.campaign_id to aid the join to emails from campaigns. Add recipient_id and email_id to that index to provide a covering index.
Now, the EXPLAIN result should show the tables in order, starting from campaigns, then emails, then the other two. MySQL will still need an internal temporary table to apply the DISTINCT. Are you sure you need that?
I'm assuming emails.id and recipients.id are the primary keys.

Can you build a MySQL query to show not found results from the conditional

I am trying to find out how to find the emails that do not exist in a table using the emails from the conditional.
I could create a table with these emails but that seems like overkill for what I need it for.
What I am looking for is a query that would show me the conditional value and NULL as the user ID.
Is this possible?
I have a query like this:
SELECT u.uid, u.mail
FROM `users` u
WHERE u.mail IN (
'alot#of',
'emails#that',
'ineed#tofind',
)
This works great at finding the emails and associating the user id. Now I need to identify which emails do not exist in the result. I am currently only using 56 emails and 6 do not appear in the list. I am trying to identify which emails are not found.
NOT IN won't work as I have over 40,000 users. I only want to identify the emails not found from my conditional. I have 56 emails and only 50 results. I need to identify the 6 not found (they may not even be in the table at all)
Let me attempt to clarify this a little more:
I am given a list of emails for supposed accounts in the system. I am trying to find the accounts from the given email. This part is fine. Now, the issue I am having, I was given 56 emails but only 50 were found. I need to identify which emails out of the 56 were not found. The emails are all thrown into the conditional. NOT IN won't work because it would return all user but the 50 that were found. (roughly 40,000) I just need to identify the emails from the conditional that were not found in the table.
Thanks for any insight or suggestions to do what I need.
There isn't a way to do what you want without creating some additional items to track the emails. Basically, you're trying to get MySQL to tell you which items in the WHERE portion aren't found, but MySQL can only tell you about rows in a table.
You need to make a secondary table that stores the email addresses from your list, call it list. I would make it a single column table with just the emails. Then LEFT JOIN it against the users table and find where the uid is null.
SELECT u.uid, l.mail
FROM `list` l
LEFT JOIN `users` u ON u.mail=l.mail
WHERE u.uid IS NULL
As posted in the comments, NOT IN may be helpful. But there are also other ways. One of them is to left join your table with the result of your query and show only non-coincident rows:
select u.uid, u.mail
from users as u
left join (
select u.uid, u.mail
from users
where mail in ('alot#of','emails#that','ineed#tofind')
) as a on u.uid = a.uid
where a.uid is null;
Add the fields you need to the join (if uid is not enough)
So your question now becomes more complicated... you want to find all the E-Mails in your condition that are not found in your table.
As far as I know, there's not a simple SQL sentence that will give you that... but you can work with temp tables and get it. The solution implies:
Create a temporary table to hold the values you want to search (and add the appropriate indexes to it)
Insert the values you want to search
Execute a select query to find non-matching rows
So... let's do it:
-- 1. Create a temp table to hold the values
drop table if exists temp_search_values;
create temporary table temp_search_values (
mail varchar(100),
unique index idx_mail(mail) -- Don't allow duplicate values here
);
-- 2. Insert the search values
insert into temp_search_values (mail) values
('alot#of'),('emails#that'),('ineed#tofind');
-- 3. Execute the query
select a.*
from users as u
left join temp_search_values as a on u.mail = a.mail
where u.mail is null;
Remember: Temporary tables are only visible to the connection that created them, and are deleted when the connection is closed or killed.
NULL is a strange result. It's not true and it's not false. If you want to check for it, you have to look specifically.
SELECT u.uid, u.mail
FROM `users` u
WHERE u.mail NOT IN (
'alot#of',
'emails#that',
'ineed#tofind',
) and u.uid IS NULL
* Oh, I see what you're getting at. This will work, although it's not pretty. *
select * from
(SELECT 'emails#that' as v
UNION SELECT 'alot#of' as v,
UNION SELECT 'ineed#tofind' as v
) as test
left join users on u.mail = test.v
where u.uid is null

Mysql query where not exists

I have three different tables - subscribers, unsubscribers, mass subscribers.
I'd like to print out each email from the mass subscribers table. However that email can only be printed if it doesn't exist in both subscribers and unsubscribers tables.
I know how to do this with arrays, however I want a plain mysql query.
What would mysql query be?
Thanks!
You can do that with a subquery (this is slow! Please read below the line):
SELECT email
FROM subscribers
WHERE email NOT IN(SELECT email FROM unsubscribers)
However, this is very bad for performance. I suggest you change the way you have your database, with just 1 table subscribers, and add a column active(tinyint). When someone unsubscribes, you set that value from 1 to 0. After that you can stay in 1 table:
SELECT email FROM subscribers WHERE active=1
This is faster because of some reasons:
No subquery
The where is bad, because you are going to select a heap of data, and compare strings
Selecting on integer in VERY fast (especially when you index it)
Apart from the fact that this is faster, it would be better for your database structure. You dont want two tables doing almost the same, with emailadresses. This will create duplicate data and a chance for misalignments
You sound like someone who doesn't have much experience with SQL. Your title does point in the right direction. Here is how you put the components together:
select m.*
from mass_subscribers m
where not exists (select 1 from subscribers s where s.email = m.email) and
not exists (select 1 from unsubscribers u where u.email = m.email);
NOT EXISTS happens to be a very good choice for this type of query; it is typically pretty efficient in both MySQL and other databases.
Without subqueries, using join
SELECT mass_subscribers.*
FROM mass_subscribers ms
LEFT JOIN subscribers s ON ms.email=s.email
LEFT JOIN unsubscribers us ON us.email=s.email
WHERE
ms.email IS NULL
AND
us.email IS NULL

How to combine 5 tables together with same ID in a query?

I have 5 different tables T_DONOR, T_RECIPIENT_1, T_RECIPIENT_2, T_RECIPIENT_3, and T_RECIPIENT_4. All 5 tables have the same CONTACT_ID.
This is the T_DONOR table:
T_RECIPIENT_1:
T_RECIPIENT_2:
This is what I want the final table to look like with more recipients and their information to the right.
T_RECIPIENT_3 and T_RECIPIENT_4 are the same as T_RECIPIENT_1 and T_RECIPIENT_2 except that they have different RECIPIENT ID and different names. I want to combine all 5 of these tables so on one line I can have the DONOR_CONTACT_ID which his information, and then all of the Recipient's information.
The problem is that when I try to run a query, it does not work because not all of the Donors have all of the recipient fields filled, so the query will run and give a blank table. Some instances I have a Donor with 4 Recipients and other times I have a Donor with only 1 Recipient so this causes a problem. I've tried running queries where I connect them with the DONOR_CONTACT_ID but this will only work if all of the RECIPIENT fields are filled. Any suggestions on what to do? Is there a way I could manipulate this in VBA? I only know some VBA, I'm not an expert.
First I think you want all rows from T_DONOR. And then you want to pull in information from the recipient tables when they include DONOR_CONTACT_ID matches. If that is correct, LEFT JOIN T_DONOR to the other tables.
Start with a simpler set of fields; you can add in the "name" fields after you get the joins set to correctly return the rest of the data you need.
SELECT
d.DONOR_CONTACT_ID,
r1.RECIPIENT_1,
r2.RECIPIENT_1
FROM
(T_DONOR AS d
LEFT JOIN T_RECIPIENT_1 AS r1
ON d.ORDER_NUMBER = r1.ORDER_NUMBER)
LEFT JOIN T_RECIPIENT_2 AS r2
ON d.ORDER_NUMBER = r2.ORDER_NUMBER;
Notice the parentheses in the FROM clause. The db engine requires them for any query which includes more than one join. If possible, set up your joins in Design View of the query designer. The query designer knows how to add parentheses to keep the db engine happy.
Here is a version without aliased table names in case it's easier to understand and set up in the query designer ...
SELECT
T_DONOR.DONOR_CONTACT_ID,
T_RECIPIENT_1.RECIPIENT_1,
T_RECIPIENT_2.RECIPIENT_1
FROM
(T_DONOR
LEFT JOIN T_RECIPIENT_1
ON T_DONOR.ORDER_NUMBER = T_RECIPIENT_1.ORDER_NUMBER)
LEFT JOIN T_RECIPIENT_2
ON T_DONOR.ORDER_NUMBER = T_RECIPIENT_2.ORDER_NUMBER;
SELECT T_DONOR.ORDER_NUMBER, T_DONOR.DONOR_CONTACT_ID, T_DONOR.FIRST_NAME, T_DONOR.LAST_NAME, T_RECIPIENT_1.RECIPIENT_1, T_RECIPIENT_1.FIRST_NAME, T_RECIPIENT_1.LASTNAME
FROM T_DONOR
JOIN T_RECIPIENT_1
ON T_DONOR.DONOR_CONTACT_ID = T_RECIPIENT_1.DONOR_CONTACT_ID
This shows you how to JOIN the first recipient table, you should be able to follow the same structure for the other three...