How to simplify/improve this mysql delete query - mysql

We regularly send a newsletter to our subscribers. We want to remove subscribers who never open our emails nor read them.
Here's the query I have put together for this - it removes subscribers/their events where they have not replied to 5 emails or more.
It seems a little awkward (and big!) and I was wondering if there was a simpler and more elegant/efficient way to do this query (maybe with joins??) as it does take a while.
DELETE FROM list_subscriber_events where
list_subscriber_events.subscriberid IN
(SELECT list_subscribers.emailaddress, list_subscriber_events.subscriberid, list_subscriber_events.eventtype, count(list_subscriber_events.eventtype) as total
FROM `list_subscriber_events`
LEFT JOIN list_subscribers on
list_subscriber_events.subscriberid=list_subscribers.subscriberid
AND list_subscribers.subscriberid<>''
AND list_subscriber_events.subscriberid<>''
AND list_subscribers.subscriberid NOT IN (select subscriberid from stats_emailopens)
AND list_subscribers.subscriberid NOT IN (select subscriberid from stats_linkclicks)
GROUP BY list_subscriber_events.subscriberid
HAVING count(list_subscriber_events.eventtype) > 5 );

To start with the IN statement in a DELETE query (or almost any query): IN tends to result in very high query execution times in mysql. The other NOT IN statements might be bad for performance also (you have to test the different cases), so this is a rewrite of the query to get rid of the NOT IN.
A rewrite of this query might be better in the following style:
CREATE VIEW myUsersToBeDeleted AS
SELECT lse.subscriberid
FROM `list_subscriber_events` lse
LEFT JOIN list_subscribers ls ON lse.subscriberid=ls.subscriberid
AND ls.subscriberid<>''
AND lse.subscriberid<>''
LEFT JOIN stats_emailopens se ON ls.subscriberid=se.subscriberid
LEFT JOIN stats_linkclicks sl ON ls.subscriberid=sl.subscriberid
WHERE sl.subscriberid IS NULL AND se.subscriberid IS NULL
GROUP BY lse.subscriberid
HAVING count(lse.eventtype) > 5 ;
The DELETE is then easier and quicker:
DELETE lse FROM list_subscriber_events lse, myUsersToBeDeleted b WHERE
lse.subscriberid=b.subscriberid;
Last hint: Migrate to MariaDB to get in general way better performance from using views. MySQL is pretty poor on that level.

Related

Conditionals in WHEREs or JOINs?

Lets say I have the following query:
SELECT occurs.*, events.*
FROM occurs
INNER JOIN events ON (events.event_id = occurs.event_id)
WHERE event.event_state = 'visible'
Another way to do the same query and get the same results would be:
SELECT occurs.*, events.*
FROM occurs
INNER JOIN events ON (events.event_id = occurs.event_id
AND event.event_state = 'visible')
My question. Is there a real difference? Is one way faster than the other? Why would I choose one way over the other?
For an INNER JOIN, there's no conceptual difference between putting a condition in ON and in WHERE. It's a common practice to use ON for conditions that connect a key in one table to a foreign key in another table, such as your event_id, so that other people maintaining your code can see how the tables relate.
If you suspect that your database engine is mis-optimizing a query plan, you can try it both ways. Make sure to time the query several times to isolate the effect of caching, and make sure to run ANALYZE TABLE occurs and ANALYZE TABLE events to provide more info to the optimizer about the distribution of keys. If you do find a difference, have the database engine EXPLAIN the query plans it generates. If there's a gross mis-optimization, you can create an Oracle account and file a feature request against MySQL to optimize a particular query better.
But for a LEFT JOIN, there's a big difference. A LEFT JOIN is often used to add details from a separate table if the details exist or return the rows without details if they do not. This query will return result rows with NULL values for b.* if no row of b matches both conditions:
SELECT a.*, b.*
FROM a
LEFT JOIN b
ON (condition_one
AND condition_two)
WHERE condition_three
Whereas this one will completely omit results that do not match condition_two:
SELECT a.*, b.*
FROM a
LEFT JOIN b ON some_condition
WHERE condition_two
AND condition_three
Code in this answer is dual licensed: CC BY-SA 3.0 or the MIT License as published by OSI.

Optimize "JOIN" query

this is my query from my source code
SELECT `truyen`.*, MAX(chapter.chapter) AS last_chapter
FROM (`truyen`)
LEFT JOIN `chapter` ON `chapter`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE \'%%\'
GROUP BY `truyen`.`Id`
LIMIT 250
When I install it on iFastnet host, It cause over 500,000 rows to be examined due to the join, and the query is being blocked (this would used over 100% of a CPU, which ultimately would cause server instability).
I also tried to add this line before the query, it fixed the problem above but lead to another issue making some of functions can not run correctly
mysql_query("SET SQL_BIG_SELECTS=1");
How can I fix this problem without buying another hosting ?
Thanks.
You might be looking for an INNER JOIN. That would remove results that do not match. I find INNER JOINs to be faster than LEFT JOINs.
However, I'm not sure what results you are actually looking for. But because you are using the GROUP BY, it looks like the INNER JOIN might work for you.
One thing I would recommend is copy and paste the query that it generates into SQL with DESCRIBE before it.
So if the query ended up being:
SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
You would type:
DESCRIBE SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
This will tell you if you could possibly ad an index to your table to JOIN on faster.
I hope this at least points you in the right direction.
Michael Berkowski seems to agree with the indexing, which you will be able to see from the DESCRIBE.
Please look if you have indexes on chapter.chapter and chapter.truyen. If not, set them and try again. If this is not successful try these suggestions:
Do you have the possibility to flag permanently on insert/update your last chapter in a column of your chapter table? Then you could use it to reduce the joined rows and you could drop out the GROUP BY. Maybe in this way:
SELECT `truyen`.*, `chapter`.`chapter` as `last_chapter`
FROM `truyen`, `chapter`
WHERE `chapter`.`truyen` = `truyen`.`Id`
AND `chapter`.`flag_last_chapter` = 1
AND `truyen`.`title` LIKE '%queryString%'
LIMIT 250
Or create a new table for that instead:
INSERT INTO new_table (truyen, last_chapter)
SELECT truyen, MAX(chapter) FROM chapter GROUP BY truyen;
SELECT `truyen`.*, `new_table`.`last_chapter`
FROM (`truyen`)
LEFT JOIN `new_table` ON `new_table`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE '%queryString%'
GROUP BY `truyen`.`Id`
LIMIT 250
Otherwise you could just fetch the 250 rows of truyen, collect your truyen ids in an array and build another SQL Statement to select the 250 rows of the chapter table. I have seen in your original question that you can use PHP for that. So you could merge the results after that:
SELECT * FROM truyen
WHERE title LIKE '%queryString%'
LIMIT 250
SELECT truyen, MAX(chapter) AS last_chapter
FROM chapter
WHERE truyen in (comma_separated_ids_from_first_select)

Mysql query where not exists

I have three different tables - subscribers, unsubscribers, mass subscribers.
I'd like to print out each email from the mass subscribers table. However that email can only be printed if it doesn't exist in both subscribers and unsubscribers tables.
I know how to do this with arrays, however I want a plain mysql query.
What would mysql query be?
Thanks!
You can do that with a subquery (this is slow! Please read below the line):
SELECT email
FROM subscribers
WHERE email NOT IN(SELECT email FROM unsubscribers)
However, this is very bad for performance. I suggest you change the way you have your database, with just 1 table subscribers, and add a column active(tinyint). When someone unsubscribes, you set that value from 1 to 0. After that you can stay in 1 table:
SELECT email FROM subscribers WHERE active=1
This is faster because of some reasons:
No subquery
The where is bad, because you are going to select a heap of data, and compare strings
Selecting on integer in VERY fast (especially when you index it)
Apart from the fact that this is faster, it would be better for your database structure. You dont want two tables doing almost the same, with emailadresses. This will create duplicate data and a chance for misalignments
You sound like someone who doesn't have much experience with SQL. Your title does point in the right direction. Here is how you put the components together:
select m.*
from mass_subscribers m
where not exists (select 1 from subscribers s where s.email = m.email) and
not exists (select 1 from unsubscribers u where u.email = m.email);
NOT EXISTS happens to be a very good choice for this type of query; it is typically pretty efficient in both MySQL and other databases.
Without subqueries, using join
SELECT mass_subscribers.*
FROM mass_subscribers ms
LEFT JOIN subscribers s ON ms.email=s.email
LEFT JOIN unsubscribers us ON us.email=s.email
WHERE
ms.email IS NULL
AND
us.email IS NULL

mysql query finding results using joined status table that needs EXIST and NOT EXIST results

I have been looking around for ages for a solution to my problem.
I have something that works but i am not sure it is the most efficient way of doing things and can't find anyone trying to do this when googling around.
I have a table with customers and a table with statuses that that customer has had.
If I want to find results where a customer has had a status happen I have managed to get the required results using a join, but sometimes I want to be able to find clients where not only has a status been reached but also where a few other statuses haven't been.
Currently I am doing this with a NOT EXISTS Sub query but it seem a bit slow and thinking about it if I have to check after finding a result that matches the first status through all the results again to see if it doesn't match another it could explain the slowness.
for instance a client could have a status of invoiced and a status of paid.
If I wanted to see which clients have been invoiced thats fine, If I want to see which clients have been invoiced and paid thats fine, but if I wanted to see which clients have been invoiced but NOT paid thats where I start having to use a NOT EXIST subquery
Is there another more efficient way around this? or is this the best way to proceed but I need to sort out how mysql uses indxes with these tables to be more efficient?
I can provide more detail of the actual sql if that helps?
Thanks
Matt
If this is over multiple clients then the usual solution would be to have a subselect for the status per client and then use LEFT OUTER JOIN to connect this.
Something like
SELECT *
FROM Clients a
LEFT OUTER JOIN (SELECT ClientId, COUNT(*) FROM ClientsStatus WHERE Status IN (1,2) GROUP BY ClientId) b
ON a.ClientId = b.ClientId
WHERE b.ClientId IS NULL
This (very rough) example is to give you a list of clients who do not have a status of 1 or 2.
You should be able to expand this basic idea to cover the scenarios / data you are dealing with
Edited for below
I have had a play with your SQL. I think you can use a JOIN onto the subselect fairly easily, but this doesn't seem to be checking anything other than whether a claim has had a status of 3 or 95.
SELECT claims.ID, claims.vat_rate, claims.com_rate,
claims.offer_val, claims.claim_value, claims.claim_ppi_amount, claims.claim_amount, claims.approx_loan_val, claims.salutationsa, claims.first_namesa, claims.last_namesa,
clients.salutation, clients.first_name,clients.last_name, clients.phone, clients.phone2, clients.mobile, clients.dob,clients.postcode, clients.address1, clients.address2, clients.town, client_claim_status.person,clients.ID
AS client_id,claims.ID AS claim_id, claims.date_added AS status_date_added,client_claim_status.date_added AS last_client_claim_status_date_added,work_suppliers.name AS refname, financial_institution.name AS lendname, clients.date_added AS client_date_added,ppi_claim_type_2.claim_type AS ppi_claim_type_name
FROM claims
RIGHT JOIN clients ON claims.client_id = clients.ID
RIGHT JOIN client_claim_status
ON claims.ID = client_claim_status.claim_id
AND client_claim_status.deleted != 'yes'
AND ((client_claim_status.status_id IN (1, 170))
AND client_claim_status.date_added < '2012-12-02 00:00:00' )
LEFT OUTER JOIN (SELECT claim_id FROM client_claim_status WHERE status_id IN (3, 95 )) Sub1
ON claims.ID = Sub1.claim_id
LEFT JOIN financial_institution ON claims.claim_against = financial_institution.ID
LEFT JOIN work_suppliers ON clients.work_supplier_id = work_suppliers.ID
LEFT JOIN ppi_claim_type_2 ON claims.ppi_claim_type_id = ppi_claim_type_2.ID
WHERE claims.deleted != 'yes'
AND Sub1.claim_id IS NULL
ORDER BY last_client_claim_status_date_added DESC
I would suggest that you rearrange the code to remove the RIGHT OUTER JOINs though to be honest. Mixing left and right joins up tend to be very confusing.

Select taking too long. Need advice for a better performance

Ok, here we go. There's this messy SELECT crossing other tables and ordering to get the one desired row. Basically I do the "math" inside the ORDER BY.
1 base table.
7 JOINS poiting to local tables.
WHERE with 2 clauses and a NOT IN crossing another table.
You'll see in the code the ORDER BY is pretty damn big/ugly, it sums the result of 5 different calculations. I need that result to order by those calculations in order to get the worst row-case.
The problem is once I execute the Stored Procedure it takes up to 8 seconds to run. That's kind of non-acceptable. So, I'm starting to check Indexes.
So, I'm looking for advices on how to make this query run faster.
I'm indexing the WHERE clauses and the field LINEA, Should I index something else? Like the rows Im crossing for the JOINs? or should I approach the query differently?
Query:
SET #LINEA = (
SELECT TOP 1
BOA.LIN
FROM
BAND_BA BOA
LEFT JOIN
TEL PAR
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(PAR.Te,2,10)
LEFT JOIN
TELP CLP
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(CLP.Numtel,2,10)
LEFT JOIN
CA C
ON REPLACE(BOA.Lin,'-','') = C.An
LEFT JOIN
RE R
ON REPLACE(BOA.Lin,'-','') = R.Lin
LEFT JOIN
PRODUCTOS2 P2
ON BOA.PRODUCTO = P2.codigo
LEFT JOIN
EN
ON REPLACE(BOA.Lin,'-','') = EN.G
LEFT JOIN
TIP ID
ON TIPID = ID.ID
WHERE
BOA.EST = 'C' AND
ID.SE = 'boA' AND
BOA.LIN NOT IN (
SELECT
LIN
FROM
BAN
)
ORDER BY (EN.VALUE + ANT.VALUE + REIT.VAL + C.VALUE + TEL.VALUE
) DESC,
I'll be frank, this is some pretty terrible SQL. Without seeing all your table structures, advice here will be incomplete. That being said, please don't post all your table structures because you are already very close to "hire a consultant" territory with this.
All the REPLACE logic should be done away with. If you need to JOIN on these fields, then add comparable fields to the tables so you don't need to manipulate the data. Every single JOIN that uses a REPLACE or SUBSTRING is a table or index scan - those are non-SARGable and a definite anti-pattern.
The ORDER BY is probably the most convoluted ORDER BY I have ever seen. Some major issues there:
Subqueries should all be eliminated and materialized either in the outer query or as variables
String manipulation should be eliminated (see item 1 above)
The entire query is basically a code smell. If you need to write code like this to meet business requirements then you either have a terribly inappropriate design or some other much larger issue in the organization or data.
One thing that can kill performance is using a lot of LEFT JOINs. To improve performance of LEFT JOIN, you might want to make sure that the column(s) to which you join have an index - that can have a huge impact on performance.