Need some help optimising an SQL query

Need some help optimising an SQL query - mysql

my client was given the following code and he uses it daily to count the messages sent to businesses on his website. I have looked at the MYSQL.SLOW.LOG and it has the following stats for this query, which indicates to me it needs optimising.
Count: 183 Time=44.12s (8073s) Lock=0.00s (0s)
Rows_sent=17337923391683297280.0 (-1), Rows_examined=382885.7
(70068089), Rows_affected=0.0 (0), thewedd1[thewedd1]#localhost
The query is:
SELECT
businesses.name AS BusinessName,
messages.created AS DateSent,
messages.guest_sender AS EnquirersEmail,
strip_tags(messages.message) AS Message,
users.name AS BusinessName
FROM
messages
JOIN users ON messages.from_to = users.id
JOIN businesses ON users.business_id = businesses.id
My SQL is not very good but would a LEFT JOIN rather than a JOIN help to reduce the number or rows returned? Ive have run an EXPLAIN query and it seems to make no difference between the LEFT JOIN and the JOIN..
Basically I think it would be good to reduce the number of rows returned, as it is absurdly big..

Short answer: There is nothing "wrong" with your query, other than the duplicate BusinessName alias.
Long answer: You can add indexes to the foreign / primary keys to speed up searching which will do more than changing the query.
If you're using SSMS (SQL management studio) you can right click on indexes for a table and use the wizard.
Just don't be tempted to index all the columns as that may slow down any inserts you do in future, stick to the ids and _ids unless you know what you're doing.

he uses it daily to count the messages sent to businesses
If this is done per day, why not limit this to messages sent in specific recent days?
As an example: To count messages sent per business per day, for just a few recent days (example: 3 or 4 days), try this:
SELECT businesses.name AS BusinessName
, messages.created AS DateSent
, COUNT(*) AS n
FROM messages
JOIN users ON messages.from_to = users.id
JOIN businesses ON users.business_id = businesses.id
WHERE messages.created BETWEEN current_date - INTERVAL '3' DAY AND current_date
GROUP BY businesses.id
, DateSent
ORDER BY DateSent DESC
, n DESC
, businesses.id
;
Note: businesses.name is functionally dependent on businesses.id (in the GROUP BY terms), which is the primary key of businesses.
Example result:
+--------------+------------+---+
| BusinessName | DateSent | n |
+--------------+------------+---+
| business1 | 2021-09-05 | 3 |
| business2 | 2021-09-05 | 1 |
| business2 | 2021-09-04 | 1 |
| business2 | 2021-09-03 | 1 |
| business3 | 2021-09-02 | 5 |
| business1 | 2021-09-02 | 1 |
| business2 | 2021-09-02 | 1 |
+--------------+------------+---+
7 rows in set
This assumes your basic join logic is correct, which might not be true.
Other data could be returned as aggregated results, if necessary, and the fact that this is now limited to just recent data, the amount of rows examined should be much more reasonable.

Related

MySQL COUNT(DISTINCT) giving wrong values with GROUP BY

I have a table that contains custom user analytics data. I was able to pull the number of unique users with a query:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users'
FROM `events`
WHERE client_id = 123
And this will return 16728
This table also has a column of type DATETIME that I would like to group the counts by. However, if I add a GROUP BY to the end of it, everything groups properly it seems except the totals don't match. My new query is this:
SELECT COUNT(DISTINCT(user_id)) AS 'unique_users', DATE(server_stamp) AS 'date'
FROM `events`
WHERE client_id = 123
GROUP BY DATE(server_stamp)
Now I get the following values:
|-----------------------------|
| unique_users | date |
|---------------|-------------|
| 2650 | 2019-08-26 |
| 3486 | 2019-08-27 |
| 3475 | 2019-08-28 |
| 3631 | 2019-08-29 |
| 3492 | 2019-08-30 |
|-----------------------------|
Totaling to 16734. I tried using a sub query to get the distinct users then count and group in the main query but no luck there. Any help in this would be greatly appreciated. Let me know if there is further information to help diagnosis.

A user, who is connected with events on multiple days (e.g. session starts before midnight and ends afterwards), will occur the number of these days times in the new query. This is due to the fact, that the first query performs the DISTINCT over all rows at once while the second just removes duplicates inside each groups. Identical values in different groups will stay untouched.
So if you have a combination of DISTINCT in the select clause and a GROUP BY, the GROUP BY will be executed before the DISTINCT. Thus without any restrictions you cannot assume, that the COUNT(DISTINCT user_id) of the first query and the sum over the COUNT(DISTINCT user_id) of all groups is the same.

Xandor is absolutely correct. If a user logged on 2 different days, There is no way your 2nd query can remove them. If you need data grouped by date, You can try below query -
SELECT COUNT(user_id) AS 'unique_users', DATE(MIN_DATE) AS 'date'
FROM (SELECT user_id, MIN(DATE(server_stamp)) MIN_DATE -- Might be MAX
FROM `events`'
WHERE client_id = 123
GROUP BY user_id) X
GROUP BY DATE(server_stamp);

Rows column in Query Plan confusing

I have a MySql query
SELECT TE.company_id,
SUM(TE.debit- TE.credit) As summation
FROM Transactions T JOIN Transaction_E TE2
ON (T.parent_id = TE2.transaction_id)
JOIN Transaction_E TE
ON (TE.transaction_id = T.id AND TE.company_id IS NOT NULL)
JOIN Accounts A
ON (TE2.account_id=A.id AND A.deactivated_timestamp=0)
WHERE (TE.company_id IN (1,2))
AND A.user_id=2341 GROUP BY TE.company_id;
When I explain the query, the plan for it is like (in summary):
| Select type | table | type | rows |
-------------------------------------
| SIMPLE | A | ref | 2 |
| SIMPLE | TE2 | ref | 17 |
| SIMPLE | T | ref | 1 |
| SIMPLE | TE | ref | 1 |
But if I do a count(*) on the same query (instead of SUM(..) ), then it shows that there are ~40k rows for a particular company_id. What I don't understand is why the query plan shows so few rows being scanned while there is at least 40k rows being processed. What does the rows column in the query plan represent? Does it not represent the number of rows that get processed in that table? In that case it should be at most 2*17*1*1 = 34 rows?

The query plan just shows a high level judgement on the expected number of rows required per table to meet the end result.
It is to be used as a tool for judging as to how the optimizer is 'seeing' your query, and to help it a bit, in case query performance is worse or can be improved.
There is always a possibility that the query plan is built based on an earlier snapshot of statistics, and hence should not be taken on face value, especially while dealing with cardinality.

Well, first let's get rid of the computational bug:
SELECT TE.company_id, TE.summation
FROM
( SELECT company_id,
SUM(debit - credit) As summation
FROM Transaction_E
WHERE company_id IN (1,2)
) TE
JOIN Transactions T ON TE.transaction_id = T.id
JOIN Transaction_E TE2 ON T.parent_id = TE2.transaction_id
JOIN Accounts A ON TE2.account_id = A.id
AND A.deactivated_timestamp = 0
WHERE A.user_id = 2341;
Your query is probably summing up the same company multiple times before doing the GROUP BY. My variant avoids that inflation of the aggregate.
I got rid of TE.company_id IS NOT NULL because it was redundant.
See what the EXPLAIN says about this, then let's discuss your question about EXPLAIN further.

MySQL subquery from same table

I have a database with table xxx_facileforms_forms, xxx_facileforms_records and xxx_facileforms_subrecords.
Column headers for xxx_facileforms_subrecords:
id | record | element | title | neame | type | value
As far as filtering records with element = '101' ..query returns proper records, but when i add subquery to filete aditional element = '4871' from same table - 0 records returned.
SELECT
F.id AS form_id,
R.id AS record_id,
PV.value AS prim_val,
COUNT(PV.value) AS count
FROM
xxx_facileforms_forms AS F
INNER JOIN xxx_facileforms_records AS R ON F.id = R.form
INNER JOIN xxx_facileforms_subrecords AS PV ON R.id = PV.record AND PV.element = '101'
WHERE R.id IN (SELECT record FROM xxx_facileforms_records WHERE record = R.id AND element = '4871')
GROUP BY PV.value
Does this looks right?
Thank You!
EDIT
Thank you for support and ideas! Yes, I left lot of un guessing. Sorry. Some input/output table data might help make it more clear.
_facileforms_form:
id | formname
---+---------
1 | myform
_facileforms_records:
id | form | submitted
----+------+--------------------
163 | 1 | 2014-06-12 14:18:00
164 | 1 | 2014-06-12 14:19:00
165 | 1 | 2014-06-12 14:20:00
_facileforms_subrecords:
id | record | element | title | name|type | value
-----+--------+---------+--------+-------------+--------
5821 | 163 | 101 | ticket | radio group | flight
5822 | 163 | 4871 | status | select list | canceled
5823 | 164 | 101 | ticket | radio group | flight
5824 | 165 | 101 | ticket | radio group | flight
5825 | 165 | 4871 | status | select list | canceled
Successful query result:
form_id | record_id | prim_val | count
1 | 163 | flight | 2
So i have to return value data (& sum those records) from those records where _subrecord element - 4871 is present (in this case 163 and 165).
And again Thank You!

Thank You for support and ideas! Yes i left lot of un guessing.. sorry . So may be some input/output table data might help.
_facileforms_form:
headers -> id | formname
1 | myform
_facileforms_records:
headers -> id | form | submitted
163 | 1 | 2014-06-12 14:18:00
164 | 1 | 2014-06-12 14:19:00
165 | 1 | 2014-06-12 14:20:00
_facileforms_subrecords
headers -> id | record | element | title | name | type | value
5821 | 163 | 101 | ticket | radio group| flight
5822 | 163 | 4871 | status | select list | canceled
5823 | 164 | 101 | ticket | radio group | flight
5824 | 165 | 101 | ticket | radio group | flight
5825 | 165 | 4871 | status | select list | canceled
Succesful Query result:
headers -> form_id | record_id | prim_val | count
1 | 163 | flight | 2
So i have to return value data (& sum those records) from those records where _subrecord element - 4871 is present (in this case 163 and 165).
And again Thank You!

No, it doesn't look quite right. There's a predicate "R.id IN (subquery)" but that subquery itself has a reference to R.id; it's a correlated subquery. Looks like something is doubled up there. (We're assuming here that id is a UNIQUE or PRIMARY key in each table.)
The subquery references an identifier element... the only other reference we see to that identifier is from the _subrecords table (we don't see any reference to that column in _records table... if there's no element column in _records, then that's a reference to the element column in PV, and that predicate in the subquery will never be true at the same time the PV.element='101' predicate is true.
Kudos for qualifying the column references with a table alias, that makes the query (and the EXPLAIN output) much easier to read; the reader doesn't need to go digging around in the table definitions to figure out which table does and doesn't contain which columns. But please take that pattern to the next step, and qualify all column references in the query, including column references in the subqueries.
Since the reference to element isn't qualified, we're left to guess whether the _records table contains a column named element.
If the goal is to return only the rows from R with element='4871', we could just do...
WHERE R.element='4871'
But, given that you've gone to the bother of using a subquery, I suspect that's not really what you want.
It's possible you're trying to return all rows from R for a _form, but only for the _form where there's at least one associated _record with element='4871'. We could get that result returned with either an IN (subquery) or an EXISTS (correlated_ subquery) predicate, or an anti-join pattern. I'd give examples of those query patterns; I could take some guesses at the specification, but I would only be guessing at what you actually want to return.
But I'm guessing that's not really what you want. I suspect that _records doesn't actually contain a column named element.
The query is already restricting the rows returned from PV with those that have element='101'.)
This is a case where some example data and the example output would help explain the actual specification; and that would be a basis for developing the required SQL.
FOLLOWUP
I'm just guessing... maybe what you want is something pretty simple. Maybe you want to return rows that have element value of either '101' or '4913'.
The IN comparison operator is a convenient of way of expressing the OR condition, that a column be equal to a value in a list:
SELECT F.id AS form_id
, R.id AS record_id
, PV.value AS prim_val
, COUNT(PV.value) AS count
FROM xxx_facileforms_forms F
JOIN xxx_facileforms_records R
ON R.form = F.id
JOIN xxx_facileforms_subrecords PV
ON PV.record = R.id
AND PV.element IN ('101','4193')
GROUP BY PV.value
NOTE: This query (like the OP query) is using a non-standard MySQL extension to GROUP BY, which allows non-aggregate expressions (e.g. bare columns) to be returned in the SELECT list.
The values returned for the non-aggregate expressions (in this case, F.id and R.id) will be a values from a row included in the "group". But because there can be multiple rows, and different values on those rows, it's not deterministic which of values will be returned. (Other databases would reject this statement, unless we wrapped those columns in an aggregate function, such as MIN() or MAX().)
FOLLOWUP
I noticed that you added information about the question into an answer... this information would better be added to the question as an EDIT, since it's not an answer to the question. I took the liberty of copying that, and reformatting.
The example makes it much more clear what you are trying to accomplish.
I think the easiest to understand is to use EXISTS predicate, to check whether a row meeting some criteria "exists" or not, and exclude rows where such a row does not exist. This will use a correlated subquery of the _subrecords table, to which check for the existence of a matching row:
SELECT f.id AS form_id
, r.id AS record_id
, pv.value AS prim_val
, COUNT(pv.value) AS count
FROM xxx_facileforms_forms f
JOIN xxx_facileforms_records r
ON r.form = f.id
JOIN xxx_facileforms_subrecords pv
ON pv.record = r.id
AND pv.element = '101'
-- only include rows where there's also a related 4193 subrecord
WHERE EXISTS ( SELECT 1
FROM xxx_facileforms_subrecords sx
WHERE sx.element = '4193'
AND sx.record = r.id
)
--
GROUP BY pv.value
(I'm thinking this is where OP was headed with the idea that a subquery was required.)
Given that there's a GROUP BY in the query, we could actually accomplish an equivalent result with a regular join operation, to a second reference to the _subrecords table.
A join operation is often more efficient than using an EXISTS predicate.
(Note that the existing GROUP BY clause will eliminate any "duplicates" that might otherwise be introduced by a JOIN operation, so this will return an equivalent result.)
SELECT f.id AS form_id
, r.id AS record_id
, pv.value AS prim_val
, COUNT(pv.value) AS count
FROM xxx_facileforms_forms f
JOIN xxx_facileforms_records r
ON r.form = f.id
JOIN xxx_facileforms_subrecords pv
ON pv.record = r.id
AND pv.element = '101'
-- only include rows where there's also a related 4193 subrecord
JOIN xxx_facileforms_subrecords sx
ON sx.record = r.id
AND sx.element = '4193'
--
GROUP BY pv.value

Display duplicate row info as extra column

I have three tables RESOURCES, SERVERS, ACTIVE_RESOURCES. The HOSTED_RESOURCES table servers as a referential table to list which servers the resources are active on. Currently I use the bellow query to retrieve a resource:
SELECT r.resource_id, r.serve_url, r.title, r.category_id, ar.server_id
FROM active_resources ar
LEFT JOIN resources AS r ON (ar.resource_id = r.id)
WHERE hr.resource_id = (
select id from resources
and id < 311
order by date_added desc
limit 1
);
Because in most cases the resources are available on all servers I end up with duplicate information in the query result, for example:
resource_id | serve_url | Title | category_id | server_id
-----------------------------------------------------------------------------
309 | /b/7514.pdf | Tuesdays with Morrie | 1 | 1
309 | /b/7514.pdf | Tuesdays with Morrie | 1 | 2
All of the data, except for server_id is a duplicate, so I was hoping to concatenate the result to one row displaying the server ids in additional columns, or even just list the server ids as comma separated in one column.
Thank you for looking at this.

SELECT r.resource_id, r.serve_url, r.title, r.category_id,
GROUP_CONCAT(ar.server_id) AS server_id
FROM active_resources ar
LEFT JOIN resources AS r ON (ar.resource_id = r.id)
WHERE hr.resource_id = (
select id from resources
and id < 311
order by date_added desc
limit 1
)
GROUP BY r.resource_id, r.serve_url, r.title, r.category_id
;

Trouble wrapping head around complex SQL delete query

Situation
My goal is to have a yearly cronjob that deletes certain data from a database based on age. To my disposal I have the powers of Bash and MySQL. I started with writing a bash script but then it struck me that maybe, I could do everything with just a single SQL query.
I'm more a programmer by nature and I haven't had much experience with data structures so that's why I would like some help.
Tables / data structure
The relevant tables and columns for this query are as follows:
Registration:
+-----+-------------------+
| Id | Registration_date |
+-----+-------------------+
| 2 | 2011-10-03 |
| 3 | 2011-10-06 |
| 4 | 2011-10-07 |
| 5 | 2011-10-07 |
| 6 | 2011-10-10 |
| 7 | 2011-10-13 |
| 8 | 2011-10-14 |
| 9 | 2011-10-14 |
| 10 | 2011-10-17 |
+-------------------------+
AssociatedClient:
+-----------+-----------------+
| Client_id | Registration_id |
+-----------+-----------------+
| 2 | 2 |
| 3 | 2 |
| 3 | 4 |
| 4 | 5 |
| 3 | 6 |
| 5 | 6 |
| 3 | 8 |
| 8 | 9 |
| 7 | 10 |
+-----------------------------+
Client: only Id is relevant here.
As you can see, this is a simple many-to-many relationship. A client can have multiple registrations to his name, and a registration can have multiple clients.
The goal
I need to delete all registrations and client data for clients who have not had a new registration in 5 years. Sounds simple, right?
The tricky part
The data should be kept if any other client on any registration from a specific client has a new registration within 5 years.
So imagine client A having 4 registrations with just him in them, and 1 registration with himself and client B. All 5 registrations are older than 5 years. If client B did not have a new registration in 5 years, everything should be deleted: client A registrations and record. If B did have a new registration within 5 years, all client A data should be kept, including his own old registrations.
What I've tried
Building my query, I got about this far:
DELETE * FROM `Registration` AS Reg
WHERE TIMESTAMPDIFF(YEAR, Reg.`Registration_date`, NOW()) >= 5
AND
(COUNT(`Id`) FROM `Registration` AS Reg2
WHERE Reg2.`Id` IN (SELECT `Registration_id` FROM `AssociatedClient` AS Clients
WHERE Clients.`Client_id` IN (SELECT `Client_id` FROM `AssociatedClient` AS Clients2
WHERE Clients2.`Registration_id` IN -- stuck
#I need all the registrations from the clients associated with the first
# (outer) registration here, that are newer than 5 years.
) = 0 -- No newer registrations from any associated clients
Please understand that I have very limited experience with SQL. I realise that even what I got so far can be heavily optimised (with joins etc) and may not even be correct.
The reason I got stuck is that the solution I had in mind would work if I could use some kind of loop, and I only just realised that this is not something you easily do in an SQL query of this kind.
Any help
Is much appreciated.

Begin by identifying the registrations of the other clients of a registration. Here's a view:
create view groups as
select a.Client_id
, c.Registration_id
from AssociatedClient as a
join AssociatedClient as b on a.Registration_id = b.Registration_id
join AssociatedClient as c on b.Client_id = c.Client_id;
That gives us:
select Client_id
, min(Registration_id) as first
, max(Registration_id) as last
, count(distinct Registration_id) as regs
, count(*) as pals
from groups
group by Client_id;
Client_id first last regs pals
---------- ---------- ---------- ---------- ----------
2 2 8 4 5
3 2 8 4 18
4 5 5 1 1
5 2 8 4 5
7 10 10 1 1
8 9 9 1 1
You dont' need a view, of course; it's just for convenience. You could just use a virtual table. But inspect it carefully to convince yourself it produces the right range of "pal registrations" for each client. Note that the view does not reference Registration. That's significant because it produces the same results even after we use it to delete from Registration, so we can use it for the second delete statement.
Now we have a list of clients and their "pal registrations". What's the date of each pal's last registration?
select g.Client_id, max(Registration_date) as last_reg
from groups as g join Registration as r
on g.Registration_id = r.Id
group by g.Client_id;
g.Client_id last_reg
----------- ----------
2 2011-10-14
3 2011-10-14
4 2011-10-07
5 2011-10-14
7 2011-10-17
8 2011-10-14
Which ones have a latest date before a time certain?
select g.Client_id, max(Registration_date) as last_reg
from groups as g join Registration as r
on g.Registration_id = r.Id
group by g.Client_id
having max(Registration_date) < '2011-10-08';
g.Client_id last_reg
----------- ----------
4 2011-10-07
IIUC that would mean that client #4 should be deleted, and anything he registered for should be deleted. Registrations would be
select * from Registration
where Id in (
select Registration_id from groups as g
where Client_id in (
select g.Client_id
from groups as g join Registration as r
on g.Registration_id = r.Id
group by g.Client_id
having max(Registration_date) < '2011-10-08'
)
);
Id Registration_date
---------- -----------------
5 2011-10-07
And, sure enough, client #4 is in Registration #5, and is the only client subject to deletion by this test.
From there you can work out the delete statements. I think the rule is "delete the client and anything he registered for". If so, I'd probably write the Registration IDs to a temporary table, and write the deletes for both Registration and AssociatedClient by joining to it.

You want to know all registrations that need to be kept.
So your first query returns registrations within 5 previous years :
SELECT
Id
FROM
Registration
WHERE
Registration_date >= '2011-10-08'
then all registrations with clients related to the previous query :
SELECT
a2.Registration_id as Id
FROM
AssociatedClient AS a1
INNER JOIN AssociatedClient AS a2
ON a1.Client_id = a2.Client_id
WHERE
a1.Registration_id IN
(
SELECT
Id
FROM
Registration
WHERE
Registration_date >= '2011-10-08'
)
Then you have all registrations that you must not delete by combining the previous queries in an UNION, and you want all clients that are not part of this query :
SELECT
Client_id
FROM
AssociatedClient
WHERE
Registration_id NOT IN
(
SELECT
Id
FROM
Registration
WHERE
Registration_date >= '2011-10-08'
UNION
SELECT
a2.Registration_id as Id
FROM
AssociatedClient AS a1
INNER JOIN AssociatedClient AS a2
ON a1.Client_id = a2.Client_id
WHERE
a1.Registration_id IN
(
SELECT
Id
FROM
Registration
WHERE
Registration_date >= '2011-10-08'
)
)
you can see the results in this SQL fiddle
Then you can delete the lines of clients without registration correspondig to the criterias using the following query :
DELETE FROM
AssociatedClient
WHERE
Client_id IN (<previous query>);
and all registrations not present in AssociatedClient :
DELETE FROM
Registration
WHERE
Id NOT IN (SELECT Registration_id FROM AssociatedClient)

Use temporary tables.
INSERT INTO LockedClient(client_id) --select clients that should not be deleted
SELECT DISTINCT ac.client_id
FROM AssociatedClient ac
JOIN Registration r ON r.Id = ac.ID
WHERE TIMESTAMPDIFF(YEAR, Reg.`Registration_date`, NOW()) >= 5;
DELETE * FROM Registration r -- now delete all except locked clients
JOIN AssociatedClient ac ON ac.registration_id = r.id
LEFT JOIN LockedClient lc ON lc.client_id = ac.client_id
WHERE TIMESTAMPDIFF(YEAR, Reg.`Registration_date`, NOW()) >= 5 AND lc.client_id IS NULL

This should give you the proper clients information 1 level down into the linked clients. I know that this may not give you all the needed information. But, as stated in the comments, a 1 level implementation should be sufficient for now. This may not be optimal.
SELECT
AC1.Client_id,
MAX(R.Registration_date) AS [LatestRegistration]
FROM
#AssociatedClient AC1
JOIN #AssociatedClient AC2
ON AC1.Registration_id = AC2.Registration_id
JOIN #AssociatedClient AC3
ON AC2.Client_id = AC3.Client_id
JOIN #Registration R
ON AC3.Registration_id = R.Id
GROUP BY
AC1.Client_id
You should look into a function using loops. That's the only thing I can think about right now.

I'm a SQL Server guy, but I think this syntax will work for MySQL. This query will pull the clients that should not be deleted.
SELECT A3.Client_id
FROM AssociatedClient A1
#Get clients with registrations in the last 5 years
JOIN Registration R1 ON A1.Registration_id = R1.Id
AND TIMESTAMPDIFFERENCE(YEAR, R1.Registration_Date, Now()) <= 5
#get the rest of the registrations for those clients
JOIN AssociatedClient A2 ON A1.Client_id = A2.Client_id
#get other clients tied to the rest of the registrations
JOIN AssociatedClient A3 ON A2.Registration_id = A3.Registration_id

You need two sql delete statement, because you are deleting from two tables.
Both delete statements need to distinguish between registrations which are being kept and those being deleted, so the delete from the registration table needs to happen second.
The controlling issue is the most recent registration associated with an id (a registration id or a client id). So you will be aggregating based on id and finding the maximum registration date.
When deleting client ids, you delete those where the aggregate registration id is older than five years. This deletion will disassociate registration ids which were previously linked, but that is ok, because this action will not give them a more recent associated registration date.
That said, once you have the client ids, you'll need a join on registration ids which finds associated registration ids. You'll need to join to client ids and then self join back to registration ids to get that part to work right. If you've deleted all client ids which were associated with a registration you'll need to delete those registrations also.
My sql is a bit rusty, and my mysql rustier, and this is untested code, but this should be reasonably close to what I think you need to do:
delete from associatedclient where client_id in (
select client_id from (
select ac.client_id, max(r.registration_date) as dt
from associatedclient ac
inner join registration r
on ac.registration_id = r.id
group by ac.client_id
) d where d.dt < cutoff
)
The next step would look something like this:
delete from registration where id in (
select id from (
select r1.id, max(r2.date) dt
from registration r1
inner join associated_client ac1
on r1.id = ac1.registration_id
inner join associated_client ac2
on ac1.client_id = ac2.client_id
inner join registration r2
on ac2.registration_id = r2.id
) d
where d.dt < cutoff
or d.dt is null
I hope you don't mind me reminding you, but you should want to run the select statements without the deletes, first, and inspect the result for plausibility, before you go ahead and delete stuff.
(And if you have any constraints or indices which prevent this from working you'll have to deal with those also.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Need some help optimising an SQL query - mysql

Related

MySQL COUNT(DISTINCT) giving wrong values with GROUP BY

Rows column in Query Plan confusing

MySQL subquery from same table

Display duplicate row info as extra column

Trouble wrapping head around complex SQL delete query

Categories

Resources