Delete from a table with JOIN, GROUP BY and HAVING - mysql

I have a table carts and cartitems, the second one has a foreign key to the first. Now I want to delete all rows from carts that are older than 3 months and have no related cartitems. The following query gives me the correct result:
SELECT *
FROM `carts` c
LEFT OUTER JOIN `cartitems` i ON ( `c`.`id` = `i`.`cart_id` )
WHERE `c`.`last_updated` < DATE_SUB(NOW() , INTERVAL 3 MONTH)
GROUP BY `c`.`id`
HAVING COUNT( `i`.`id` ) = 0;
But I can't figure out how to turn this query into a DELETE.
Also, since there are ~10 million rows in the carts table, I'd be thankful for suggestions on how to improve this query :-)

You can run the following without LIMIT or with a LIMIT to delete the rows in batches.
DELETE c
FROM carts c
WHERE c.last_updated < DATE_SUB(NOW() , INTERVAL 3 MONTH)
AND NOT EXISTS
( SELECT *
FROM cartitems i
WHERE c.id = i.cart_id
)
LIMIT 10000 ; --- optional

You can try with this
DELETE FROM
(
SELECT *
FROM `carts` c
LEFT OUTER JOIN `cartitems` i ON ( `c`.`id` = `i`.`cart_id` )
WHERE `c`.`last_updated` < DATE_SUB(NOW() , INTERVAL 3 MONTH)
GROUP BY `c`.`id`
HAVING COUNT( `i`.`id` ) = 0
);
I am not sure about it though :)

That exists query should work:
DELETE FROM `carts` WHERE EXISTS (SELECT *
FROM `carts` c
LEFT OUTER JOIN `cartitems` i ON ( `c`.`id` = `i`.`cart_id` )
WHERE `c`.`last_updated` < DATE_SUB(NOW() , INTERVAL 3 MONTH)
GROUP BY `c`.`id`
HAVING COUNT( `i`.`id` ) = 0);
This query should delete all rows that are in the result set.
Please note that you can exchange the asterisk with null or any other value.
A second try:
DELETE FROM `carts` WHERE `carts`.`id` IN (SELECT `c`.`id`
FROM `carts` c
LEFT OUTER JOIN `cartitems` i ON ( `c`.`id` = `i`.`cart_id` )
WHERE `c`.`last_updated` < DATE_SUB(NOW() , INTERVAL 3 MONTH)
GROUP BY `c`.`id`
HAVING COUNT( `i`.`id` ) = 0);
That one generates a list of ids (I hope that c.id is your primary key) and delete all rows related to that ids.

Related

Trying to select the highest date between date ranges in a subquery

I am trying to update a table using data gotten from 3 tables.
table3 contains rates for different days. The rates aren't measured daily so I simply want to use the most recent date with respect to the table2 record which is what prompts the trigger.
So far this is what I have but I can't seem to get it to work.
I'm not sure I'm explaining myself properly to be honest.
CREATE TRIGGER `trigger1`
AFTER UPDATE ON `table2` FOR EACH ROW
UPDATE `table4`
inner join (SELECT o.`Name`,
o.Date,
(o.`value` * (m.`rate`)) total
FROM `table1` o
LEFT JOIN `table2` r
ON o.`Name` = r.`Name`
AND o.Date = r.Date
LEFT JOIN (SELECT * from `table3`
INNER JOIN (SELECT `Name`, Date
WHERE Date < r.Date
ORDER BY Date DESC
LIMIT 1) as y
ON table3.Date = y.Date AND table3.Name = y.Name) m
ON o.`Name` = m.`Name`
GROUP BY o.`Name`, o.Date)x
set `Contribution` = x.total
where (`table4`.Date) = x.Date and `table4`.`Well Name` = x.`Well Name`;
fiddle link : https://www.db-fiddle.com/f/pVqmhmM21XJARKhvFMP52B/4

mysql count table rows with count child tables rows

I have three tables. a>b>c tables. I want to totals separately each other and optimize query. My query is not giving the true counts.
TABLE: a
a
-----------
id
no
create_time
TABLE: b
b
-----------
id
no
TABLE: c
c
-----------
id
b_id
this query is not giving the true counts
SELECT
DATE( a.create_time ) AS date,
COUNT( a.id ) AS total_a,
COUNT( b.id ) AS total_b,
COUNT( c.id ) AS total_c
FROM
`a`
LEFT JOIN `b` ON `a`.`no` = `b`.`no`
LEFT JOIN `c` ON `b`.`id` = `c`.`b_id`
WHERE
( `a`.`status` = 1 )
GROUP BY
DATE( a.create_time )
ORDER BY
`date` DESC
LIMIT 20
A left join repeats each row on the left hand side for each matching row on the right hand side. So you get more rows than in the original.
An easy fix is to count just the unique identifiers:
SELECT
DATE( a.create_time ) AS date,
COUNT( DISTINCT a.id ) AS total_a,
COUNT( DISTINCT b.id ) AS total_b,
COUNT( DISTINCT c.id ) AS total_c

How do I optimize this query? It's taking 3 minutes to run

I have a database with 3 tables.
A calendar table that has a row for each date between 2000-01-01 and 2040-01-01 totaling 14610 rows
A locations table that has an id and name for each location totaling 12 rows
A receipts table that has an id and datetime, and several other fields that aren't relevant totaling ~250,000 rows
I'm trying to get a count of receipts for each day between a date range grouped by location with zero counts if no receipts exist.
I've got a working query but it takes ~3 minutes to run:
SELECT
`locations`.`name` AS `location`,
`calendar`.`date` AS `date`,
COUNT(`receipts`.`id`) AS `count`
FROM `locations`
CROSS JOIN `calendar`
LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`)
AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id
WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07'
GROUP BY `locations`.`id`, `calendar`.`id`
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC;
I believe it has something to do with the WHERE statement.
I changed the WHERE to this instead which runs instantly but it no longer gives me zero counts for no receipts:
SELECT
`locations`.`name` AS `location`,
`calendar`.`date` AS `date`,
COUNT(`receipts`.`id`) AS `count`
FROM `locations`
CROSS JOIN `calendar`
LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`)
AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id
WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07'
GROUP BY `locations`.`id`, `calendar`.`id`
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC;
I then started messing around with subqueries but no success:
SELECT
`locations`.`name` AS `location`,
`cal`.`date` AS `date`,
COUNT(`receipts`.`id`) AS `count`
FROM `locations`
CROSS JOIN (
SELECT `calendar`.`id`, `calendar`.`date`
FROM `calendar`
WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07'
) `cal`
LEFT JOIN `receipts` ON `cal`.`date` = DATE(`receipts`.`datetime`)
AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id
WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07'
GROUP BY `locations`.`id`, `cal`.`id`
ORDER BY `locations`.`name` ASC, `cal`.`date` ASC;
Anyway I can speed up the first query since that's the one that gives me the output I want?
Try this:
SELECT l.name location, c.date, COUNT(r.id) count
FROM calendar c
left join calendar n on n.Date = c.Date + 1 -- one day after c.date
left join (locations l join receipts r
on r.id like '%' + l.Id)
on r.datetime between c.Date and n.Date
where c.Date between '2017-04-01' and '2017-04-07'
GROUP BY l.id, c.id
ORDER BY l.name, c.date;
Your problem is caused by:
1.You were using a cross join which is unnecessary. Cross joins create Cartesian products (every row in one side is combined with every row on the other side.) So cross joining the alphabet with the 10 digits will result in 260 rows, {A0, A1, A2...A9, B1, B2, ....B9 ...etc.}
2. the fact that there are multiple (although even one is enough) constructs in your SQL query that cause the query processor to have to read every row of the table from the disk, effectively preventing it from using any indices that might be on the table. Use of a function on a value of a column in a table for a filter (where clause) or for ordering (Order by clause) does this because the query processor cannot know what the functions value is without executing the function, and it must read the row from the main table on disk to get the underlying value to execute the function. If it was just the raw column value, and that column was in an index, the processor does not need to read the main data table, it can just traverse the index, which will often be a considerably smaller size and require a much smaller number of Disk IOs.
This is referred to as SARGable.
if the c.Date + 1 is not possible in MySQL, then try this:
SELECT l.name location, c.date, COUNT(r.id) count
FROM calendar c
left join calendar n on n.Date =
(Select min(date) from Calendar -- subquery gets the next day in calendar
Where date > c.Date)
left join (locations l join receipts r
on r.id like '%' + l.Id)
on r.datetime between c.Date and n.Date
where c.Date between '2017-04-01' and '2017-04-07'
GROUP BY l.id, c.id
ORDER BY l.name, c.date;
Sorry I wasted everyone's time but I managed to solve this on my own.
Here is the query I figured out which runs instantly:
SELECT
`l`.`name` AS `location`,
`c`.`date` AS `date`,
COUNT(`r`.`id`) AS `count`
FROM `locations` AS `l`
CROSS JOIN (
SELECT `calendar`.`id`, `calendar`.`date`
FROM `calendar`
WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07'
) `c`
LEFT JOIN (
SELECT `receipts`.`id`, `receipts`.`datetime`
FROM `receipts`
WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07'
) `r` ON `c`.`date` = DATE(`r`.`datetime`) AND `l`.`id` = UPPER(LEFT(`r`.`id`, 1))
GROUP BY `l`.`id`, `c`.`id`
ORDER BY `l`.`name` ASC, `c`.`date` ASC;
SELECT
`locations`.`name` AS `location`,
`calendar`.`date` AS `date`,
COUNT(`receipts`.`id`) AS `count`
FROM `locations`
CROSS JOIN `calendar`
LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`)
AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id
WHERE `calendar`.`date` BETWEEN '2017-04-01' AND '2017-04-07'
GROUP BY `locations`.`id`, `calendar`.`id`
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC;
Try above query.
Here I had used BETWEEN instead of < and >.
Also you can create index on calendar.date this field.
You can add FOREIGN KEY constraint on child table and make join on that column. In that scenario also INDEX would be helpful .

How to correct and rewrite this query?

SELECT *
FROM users
WHERE id
IN ( 2024 )
AND id NOT IN (
SELECT user_id
FROM `used`
WHERE DATE_SUB( DATE_ADD( CURDATE( ) , INTERVAL 7 DAY ) , INTERVAL 14 DAY ) <= created)
AND id NOT IN (
SELECT user_id
FROM coupon_used
WHERE code = 'XXXXX')
AND id IN (
SELECT user_id
FROM accounts)
I have id 2024 in users table, but this id 2024 is there in used tables. So when I run this query, it shows me 2024 id also, which should be filtered out. I run the query where I selected specific users, and then I want these user to be filter out that they should not be in used table. But above query is not giving me the desire result. Desire Result is that I want to Select Users by following conditions: Take Specific Users, and check that they are not in used table and not in coupon_used table but they should be in accounts table.
I would use left joins for the exclusion conditions and a regular join for the inclusions:
SELECT users.*
FROM users
INNER JOIN accounts ON accounts.user_id = users.id
LEFT JOIN used ON used.user_id = users.id AND DATE_SUB(CURDATE(), INTERVAL 7 DAY) <= used.created)
LEFT JOIN coupon_used ON coupon_used.user_id = users.id AND coupon_used.code = 'XXXX'
WHERE id IN (2024) AND used.user_id IS NULL AND coupon_used.user_id IS NULL
I've edited the date manipulation as well; +7 -14 would be -7 :)
I would recommend using a JOIN on accounts and LEFT OUTER JOINs on the other two tables. A JOIN on accounts means it must be in the accounts table. LEFT OUTER JOINS on the coupon_used and used means it will return a record no matter if they're in that table or not. Filtering down to c.user_id IS NULL means that there is NOT a record in that table.
SELECT users.*
FROM users
JOIN accounts ON users.id = accounts.user_id
LEFT OUTER JOIN coupon_used c ON users.id = c.user_id AND c.code = 'XXXXX'
LEFT OUTER JOIN `used` u ON users.id = u.user_id AND DATE_SUB( DATE_ADD( CURDATE( ) , INTERVAL 7 DAY ) , INTERVAL 14 DAY ) <= u.created
WHERE id IN ( 2024 )
AND c.user_id IS NULL
AND u.user_id IS NULL
Firstly, try something like this using joins. Which should be easier to read and (depending on the version of MySQL) faster
SELECT DISTINCT users.*
FROM users
INNER JOIN accounts ON users.id = accounts.user_id
LEFT OUTER JOIN coupon_used ON users.id = coupon_used.user_id AND coupon_used.code = 'XXXXX'
LEFT OUTER JOIN `used` ON users.id = `used`.user_id AND DATE_SUB( DATE_ADD( CURDATE( ) , INTERVAL 7 DAY ) , INTERVAL 14 DAY ) <= `used`.created
WHERE id IN ( 2024 )
AND coupon_used.user_id IS NULL
AND `used`.user_id IS NULL
EDIT - Simplifying the date check:-
SELECT DISTINCT users.*
FROM users
INNER JOIN accounts ON users.id = accounts.user_id
LEFT OUTER JOIN coupon_used ON users.id = coupon_used.user_id AND coupon_used.code = 'XXXXX'
LEFT OUTER JOIN `used` ON users.id = `used`.user_id AND DATE_SUB( CURDATE( ) , INTERVAL 7 DAY ) <= `used`.created
WHERE id IN ( 2024 )
AND coupon_used.user_id IS NULL
AND `used`.user_id IS NULL

Search on specific users in the table

I have following query:
SELECT *
FROM users
WHERE id NOT
IN (
SELECT user_id
FROM `bids`
WHERE DATE_SUB( DATE_ADD( CURDATE( ) , INTERVAL 7
DAY ) , INTERVAL 14
DAY ) <= created
)
AND id NOT
IN (
SELECT user_id
FROM coupon_used WHERE code = 'ACTNOW'
)
AND id
IN (
SELECT user_id
FROM accounts
)
I just want to take specific users and search on them, instead of searching on all users in the table. Like I have the list of users with id 1,2,3,4,5 I only want to search on these users
Just add a WHERE clause using IN()
SELECT *
FROM users
WHERE id IN(1,2,3,4,5)
I believe using left outer joins will simplify your query and hopefully improve performance
SELECT users.*
FROM users
LEFT OUTER JOIN bids on bids.user_id = users.id AND DATE_SUB(DATE_ADD(CURDATE(), INTERVAL 7 DAY), INTERVAL 14 DAY) <= bids.created
LEFT OUTER JOIN coupon_used on coupon_used.user_id = users.id AND coupon_used.code = 'ACTNOW'
INNER JOIN accounts on accounts.user_id = users.id
WHERE bids.id is null AND coupon_used.id is null
AND users.id in (1,2,3,4,5)