Use a CTE to UPDATE or DELETE in MySQL

Use a CTE to UPDATE or DELETE in MySQL - mysql

The new version of MySQL, 8.0, now supports Common Table Expressions.
According to the manual:
A WITH clause is permitted at the beginning of SELECT, UPDATE, and DELETE statements:
WITH ... SELECT ...
WITH ... UPDATE ...
WITH ... DELETE ...
So, I thought, given the following table:
ID lastName firstName
----------------------
1 Smith Pat
2 Smith Pat
3 Smith Bob
I can use the following query:
WITH ToDelete AS
(
SELECT ID,
ROW_NUMBER() OVER (PARTITION BY lastName, firstName ORDER BY ID) AS rn
FROM mytable
)
DELETE FROM ToDelete;
in order to delete duplicates from the table, just like I could do in SQL Server.
It turns out I was wrong. When I try to execute the DELETE stament from MySQL Workbench I get the error:
Error Code: 1146. Table 'todelete' doesn't exist
I also get an error message when I try to do an UPDATE using the CTE.
So, my question is, how could one use a WITH clause in the context of an UPDATE or DELETE statement in MySQL (as cited in the manual of version 8.0)?

This appears to be a published bug in MySQL 8.x. From this bug report:
In the 2015 version of the SQL standard, a CTE cannot be defined in UPDATE; MySQL allows it but makes the CTE read-only (we're updating the documentation now to mention this).
This said, one could use a view instead of the CTE; then the view may be updatable, but due to the presence of window functions it is materialized into a temporary table (it is not merged) so is not updatable (we're going to mention it in the doc as well).
All the above applies to DELETE too.
If you follow the above bug link, you will see a workaround suggested for using a CTE, but it involved joining the CTE to the original target table in a one-to-one mapping. Based on your example, which is a blanket delete, it is not clear what workaround you need, were to proceed using a CTE for your delete.

Since the CTE is not updatable, you need to refer to the original table to delete rows. I think you are looking for something like this:
WITH ToDelete AS
(
SELECT ID,
ROW_NUMBER() OVER (PARTITION BY lastName, firstName ORDER BY ID) AS rn
FROM mytable
)
DELETE FROM mytable USING mytable JOIN ToDelete ON mytable.ID = ToDelete.ID
WHERE ToDelete.rn > 1;

Related

How do I run an update query in MySQL on a CTE?

I'm trying to create a CTE, which I think is working fine. However, I want to then run an update query on this CTE but I keep getting the following error in MySQL WorkBench,
"Error Code: 1288. The target table pptest of the UPDATE is not updatable"
I've had a look around but no not understand any of the work arounds and on top of that I'm not too clued up on my MySQL either. My goal was basically to create a table/view with a bunch of records in there that were partitioned and would have an index number on each row, which would be represented by "row_num". This was to group duplicated data in a table together and then I was hoping to simply run an update query on this structure where the row_num was greater than 1. So simple logic but I cannot figure out any other way of achieving my goal. Can anyone help with this please?
The full query I ran before was,
WITH cte as (select *, row_number() over (partition by col_a order by col_a) row_num from db_name.table_name)
update cte set col_b='test' where row_num > 1

This syntax is not supported in MySQL. Actually this looks like SQL Server syntax... You can't just port queries from one database to another and expect them to work.
In MySQL, you could use the update ... join syntax. However, you need a primary key column (or set of columns), that uniquely identifies each row, to serve as a join condition. Assuming that column id is your primary key, that would be:
update db_name.table_name t
inner join (
select id
row_number() over (partition by col_a order by col_a) row_num
from db_name.table_name)
) t1 on t1.id = t.id
set t.col_b = 'test'
where t1.row_num > 1
Side notes:
As I understand this, your query is meant to flag duplicates on col_a. But the way you use row_number(), it is undefined which of the duplicate rows will be flagged, because your order by clause is not deterministic; you would better have an over() clause like: over (partition by col_a order by id)
window functions such as row_number() are available in MySQL 8.0 only

Update and Delete not work in mysql temporary table. A work around is to join the original table with temporary table CTE and update original table :
WITH cte as (select *, row_number()
over (partition by col_a order by col_a) row_num
from db_name.table_name)
update db_name.table_name T1
Join
cte on T1.col_a = cte.col_a set t1.col_b='test' where cte.row_num > 1
this work fine for delete instruction too.

Why adding extra one layer of SELECT resolve Mysql error code :1235

In one of my previous question, I have asked solution for resolving mysql 1235 error:
Error Code: 1235. This version of MySQL doesn't yet support 'LIMIT &
IN/ALL/ANY/SOME subquery'
Following will throw 1235 :
DELETE
FROM job_detail_history
where id not in (select id from job_detail_history order by start_time desc limit 2);
For that i got the solution which is given by #Zaynul Abadin Tuhin
as follows and it works for me too. He just added one single select layer over my subquery. and as per him it is suggested by some mysql experts.
Resolution For above problem:
DELETE
FROM job_detail_history
where id not in (select * from
(select id from job_detail_history order by start_time desc limit 2) as t1 );
I try to do analysis of DB table and i found that when i use
Problem :: this will not work with delete as explained above.
select id from job_detail_history order by start_time desc limit 2;
it return me something like this :
last null was for new row as workbench suggest:
And when i add one extra layer of select :
adding extra layer of subquery : And this will work with my delete.
(select id from (select id from job_detail_history order by start_time desc limit 2) as t1);
it returns something like this :
So, what i want to understand
How subquery with one extra layer of resolve 1235 error?
can anyone eleborate it in detail.

There is an important difference between a subquery and a derived table.
A derived table replaces a table (it's used where you can also use a "normal" tablename), specifically in from <tablename> or join <tablename>, and it requries an alias (or the "tablename", as it is used as any other table). You cannot write where not in (<tablename>); that is not a derived table, it is a subquery.
In general, this problem (and the solution to use another layer) happens for delete:
You cannot delete from a table and select from the same table in a subquery.
But a derived table using this table is not forbidden. MySQL simply can't handle (or doesn't want to handle) this kind of dependency as to how it works internally (and according to it's rules).
For LIMIT, there is a similar subquery-specific restriction,
MySQL does not support LIMIT in subqueries for certain subquery operators
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
As to the reason why it makes a difference for MySQL: a derived table stands on its own and cannot depend on the outer query. It can be used internally like a normal table. (E.g., MySQL could simply create this table in the first step of the execution plan and it's there for all further steps.) A subquery on the other hand can depend on the outer table (making it a dependent subquery).
Specifically,
where id not in (select id from job_detail_history);
is the same as
where not exists (select id from job_detail_history sub where sub.id = outer.id);
while you cannot do this for limit:
where id not in (select id from job_detail_history limit 2);
is not the same as
where not exists (select id from job_detail_history sub
where sub.id = outer.id limit 2);
MySQL can simply not handle this, as it is used to doing this transformation. It will probably allow it sooner or later though. To make it work for the delete, you will still need to use a subquery though.

Why does MySQL do a scan for update but lookup for select

I have these 2 queries. As you can see its doing a lookup in TabRedemption for orderItemID. The select takes a fraction of a second while the update takes ~30 seconds.
Why does MySQL resort to a full index scan in the update, and how can I stop this. It already has a foreign key constraint and and index.
select RedemptionID from TabRedemption where orderItemID in
(SELECT OrderItemID FROM TabOrderDetails WHERE OrderId = 4559775);
UPDATE TabRedemption SET active = 1 where orderItemID in
(SELECT OrderItemID FROM TabOrderDetails WHERE OrderId = 4559775);
Strangely if I resolve the subquery manually its fast.
UPDATE TabRedemption SET active = 1 where orderItemID in (2579027);
I've noticed that if I use a update with join query its fast, but I dont want to do that because its not supported in h2database.
On a side note MS SQLServer does this fine.

The best workaround:
UPDATE TabRedemption
JOIN TabOrderDetails USING(orderItemID)
SET TabRedemption.active = 1
WHERE TabOrderDetails.OrderId = 4559775;
(or something close to that)
The answer is that SELECT and UPDATE use different parsers. The workaround is to add a second table to the UPDATE because it will then use the SELECT parser.
The difference in parsers is being addressed by Oracle in MySQL 5.7.
Keep in mind that the pattern "IN ( SELECT ... )" optimizes poorly in many cases (although apparently not your case).

What does this X in a query using a derived table in MySQL mean?

I'm getting my head around using what I think are called 'derived tables' in MySQL and noticed my queries only work when I have this x after the FROM() statement:
SELECT * FROM( <inner select> ) x ORDER BY id ASC
// -----------------------------^
I've looked around and read several articles about derived tables but none of them mention this (all the examples I see don't even utilise it, though my tests return 0 rows without it). I came across it in an answer to one of my previous questions.
If it makes any difference, I am running the queries via PDO.

The x is an alias prefix for the subdataset. Just the same as picking from a table.
SELECT test_id from tableA x;
would make your results be accessable by x.test_id. This is good for shortening tables and summing subdatasets.
SELECT * FROM (SELECT test_id FROM tableA) x;
would offer x.test_id
Additional note by Jonathan Leffler: The SQL standard says the alias is mandatory.
Refere to this Blogpost about the advantages of alias: http://openquery.com/blog/good-practice-bad-practice-table-aliases

x is the name given to the resulting table which comes from the inner select. The name could be any valid table name not just x.
Subqueries are legal in a SELECT statement's FROM clause. The actual
syntax is:
SELECT ... FROM (subquery) [AS] name ...
The [AS] name clause is mandatory, because every table in a FROM
clause must have a name.
Reference: http://dev.mysql.com/doc/refman/5.0/en/from-clause-subqueries.html

SQL query: Delete all records from the table except latest N?

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?
Something like this, only it doesn't work :)
delete from table order by id ASC limit ((select count(*) from table ) - N)
Thanks.

You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.
This works (tested in MySQL 5.0.67):
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
The intermediate subquery is required. Without it we'd run into two errors:
SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.
Fortunately, using an intermediate subquery allows us to bypass both of these limitations.
Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.

I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.
The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN method.
Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.
What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN:
DELETE FROM `test_sandbox`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `test_sandbox`
ORDER BY id DESC
LIMIT 1 OFFSET 42 -- keep this many records
) foo
);
It uses OFFSET to get the id of the Nth record and deletes that record and all previous records.
Since ordering is already an assumption of this problem (ORDER BY id DESC), <= is a perfect fit.
It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.
Test case
I tested the three working methods and the new method above in two test cases.
Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).
+-----------+------------------------+----------------------+
| | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN | 3.2542 seconds | 0.1629 seconds |
| NOT IN v2 | 4.5863 seconds | 0.1650 seconds |
| <=,OFFSET | 0.0204 seconds | 0.1076 seconds |
+-----------+------------------------+----------------------+
What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.

Unfortunately for all the answers given by other folks, you can't DELETE and SELECT from a given table in the same query.
DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);
ERROR 1093 (HY000): You can't specify target table 'mytable' for update
in FROM clause
Nor can MySQL support LIMIT in a subquery. These are limitations of MySQL.
DELETE FROM mytable WHERE id NOT IN
(SELECT id FROM mytable ORDER BY id DESC LIMIT 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
The best answer I can come up with is to do this in two stages:
SELECT id FROM mytable ORDER BY id DESC LIMIT n;
Collect the id's and make them into a comma-separated string:
DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );
(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.)
note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.

DELETE i1.*
FROM items i1
LEFT JOIN
(
SELECT id
FROM items ii
ORDER BY
id DESC
LIMIT 20
) i2
ON i1.id = i2.id
WHERE i2.id IS NULL

If your id is incremental then use something like
delete from table where id < (select max(id) from table)-N

To delete all the records except te last N you may use the query reported below.
It's a single query but with many statements so it's actually not a single query the way it was intended in the original question.
Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.
Hope it may be useful anyway...
nnn are the rows to keep and theTable is the table you're working on.
I'm assuming you have an autoincrementing record named id
SELECT #ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT #ROWS_TO_DELETE := IF(#ROWS_TO_DELETE<0,0,#ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING #ROWS_TO_DELETE;
The good thing about this approach is performance: I've tested the query on a local DB with about 13,000 record, keeping the last 1,000. It runs in 0.08 seconds.
The script from the accepted answer...
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
Takes 0.55 seconds. About 7 times more.
Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD

DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)

try below query:
DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)
the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.

What about :
SELECT * FROM table del
LEFT JOIN table keep
ON del.id < keep.id
GROUP BY del.* HAVING count(*) > N;
It returns rows with more than N rows before.
Could be useful ?

Using id for this task is not an option in many cases. For example - table with twitter statuses. Here is a variant with specified timestamp field.
delete from table
where access_time >=
(
select access_time from
(
select access_time from table
order by access_time limit 150000,1
) foo
)

Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL. The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. This code worked in SQL 2008, and is based on this SO post. https://stackoverflow.com/a/1104447/993856
-- Keep the last 10 most recent passwords for this user.
DECLARE #UserID int; SET #UserID = 1004
DECLARE #ThresholdID int -- Position of 10th password.
SELECT #ThresholdID = UserPasswordHistoryID FROM
(
SELECT ROW_NUMBER()
OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
FROM UserPasswordHistory
WHERE UserID = #UserID
) sub
WHERE (RowNum = 10) -- Keep this many records.
DELETE UserPasswordHistory
WHERE (UserID = #UserID)
AND (UserPasswordHistoryID < #ThresholdID)
Admittedly, this is not elegant. If you're able to optimize this for Microsoft SQL, please share your solution. Thanks!

If you need to delete the records based on some other column as well, then here is a solution:
DELETE
FROM articles
WHERE id IN
(SELECT id
FROM
(SELECT id
FROM articles
WHERE user_id = :userId
ORDER BY created_at DESC LIMIT 500, 10000000) abc)
AND user_id = :userId

This should work as well:
DELETE FROM [table]
INNER JOIN (
SELECT [id]
FROM (
SELECT [id]
FROM [table]
ORDER BY [id] DESC
LIMIT N
) AS Temp
) AS Temp2 ON [table].[id] = [Temp2].[id]

DELETE FROM table WHERE id NOT IN (
SELECT id FROM table ORDER BY id, desc LIMIT 0, 10
)

Stumbled across this and thought I'd update.
This is a modification of something that was posted before. I would have commented, but unfortunately don't have 50 reputation...
LOCK Tables TestTable WRITE, TestTable as TestTableRead READ;
DELETE FROM TestTable
WHERE ID <= (
SELECT ID
FROM TestTable as TestTableRead -- (the 'as' declaration is required for some reason)
ORDER BY ID DESC LIMIT 1 OFFSET 42 -- keep this many records);
UNLOCK TABLES;
The use of 'Where' and 'Offset' circumvents the sub-query.
You also cannot read and write from the same table in the same query, as you may modify entries as they're being used. The Locks allow to circumvent this. This is also safe for parallel access to the database by other processes.
For performance and further explanation see the linked answer.
Tested with mysql Ver 15.1 Distrib 10.5.18-MariaDB
For further details on locks, see here

Why not
DELETE FROM table ORDER BY id DESC LIMIT 1, 123456789
Just delete all but the first row (order is DESC!), using a very very large nummber as second LIMIT-argument. See here

Answering this after a long time...Came across the same situation and instead of using the answers mentioned, I came with below -
DELETE FROM table_name order by ID limit 10
This will delete the 1st 10 records and keep the latest records.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Use a CTE to UPDATE or DELETE in MySQL - mysql

Related

How do I run an update query in MySQL on a CTE?

Why adding extra one layer of SELECT resolve Mysql error code :1235

Why does MySQL do a scan for update but lookup for select

What does this X in a query using a derived table in MySQL mean?

SQL query: Delete all records from the table except latest N?

Categories

Resources