How do I run an update query in MySQL on a CTE? - mysql

I'm trying to create a CTE, which I think is working fine. However, I want to then run an update query on this CTE but I keep getting the following error in MySQL WorkBench,
"Error Code: 1288. The target table pptest of the UPDATE is not updatable"
I've had a look around but no not understand any of the work arounds and on top of that I'm not too clued up on my MySQL either. My goal was basically to create a table/view with a bunch of records in there that were partitioned and would have an index number on each row, which would be represented by "row_num". This was to group duplicated data in a table together and then I was hoping to simply run an update query on this structure where the row_num was greater than 1. So simple logic but I cannot figure out any other way of achieving my goal. Can anyone help with this please?
The full query I ran before was,
WITH cte as (select *, row_number() over (partition by col_a order by col_a) row_num from db_name.table_name)
update cte set col_b='test' where row_num > 1

This syntax is not supported in MySQL. Actually this looks like SQL Server syntax... You can't just port queries from one database to another and expect them to work.
In MySQL, you could use the update ... join syntax. However, you need a primary key column (or set of columns), that uniquely identifies each row, to serve as a join condition. Assuming that column id is your primary key, that would be:
update db_name.table_name t
inner join (
select id
row_number() over (partition by col_a order by col_a) row_num
from db_name.table_name)
) t1 on t1.id = t.id
set t.col_b = 'test'
where t1.row_num > 1
Side notes:
As I understand this, your query is meant to flag duplicates on col_a. But the way you use row_number(), it is undefined which of the duplicate rows will be flagged, because your order by clause is not deterministic; you would better have an over() clause like: over (partition by col_a order by id)
window functions such as row_number() are available in MySQL 8.0 only

Update and Delete not work in mysql temporary table. A work around is to join the original table with temporary table CTE and update original table :
WITH cte as (select *, row_number()
over (partition by col_a order by col_a) row_num
from db_name.table_name)
update db_name.table_name T1
Join
cte on T1.col_a = cte.col_a set t1.col_b='test' where cte.row_num > 1
this work fine for delete instruction too.

Related

Keep only last two rows for grouped columns in table

I have a table "History" with about 300.000 rows, which is filled with new data daily. I want to keep only the last two lines of every refSchema/refId combination.
Actually I go this way:
First Step:
SELECT refSchema,refId FROM History GROUP BY refSchema,refId
With this statement I get all combinations (which are about 40.000).
Second Step:
I run a foreach which looks up for the existing rows for the query above like this:
SELECT id
FROM History
WHERE refSchema = ? AND refId = ? AND state = 'done'
ORDER BY importedAt
DESC LIMIT 2,2000
Please keep in mind, that I want to hold the last two rows in my table, so I limit 2,2000. If I find matching rows I put the id's in an array called idList.
Final Step
I delete all id's from the array in that way:
DELETE FROM History WHERE id in ($idList)
This all seems not to be the best performance, because I have to check every combination with an extra query. Is there a way to have one delete statement that does the magic to avoid the 40.000 extra queries?
Edit Update: I use AWS Aurora DB
If you are using MySQL 8+, then one conceptually simple way to proceed here is to use a CTE to identify the top two rows per group which you do want to retain. Then, delete any record whose schema/id pair do not appear in this whitelist:
WITH cte AS (
SELECT refSchema, refId
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY refSchema, refId ORDER BY importedAt DESC) rn
FROM History
) t
WHERE rn IN (1, 2)
)
DELETE
FROM History
WHERE (refSchema, refId) NOT IN (SELECT refSchema, refId FROM cte);
If you can't use CTE, then try inlining the above CTE:
DELETE
FROM History
WHERE (refSchema, refId) NOT IN (
SELECT refSchema, refId
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY refSchema, refId ORDER BY importedAt DESC) rn
FROM History
) t
WHERE rn IN (1, 2)
);

Assigning one-column subquery to column

So I've got a table with a primary key id and a foreign key target_id that points to an id. What I'm trying to do is set each target_id to a random id, that is not itself.
At the moment, this is how I get the randomized id's:
SELECT id FROM (SELECT id FROM test) AS sub ORDER BY RAND()
However, when I try to assign that subquery of randomized id's to the column target_id, an error is thrown saying that the subquery returns more than one row.
However, when I tried
UPDATE SET `target_id` = `id`
to see if columns can be directly copied, it worked, showing that IS possible. But why can't one column of my subquery be copied into a column of my table?
Sorry if I worded my question really weirdly; I'm not very experienced in MySql :/
Thanks! :D
As you might guess from the link in my comment to your question - I did not have time to explain further at the time ... - I assume that what you actually want is to shuffle the values of column target_id around. In Mysql this seems to be a non-trivial task, as the post in my comment shows.
I also played around a bit further and came up with this "variation of the theme" using user defined variables. I first tried using a temporary table, but this can only be referenced once so I had to revert to using a "conventional" table tmp in the end:
drop table if exists tmp;
select #n:=0;select #m:=0;
create table tmp select * from
(select *,#n:=#n+1 nr
from (select id,#m:=#m+1 n from tbl order by rand()) rnd
order by rand()
) rnd ;
update tbl s
inner join tmp a on a.id=s.id
inner join tmp b on b.nr=a.n
inner join tbl t on t.id=b.id
set t.target_id=s.target_id;
Check out the little demo here: http://rextester.com/XYF73480

Use a CTE to UPDATE or DELETE in MySQL

The new version of MySQL, 8.0, now supports Common Table Expressions.
According to the manual:
A WITH clause is permitted at the beginning of SELECT, UPDATE, and DELETE statements:
WITH ... SELECT ...
WITH ... UPDATE ...
WITH ... DELETE ...
So, I thought, given the following table:
ID lastName firstName
----------------------
1 Smith Pat
2 Smith Pat
3 Smith Bob
I can use the following query:
WITH ToDelete AS
(
SELECT ID,
ROW_NUMBER() OVER (PARTITION BY lastName, firstName ORDER BY ID) AS rn
FROM mytable
)
DELETE FROM ToDelete;
in order to delete duplicates from the table, just like I could do in SQL Server.
It turns out I was wrong. When I try to execute the DELETE stament from MySQL Workbench I get the error:
Error Code: 1146. Table 'todelete' doesn't exist
I also get an error message when I try to do an UPDATE using the CTE.
So, my question is, how could one use a WITH clause in the context of an UPDATE or DELETE statement in MySQL (as cited in the manual of version 8.0)?
This appears to be a published bug in MySQL 8.x. From this bug report:
In the 2015 version of the SQL standard, a CTE cannot be defined in UPDATE; MySQL allows it but makes the CTE read-only (we're updating the documentation now to mention this).
This said, one could use a view instead of the CTE; then the view may be updatable, but due to the presence of window functions it is materialized into a temporary table (it is not merged) so is not updatable (we're going to mention it in the doc as well).
All the above applies to DELETE too.
If you follow the above bug link, you will see a workaround suggested for using a CTE, but it involved joining the CTE to the original target table in a one-to-one mapping. Based on your example, which is a blanket delete, it is not clear what workaround you need, were to proceed using a CTE for your delete.
Since the CTE is not updatable, you need to refer to the original table to delete rows. I think you are looking for something like this:
WITH ToDelete AS
(
SELECT ID,
ROW_NUMBER() OVER (PARTITION BY lastName, firstName ORDER BY ID) AS rn
FROM mytable
)
DELETE FROM mytable USING mytable JOIN ToDelete ON mytable.ID = ToDelete.ID
WHERE ToDelete.rn > 1;

Simply delete duplicate content in a sql table

I wanted to know if there is an easy way to remove duplicates from a table sql.
Rather than fetch the whole table and delete the data if they appear twice.
Thank you in advance
This is my structure :
CREATE TABLE IF NOT EXISTS `mups` (
`idgroupe` varchar(15) NOT NULL,
`fan` bigint(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
If you are using Sql Server
Check this: SQL SERVER – 2005 – 2008 – Delete Duplicate Rows
Sample Code using CTE:
/* Delete Duplicate records */
WITH CTE (COl1,Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO
Add a calculated column that takes the checksum of the entire row. Search for any duplicate checksums, rank and remove the duplicates.
you can do something like this :
DELETE from yourTable WHERE tableID in
(SELECT clone.tableID
from yourTable origine,
yourTable clone
where clone.tableID= origine.tableID)
But in the WHERE, you can either compare the indexes or compare each other fields...
depending on how you find your doubles.
note, this solution has the advantage of letting you choose what IS a double (if the PK changes for example)
You can find the duplicates by joining the table to itself, doing a group by the fields you are looking for duplicates in, and a having clause where count is greater than one.
Let's say your table name is customers, and your looking for duplicate name fields.
select cust_out.name, count(cust_count.name)
from customers cust_out
inner join customers cust_count on cust_out.name = cust_count.name
group by cust_out.name
having count(cust_count.name) > 1
If you use this in a delete statement you would be deleting all the duplicate records, when you probably intend to keep on of the records.
So to select the records to delete,
select cust_dup.id
from customers cust
inner join customers cust_dup on cust.name = cust_dup.name and cust_dup.id > cust.id
group by cust_dup.id

Selecting last row WITHOUT any kind of key

I need to get the last (newest) row in a table (using MySQL's natural order - i.e. what I get without any kind of ORDER BY clause), however there is no key I can ORDER BY on!
The only 'key' in the table is an indexed MD5 field, so I can't really ORDER BY on that. There's no timestamp, autoincrement value, or any other field that I could easily ORDER on either. This is why I'm left with only the natural sort order as my indicator of 'newest'.
And, unfortunately, changing the table structure to add a proper auto_increment is out of the question. :(
Anyone have any ideas on how this can be done w/ plain SQL, or am I SOL?
If it's MyISAM you can do it in two queries
SELECT COUNT(*) FROM yourTable;
SELECT * FROM yourTable LIMIT useTheCountHere - 1,1;
This is unreliable however because
It assumes rows are only added to this table and never deleted.
It assumes no other writes are performed to this table in the meantime (you can lock the table)
MyISAM tables can be reordered using ALTER TABLE, so taht the insert order is no longer preserved.
It's not reliable at all in InnoDB, since this engine can reorder the table at will.
Can I ask why you need to do this?
In oracle, possibly the same for MySQL too but the optimiser will choose the quickest record / order to return you results. So there is potential if your data was static to run the same query twice and get a different answer.
You can assign row numbers using the ROW_NUMBER function and then sort by this value using the ORDER BY clause.
SELECT *,
ROW_NUMBER() OVER() AS rn
FROM table
ORDER BY rn DESC
LIMIT 1;
Basically, you can't do that.
Normally I'd suggest adding a surrogate primary key with auto-incrememt and ORDER BY that:
SELECT *
FROM yourtable
ORDER BY id DESC
LIMIT 1
But in your question you write...
changing the table structure to add a proper auto_increment is out of the question.
So another less pleasant option I can think of is using a simulated ROW_NUMBER using variables:
SELECT * FROM
(
SELECT T1.*, #rownum := #rownum + 1 AS rn
FROM yourtable T1, (SELECT #rownum := 0) T2
) T3
ORDER BY rn DESC
LIMIT 1
Please note that this has serious performance implications: it requires a full scan and the results are not guaranteed to be returned in any particular order in the subquery - you might get them in sort order, but then again you might not - when you dont' specify the order the server is free to choose any order it likes. Now it probably will choose the order they are stored on disk in order to do as little work as possible, but relying on this is unwise.
Without an order by clause you have no guarantee of the order in which you will get your result. The SQL engine is free to choose any order.
But if for some reason you still want to rely on this order, then the following will indeed return the last record from the result (MySql only):
select *
from (select *,
#rn := #rn + 1 rn
from mytable,
(select #rn := 0) init
) numbered
where rn = #rn
In the sub query the records are retrieved without order by, and are given a sequential number. The outer query then selects only the one that got the last attributed number.
We can use the having for that kind of problem-
SELECT MAX(id) as last_id,column1,column2 FROM table HAVING id=last_id;