I have seen a bunch of helpful answers about updating table values from a different table with multiple values based on a timestamp using a MAX() subquery.
e.g. Update another table based on latest record
I was wondering how this compares with doing an ALTER first and relying on the order in the table to simplify the UPDATE. Something like this:
ALTER TABLE `table_with_multiple_data` ORDER BY `timestamp` DESC;
UPDATE `table_with_single_data` as `t1`
LEFT JOIN `table_with_multiple_data` AS `t2`
ON `t1`.`id`=`t2`.`t1id`
SET `t1`.`value` = `t2`.`value`;
(Apologies for the pseudocode but I hope you get what I'm asking)
Both achieve the same for me but don't really have a big enough data set to see any difference in speed.
Thanks!!
You would normally use a correlated subquery:
UPDATE table_with_single_data t1
SET t1.value = (select t2.value
from table_with_multiple_data t2
where t2.t1id = t1.id
order by t2.timestamp desc
limit 1
);
If your method happens to work, that is just happenstance. Even if MySQL respected the ordering of tables, such ordering would not survive the join operation. Not to mention the fact that there is no guarantee on *which * value is assigned when there is multiple matching rows.
I would like to trim an existing MySQL ISAM table, 10million records.
What would be a SQL statement to delete the last 5million records? This is just a log table, no damage done by removing records.
delete from table where id >= (select * from (select id from table order by id limit 4999999,1) as t )
You can use ORDER BY and LIMIT with DELETE to affect only a limited number of rows. The way that you would get the 'last' 5 million depends on what you have as far as determining record age, and use that in your ORDER BY.
SELECT *
FROM ??? (youre table name)ORDER BY?????? LIMIT 5000000
Here ??? = youre table name, and ?????? is the way you sort youre table. So if you have some sort of date or number of entry to sort by.
http://www.w3schools.com/sql/
I've used SELECT in my example above so you can try out first if it works correctly! If it does work, change SELECT into DELETE :)
If it doesn't work, than you've to try something else to sort by :)
Does this works
DELETE FROM tableA ORDER BY some_indexed_column DESC LIMIT 1000000
I need some help with this query: I want to delete the following rows from my table.
The table has 4 columns: UserID, UserName, Tag, Score.
I want to keep only the top 1000 users with highest score, for each Tag.
So I need to order by score, and to keep the first 1000 users for each tag.
Thanks
Based on the other answer,
DELETE FROM table t1
WHERE UserID NOT IN (SELECT userID FROM table t2
WHERE t2.tag = t1.tag
ORDER BY Score DESC LIMIT 1000);
This assumes that UserID is a unique/primary key field. If it's not, you could do something like this:
DELETE FROM table t1
WHERE UserID||' '||Tag NOT IN (SELECT UserID||' '||Tag FROM table t2
WHERE t2.tag = t1.tag
ORDER BY Score DESC LIMIT 1000);
Putting the spaces in because I know I've had to do similar on Oracle, not sure how MySQL would handle it without.
You should be able to fix this with a subquery although I'm not completely sure if MySQL supports that within delete queries. I do know that it can give problems with update queries (I've had some segfaults in MySQL because of this)
Either way... this should work (atleast, it would in Postgres)
DELETE FROM table WHERE id NOT IN (SELECT id FROM table ORDER BY Score DESC LIMIT 1000)
Assuming that id is your primary key ofcourse.
Solution using a temporary table:
Do note that you'll have to repeat this process for every tag. I can't think of a way to do it in 1 query in MySQL 5.0
CREATE TEMPORARY TABLE top_users
SELECT * FROM table
WHERE Tag = ...
GROUP BY UserID
ORDER BY Score DESC
LIMIT 1000;
DELETE FROM table WHERE Tag = ...;
INSERT INTO table
SELECT * FROM top_users;
Maybe a temporary table, who is filled with a select clause
SELECT * FROM table ORDER BY Score DESC LIMIT 1000;
I dont have the syntax handy for creating the temporary table, but that will atleast get your the rows you want.
Be careful with delete commands... Better to create a new view/table.
I wanted to automate my table population for testing purposes.
I needed to edit some columns from a certain table but I must make sure that the values I put in that certain column does not simply come out of randomness.
So the values actually comes from another table on a certain condition.
How can I do that? Just like this code:
update table_one set `some_id`=(select some_id from another_table where another_table.primary_id=table_one.primary_id order by rand() limit 1)
It's something like my condition for the Subselect query. It should match the id of the current row I am updating.
I really forgot my SQL now. Thanks for the answers though.
You're almost there - all you need to do is qualify the column you're selecting in the subquery, so you know it comes from the correct table:
update table_one
set some_id=(
select another_table.some_id
from another_table
where another_table.primary_id=table_one.primary_id
order by rand()
limit 1
)
Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?
Something like this, only it doesn't work :)
delete from table order by id ASC limit ((select count(*) from table ) - N)
Thanks.
You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.
This works (tested in MySQL 5.0.67):
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
The intermediate subquery is required. Without it we'd run into two errors:
SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.
Fortunately, using an intermediate subquery allows us to bypass both of these limitations.
Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.
I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.
The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN method.
Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.
What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN:
DELETE FROM `test_sandbox`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `test_sandbox`
ORDER BY id DESC
LIMIT 1 OFFSET 42 -- keep this many records
) foo
);
It uses OFFSET to get the id of the Nth record and deletes that record and all previous records.
Since ordering is already an assumption of this problem (ORDER BY id DESC), <= is a perfect fit.
It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.
Test case
I tested the three working methods and the new method above in two test cases.
Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).
+-----------+------------------------+----------------------+
| | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN | 3.2542 seconds | 0.1629 seconds |
| NOT IN v2 | 4.5863 seconds | 0.1650 seconds |
| <=,OFFSET | 0.0204 seconds | 0.1076 seconds |
+-----------+------------------------+----------------------+
What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.
Unfortunately for all the answers given by other folks, you can't DELETE and SELECT from a given table in the same query.
DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);
ERROR 1093 (HY000): You can't specify target table 'mytable' for update
in FROM clause
Nor can MySQL support LIMIT in a subquery. These are limitations of MySQL.
DELETE FROM mytable WHERE id NOT IN
(SELECT id FROM mytable ORDER BY id DESC LIMIT 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
The best answer I can come up with is to do this in two stages:
SELECT id FROM mytable ORDER BY id DESC LIMIT n;
Collect the id's and make them into a comma-separated string:
DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );
(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.)
note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.
DELETE i1.*
FROM items i1
LEFT JOIN
(
SELECT id
FROM items ii
ORDER BY
id DESC
LIMIT 20
) i2
ON i1.id = i2.id
WHERE i2.id IS NULL
If your id is incremental then use something like
delete from table where id < (select max(id) from table)-N
To delete all the records except te last N you may use the query reported below.
It's a single query but with many statements so it's actually not a single query the way it was intended in the original question.
Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.
Hope it may be useful anyway...
nnn are the rows to keep and theTable is the table you're working on.
I'm assuming you have an autoincrementing record named id
SELECT #ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT #ROWS_TO_DELETE := IF(#ROWS_TO_DELETE<0,0,#ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING #ROWS_TO_DELETE;
The good thing about this approach is performance: I've tested the query on a local DB with about 13,000 record, keeping the last 1,000. It runs in 0.08 seconds.
The script from the accepted answer...
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
Takes 0.55 seconds. About 7 times more.
Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD
DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)
try below query:
DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)
the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.
What about :
SELECT * FROM table del
LEFT JOIN table keep
ON del.id < keep.id
GROUP BY del.* HAVING count(*) > N;
It returns rows with more than N rows before.
Could be useful ?
Using id for this task is not an option in many cases. For example - table with twitter statuses. Here is a variant with specified timestamp field.
delete from table
where access_time >=
(
select access_time from
(
select access_time from table
order by access_time limit 150000,1
) foo
)
Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL. The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. This code worked in SQL 2008, and is based on this SO post. https://stackoverflow.com/a/1104447/993856
-- Keep the last 10 most recent passwords for this user.
DECLARE #UserID int; SET #UserID = 1004
DECLARE #ThresholdID int -- Position of 10th password.
SELECT #ThresholdID = UserPasswordHistoryID FROM
(
SELECT ROW_NUMBER()
OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
FROM UserPasswordHistory
WHERE UserID = #UserID
) sub
WHERE (RowNum = 10) -- Keep this many records.
DELETE UserPasswordHistory
WHERE (UserID = #UserID)
AND (UserPasswordHistoryID < #ThresholdID)
Admittedly, this is not elegant. If you're able to optimize this for Microsoft SQL, please share your solution. Thanks!
If you need to delete the records based on some other column as well, then here is a solution:
DELETE
FROM articles
WHERE id IN
(SELECT id
FROM
(SELECT id
FROM articles
WHERE user_id = :userId
ORDER BY created_at DESC LIMIT 500, 10000000) abc)
AND user_id = :userId
This should work as well:
DELETE FROM [table]
INNER JOIN (
SELECT [id]
FROM (
SELECT [id]
FROM [table]
ORDER BY [id] DESC
LIMIT N
) AS Temp
) AS Temp2 ON [table].[id] = [Temp2].[id]
DELETE FROM table WHERE id NOT IN (
SELECT id FROM table ORDER BY id, desc LIMIT 0, 10
)
Stumbled across this and thought I'd update.
This is a modification of something that was posted before. I would have commented, but unfortunately don't have 50 reputation...
LOCK Tables TestTable WRITE, TestTable as TestTableRead READ;
DELETE FROM TestTable
WHERE ID <= (
SELECT ID
FROM TestTable as TestTableRead -- (the 'as' declaration is required for some reason)
ORDER BY ID DESC LIMIT 1 OFFSET 42 -- keep this many records);
UNLOCK TABLES;
The use of 'Where' and 'Offset' circumvents the sub-query.
You also cannot read and write from the same table in the same query, as you may modify entries as they're being used. The Locks allow to circumvent this. This is also safe for parallel access to the database by other processes.
For performance and further explanation see the linked answer.
Tested with mysql Ver 15.1 Distrib 10.5.18-MariaDB
For further details on locks, see here
Why not
DELETE FROM table ORDER BY id DESC LIMIT 1, 123456789
Just delete all but the first row (order is DESC!), using a very very large nummber as second LIMIT-argument. See here
Answering this after a long time...Came across the same situation and instead of using the answers mentioned, I came with below -
DELETE FROM table_name order by ID limit 10
This will delete the 1st 10 records and keep the latest records.