Select all except top N in MySQL - mysql

I have a table with data that I don't need to keep for very long, so every night I want to remove all rows except the last 20.
To do that, I found the following query:
DELETE FROM Table WHERE ID NOT IN (
SELECT id FROM (
SELECT TOP 10 ID FROM Table
) AS x
)
MySQL doesn't support the TOP function, so I rewrote it to use LIMIT instead:
DELETE FROM Table WHERE ID NOT IN (
SELECT id FROM (
SELECT ID FROM Table ORDER BY ID DESC LIMIT 10
) AS x
)
Unfortunately, MySQL doesn't seem to support the LIMIT function within subqueries. So what do I do now?
How do I select all except the 10 rows with the highest ID?
I could probably just delete all the records that are older than a day or something, but it feels like I should be able to do it this way.

As MySQL has foibles when reading from the same table as you are deleting, the simplest option is often to use a temp table.
INSERT INTO yourTempTable
SELECT id FROM yourTable
ORDER BY ID DESC LIMIT 10
DELETE yourTable WHERE id IN (SELECT id FROM yourTempTable)
Or many variations thereof (using joins instead of IN, etc).
The main consideration isn't about how to write the second query, it's about race conditions.
Your data could be changed by another process between the temp table and the delete. If that is possible and matters, you need to wrap it all in a transaction and slap a table lock on yourTable.

Another way:
DELETE t
FROM
TableX AS t
CROSS JOIN
( SELECT Id
FROM TableX
ORDER BY Id DESC
LIMIT 1 OFFSET 9
) AS tenth
WHERE t.Id < tenth.id

There are few ways to do this, I would prefer this :-
create table tmp_table select * from your_table order by id desc limit 20;
truncate table your_table;
insert into your_table select * from tmp_table;
drop table tmp_table;
It seems very lengthy at first glance,
but personally I think is understandable,
and very low risk (plus, it should be efficient that doing a JOIN)
PS : truncate won't reset auto increment field
ps2 : this approach assume there is no write during truncate, one can issue lock to preserve the data integrity

Related

Keep only last two rows for grouped columns in table

I have a table "History" with about 300.000 rows, which is filled with new data daily. I want to keep only the last two lines of every refSchema/refId combination.
Actually I go this way:
First Step:
SELECT refSchema,refId FROM History GROUP BY refSchema,refId
With this statement I get all combinations (which are about 40.000).
Second Step:
I run a foreach which looks up for the existing rows for the query above like this:
SELECT id
FROM History
WHERE refSchema = ? AND refId = ? AND state = 'done'
ORDER BY importedAt
DESC LIMIT 2,2000
Please keep in mind, that I want to hold the last two rows in my table, so I limit 2,2000. If I find matching rows I put the id's in an array called idList.
Final Step
I delete all id's from the array in that way:
DELETE FROM History WHERE id in ($idList)
This all seems not to be the best performance, because I have to check every combination with an extra query. Is there a way to have one delete statement that does the magic to avoid the 40.000 extra queries?
Edit Update: I use AWS Aurora DB
If you are using MySQL 8+, then one conceptually simple way to proceed here is to use a CTE to identify the top two rows per group which you do want to retain. Then, delete any record whose schema/id pair do not appear in this whitelist:
WITH cte AS (
SELECT refSchema, refId
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY refSchema, refId ORDER BY importedAt DESC) rn
FROM History
) t
WHERE rn IN (1, 2)
)
DELETE
FROM History
WHERE (refSchema, refId) NOT IN (SELECT refSchema, refId FROM cte);
If you can't use CTE, then try inlining the above CTE:
DELETE
FROM History
WHERE (refSchema, refId) NOT IN (
SELECT refSchema, refId
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY refSchema, refId ORDER BY importedAt DESC) rn
FROM History
) t
WHERE rn IN (1, 2)
);

Select last row in MySQL

How can I SELECT the last row in a MySQL table?
I'm INSERTing data and I need to retrieve a column value from the previous row.
There's an auto_increment in the table.
Yes, there's an auto_increment in there
If you want the last of all the rows in the table, then this is finally the time where MAX(id) is the right answer! Kind of:
SELECT fields FROM table ORDER BY id DESC LIMIT 1;
Keep in mind that tables in relational databases are just sets of rows. And sets in mathematics are unordered collections. There is no first or last row; no previous row or next row.
You'll have to sort your set of unordered rows by some field first, and then you are free the iterate through the resultset in the order you defined.
Since you have an auto incrementing field, I assume you want that to be the sorting field. In that case, you may want to do the following:
SELECT *
FROM your_table
ORDER BY your_auto_increment_field DESC
LIMIT 1;
See how we're first sorting the set of unordered rows by the your_auto_increment_field (or whatever you have it called) in descending order. Then we limit the resultset to just the first row with LIMIT 1.
You can combine two queries suggested by #spacepille into single query that looks like this:
SELECT * FROM `table_name` WHERE id=(SELECT MAX(id) FROM `table_name`);
It should work blazing fast, but on INNODB tables it's fraction of milisecond slower than ORDER+LIMIT.
on tables with many rows are two queries probably faster...
SELECT #last_id := MAX(id) FROM table;
SELECT * FROM table WHERE id = #last_id;
Almost every database table, there's an auto_increment column(generally id )
If you want the last of all the rows in the table,
SELECT columns FROM table ORDER BY id DESC LIMIT 1;
OR
You can combine two queries into single query that looks like this:
SELECT columns FROM table WHERE id=(SELECT MAX(id) FROM table);
Make it simply use: PDO::lastInsertId
http://php.net/manual/en/pdo.lastinsertid.php
Many answers here say the same (order by your auto increment), which is OK, provided you have an autoincremented column that is indexed.
On a side note, if you have such field and it is the primary key, there is no performance penalty for using order by versus select max(id). The primary key is how data is ordered in the database files (for InnoDB at least), and the RDBMS knows where that data ends, and it can optimize order by id + limit 1 to be the same as reach the max(id)
Now the road less traveled is when you don't have an autoincremented primary key. Maybe the primary key is a natural key, which is a composite of 3 fields...
Not all is lost, though. From a programming language you can first get the number of rows with
SELECT Count(*) - 1 AS rowcount FROM <yourTable>;
and then use the obtained number in the LIMIT clause
SELECT * FROM orderbook2
LIMIT <number_from_rowcount>, 1
Unfortunately, MySQL will not allow for a sub-query, or user variable in the LIMIT clause
If you want the most recently added one, add a timestamp and select ordered in reverse order by highest timestamp, limit 1. If you want to go by ID, sort by ID. If you want to use the one you JUST added, use mysql_insert_id.
You can use an OFFSET in a LIMIT command:
SELECT * FROM aTable LIMIT 1 OFFSET 99
in case your table has 100 rows this return the last row without relying on a primary_key
Without ID in one query:
SELECT * FROM table_name LIMIT 1 OFFSET (SELECT COUNT(*) - 1 FROM table_name)
SELECT * FROM adds where id=(select max(id) from adds);
This query used to fetch the last record in your table.

MySQL: how to keep top 1000 rows for each value of a column

I need some help with this query: I want to delete the following rows from my table.
The table has 4 columns: UserID, UserName, Tag, Score.
I want to keep only the top 1000 users with highest score, for each Tag.
So I need to order by score, and to keep the first 1000 users for each tag.
Thanks
Based on the other answer,
DELETE FROM table t1
WHERE UserID NOT IN (SELECT userID FROM table t2
WHERE t2.tag = t1.tag
ORDER BY Score DESC LIMIT 1000);
This assumes that UserID is a unique/primary key field. If it's not, you could do something like this:
DELETE FROM table t1
WHERE UserID||' '||Tag NOT IN (SELECT UserID||' '||Tag FROM table t2
WHERE t2.tag = t1.tag
ORDER BY Score DESC LIMIT 1000);
Putting the spaces in because I know I've had to do similar on Oracle, not sure how MySQL would handle it without.
You should be able to fix this with a subquery although I'm not completely sure if MySQL supports that within delete queries. I do know that it can give problems with update queries (I've had some segfaults in MySQL because of this)
Either way... this should work (atleast, it would in Postgres)
DELETE FROM table WHERE id NOT IN (SELECT id FROM table ORDER BY Score DESC LIMIT 1000)
Assuming that id is your primary key ofcourse.
Solution using a temporary table:
Do note that you'll have to repeat this process for every tag. I can't think of a way to do it in 1 query in MySQL 5.0
CREATE TEMPORARY TABLE top_users
SELECT * FROM table
WHERE Tag = ...
GROUP BY UserID
ORDER BY Score DESC
LIMIT 1000;
DELETE FROM table WHERE Tag = ...;
INSERT INTO table
SELECT * FROM top_users;
Maybe a temporary table, who is filled with a select clause
SELECT * FROM table ORDER BY Score DESC LIMIT 1000;
I dont have the syntax handy for creating the temporary table, but that will atleast get your the rows you want.
Be careful with delete commands... Better to create a new view/table.

How can I speed up a MySQL query with a large offset in the LIMIT clause?

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.

SQL query: Delete all records from the table except latest N?

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?
Something like this, only it doesn't work :)
delete from table order by id ASC limit ((select count(*) from table ) - N)
Thanks.
You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.
This works (tested in MySQL 5.0.67):
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
The intermediate subquery is required. Without it we'd run into two errors:
SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.
Fortunately, using an intermediate subquery allows us to bypass both of these limitations.
Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.
I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.
The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN method.
Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.
What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN:
DELETE FROM `test_sandbox`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `test_sandbox`
ORDER BY id DESC
LIMIT 1 OFFSET 42 -- keep this many records
) foo
);
It uses OFFSET to get the id of the Nth record and deletes that record and all previous records.
Since ordering is already an assumption of this problem (ORDER BY id DESC), <= is a perfect fit.
It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.
Test case
I tested the three working methods and the new method above in two test cases.
Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).
+-----------+------------------------+----------------------+
| | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN | 3.2542 seconds | 0.1629 seconds |
| NOT IN v2 | 4.5863 seconds | 0.1650 seconds |
| <=,OFFSET | 0.0204 seconds | 0.1076 seconds |
+-----------+------------------------+----------------------+
What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.
Unfortunately for all the answers given by other folks, you can't DELETE and SELECT from a given table in the same query.
DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);
ERROR 1093 (HY000): You can't specify target table 'mytable' for update
in FROM clause
Nor can MySQL support LIMIT in a subquery. These are limitations of MySQL.
DELETE FROM mytable WHERE id NOT IN
(SELECT id FROM mytable ORDER BY id DESC LIMIT 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
The best answer I can come up with is to do this in two stages:
SELECT id FROM mytable ORDER BY id DESC LIMIT n;
Collect the id's and make them into a comma-separated string:
DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );
(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.)
note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.
DELETE i1.*
FROM items i1
LEFT JOIN
(
SELECT id
FROM items ii
ORDER BY
id DESC
LIMIT 20
) i2
ON i1.id = i2.id
WHERE i2.id IS NULL
If your id is incremental then use something like
delete from table where id < (select max(id) from table)-N
To delete all the records except te last N you may use the query reported below.
It's a single query but with many statements so it's actually not a single query the way it was intended in the original question.
Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.
Hope it may be useful anyway...
nnn are the rows to keep and theTable is the table you're working on.
I'm assuming you have an autoincrementing record named id
SELECT #ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT #ROWS_TO_DELETE := IF(#ROWS_TO_DELETE<0,0,#ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING #ROWS_TO_DELETE;
The good thing about this approach is performance: I've tested the query on a local DB with about 13,000 record, keeping the last 1,000. It runs in 0.08 seconds.
The script from the accepted answer...
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
Takes 0.55 seconds. About 7 times more.
Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD
DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)
try below query:
DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)
the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.
What about :
SELECT * FROM table del
LEFT JOIN table keep
ON del.id < keep.id
GROUP BY del.* HAVING count(*) > N;
It returns rows with more than N rows before.
Could be useful ?
Using id for this task is not an option in many cases. For example - table with twitter statuses. Here is a variant with specified timestamp field.
delete from table
where access_time >=
(
select access_time from
(
select access_time from table
order by access_time limit 150000,1
) foo
)
Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL. The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. This code worked in SQL 2008, and is based on this SO post. https://stackoverflow.com/a/1104447/993856
-- Keep the last 10 most recent passwords for this user.
DECLARE #UserID int; SET #UserID = 1004
DECLARE #ThresholdID int -- Position of 10th password.
SELECT #ThresholdID = UserPasswordHistoryID FROM
(
SELECT ROW_NUMBER()
OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
FROM UserPasswordHistory
WHERE UserID = #UserID
) sub
WHERE (RowNum = 10) -- Keep this many records.
DELETE UserPasswordHistory
WHERE (UserID = #UserID)
AND (UserPasswordHistoryID < #ThresholdID)
Admittedly, this is not elegant. If you're able to optimize this for Microsoft SQL, please share your solution. Thanks!
If you need to delete the records based on some other column as well, then here is a solution:
DELETE
FROM articles
WHERE id IN
(SELECT id
FROM
(SELECT id
FROM articles
WHERE user_id = :userId
ORDER BY created_at DESC LIMIT 500, 10000000) abc)
AND user_id = :userId
This should work as well:
DELETE FROM [table]
INNER JOIN (
SELECT [id]
FROM (
SELECT [id]
FROM [table]
ORDER BY [id] DESC
LIMIT N
) AS Temp
) AS Temp2 ON [table].[id] = [Temp2].[id]
DELETE FROM table WHERE id NOT IN (
SELECT id FROM table ORDER BY id, desc LIMIT 0, 10
)
Stumbled across this and thought I'd update.
This is a modification of something that was posted before. I would have commented, but unfortunately don't have 50 reputation...
LOCK Tables TestTable WRITE, TestTable as TestTableRead READ;
DELETE FROM TestTable
WHERE ID <= (
SELECT ID
FROM TestTable as TestTableRead -- (the 'as' declaration is required for some reason)
ORDER BY ID DESC LIMIT 1 OFFSET 42 -- keep this many records);
UNLOCK TABLES;
The use of 'Where' and 'Offset' circumvents the sub-query.
You also cannot read and write from the same table in the same query, as you may modify entries as they're being used. The Locks allow to circumvent this. This is also safe for parallel access to the database by other processes.
For performance and further explanation see the linked answer.
Tested with mysql Ver 15.1 Distrib 10.5.18-MariaDB
For further details on locks, see here
Why not
DELETE FROM table ORDER BY id DESC LIMIT 1, 123456789
Just delete all but the first row (order is DESC!), using a very very large nummber as second LIMIT-argument. See here
Answering this after a long time...Came across the same situation and instead of using the answers mentioned, I came with below -
DELETE FROM table_name order by ID limit 10
This will delete the 1st 10 records and keep the latest records.