Delete rows from table but keep a given number - mysql

Using mysql, how do I delete all rows from a table, but keep, say, 200 records?
The obvious approach is to count them, do some arithmetic, and delete the right number. But does mysql has some builtin function that does it in one delete query?

You can delete using a condition:
delete from YourTable
where YourSequentialID > 200
However your sequential could have gaps, so you would not have exactly 200 rows. So what you can do is working on your condition.
Find the records you want to keep (say the first 200) and delete everything else:
delete from YourTable
where id not in
(
select ID
from YourTable
LIMIT 200
)
I know, that can be slow. But that's not a production query, it's just a clean up query. You can live with having to run it only once.

Related

Delete query using COUNT and MIN. MS Access 2010

I am trying to identify and remove duplicates from a data extract.
I have setup a query to group by contract_number and count > 1 which identifies the cases and there are two contract_start_date's of which I need to remove the earliest so I have applied min.
I am unable to run this as delete query. I am fairly new to Access and SQL Scripts.
SELECT Gas_Data.CONTRACT_NUMBER,
Count(Gas_Data.CONTRACT_NUMBER) AS CountOfCONTRACT_NUMBER,
Min(Gas_Data.CONTRACT_START_DATE) AS MinOfCONTRACT_START_DATE
FROM Gas_Data
GROUP BY Gas_Data.CONTRACT_NUMBER
HAVING (((Count(Gas_Data.CONTRACT_NUMBER))>1));
Try this approach where you, in the subquery, identify those records not to be deleted:
DELETE
*
FROM
Gas_Data
WHERE
Gas_Data.CONTRACT_START_DATE Not IN
(SELECT
Max(T.CONTRACT_START_DATE)
FROM
Gas_Data As T
WHERE
T.CONTRACT_NUMBER = Gas_Data.CONTRACT_NUMBER)
Of course, do make a backup first.
Consider the following:
delete from gas_data a
where exists
(
select top 1 * from gas_data b
where
a.contract_number = b.contract_number and
a.contract_start_date < b.contract_start_date
)
For every record, the above will test whether there is at least one other record in the dataset for which the contract number is equal and the start date is later. If such a record exists, the earlier record is deleted.
Always retain a backup of your data before running delete queries.
Try:
DELETE FROM Gas_Data
WHERE Count(Gas_Data.CONTRACT_NUMBER)>1

Subquery returns more rows than straight same query in MySQL

I want to remove duplicates based on the combination of listings.product_id and listings.channel_listing_id
This simple query returns 400.000 rows (the id's of the rows I want to keep):
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
While this variation returns 1.600.000 rows, which are all records on the table, not only is_verified = 0:
SELECT *
FROM (
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
) AS keepem
I'd expect them to return the same amount of rows.
What's the reason for this? How can I avoid it (in order to use the subselect in the where condition of the DELETE statement)?
EDIT: I found that doing a SELECT DISTINCT in the outer SELECT "fixes" it (it returns 400.000 records as it should). I'm still not sure if I should trust this subquery, for there is no DISTINCT in the DELETE statement.
EDIT 2: Seems to be just a bug in the way phpMyAdmin reports the total count of the rows.
Your query as it stands is ambiguous. Suppose you have two listings with the same product_id and channel_id. Then what id is supposed to be returned? The first, the second? Or both, ignoring the GROUP request?
What if there is more than one id with different product and channel ids?
Try removing the ambiguity by selecting MAX(id) AS id and adding DISTINCT.
Are there any foreign keys to worry about? If not, you could pour the original table into a copy, empty the original and copy back in it the non-duplicates only. Messier, but you only do SELECTs or DELETEs guaranteed to succeed, and you also get to keep a backup.
Assign aliases in order to avoid field reference ambiguity:
SELECT
keepem.*
FROM
(
SELECT
innerStat.id
FROM
`listings` AS innerStat
WHERE
innerStat.is_verified = 0
GROUP BY
innerStat.product_id,
innerStat.channel_listing_id
) AS keepem

single query to group data

I'm saving data in MySQL database every 5 seconds and I want to group this data in an average of 5 minutes.
The select is this:
SELECT MIN(revision) as revrev,
AVG(temperature),
AVG(humidity)
FROM dht22_sens t
Group by revision div 500
ORDER BY `revrev` DESC
Is possible to save data with a single query possibly in the same table?
If it is about reducing the number of rows, then I think you have to insert a new row with aggregated values and then delete the original, detailed rows. I don't know any single sql statement for inserting and deleting in one rush (cf also a similar answer from symcbean at StackOverflow, who additionally suggests to pack these two statements into a procedure).
I'd suggest to add an extra column aggregationLevel and do two statements (with or without procedure):
insert into dht22_sens SELECT MIN(t.revision) as revision,
AVG(t.temperature) as temperature,
AVG(t.humidity) as humidity,
500 as aggregationLevel
FROM dht22_sens t
WHERE t.aggregationLevel IS NULL;
delete from dht22_sens where aggregationLevel is null;

How to run a Query in Access for a 30,000 records in a table of 800,000 records?

How to run a Query in Access for the first 30,000 records in a table of 800,000 records?
UPDATE Table1 SET TIME = TimeSerial(Left(TIME,2),Right(TIME,2),0);
Ok the first thing to remember is that records could be pulled in any order, so we probably want to ORDER BY, to ensure that the top 30,000 records are the same each time (or if you want the next 30,000 records, that you don't end up repeating yourself etc.) I assumed you have some sort of id, you can figure out what to order by yourself.
What you're looking for is
UPDATE (SELECT TOP 30000 * FROM Table1 ORDER BY Table1.id) AS a
SET a.TIME = TimeSerial(Left(a.TIME,2),Right(a.TIME,2),0);

MySQL how do i delete duplicate rows from very large table?

I need to know most effective way of deleting duplicated rows from very large table, (over 1 billion rows in this table) so i need to know a very efficient way of doing this as this may take days if i execute a ineffective query.
I need to delete all duplicate urls in the search table,
i.e
DELETE FROM search WHERE (url) NOT IN
(
SELECT url FROM
(
SELECT url FROM search GROUP BY url
) X
);
Depends entirely on your indexes. Do this in two steps: (1) create the highest-selectivity indexes your DBMS supports on the URL field combined with any other field that can distinguish records with the same URL, such as a primary key or time stamp field; (2) write procedural code (not just a query) to process a small fraction if the records at a time and commit results in these small batches, e.g. sliced by PK mod 1000, or the 3 characters of the URL preceding the .TLD part.
This is the best way to have a predictable result, unless you are sure the DB process won't run out of memory, log file space etc. during the long cycle of deletes a straight query would require.
DELETE from search
where id not in (
select min(id) from search
group by url
having count(*)=1
union
SELECT min(id) FROM search
group by url
having count(*) > 1
)