mysql delete rows limited by group - mysql

I have a big table of messages with date and room columns. and 2 billion rows.
now i want keep only last 50 messages for every room and delete previous messages.
can i do it with a fast query ?
this question is unique , i didn't found any other question for delete rows over a grouped and ordered selection

You cannot do it in a fast query. You have a lot of data.
I would suggest creating a new table. You can then replace the data in your first table, if necessary.
Possibly the most efficient method to get the 50 rows -- assuming that date is unique for each room:
select t.*
from t
where t.date >= coalesce((select t2.date
from t t2
where t2.room = t.room
order by t2.date desc
limit 1
), t.date
);
For this to have any hope of performance you want an index on (room, date).
You can also try row_number() in MySQL 8+:
select . . . -- list the columns
from (select t.*, row_number() over (partition by room order by date desc) as seqnum
from t
) t
where seqnum <= 50;
Then you can replace the data by doing:
create table temp_t as
select . . . -- one of the select queries here;
truncate table t; -- this gets rid of all the data, so be careful
insert into t
select *
from temp_t;
Massive inserts are much more efficient than massive updates, because the old data does not need to be logged (nor the pages locked and other things).

You can use Rank() function to get top 50 results for each group ordered by date desc, so the last entries will be in top.
http://www.mysqltutorial.org/mysql-window-functions/mysql-rank-function/
Then you left join that subquery on your table on id ( or room and date, if those are unique and you don’t have id in your table)
The last step would be to filter all such result that have null in subquery and delete those.
The full code will look something like this:
DELETE T FROM YOURTABLE T
LEFT JOIN (
SELECT *,
RANK() OVER (PARTITION BY
ROOM
ORDER BY
[DATE] DESC
) DATE_RANK
) AS T2
ON T.[DATE] = T2.[DATE]
AND T.ROOM = T2.ROOM
AND T2.DATE_RANK<=50
WHERE T2.DATE IS NULL

Related

Reduce records count in MySQL

I have a table with: id, pointId, pointValue with about 10.000.000 rows.
I have to to reduce it by deleting 90% rows, so I need to leave every tenth record, and delete any other.
Using id from table isn't good idea because this is not sequence of consecutive numbers.
How can I do it by query?
I am not sure why you really need to delete so many millions of records.
But, following approach may help you.
You can generate dynamic 'row_number' based on which you can filter every 10th row and delete rest of the records.
Example:
Assuming that the table name is 'points_table'.
delete from points_table
where id NOT IN (
select id from (
select #rn:=#rn+1 as row_num, p.id
from points_table p, (select #rn:=0) rn
) list_of_ids_to_be_deleted
where row_num % 10 = 0
)
You can try it:-
DELETE FROM TABLE_NAME
WHERE ID NOT IN
(SELECT ID FROM (SELECT #RN:=#RN+1, ID, POINTID, POINTVAL
FROM TABLE_NAME, (SELECT #RN:=0) T) T1
WHERE MOD(#RN,10) = 0)
This might help you.
A simple solution that should delete 90% of your rows assuming you've got 10 million rows.
DELETE FROM MyTable WHERE Id IN (SELECT Id FROM MyTable LIMIT 9000000);

mysql return multiple columns in subquery for previous and next in same table

I am trying to concoct a mysql query that returns, in a single row, the fields for a given row, as well as a few fields from a row that matches the "previous" position, and the same fields for the "next" position. I'm pretty new at mysql, but for all the scouring the net for answers, this is the best I can do:
SELECT *,
(select id
from mytable t2
where t2.added_on < t.added_on
order by t2.added_on DESC
limit 1
) as prev_id,
(select id
from mytable t3
where t3.added_on > t.added_on
order by t3.added_on
limit 1
) as next_id FROM mytable as t ORDER BY `added_on`
which works, but only gives me the id field for the "previous" and "next". As you may know, using * (or 'id', 'title') instead of id in the subqueries gives an error. I've looked into using JOINs and some other approaches but I'm just simply not getting it.
OK, I figured it out, here is the solution for all of you out there trying to find a solution to this. Make sure to vote up if it helped you.
SELECT t.*,
prev.id as prev_id,
prev.added_on as prev_added_on,
next.id as next_id,
next.added_on as next_added_on
FROM `TABLE_NAME` AS t
LEFT JOIN `TABLE_NAME` AS prev
ON prev.id =
(select id
from `TABLE_NAME` t2
where t2.added_on < t.added_on
order by t2.added_on DESC
limit 1)
LEFT JOIN `TABLE_NAME` AS next
ON next.id =
(select id
from `TABLE_NAME` t3
where t3.added_on > t.added_on
order by t3.added_on
limit 1 )
ORDER BY t.added_on DESC
That last ORDER BY actually causes the whole thing to be greatly optimized for some reason (my current understanding of mysql doesn't let me know why this is for sure) and in my case it makes execution at least twice as fast as without it.

SQL find distinct and show other columns

I have read many replies and to similar questions but cannot seem to apply it to my situation. I have a table that averages 10,000 records and is ever changing. It containing a column called deviceID which has about 20 unique values, another called dateAndTime and many others including status1 and status2. I need to isolate one instance each deviceID, showing the record that had the most current dateAndTime. This works great using:
select DISTINCT deviceID, MAX(dateAndTime)
from MyTable
Group By deviceID
ORDER BY MAX(dateAndTime) DESC
(I have noticed omitting DISTINCT from the above statement also yields the same result)
However, I cannot expand this statement to include the fields status fields without incurring errors in the statement or incorrect results. I have tried using IN and EXISTS and syntax to isolate rows, all without luck. I am wondering how I can nest or re-write this query so that the results will display the unique deviceID's, the date of the most recent record and the corresponding status fields associated with those unique records.
If you can guarantee that the DeviceID + DateAndTime is UNIQUE you can do the following:
SELECT *
FROM
MyTable as T1,
(SELECT DeviceID, max(DateAndTime) as mx FROM MyTable group by DeviceID) as T2
WHERE
T1.DeviceID = T2.DeviceID AND
T1.DateAndTime = T2.mx
So basically what happens is, that you do a group by on the DeviceID (NOTE: A GROUP BY always goes with an aggregate function. We are using MAX in this case).
Then you join the Query with the Table, and add the DeviceID + DateAndTime in the WHERE clause.
Side Note... GROUP BY will return distinct elements with or without adding DISTINCT because all rows are distinct by default.
Maybe:
SELECT a.*
FROM( SELECT DISTINCT *,
ROW_NUMBER() OVER (PARTITION BY deviceID ORDER BY dateAndTime DESC) as rown
FROM MyTable ) a
WHERE a.rown = 1

Select values with latest date and group by an other column

See My Table and query here (on sqlfiddle) Myquery is fetching exact results that I need but it takes long time when data is more than 1000 rows. I need to make it Efficient
Select `id`, `uid`, case when left_leg > right_leg then left_leg
else right_leg end as Max_leg,`date` from
(
SELECT * FROM network t2 where id in (select id from
(select id,uid,`date` from (SELECT * FROM network order by uid,
`date` desc) t4 group by uid) t3) and
(left_leg>=500 or right_leg>=500))t1
Want to pick data from network againt latest dates for each uid
Wanted to pick data where left_leg >=500 or right_leg >=500
Wanted to pick only bigger of two legs (left or right)
Whole query might have some problems but Core issue is with this code
SELECT * FROM network t2 where id in (select id from
(select id,uid,`date` from (SELECT * FROM network order by uid,
`date` desc) t4 group by uid) t3)
I want to improve this query because it fetches esults so much slow when data grows.
Taking your description:
first find the max date for each uid
then find the associate ids
filter on the leg values
Example:
SELECT id, uid, GREATEST(left_leg, right_leg) as max_leg
FROM network
WHERE (uid, `date`) IN (SELECT uid, MAX(`date`)
FROM network
GROUP BY uid)
AND (left_leg > 500 or right_leg > 500)
Note that this means that if the latest date for a uid does not have some legs > 500, then the records won't show up in the result. If you want the latest records with legs > 500, the leg filter has to be moved in.
Add an index on (uid,date):
ALTER TABLE network ADD INDEX uid_date( uid, date )
and try something like:
SELECT n.id, n.uid, greatest( n.right_leg, n.left_leg ) as max_leg, n.date
FROM
( SELECT uid,max(date) as latest FROM network
WHERE right_leg>=500 OR left_leg >=500
GROUP BY uid
) ng
JOIN network n ON n.uid = ng.uid AND n.date = ng.latest

How do I write this kind of query (returning the latest avaiable data for each row)

I have a table defined like this:
CREATE TABLE mytable (id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(id),
user_id INT REFERENCES user(id) ON UPDATE CASCASE ON DELETE RESTRICT,
amount REAL NOT NULL CHECK (amount > 0),
record_date DATE NOT NULL
);
CREATE UNIQUE INDEX idxu_mybl_key ON mytable (user_id, amount, record_date);
I want to write a query that will have two columns:
user_id
amount
There should be only ONE entry in the returned result set for a given user. Furthermore, the amount figure returned should be the last recoreded amount for the user (i.e. MAX(record_date).
The complication arises because weights are recorded on different dates for different users, so there is no single LAST record_date for all users.
How may I write (preferably an ANSI SQL) query to return the columns mentioned previously, but ensuring that its only the amount for the last recorded amount for the user that is returned?
As an aside, it is probably a good idea to return the 'record_date' column as well in the query, so that it is eas(ier) to verify that the query is working as required.
I am using MySQL as my backend db, but ideally the query should be db agnostic (i.e. ANSI SQL) if possible.
First you need the last record_date for each user:
select user_id, max(record_date) as last_record_date
from mytable
group by user_id
Now, you can join previous query with mytable itself to get amount for this record_date:
select
t1.user_id, last_record_date, amount
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
A problem appears becuase a user can have several rows for same last_record_date (with different amounts). Then you should get one of them, sample (getting the max of the different amounts):
select
t1.user_id, t1.record_date as last_record_date, max(t1.amount)
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
group by t1.user_id, t1.record_date
I do not now about MySQL but in general SQL you need a sub-query for that. You must join the query that calculates the greatest record_date with the original one that calculates the corresponding amount. Roughly like this:
SELECT B.*
FROM
(select user_id, max(record_date) max_date from mytable group by user_id) A
join
mytable B
on A.user_id = B.user_id and A.max_date = B.record_date
SELECT datatable.* FROM
mytable AS datatable
INNER JOIN (
SELECT user_id,max(record_date) AS max_record_date FROM mytable GROUP BS user_id
) AS selectortable ON
selectortable.user_id=datatable.user_id
AND
selectortable.max_record_date=datatable.record_date
in some SQLs you might need
SELECT MAX(user_id), ...
in the selectortable view instead of simply SELECT user_id,...
The definition of maximum: there is no larger(or: "more recent") value than this one. This naturally leads to a NOT EXISTS query, which should be available in any DBMS.
SELECT user_id, amount
FROM mytable mt
WHERE mt.user_id = $user
AND NOT EXISTS ( SELECT *
FROM mytable nx
WHERE nx.user_id = mt.user_id
AND nx.record_date > mt.record_date
)
;
BTW: your table definition allows more than one record to exist for a given {id,date}, but with different amounts. This query will return them all.