I have a table with: id, pointId, pointValue with about 10.000.000 rows.
I have to to reduce it by deleting 90% rows, so I need to leave every tenth record, and delete any other.
Using id from table isn't good idea because this is not sequence of consecutive numbers.
How can I do it by query?
I am not sure why you really need to delete so many millions of records.
But, following approach may help you.
You can generate dynamic 'row_number' based on which you can filter every 10th row and delete rest of the records.
Example:
Assuming that the table name is 'points_table'.
delete from points_table
where id NOT IN (
select id from (
select #rn:=#rn+1 as row_num, p.id
from points_table p, (select #rn:=0) rn
) list_of_ids_to_be_deleted
where row_num % 10 = 0
)
You can try it:-
DELETE FROM TABLE_NAME
WHERE ID NOT IN
(SELECT ID FROM (SELECT #RN:=#RN+1, ID, POINTID, POINTVAL
FROM TABLE_NAME, (SELECT #RN:=0) T) T1
WHERE MOD(#RN,10) = 0)
This might help you.
A simple solution that should delete 90% of your rows assuming you've got 10 million rows.
DELETE FROM MyTable WHERE Id IN (SELECT Id FROM MyTable LIMIT 9000000);
Related
Table: MyTable
[
Here is the result I desire:
For every ID in the table, count how many times that ID appears in the column Parent_ID.
Create a custom column AS Instances to place the result.
My desired Result
I imagine to get the above result with something not more complicated than a working version of the following query:
SELECT ID, Parent_ID, COUNT( Parent_ID = ID ) AS Instances FROM MyTable
You can use a scalar subquery to compute the extra column. For example:
select
id,
parent_id,
(
select count(*) from my_table b where b.parent_id = a.id
) as instances
from my_table a
A correlated subquery is the simplest solution:
select t.*,
(select count(*) from mytable t2 where t2.parent_id = t.id) as instances
from mytable t;
I have a big table of messages with date and room columns. and 2 billion rows.
now i want keep only last 50 messages for every room and delete previous messages.
can i do it with a fast query ?
this question is unique , i didn't found any other question for delete rows over a grouped and ordered selection
You cannot do it in a fast query. You have a lot of data.
I would suggest creating a new table. You can then replace the data in your first table, if necessary.
Possibly the most efficient method to get the 50 rows -- assuming that date is unique for each room:
select t.*
from t
where t.date >= coalesce((select t2.date
from t t2
where t2.room = t.room
order by t2.date desc
limit 1
), t.date
);
For this to have any hope of performance you want an index on (room, date).
You can also try row_number() in MySQL 8+:
select . . . -- list the columns
from (select t.*, row_number() over (partition by room order by date desc) as seqnum
from t
) t
where seqnum <= 50;
Then you can replace the data by doing:
create table temp_t as
select . . . -- one of the select queries here;
truncate table t; -- this gets rid of all the data, so be careful
insert into t
select *
from temp_t;
Massive inserts are much more efficient than massive updates, because the old data does not need to be logged (nor the pages locked and other things).
You can use Rank() function to get top 50 results for each group ordered by date desc, so the last entries will be in top.
http://www.mysqltutorial.org/mysql-window-functions/mysql-rank-function/
Then you left join that subquery on your table on id ( or room and date, if those are unique and you don’t have id in your table)
The last step would be to filter all such result that have null in subquery and delete those.
The full code will look something like this:
DELETE T FROM YOURTABLE T
LEFT JOIN (
SELECT *,
RANK() OVER (PARTITION BY
ROOM
ORDER BY
[DATE] DESC
) DATE_RANK
) AS T2
ON T.[DATE] = T2.[DATE]
AND T.ROOM = T2.ROOM
AND T2.DATE_RANK<=50
WHERE T2.DATE IS NULL
I have a MySQL table where I have a certain id as a foreign key coming from another table. This id is not unique to this table so I can have many records holding the same id.
I need to find out which ids are seen the least amount of times in this table and pull up a list containing them.
For example, if I have 5 records with id=1, 3 records with id=2 and 3 records with id=3, I want to pull up only ids 2 & 3. However, the data in the table changes quite often so I don't know what that minimum value is going to be at any given moment. The task is quite trivial if I use two queries but I'm trying to do it with just one. Here's what I have:
SELECT id
FROM table
GROUP BY id
HAVING COUNT(*) = MIN(SELECT COUNT(*) FROM table GROUP BY id)
If I substitute COUNT(*) = 3, then the results come up but using the query above gives me an error that MIN is not used properly. Any tips?
I would try with:
SELECT id
FROM table
GROUP BY id
HAVING COUNT(*) = (SELECT COUNT(*) FROM table GROUP BY id ORDER BY COUNT(*) LIMIT 1);
This gets the minimum selecting the first row from the set of counts in ascendent order.
You need a double select in the having clause:
SELECT id
FROM table
GROUP BY id
HAVING COUNT(*) = (SELECT MIN(cnt) FROM (SELECT COUNT(*) as cnt FROM table GROUP BY id) t);
The MIN() aggregate function is suposed to take a column, not a query. So, I see two ways to solve this:
To properly write the subquery, or
To use temp variables
First alternative:
select id
from yourTable
group by id
having count(id) = (
select min(c) from (
select count(*) as c from yourTable group by id
) as a
)
Second alternative:
set #minCount = (
select min(c) from (
select count(*) as c from yourTable group by id
) as a
);
select id
from yourTable
group by id
having count(*) = #minCount;
You need to GROUP BY to produce a set of grouped values and additional select to get the MIN value from that group, only then you can match it against having
SELECT * FROM table GROUP BY id
HAVING COUNT(*) =
(SELECT MIN(X.CNT) AS M FROM(SELECT COUNT(*) CNT FROM table GROUP BY id) AS X)
I am using query like
select * from audittable where a_id IN (1,2,3,4,5,6,7,8);
For each ID its returning 5-6 records. I wanted to get the last but one record for each ID.
Can i do this in one sql statement.
Try this query
SELECT
*
FROM
(SELECT
#rn:=if(#prv=a_id, #rn+1, 1) as rId,
#prv:=a_id as a_id,
---Remaining columns
FROM
audittable
JOIN
(SELECT #rn:=0, #prv:=0) t
WHERE
a_id IN (1,2,3,4,5,6,7,8)
ORDER BY
a_id, <column> desc)tmp --Replace column with the column with which you will determine it is the last record
WHERE
rId=1;
If your database is having DateCreated or any column in which you are saving the DateTime as well like when your data is inserted for a particular row then you may use query like
select at1.* from audittable at1 where
datecreated in( select max(datecreated) from audittable at2
where
at1.id = at2.id
order by datecreated desc
);
You may also use LIMIT function as well.
Hope you understand and works for you.
In SQLite, you have the columns a_id and b. For each a_id you get a set of b's. Let you want
to get the latest/highest (maximum in terms of row_id, date or another naturally increasing index) one of b's
SELECT MAX(b), *
FROM audittable
GROUP BY a_id
Here MAX help to get the maximum b from each group.
Bad news that MySQL doesn't associate MAX b with other *-columns of the table. But it still can be used in case of simple table with a_id and b columns!
I have a series of tables that contain data of similar format. I.e. a UNION would work.
Conceptually you can think of it as 1 table partitioned into multiple tables.
I want to get the data from all of these tables sorted.
Now the problem I have is that the data are too much to be displayed all at once to the user, so I need to display them in portions i.e. pages.
Now my problem is that I need to display the data sorted (as already said).
So if I do something like:
SELECT FROM TABLE_1
UNION
SELECT FROM TABLE_2
UNION
....
SELECT FROM TABLE_N
ORDER BY COL
LIMIT OFFSET, RECORDS;
I would constantly be doing a UNION and ORDER BY to get e.g. the just the corresponding 50 records of the pages on each request.
So how would I most efficiently handle this?
My first attempt would be UNION'ing just a small number of records from each table:
( SELECT FROM table_1 ORDER BY col LIMIT #offset, #records )
UNION
...
( SELECT FROM table_N ORDER BY col LIMIT #offset, #records )
ORDER BY col LIMIT #offset, #records
If the above proves insufficient, I would build a manual index table (based on David Starkey's clever suggestion).
CREATE TABLE index_table (
table_id INT,
item_id INT,
col DATETIME,
INDEX (col, table_id, id)
);
Then populate index_table with a method of your liking (cron job, triggers on tables table_n, ...). Your SELECT statement would then look like this:
SELECT *
FROM ( SELECT * FROM index_table ORDER BY col LIMIT #offset, #records ) AS idx
LEFT JOIN table_1 ON (idx.table_id = 1 AND idx.item_id = table_1.id)
...
LEFT JOIN table_n ON (idx.table_id = n AND idx.item_id = table_n.id)
However, I am not sure of how such a query would perform with so many LEFT JOIN's. It really depends on how many tables table_n there are.
Finally, I would consider merging all tables into one single table.