I have a table in SQL Server 2005 and 2008 databases that has a periodic job which truncates the table and another table that has a periodic job that deletes rows from the end. My question is if I have a select that has started before the truncate or delete jobs start, will the select query the pre truncate/delete data or could some data be removed before the select reads it? The selects would be using one table or the other not joining both.
The statements would be along these lines:
select * from someTable where id > x and id < y
truncate table someTable
delete from otherTable where id < z
Truncate will wait for the SELECT to finish. DELETE will not. DELETE can remove rows before the SELECT had a chance to read them. Whether this will really happen or not depends on more factors, like your table organization (indexes, clustered index) and cardinality (number of rows in table, number of rows between lower and upper, number of rows lower than upper).
You can prevent this from happening by deploying row versioning
Related
I have a MySQL database with just 1 table:
Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.
Indexes exist on both btcaddress and txid.
Data in it looks like this:
I need to delete all "deletable" record pairs. An example is given in red.
Conditions are:
txid must be the same (there can be more than 2 records with same txid)
vinvoutnr must be the same
vin must be different (can have only 2 values 0 and 1, so 1 must be 0 other must be 1)
In a table of 36M records, about 33M records will be deleted.
I've used this:
delete t1
from registration t1
inner join registration t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
It works but takes 5 hours.
Maybe this would work too (not tested yet):
delete t1
from registration as t1, registration as t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?
Database can be offline for this delete query.
Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:
create table temp_registration as
<query for the rows to keep here>;
truncate table registration;
insert into registration
select *
from temp_registration;
Your logic is a bit hard to follow, but I think the logic on the rows to keep is:
select r.*
from registration r
where not exists (select 1
from registration r2
where r2.txid = r.txid and
r2.vinvoutnr = r.vinvoutnr and
r2.vin <> r.vin
);
For best performance, you want an index on registration(txid, vinvoutnr, vin).
Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.
This fairly obvious question has very few (couldnt find any) solid answers.
I do simple select from table of 2 million rows.
select count(id) as total from big_table
Any machine I try this query on, usually takes at least 5 seconds to complete. This is unacceptable for realtime queries.
The reason I need an exact value of rows fetched is for precise statistical calculations later on.
Using the last auto increment value is unfortunately not an options because rows also get deleted periodically.
It can indeed be slow when running on an InnoDB engine. As stated in section 14.24 of the MySQL 5.7 Reference Manual, “InnoDB Restrictions and Limitations”, 3rd bullet point:
InnoDB InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate Function Descriptions”.
The suggested solution is a counter table. This is a separate table with one row and column, having the current record count. It could be kept updated via triggers. Something like this:
create table big_table_count (rec_count int default 0);
-- one-shot initialisation:
insert into big_table_count select count(*) from big_table;
create trigger big_insert after insert on big_table
for each row
update big_table_count set rec_count = rec_count + 1;
create trigger big_delete after delete on big_table
for each row
update big_table_count set rec_count = rec_count - 1;
You can see here a fiddle, where you should alter the insert/delete statements in the build section to see the effect on:
select rec_count from big_table_count;
You could extend this for several tables, either by creating such a table for each, or to reserve a row per table in the above counter table. It would then be keyed by a column "table_name".
Improving concurrency
The above method does have an impact if you have many concurrent sessions inserting or deleting records, because they need to wait for each other to complete the update of the counter.
A solution is to not let the triggers update the same, single record, but to let them insert a new record, like this:
create trigger big_insert after insert on big_table
for each row
insert into big_table_count (rec_count) values (1);
create trigger big_delete after delete on big_table
for each row
insert into big_table_count (rec_count) values (-1);
The way to get the count then becomes:
select sum(rec_count) from big_table_count;
Then, once in a while (e.g. daily) you should re-initialise the counter table to keep it small:
truncate table big_table_count;
insert into big_table_count select count(*) from big_table;
UPDATE myTable SET niceColumn=1 WHERE someVal=1;
SELECT * FROM myTable WHERE someVal=1;
Is there a way to combine these two queries into one? I mean can I run an update query and it shows the rows it updates. Because here I use "where id=1" filtering twice, I don't want this. Also I think if someVal changes before select query I will have troubles about what I get (ex: update updates it and after that someVal becomes 0 because of other script).
Wrap the two queries in a transaction with the desired ISOLATION LEVEL so that no other threads can't affect the locked rows between the update and the select.
Actually, even what you have done will not show the rows it updated, because meanwhile (after the update) some process may add/change rows.
And this will show all the records, including the ones updated yesterday :)
If I want to see exactly which rows were changed, I would go with temp table. First select into a temp table all the row IDs to be updated. Then perform the update based on the raw IDs in the temp table, and then return the temp table.
CREATE TEMPORARY TABLE to_be_updated
SELECT id
FROM myTable
WHERE someVal = 1;
UPDATE myTable
SET niceColumn = 1
WHERE id IN (SELECT * FROM to_be_updated);
SELECT *
FROM myTable
WHERE id IN (SELECT * FROM to_be_updated)
If in your real code the conditional part (where and so on) is too long to repeat, just put it in a variable that you use in both queries.
Unless you encounter a different problem, you shouldn't need these two combined.
I currently have a MySQL table of about 20 million rows, and I need to prune it. I'd like to remove every row whose updateTime (timestamp of insertion) was more than one month ago. I have not personally performed any alterations of the table's order, so the data should be in the order in which it was inserted, and there is a UNIQUE key on two fields, id and updateTime. How would I go about doing this in a short amount of time?
How much down time can you incur? How big are the rows? How many are you deleting?
Simply put, deleting rows is one of the most expensive things you can do to a table. It's just a horrible thing overall.
If you don't have to do it, and you have the disk space for it, and your queries aren't affected by the table size (well indexed queries typically ignore table size), then you may just leave well enough alone.
If you have the opportunity and can take the table offline (and you're removing a good percentage of the table), then your best bet would be to copy the rows you want to keep to a new table, drop the old one, rename the new one to the old name, and THEN recreate your indexes.
Otherwise, you're pretty much stuck with good 'ol delete.
There are two ways to remove a large number of rows. First there is the obvious way:
DELETE FROM table1 WHERE updateTime < NOW() - interval 1 month;
The second (slightly more complicated) way is to create a new table and copy the data that you want to keep, truncate your old table, then copy the rows back.
CREATE TABLE table2 AS
SELECT * FROM table1 WHERE updateTime >= NOW() - interval 1 month;
TRUNCATE table1;
INSERT INTO table1
SELECT * FROM table2;
Using TRUNCATE is much faster than a DELETE with a WHERE clause when you have a large number of rows to delete and a relatively small number that you wish to keep.
Spliting the deletes with limit might speed up the process;
I had to delete 10M rows and i issued the command. It never responded for hours.
I killed the query ( which took couple of hours)
then Split the deletes.
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
Then i duplicated this statement in a file and used the command.
mysql> source /tmp/delete.sql
This was much faster.
You can also try to use tools like pt-tools. and pt-archiver.
Actually even if you can't take the table offline for long, you can still use the 'rename table' technique to get rid of old data.
Stop processes writting to table.
rename table tableName to tmpTableName;
create table tableName like tmpTableName;
set #currentId=(select max(id) from tmpTableName);
set #currentId=#currentId+1;
set #indexQuery = CONCAT("alter table test auto_increment = ", #currentId);
prepare stmt from #indexQuery;
execute stmt;
deallocate prepare stmt;
Start processes writting to table.
insert into tableName
select * from tmpTableName;
drop table;
New inserts to tableName will begin at the correct index; The old data will be inserted in correct indexes.
I use a 1 column memory table to keep track of views on various items in my DB. Each view = INSERT query into the memory table. Every 10 mins, I wanna count() the rows for each item, and commit changes to DB.
The question is.... if I run the query that will get the list of all items, such as
SELECT COUNT(*) AS period_views, `item_id` FROM `-views` GROUP BY `item_id` ORDER BY `item_id`
and then run an update query for each row to add the amount of views in that period, and then truncate the table. This operation might take a few seconds.... and in those few seconds, there is going to be other INSERTS into that table, that didnt make it into the original count. Will they be truncated too once that command executes? or will the table be locked until the entire operation completes, and the the new INSERTs added?
MySQL does not lock the table automatically, and it is possible that you will lose some records in between getting the count and performing the truncate. So two solutions jump out at me:
1) Use table locks to prevent the memory table being updated - depending on the nature of your application, this means that all of your clients might freeze for a few seconds while you are updating, this might be OK.
2) Add a second column to keep track of which records you are currently updating ...
ALTER TABLE `-views` ADD work_in_progress TINYINT NOT NULL DEFAULT 0;
And then when you want to work on the those records
UPDATE `-views` SET work_in_progress = 1;
SELECT COUNT(*) AS period_views, `item_id` FROM `-views` WHERE work_in_progress GROUP BY `item_id` ORDER BY `item_id`;
# [ perform updates as necessary ]
DELETE FROM `-views` WHERE work_in_progress;
This implementation will guarantee that you don't delete any -views which were added while you were updating.
And FWIW, -views is an awful name for a table!