Handling Deletes/Inserts/Select in a huge table - mysql

I have a dating website. In this website I used to send daily 10 photo matches to a user and store them in the structure as
SENDER RECEIVER
11 1
12 1
13 1
14 1
I maintain two month log.
User can also check them by logging to my website.
Which means there is parallel insert and select, which surely is not an issue.
Problem is when a user becomes inactive or deletes its id, I need to remove all the entries from the log where sender='inactive-id'.
Size of log is approx 60 million.
So whenever a delete queries comes in this huge table, all selects get locked and my site is getting down.
Note my table is merge myisam
as I need to store 2-3 month records and on 1st of every month I change the definition

Normally, Table is the most granular object that is locked by DELETE statement. Therefore, by using MERGE table you combine several objects that can be lock independently into a single big object that will be locked, when DELETE hits ANY of its tables.
MERGE is a solution for tables which change rarely or never: MERGE Table Advantages and Disadvantages.
You have 2 options:
Minimise impact of locks:
Delete in small batches
Run delete job during low load hours
Consider not deleting at all, if it does not save you much space
Instead of deleting rows mark them as "deleted" or obsolete and exclude from SELECT queries
Have smaller objects locked (rather than locking all your tables at once):
Have several Delete statements to delete from each of underlying tables
Drop MERGE definition, delete data from each underlying table create MERGE. However, I think you can do it without dropping MERGE definition.
Use partitioning.
Quote from MySQL Manual:
An alternative to a MERGE table is a partitioned table, which stores
partitions of a single table in separate files. Partitioning enables
some operations to be performed more efficiently and is not limited to
the MyISAM storage engine. For more information, see Chapter 18, Partitioning.
I would strongly advocate for partitioning, because:
- You can fully automate your logging / data retention process: a script can create new and remove empty partitions, move obsolete data to a different table and then truncate that table.
- key uniqueness is enforced
- Only partition that contains data to be deleted is locked. Selects on other partitions runs as normal.
- Searches run on all partitions at the same time (as with MERGE), but you can use HASH SubPartitioning to further speed up searches.
However if you believe that benefits of partitioning will be outweighed by cost of development, then may be you should not delete that data at all?

I think that the best solution would be setting partitions on log based on user id. This way when you run a delete Db will block only one partition.

If you Google on "delete on huge table" you'll get some informative results. Here are the first three hits:
http://www.dba-oracle.com/t_oracle_fastest_delete_from_large_table.htm
Improving DELETE and INSERT times on a large table that has an index structure
http://www.dbforums.com/microsoft-sql-server/1635228-fastest-way-delete-large-table.html
One method they all mention is deleting in small batches instead of all at once. You say that the table contains data for a 2 month period. Maybe you run delete statements for each day separate?
I hope this helps!

If you use InnoDB and create FOREIGN KEY relations, you can get the rows deleted automatically when the user themself is deleted:
CREATE TABLE `DailyChoices`(
sender INT(11) NOT NULL,
receiver INT(11) NOT NULL,
CONSTRAINT FOREIGN KEY (sender) REFERENCES users (userid) ON DELETE CASCADE ON UPDATE CASCADE
) TYPE = InnoDB;

Related

how many records can be deleted using a single transaction in mysql innodb

I wanted to delete old records from 10 related tables every 6 months using primary keys and foreignkeys. I am planning to do it in a single transaction block, because in case of any failure I have to rollback the changes. My queries will be somethign like this
DELETE FROM PARENT_TABLE WHERE PARENT_ID IN (1, 2, 3,etc);
DELETE FROM CHILD_TABLE1 WHERE PARENT_ID IN (1, 2, 3,etc);
The records to delete will be around 1million. Is it safe to delete all these in a single transaction? how will be the performanace?
Edit
To be more clear on my question. I will detail my execution plan
I am first retreiving primary keys of all the records from the parent table which has to be deleted and store it to a temporary table
START TRANSACITON
DELETE FROM CHILD_ONE WHERE PARENT_ID IN (SELECT * FROM TEMP_ID_TABLE);
DELETE FROM CHILD_TWO WHERE PARENT_ID IN (SELECT * FROM TEMP_ID_TABLE);
DELETE FROM PARENT_TABLE WHERE PARENT_ID IN (SELECT * FROM TEMP_ID_TABLE);
COMMIT;
ROLLBACK on any failure.
Given that I can have around a million records to delete from all these tables, is it safe to put everything inside a single transaction block?
You can probably succeed. But it is not wise. Something random (eg, a network glitch) could come along to cause that huge transaction to abort. You might be blocking other activity for a long time. Etc.
Are the "old" records everything older than date X? If so, it would much more efficient to make use of PARTITIONing for DROPping old rows. We can discuss the details. Oops, you have FOREIGN KEYs, which are incompatible with PARTITIONing. Do all the tables have FKs?
Why do you wait 6 months before doing the delete? 6K rows a day would would have the same effect and be much less invasive and risky.
IN ( SELECT ... )
has terrible performance, use a JOIN instead.
If some of the tables are just normalizations, why bother deleting from them?
Would it work to delete 100 ids per transaction? That would be much safer and less invasive.
First of all: Create a proper backup AND test it before you start to delete the records
The number of record you asked for is mostly depends on the configuration (hardware) of your database server. You have to test it out, how many records could be deleted on that specific server without problems. Start with e.g. 1000 records then increase the amount in each iteration while it seems to be too slow. If you have replication, the setup and the slave's performance affects the row number too (too much write requests could cause serious delay in replication).
An advice: Remove all foreign keys and indexes (except the primary key and the indexes related to the where clauses you use to perform the action) if possible before you start the delete.
Edit:
If the count of records which will be deleted is larger than the count of records which will not, consider to just copy the records into a new table, then rename the old and new tables. For the first step, copy the structure of table using the CREATE TABLE .. LIKE statement, then drop all unnecessary indexes and constraints, copy the records, add the indexes, then rename the tables. (Copy the lastest new records from the original table into the copy if necessary), then you can drop the old table.
what i believe first you have to move the data in another database then
use single Transaction to delete all 10 table which is very safe to rollback immediately and delete the data from live data base when interaction of user is very less more info

mysql - Deleting Rows from InnoDB is very slow

I got a mysql database with approx. 1 TB of data. Table fuelinjection_stroke has apprx. 1.000.000.000 rows. DBID is the primary key that is automatically incremented by one with each insert.
I am trying to delete the first 1.000.000 rows using a very simple statement:
Delete from fuelinjection_stroke where DBID < 1000000;
This query is takeing very long (>24h) on my dedicated 8core Xeon Server (32 GB Memory, SAS Storage).
Any idea whether the process can be sped up?
I believe that you table becomes locked. I've faced same problem and find out that can delete 10k records pretty fast. So you might want to write simple script/program which will delete records by chunks.
DELETE FROM fuelinjection_stroke WHERE DBID < 1000000 LIMIT 10000;
And keep executing it until it deletes everything
Are you space deprived? Is down time impossible?
If not, you could fit in a new INT column length 1 and default it to 1 for "active" (or whatever your terminology is) and 0 for "inactive". Actually, you could use 0 through 9 as 10 different states if necessary.
Adding this new column will take a looooooooong time, but once it's over, your UPDATEs should be lightning fast as long as you do it off the PRIMARY (as you do with your DELETE) and you don't index this new column.
The reason why InnoDB takes so long to DELETE on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY (or first UNIQUE it finds...or whatever it feels like if it can't find PRIMARY or UNIQUE), so when you pull out one row, it now reorders your ENTIRE table physically on the disk for speed and defragmentation. So it's not the DELETE that's taking so long. It's the physical reordering after that row is removed.
When you create a new INT column with a default value, the space will be filled, so when you UPDATE it, there's no need for physical reordering across your huge table.
I'm not sure exactly what your schema is exactly, but using a column for a row's state is much faster than DELETEing; however, it will take more space.
Try setting values:
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT (for non-windows machine)
innodb_buffer_pool_size=25GB (currently it is close to 21GB)
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=0...1000 (try different values, beginning with 200)
References:
MySQL docs for description of different variables.
MySQL Server Setting Tuning
MySQL Performance Optimization basics
http://bugs.mysql.com/bug.php?id=28382
What indexes do you have?
I think your issue is that the delete is rebuilding the index on every iteration.
I'd delete the indexes if any, do the delete, then re-add the indexes. It'll be far faster, (I think).
I was having the same problem, and my table has several indices that I didn't want to have to drop and recreate. So I did the following:
create table keepers
select * from origTable where {clause to retrieve rows to preserve};
truncate table origTable;
insert into origTable null,keepers.col2,...keepers.col(last) from keepers;
drop table keepers;
About 2.2 million rows were processed in about 3 minutes.
Your database may be checking for records that need to be modified in a foreign key (cascades, delete).
But I-Conica answer is a good point(+1). The process of deleting a single record and updating a lot of indexes during done 100000 times is inefficient. Just drop the index, delete all records and create it again.
And of course, check if there is any kind of lock in the database. One user or application can lock a record or table and your query will be waiting until the user release the resource or it reachs a timeout. One way to check if your database is doing real work or just waiting is lauch the query from a connection that sets the --innodb_lock_wait_timeout parameter to a few seconds. If it fails at least you know that the query is OK and that you need to find and realease that lock. Examples of locks are Select * from XXX For update and uncommited transactions.
For such long tables, I'd rather use MYISAM, specially if there is not a lot of transactions needed.
I don't know exact ans for ur que. But writing another way to delete those rows, pls try this.
delete from fuelinjection_stroke where DBID in
(
select top 1000000 DBID from fuelinjection_stroke
order by DBID asc
)

Adding Index to 3 million rows MySQL

I need to add at least 1 index to a column of type int(1) on an InnoDB table. There are about 3 million rows that it would need to index. This is a database on my production server, and it is in use by thousands of people everyday. I tried to add an index the standard way, but it was taking up too much time (I let it run for about 7 minutes before killing the process) and locking rows, meaning a frozen application for many users.
My VPS that runs all of this has 512mb of RAM and has an Intel Xeon E5504 processor.
How can I add an index to this production database without interrupting my user's experience?
Unless the table either reads XOR writes then you'll probably need to take down the site. Lock the databases, run the operation and wait.
If the table is a write only swap the writes to a temporary table and run the operation on the old table, then swap the writes back to the old table and insert the data from the temporary table.
If the table is read only, duplicate the table and run the operation on the copy.
If the table is a read/write then a messy alternative that might work, is to create a new table with the indexes and set the primary key start point to the next value in the original table, add a join to your read requests to select from both tables, but write exclusively to the new table. Then write a script that inserts from the old table to the new then deletes the row in the old table. It'll take far, far longer than the downtime, and plenty can go wrong, but it should be do-able.
you can set the start point of a primary key with
ALTER TABLE `my_table` AUTO_INCREMENT = X;
hope that helps.
take a look at pt-online-schema-change. i think this tool can be quite useful in your case. it will obviously put additional load on your database server but should not block access to the table for most of the operation time.

Generating a massive 150M-row MySQL table

I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on simple blank table (4 integer columns with 1 primary key). Setup as a MEMORY table, the entire task completes in 8 hours. After finishing, about 150 million rows exist in the table. Eight hours is a completely-decent number for me. This is a one-time deal.
The problem comes when trying to convert the MEMORY table back into MyISAM so that (A) I'll have the memory freed up for other processes and (B) the data won't be killed when I restart the computer.
ALTER TABLE memtable ENGINE = MyISAM
I've let this ALTER TABLE query run for over two days now, and it's not done. I've now killed it.
If I create the table initially as MyISAM, the write speed seems terribly poor (especially due to the fact that the query requires the use of the ON DUPLICATE KEY UPDATE technique). I can't temporarily turn off the keys. The table would become over 1000 times larger if I were to and then I'd have to reprocess the keys and essentially run a GROUP BY on 150,000,000,000 rows. Umm, no.
One of the key constraints to realize: The INSERT query UPDATEs records if the primary key (a hash) exists in the table already.
At the very beginning of an attempt at strictly using MyISAM, I'm getting a rough speed of 1,250 rows per second. Once the index grows, I imagine this rate will tank even more.
I have 16GB of memory installed in the machine. What's the best way to generate a massive table that ultimately ends up as an on-disk, indexed MyISAM table?
Clarification: There are many, many UPDATEs going on from the query (INSERT ... ON DUPLICATE KEY UPDATE val=val+whatever). This isn't, by any means, a raw dump problem. My reasoning for trying a MEMORY table in the first place was for speeding-up all the index lookups and table-changes that occur for every INSERT.
If you intend to make it a MyISAM table, why are you creating it in memory in the first place? If it's only for speed, I think the conversion to a MyISAM table is going to negate any speed improvement you get by creating it in memory to start with.
You say inserting directly into an "on disk" table is too slow (though I'm not sure how you're deciding it is when your current method is taking days), you may be able to turn off or remove the uniqueness constraints and then use a DELETE query later to re-establish uniqueness, then re-enable/add the constraints. I have used this technique when importing into an INNODB table in the past, and found even with the later delete it was overall much faster.
Another option might be to create a CSV file instead of the INSERT statements, and either load it into the table using LOAD DATA INFILE (I believe that is faster then the inserts, but I can't find a reference at present) or by using it directly via the CSV storage engine, depending on your needs.
Sorry to keep throwing comments at you (last one, probably).
I just found this article which provides an example of a converting a large table from MyISAM to InnoDB, while this isn't what you are doing, he uses an intermediate Memory table and describes going from memory to InnoDB in an efficient way - Ordering the table in memory the way that InnoDB expects it to be ordered in the end. If you aren't tied to MyISAM it might be worth a look since you already have a "correct" memory table built.
I don't use mysql but use SQL server and this is the process I use to handle a file of similar size. First I dump the file into a staging table that has no constraints. Then I identify and delete the dups from the staging table. Then I search for existing records that might match and put the idfield into a column in the staging table. Then I update where the id field column is not null and insert where it is null. One of the reasons I do all the work of getting rid of the dups in the staging table is that it means less impact on the prod table when I run it and thus it is faster in the end. My whole process runs in less than an hour (and actually does much more than I describe as I also have to denormalize and clean the data) and affects production tables for less than 15 minutes of that time. I don't have to wrorry about adjusting any constraints or dropping indexes or any of that since I do most of my processing before I hit the prod table.
Consider if a simliar process might work better for you. Also could you use some sort of bulk import to get the raw data into the staging table (I pull the 22 gig file I have into staging in around 16 minutes) instead of working row-by-row?

MYSQL mass deletion FROM 80 tables

I have 50GB mysql database (80 tables) that I need to delete some contents from it.
I have a reference table that contains list if product ids that needs to be deleted from the the other tables.
Now, the other tables can be 2 GB each, contains the items that needs to be deleted.
My question is: since it is not a small database, what is the safest way to delete
the data in one shot in order to avoid problems?
What is the best method to verify the the entire data was deleted?
Probably this doesn't help anymore. But you should keep this in mind when creating the database. In mysql (depending on the table storage type, for instance in InnoDB) you can specify relations (They are called foreign key constraints). These relations mean that if you delete an entry from one row (for instance products) you can automatically update or delete entries in other tables that have that row as foreign key (such as product_storage). These relations guard that you have a 100% consistent state. However these relations might be hard to add on hindsight. If you plan to do this more often, it is definitely worth researching if you can add these to your database, they will save you a lot of work (all kinds of queries become simpler)
Without these relations you can't be 100% sure. So you'd have to go over all the tables, not which columns you want to check on and write a bunch of sql queries to make sure there are no entries left.
As Thirler has pointed out, it would be nice if you had foreign keys. Without them burnall 's solution can be used to transactions to ensure that no inconsistencies creep.
Regardless of how you do it, this could take a long time, even hours so please be prepared for that.
As pointed out earlier foreign keys would be nice in this place. But regarding question 1 you could perhaps run the changes within a transaction from the MySQL prompt. This assumes you are using a transaction safe storage engine like InnoDB. You can convert from myisam to InnoDB if you need to. Anyway something like this:
START TRANSACTION;
...Perform changes...
...Control changes...
COMMIT;
...or...
ROLLBACK;
Is it acceptable to have any downtime?
When working with PostgreSQL with databases >250Gb we use this technique on production servers in order to perform database changes. If the outcome isn't as expected we just rollback the transaction. Of course there is a penalty as the I/O-system has to work a bit.
// John
I am agree with Thirler that using of foreign keys is preferrable. It guarantees referential integrity and consisitency of the whole database.
I can believe that life sometimes requires more tricky logic.
So you could use manual queries like
delete from a where id in (select id from keys)
You could delete all records at once or by range of keys or using LIMIT in DELETE. Proper index is a must.
To verify consistency you need function or query. For example:
create function check_consistency() returns boolean
begin
return not exists(select * from child where id not in (select id from parent) )
and not exists(select * from child2 where id not in (select id from parent) );
-- and so on
end
Also maybe something to look into is Partitioning in MySQL tables. For more information check out the ref manual:
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Comes down that you can divide tables (for example) in different partitions per datetime values or indexsets.