I had a table with 100,000,000 records and 500GB of data. I have been backing up a lot of the older records into a backup DB and deleting them from main DB. However the disk space hasn't reduced, and I noticed the data_free has grown a lot for that table.
My understanding is I need to run OPTIMIZE TABLE to reduce the disk size, however I have read this causes replication lag. I am using mysql 5.7 InnoDB.
So my question is, can I run OPTIMIZE TABLE without causing replication lag? For example running OPTIMIZE TABLE on master such as:
OPTIMIZE NO_WRITE_TO_BINLOG TABLE tblname;
Then run the same command on the slaves one by one. Would that work? Are there some risks in doing that? Or is there any other way?
At my company we use Percona's free tool pt-online-schema-change.
It doesn't literally do an OPTIMIZE TABLE, but for InnoDB tables, any table-copy operation will accomplish the same result. That is, it makes a new InnoDB tablespace, copies all the rows to that tablespace, and rebuilds all the indexes for that table. The new tablespace will be a defragmented version of the original tablespace.
Any alter will work, you don't have to change anything in the table. I use the no-op ALTER TABLE <name> FORCE.
The advantage of pt-online-schema-change is that while it's working, you can continue to read and write the table. It only needs a brief metadata lock to create triggers as it starts, and another brief metadata lock at the end to swap the new table for the old.
If you use OPTIMIZE TABLE, this causes long replication lag, because it won't start running on the replica until after it's finished on the source.
Whereas with pt-online-schema-change, it starts running the table-copy immediately, and this continues along with other concurrent transactions, and when it's done on the source, it's only a moment until it's also done on the replica.
It actually takes longer than OPTIMIZE TABLE, but since it doesn't prevent you from using the table, that doesn't matter as much.
I ended up making the tests on my local by setting up a replication environment.
It seems possible to run OPTIMIZE TABLE tblname; without causing any downtime or replication lag.
You need to run OPTIMIZE NO_WRITE_TO_BINLOG TABLE tblname; on master, to avoid writing to the bin logs and replicating the query to the slaves.
Then you have to run OPTIMIZE TABLE tblname; individually in every slave.
Here is more detailed explanation of what happens: https://dev.mysql.com/doc/refman/5.7/en/optimize-table.html#optimize-table-innodb-details
It says:
an exclusive table lock is only taken briefly during the prepare phase
and the commit phase of the operation.
So there is almost no lock time.
There are edge cases to worry about that could cause downtime (due to table lock caused by copy method over online DDL), some of those are listed in the link above.
Another thing to consider is disk space. With InnoDB I observed it recreates the table. So if the contents of your table add up to 100GB, you would need at least an extra 100GB of free space to run the command successfully.
As Bill suggested it may be a safer alternative to use the pt-online-schema-change, however if you cant use it, with careful operation seems no replication lag and no downtime is possible.
Related
I ran the analyze table command on production mysql db without knowing it would prevent me from selecting the contents of the table. This caused production site to go down :( How long can it take for the lock to release? Also, would recreating the db from a backup solve the problem / get rid of the locks?
Please let me know.
Thanks.
ANALYZE TABLE waits to acquire a metadata lock. While it's waiting, any SQL query against the table waits for ANALYZE TABLE.
ANALYZE TABLE is normally pretty quick, i.e. 1-3 seconds. But that quick operation doesn't start until it can acquire the metadata lock.
It can't acquire the metadata lock while you have long-running transactions going against the table. So if you want this to run faster, finish your transactions.
See my answer to MySQL failing to ALTER TABLE which is being actively written to for more information.
ANALYZE TABLE quite clearly says 'During the analysis, the table is locked with a read lock for InnoDB and MyISAM'.
You can KILL {connection number} in SQL to stop the command.
Note: you probably should update to a more recent version of MySQL-5.6.
I've recently been thrust into the position of db admin for our server so I'm having to learn as I go. We recently found that one of our tables had maxed out the id column and needs to be migrated to bigint.
This is for an INNODB table with roughly roughly 301GB of data. We are running mysql version 5.5.38. The command I'm running to migrate the table is
ALTER TABLE tb_name CHANGE id id BIGINT NOT NULL;
I kicked off the migration and we are now 18 hours into the migration, but I'm not seeing our disk space on the server change at all which makes me think nothing is happening. We have plenty of memory so no concern there, but it still shows the following message state when I run "show processlist;"
copy to tmp table
Does anyone have any ideas or know what I'm doing incorrectly? Please ask if you need more information.
Yes, it will take a looooong time. The disks are probably spinning as fast as they can. (SSDs employ faster hamsters.)
You can kill the ALTER, since all it is doing is, as it says, "copying to tmp table", after which it will rename the tmp table to be the real table and drop the old copy.
I hope you had innodb_file_per_table = ON when you started the ALTER. Else it will be expanding ibdata1, which won't shrink afterwards.
pt-online-schema-change is an alternative. It will still take a loooooong time (with one extra 'o' because it will be slightly slower). It will do the job without blocking other activity.
This might have been a good time to check all the columns and indexes in the table:
Could some INTs be turned into MEDIUMINT or something smaller?
Are some of the INDEXes unused?
How about normalizing some of the VARCHARs?
Maybe even PARTITIONing (but not without a good reason)? Time-series is a typical use for Data Warehousing.
Summarize the data, and toss at least the older data?
If you would like further guidance, please provide SHOW CREATE TABLE.
We have an update process which currently takes over an hour and means that our DB is unusable during this period.
If I setup up replication would this solve the problem or would the replicated DB suffer from exactly the same problem that the tables would be locked during the update?
Is it possible to have the replicated DB prioritize reading over updating?
Thanks,
D
I suspect that with replication you're just going to be dupolicating the issue (unless most of the time is spent in CPU and only results in a couple of records being updated).
Without knowing a lot more about the scema, distribution and size of data and the update process its impossible to say how best to resolve the problem - but you might get some mileage out of using innodb instead of C-ISAM and making sure that the update is implemented as a number of discrete steps (e.g. using stored procuedures) rather than a single DML statement.
MySQL gives you the ability to run queries delaye. Example: "INSERT DELAYED INTO...", this will cause the query to only be executed when MYSQL has time to take the query.
Based on your input, it sounds like you are using MyISAM tables, MyISAM only support table-wide locking. That means that a single update will lock the whole database table until the query is completed. InnoDB on the other hand uses row locking, which will not cause SELECT queries to wait(hang) for updates to complete.
So you have the best chances of a better sysadmin life if you change to InnoDB :)
When it comes to replication it is pretty normal to seperate updates and selects to two different MySQL servers, and that does tend to work very well. But if you are using MyISAM tables and does a lot of updates, the locking issue itself will still be there.
So my 2 cents: First get rid of MyISAM, then consider replication or a better scaled MySQL server if the problem still exists. (The key for good performance in MySQL is to have at least the size of all indexes across all databases as physical RAM)
We have a series of tables that have grown organically to several million rows, in production doing an insert or update can take up to two seconds. However if I dump the table and recreate it from the dump queries are lightning fast.
We have rebuilt one of the tables by creating a copy rebuilding the indexes and then doing a rename switch and copying over any new rows, this worked because that table is only ever appended to. Doing this made the inserts and updates lightning quick.
My questions:
Why do inserts get slow over time?
Why does recreating the table and doing an import fix this?
Is there any way that I can rebuild indexes without locking a table for updates?
It sounds like it's either
Index unbalancing over time
Disk fragmentation
Internal innodb datafile(s) fragmentation
You could try analyze table foo which doesn't take locks, just a few index dives and takes a few seconds.
If this doesn't fix it, you can use
mysql> SET PROFILING=1;
mysql> INSERT INTO foo ($testdata);
mysql> show profile for QUERY 1;
and you should see where most of the time is spent.
Apparently innodb performs better when inserts are done in PK order, is this your case?
InnoDB performance is heavily dependent on RAM. If the indexes don't fit in RAM, performance can drop considerably and quickly. Rebuild the whole table improves performance because the data and indexes are now optimized.
If you are only ever inserting into the table, MyISAM is better suited for that. You won't have locking issues if only appending, since the record is added to the end of the file. MyISAM will also allow you to use MERGE tables, which are really nice for taking parts of the data offline or archiving without having to do exports and/or deletes.
Updating a table requires indices to be rebuilt. If you are doing bulk inserts, try to do them in one transaction (as the dump and restore does). If the table is write-biased I would think about dropping the indices anyway or let a background job do read-processing of the table (eg by copying it to an indexed one).
track down the in use my.ini and increase the key_buffer_size I had a 1.5GB table with a large key where the Queries per second (all writes) were down to 17. I found it strange that the in the administration panel (while the table was locked for writing to speed up the process) it was doing 200 InnoDB reads per second to 24 writes per second.
It was forced to read the index table off disk. I changed the key_buffer_size from 8M to 128M and the performance jumped to 150 queries per second completed and only had to perform 61 reads to get 240 writes. (after restart)
Could it be due to fragmentation of XFS?
Copy/pasted from http://stevesubuntutweaks.blogspot.com/2010/07/should-you-use-xfs-file-system.html :
To check the fragmentation level of a drive, for example located at /dev/sda6:
sudo xfs_db -c frag -r /dev/sda6
The result will look something like so:
actual 51270, ideal 174, fragmentation factor 99.66%
That is an actual result I got from the first time I installed these utilities, previously having no knowledge of XFS maintenance. Pretty nasty. Basically, the 174 files on the partition were spread over 51270 separate pieces. To defragment, run the following command:
sudo xfs_fsr -v /dev/sda6
Let it run for a while. the -v option lets it show the progress. After it finishes, try checking the fragmentation level again:
sudo xfs_db -c frag -r /dev/sda6
actual 176, ideal 174, fragmentation factor 1.14%
Much better!
Hi I am using Mysql 5.0.x
I have just changed a lot of the tables from MyISAM to InnoDB
With the MyISAM tables it took about 1 minute to install our database
With the InnoDB it takes about 15 minute to install the same database
Why does the InnoDB take so long?
What can I do to speed things up?
The Database install does the following steps
1) Drops the schema
2) Create the schema
3) Create tables
4) Create stored procedures
5) Insert default data
6) Insert data via stored procedure
EDIT:
The Inserting of default data takes most of the time
Modify the Insert Data step to start a transaction at the start and to commit it at the end. You will get an improvement, I guarantee it. (If you have a lot of data, you might want to break the transaction up to per table.)
If you application does not use transactions at all, then you should set the paramater innodb_flush_log_at_trx_commit to 2. This will give you a lot of performance back because you will almost certainly have auto_commit enabled and this generates a lot more transactions than InnoDB's default parameters are configured for. This setting stops it unnecessarily flushing the disk buffers on every commit.
15 minutes doesn't seem excessive to me. After all, it's a one-time cost.
I'm not certain, but I would imagine that part of the explanation is the referential integrity isn't free. InnoDB has to do more work to guarantee it, so of course it would take up more time.
Maybe your script needs to be altered to add constraints after the tables are created.
Like duffymo said, disable your constraints(indexes and foreing/primary keys) before inserting the data.
Maybe you should restore some indexes before the data inserted via stored procedure, if its use a lot of select statements