I'm trying to convert a 400 million row Innodb table to the tokudb engine. When I start with "alter table ... engine=TokuDB" things run really fast in the beginning, (Using SHOW PROCESSLIST) I see it reading in about 1 million rows every 10 seconds. But once I reach about 19-20 million rows, it starts to slow reading and is more like 10k rows every few seconds.
Are there any mysql or tokudb variables that affect the speed of which an ALTER TABLE to tokudb works? I tried the tmp_table_size and some others but can't seem to get past that hurdle.
Any ideas?
The solution to this for me was to export "into outfile" and import "load data infile"
This was several orders of magnitude faster for me (110 million records). Everytime I modify a large tokudb database (alter table) it takes forever (~30k/sec).
It has been quicker to full export and import (~500k/sec)
Dropped alter table times from hours to minutes.
This is true when either converting from innodb or altering native tokudb (any alter table).
select a.*,calcfields from table1 a into outfile 'temp.txt';
create table table2 .....<br>
load data infile 'temp.txt' into table table2 (field1,field2,...);
ps: experiment with the create table with row_format=tokudb_lzma or tokudb_uncompressed). You can try 3 ways pretty quick (you need to do an OS level directory ls to see size). I find offline indexes help too.
set tokudb_create_index_online=off;
create clustering index field1 on table2(field1); (much faster)
Multiple Clustering indexes can make a world of difference when you learn when to use them.
I was using GUI tools that alter table for index changes (waiting hours each time)
Hand doing this make things far more productive (I had spent days going nowhere via GUI, to done in 30 min)
Using 5.5.30-tokudb-7.0.1-MariaDB and VERY HAPPY.
Hopefully this can help others when experimenting. Obviously very late for original asker.
The only existing response was not constructive at all for me. (The question was)
Here are the important variables, make sure they are set globally prior to starting the operation or locally within the session executing the storage engine change:
tokudb_load_save_space : default is off and should be left alone unless you are low on disk space.
tokudb_cache_size : if unset the TokuDB will allocate 50% of RAM for it's own caching mechanism, we generally recommend leaving this setting alone. As you are running on an existing server you need to make sure that you aren't over-committing memory between TokuDB, InnoDB, and MyISAM.
Related
sometimes, I have to re-import data for a project, thus reading about 3.6 million rows into a MySQL table (currently InnoDB, but I am actually not really limited to this engine). "Load data infile..." has proved to be the fastest solution, however it has a tradeoff:
- when importing without keys, the import itself takes about 45 seconds, but the key creation takes ages (already running for 20 minutes...).
- doing import with keys on the table makes the import much slower
There are keys over 3 fields of the table, referencing numeric fields.
Is there any way to accelerate this?
Another issue is: when I terminate the process which has started a slow query, it continues running on the database. Is there any way to terminate the query without restarting mysqld?
Thanks a lot
DBa
if you're using innodb and bulk loading here are a few tips:
sort your csv file into the primary key order of the target table : remember innodb uses
clustered primary keys so it will load faster if it's sorted !
typical load data infile i use:
truncate <table>;
set autocommit = 0;
load data infile <path> into table <table>...
commit;
other optimisations you can use to boost load times:
set unique_checks = 0;
set foreign_key_checks = 0;
set sql_log_bin=0;
split the csv file into smaller chunks
typical import stats i have observed during bulk loads:
3.5 - 6.5 million rows imported per min
210 - 400 million rows per hour
This blog post is almost 3 years old, but it's still relevant and has some good suggestions for optimizing the performance of "LOAD DATA INFILE":
http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/
InnoDB is a pretty good engine. However, it highly relies on being 'tuned'. One thing is that if your inserts are not in the order of increasing primary keys, innoDB can take a bit longer than MyISAM. This can easily be overcome by setting a higher innodb_buffer_pool_size. My suggestion is to set it at 60-70% of your total RAM on a dedicated MySQL machine.
My client has a store on Woocommerce with 1.2 Gb database. I know that similar store (count by product ) should have approx 700Mb.
The biggest table is wp_posts (760Mb) alone ! Which, I think is strange. Usually biggest table is wp_postmete or wp_options.
I tried optimize this database by plugins: WP-Sweep and wp-optimize so there is no revisions and draft left.
I also tried SQL:
OPTIMIZE TABLE
but it is innoDB so it do not support it. I get this message:
Table does not support optimize,doing recreate + analyze instead
So it is done? I mean:”recreate + analyze” or I should do it? And how?
I read that in innoDB i should dump table and restore but when I do this by DBeaver - i get same size.
Any Idea what should I do?
The error message is a bit misleading, because it dates back to the days when MyISAM was the default storage engine, and OPTIMIZE TABLE does a few things in MyISAM that are different from what it does in InnoDB. For example, MyISAM can't reclaim space from deleted rows until you do OPTIMIZE TABLE (whereas InnoDB does reclaim space dynamically).
InnoDB does support OPTIMIZE TABLE and it does useful things. It does basically the same as an ALTER TABLE when using the COPY algorithm. That is, it creates a new file, and copies the data row by row into the new file. This accomplishes defragmentation and rebuilding the indexes, just as if you had done a dump and restore. So you don't need to dump and restore.
After OPTIMIZE TABLE, the InnoDB table may be close to the same size it was before, if there was little fragmentation.
Frankly, a table 1.2GB in size is not so large by the standards of most MySQL projects I've worked on. We start to get concerned if a table is larger than 500GB, and we start alerting developers if the table is larger than 800GB, or larger than the remaining free disk space.
sometimes, I have to re-import data for a project, thus reading about 3.6 million rows into a MySQL table (currently InnoDB, but I am actually not really limited to this engine). "Load data infile..." has proved to be the fastest solution, however it has a tradeoff:
- when importing without keys, the import itself takes about 45 seconds, but the key creation takes ages (already running for 20 minutes...).
- doing import with keys on the table makes the import much slower
There are keys over 3 fields of the table, referencing numeric fields.
Is there any way to accelerate this?
Another issue is: when I terminate the process which has started a slow query, it continues running on the database. Is there any way to terminate the query without restarting mysqld?
Thanks a lot
DBa
if you're using innodb and bulk loading here are a few tips:
sort your csv file into the primary key order of the target table : remember innodb uses
clustered primary keys so it will load faster if it's sorted !
typical load data infile i use:
truncate <table>;
set autocommit = 0;
load data infile <path> into table <table>...
commit;
other optimisations you can use to boost load times:
set unique_checks = 0;
set foreign_key_checks = 0;
set sql_log_bin=0;
split the csv file into smaller chunks
typical import stats i have observed during bulk loads:
3.5 - 6.5 million rows imported per min
210 - 400 million rows per hour
This blog post is almost 3 years old, but it's still relevant and has some good suggestions for optimizing the performance of "LOAD DATA INFILE":
http://www.mysqlperformanceblog.com/2007/05/24/predicting-how-long-data-load-would-take/
InnoDB is a pretty good engine. However, it highly relies on being 'tuned'. One thing is that if your inserts are not in the order of increasing primary keys, innoDB can take a bit longer than MyISAM. This can easily be overcome by setting a higher innodb_buffer_pool_size. My suggestion is to set it at 60-70% of your total RAM on a dedicated MySQL machine.
We have a series of tables that have grown organically to several million rows, in production doing an insert or update can take up to two seconds. However if I dump the table and recreate it from the dump queries are lightning fast.
We have rebuilt one of the tables by creating a copy rebuilding the indexes and then doing a rename switch and copying over any new rows, this worked because that table is only ever appended to. Doing this made the inserts and updates lightning quick.
My questions:
Why do inserts get slow over time?
Why does recreating the table and doing an import fix this?
Is there any way that I can rebuild indexes without locking a table for updates?
It sounds like it's either
Index unbalancing over time
Disk fragmentation
Internal innodb datafile(s) fragmentation
You could try analyze table foo which doesn't take locks, just a few index dives and takes a few seconds.
If this doesn't fix it, you can use
mysql> SET PROFILING=1;
mysql> INSERT INTO foo ($testdata);
mysql> show profile for QUERY 1;
and you should see where most of the time is spent.
Apparently innodb performs better when inserts are done in PK order, is this your case?
InnoDB performance is heavily dependent on RAM. If the indexes don't fit in RAM, performance can drop considerably and quickly. Rebuild the whole table improves performance because the data and indexes are now optimized.
If you are only ever inserting into the table, MyISAM is better suited for that. You won't have locking issues if only appending, since the record is added to the end of the file. MyISAM will also allow you to use MERGE tables, which are really nice for taking parts of the data offline or archiving without having to do exports and/or deletes.
Updating a table requires indices to be rebuilt. If you are doing bulk inserts, try to do them in one transaction (as the dump and restore does). If the table is write-biased I would think about dropping the indices anyway or let a background job do read-processing of the table (eg by copying it to an indexed one).
track down the in use my.ini and increase the key_buffer_size I had a 1.5GB table with a large key where the Queries per second (all writes) were down to 17. I found it strange that the in the administration panel (while the table was locked for writing to speed up the process) it was doing 200 InnoDB reads per second to 24 writes per second.
It was forced to read the index table off disk. I changed the key_buffer_size from 8M to 128M and the performance jumped to 150 queries per second completed and only had to perform 61 reads to get 240 writes. (after restart)
Could it be due to fragmentation of XFS?
Copy/pasted from http://stevesubuntutweaks.blogspot.com/2010/07/should-you-use-xfs-file-system.html :
To check the fragmentation level of a drive, for example located at /dev/sda6:
sudo xfs_db -c frag -r /dev/sda6
The result will look something like so:
actual 51270, ideal 174, fragmentation factor 99.66%
That is an actual result I got from the first time I installed these utilities, previously having no knowledge of XFS maintenance. Pretty nasty. Basically, the 174 files on the partition were spread over 51270 separate pieces. To defragment, run the following command:
sudo xfs_fsr -v /dev/sda6
Let it run for a while. the -v option lets it show the progress. After it finishes, try checking the fragmentation level again:
sudo xfs_db -c frag -r /dev/sda6
actual 176, ideal 174, fragmentation factor 1.14%
Much better!
Hi I am using Mysql 5.0.x
I have just changed a lot of the tables from MyISAM to InnoDB
With the MyISAM tables it took about 1 minute to install our database
With the InnoDB it takes about 15 minute to install the same database
Why does the InnoDB take so long?
What can I do to speed things up?
The Database install does the following steps
1) Drops the schema
2) Create the schema
3) Create tables
4) Create stored procedures
5) Insert default data
6) Insert data via stored procedure
EDIT:
The Inserting of default data takes most of the time
Modify the Insert Data step to start a transaction at the start and to commit it at the end. You will get an improvement, I guarantee it. (If you have a lot of data, you might want to break the transaction up to per table.)
If you application does not use transactions at all, then you should set the paramater innodb_flush_log_at_trx_commit to 2. This will give you a lot of performance back because you will almost certainly have auto_commit enabled and this generates a lot more transactions than InnoDB's default parameters are configured for. This setting stops it unnecessarily flushing the disk buffers on every commit.
15 minutes doesn't seem excessive to me. After all, it's a one-time cost.
I'm not certain, but I would imagine that part of the explanation is the referential integrity isn't free. InnoDB has to do more work to guarantee it, so of course it would take up more time.
Maybe your script needs to be altered to add constraints after the tables are created.
Like duffymo said, disable your constraints(indexes and foreing/primary keys) before inserting the data.
Maybe you should restore some indexes before the data inserted via stored procedure, if its use a lot of select statements