MySQL change engine big millions rows table - mysql

I want to change engine of 2 million rows table from MyISAM to InnoDB. I am afraid of this long time operation, so I created similar structure InnoDB table and now I want to copy all data from old one to this new one. What is the fastest way? SELECT INSERT? What about START TRANSACTION? Please, help. I dont want to hang my server.

Do yourself a favor: copy the whole setup to your local machine and try it all out there. You'll have a much better idea of what you are getting into. Just be aware of potential differences in hardware between your production server and your local machine.
The fastest way is probably the most straightforward way:
INSERT INTO table2 SELECT * FROM table1;

I suspect that you cannot do it any faster than what is built into the ALTER. And it does have to copy over all the data and rebuild all the indexes.
Be sure to have innodb_buffer_pool_size raised to prepare for InnoDB. And lower key_buffer_size to allow room. Suggest 35% and 12% of RAM, respectively, for the transition. After all tables are converted, suggest 70% and a mere 20MB.
One slight speedup is to do some select that fetches the entire table and the entire PRIMARY KEY (if it can be cached). This will do some I/O before really starting. Example: SELECT avg(id) FROM tbl where id is the primary key. And SELECT avg(foo) FROM tbl where foo is not indexed but it numeric. These will force a full scan of the PK index and the data, thereby caching the stuff that the ALTER will have to read.
Other tips on converting: http://mysql.rjweb.org/doc.php/myisam2innodb .

Related

Create a table both in-memory and transaction-safe in MySQL

I know I should use engine=MEMORY to make the table in memory and engine=INNODB to make the table transaction safe. However, how can I achieve both objectives? I tried engine=MEMORY, INNODB, but I failed. My purpose is to access tables fast and allow multiple threads to change contents of tables.
You haven't stated your goals above. I guess you're looking for good performance, and you also seem to want the table to be transactional. Your only option really is InnoDB. As long as you have configured InnoDB to use enough memory to hold your entire table (with innodb_buffer_pool_size), and there is not excessive pressure from other InnoDB tables on the same server, the data will remain in memory. If you're concerned about write performance (and again barring other uses of the same system) you can reduce durability to drastically increase write performance by setting innodb_flush_log_at_trx_commit = 0 and disabling binary logging.
Using any sort of triggers with temporary tables will be a mess to maintain, and won't give you any benefits of transactionality on the temporary tables.
You are asking for a way to create the table with 2 (or more) engines, that is not possible with mysql.
However, I will guess that you want to use memory because you don't think innodb will be fast enough for your need. I think innodb is pretty fast and will be probably enough, but if you really need it, I think you should try creating 2 tables:
table1 memory <-- here is where you will make all the SELECTs
table2 innodb <-- here you will make the UPDATE, INSERT, DELETE, etc and add a TRIGGER so when this one is updated, the table1 gets the same modification.
as i know the there are two ways
1st way
create a temp table as ( these are stored in memory with a small diff they will get deleted as the session is logged out )
create temporary table sample(id int) engine=Innodb;
2nd way
you have to create two tables one with memory engine and other with innodb or bdb
first insert all the data into your innodb table and then trigger the data to be copied into memory table
and if you want to empty the data in the innodb table you can do it with same trigger
you can achieve this using events also

Speed up tokudb "alter table ... engine=TokuDB”

I'm trying to convert a 400 million row Innodb table to the tokudb engine. When I start with "alter table ... engine=TokuDB" things run really fast in the beginning, (Using SHOW PROCESSLIST) I see it reading in about 1 million rows every 10 seconds. But once I reach about 19-20 million rows, it starts to slow reading and is more like 10k rows every few seconds.
Are there any mysql or tokudb variables that affect the speed of which an ALTER TABLE to tokudb works? I tried the tmp_table_size and some others but can't seem to get past that hurdle.
Any ideas?
The solution to this for me was to export "into outfile" and import "load data infile"
This was several orders of magnitude faster for me (110 million records). Everytime I modify a large tokudb database (alter table) it takes forever (~30k/sec).
It has been quicker to full export and import (~500k/sec)
Dropped alter table times from hours to minutes.
This is true when either converting from innodb or altering native tokudb (any alter table).
select a.*,calcfields from table1 a into outfile 'temp.txt';
create table table2 .....<br>
load data infile 'temp.txt' into table table2 (field1,field2,...);
ps: experiment with the create table with row_format=tokudb_lzma or tokudb_uncompressed). You can try 3 ways pretty quick (you need to do an OS level directory ls to see size). I find offline indexes help too.
set tokudb_create_index_online=off;
create clustering index field1 on table2(field1); (much faster)
Multiple Clustering indexes can make a world of difference when you learn when to use them.
I was using GUI tools that alter table for index changes (waiting hours each time)
Hand doing this make things far more productive (I had spent days going nowhere via GUI, to done in 30 min)
Using 5.5.30-tokudb-7.0.1-MariaDB and VERY HAPPY.
Hopefully this can help others when experimenting. Obviously very late for original asker.
The only existing response was not constructive at all for me. (The question was)
Here are the important variables, make sure they are set globally prior to starting the operation or locally within the session executing the storage engine change:
tokudb_load_save_space : default is off and should be left alone unless you are low on disk space.
tokudb_cache_size : if unset the TokuDB will allocate 50% of RAM for it's own caching mechanism, we generally recommend leaving this setting alone. As you are running on an existing server you need to make sure that you aren't over-committing memory between TokuDB, InnoDB, and MyISAM.

Inserting New Column in MYSQL taking too long

We have a huge database and inserting a new column is taking too long. Anyway to speed up things?
Unfortunately, there's probably not much you can do. When inserting a new column, MySQL makes a copy of the table and inserts the new data there. You may find it faster to do
CREATE TABLE new_table LIKE old_table;
ALTER TABLE new_table ADD COLUMN (column definition);
INSERT INTO new_table(old columns) SELECT * FROM old_table;
RENAME table old_table TO tmp, new_table TO old_table;
DROP TABLE tmp;
This hasn't been my experience, but I've heard others have had success. You could also try disabling indices on new_table before the insert and re-enabling later. Note that in this case, you need to be careful not to lose any data which may be inserted into old_table during the transition.
Alternatively, if your concern is impacting users during the change, check out pt-online-schema-change which makes clever use of triggers to execute ALTER TABLE statements while keeping the table being modified available. (Note that this won't speed up the process however.)
There are four main things that you can do to make this faster:
If using innodb_file_per_table the original table may be highly fragmented in the filesystem, so you can try defragmenting it first.
Make the buffer pool as big as sensible, so more of the data, particularly the secondary indexes, fits in it.
Make innodb_io_capacity high enough, perhaps higher than usual, so that insert buffer merging and flushing of modified pages will happen more quickly. Requires MySQL 5.1 with InnoDB plugin or 5.5 and later.
MySQL 5.1 with InnoDB plugin and MySQL 5.5 and later support fast alter table. One of the things that makes a lot faster is adding or rebuilding indexes that are both not unique and not in a foreign key. So you can do this:
A. ALTER TABLE ADD your column, DROP your non-unique indexes that aren't in FKs.
B. ALTER TABLE ADD back your non-unique, non-FK indexes.
This should provide these benefits:
a. Less use of the buffer pool during step A because the buffer pool will only need to hold some of the indexes, the ones that are unique or in FKs. Indexes are randomly updated during this step so performance becomes much worse if they don't fully fit in the buffer pool. So more chance of your rebuild staying fast.
b. The fast alter table rebuilds the index by sorting the entries then building the index. This is faster and also produces an index with a higher page fill factor, so it'll be smaller and faster to start with.
The main disadvantage is that this is in two steps and after the first one you won't have some indexes that may be required for good performance. If that is a problem you can try the copy to a new table approach, using just the unique and FK indexes at first for the new table, then adding the non-unique ones later.
It's only in MySQL 5.6 but the feature request in http://bugs.mysql.com/bug.php?id=59214 increases the speed with which insert buffer changes are flushed to disk and limits how much space it can take in the buffer pool. This can be a performance limit for big jobs. the insert buffer is used to cache changes to secondary index pages.
We know that this is still frustratingly slow sometimes and that a true online alter table is very highly desirable
This is my personal opinion. For an official Oracle view, contact an Oracle public relations person.
James Day, MySQL Senior Principal Support Engineer, Oracle
usually new line insert means that there are many indexes.. so I would suggest reconsidering indexing.
Michael's solution may speed things up a bit, but perhaps you should have a look at the database and try to break the big table into smaller ones. Take a look at this: link. Normalizing your database tables may save you loads of time in the future.

Generating a massive 150M-row MySQL table

I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on simple blank table (4 integer columns with 1 primary key). Setup as a MEMORY table, the entire task completes in 8 hours. After finishing, about 150 million rows exist in the table. Eight hours is a completely-decent number for me. This is a one-time deal.
The problem comes when trying to convert the MEMORY table back into MyISAM so that (A) I'll have the memory freed up for other processes and (B) the data won't be killed when I restart the computer.
ALTER TABLE memtable ENGINE = MyISAM
I've let this ALTER TABLE query run for over two days now, and it's not done. I've now killed it.
If I create the table initially as MyISAM, the write speed seems terribly poor (especially due to the fact that the query requires the use of the ON DUPLICATE KEY UPDATE technique). I can't temporarily turn off the keys. The table would become over 1000 times larger if I were to and then I'd have to reprocess the keys and essentially run a GROUP BY on 150,000,000,000 rows. Umm, no.
One of the key constraints to realize: The INSERT query UPDATEs records if the primary key (a hash) exists in the table already.
At the very beginning of an attempt at strictly using MyISAM, I'm getting a rough speed of 1,250 rows per second. Once the index grows, I imagine this rate will tank even more.
I have 16GB of memory installed in the machine. What's the best way to generate a massive table that ultimately ends up as an on-disk, indexed MyISAM table?
Clarification: There are many, many UPDATEs going on from the query (INSERT ... ON DUPLICATE KEY UPDATE val=val+whatever). This isn't, by any means, a raw dump problem. My reasoning for trying a MEMORY table in the first place was for speeding-up all the index lookups and table-changes that occur for every INSERT.
If you intend to make it a MyISAM table, why are you creating it in memory in the first place? If it's only for speed, I think the conversion to a MyISAM table is going to negate any speed improvement you get by creating it in memory to start with.
You say inserting directly into an "on disk" table is too slow (though I'm not sure how you're deciding it is when your current method is taking days), you may be able to turn off or remove the uniqueness constraints and then use a DELETE query later to re-establish uniqueness, then re-enable/add the constraints. I have used this technique when importing into an INNODB table in the past, and found even with the later delete it was overall much faster.
Another option might be to create a CSV file instead of the INSERT statements, and either load it into the table using LOAD DATA INFILE (I believe that is faster then the inserts, but I can't find a reference at present) or by using it directly via the CSV storage engine, depending on your needs.
Sorry to keep throwing comments at you (last one, probably).
I just found this article which provides an example of a converting a large table from MyISAM to InnoDB, while this isn't what you are doing, he uses an intermediate Memory table and describes going from memory to InnoDB in an efficient way - Ordering the table in memory the way that InnoDB expects it to be ordered in the end. If you aren't tied to MyISAM it might be worth a look since you already have a "correct" memory table built.
I don't use mysql but use SQL server and this is the process I use to handle a file of similar size. First I dump the file into a staging table that has no constraints. Then I identify and delete the dups from the staging table. Then I search for existing records that might match and put the idfield into a column in the staging table. Then I update where the id field column is not null and insert where it is null. One of the reasons I do all the work of getting rid of the dups in the staging table is that it means less impact on the prod table when I run it and thus it is faster in the end. My whole process runs in less than an hour (and actually does much more than I describe as I also have to denormalize and clean the data) and affects production tables for less than 15 minutes of that time. I don't have to wrorry about adjusting any constraints or dropping indexes or any of that since I do most of my processing before I hit the prod table.
Consider if a simliar process might work better for you. Also could you use some sort of bulk import to get the raw data into the staging table (I pull the 22 gig file I have into staging in around 16 minutes) instead of working row-by-row?

MySQL ALTER TABLE on very large table - is it safe to run it?

I have a MySQL database with a MyISAM table with 4 million rows. I update this table about once a week with about 2000 new rows. After updating, I then alter the table like this:
ALTER TABLE x ORDER BY PK DESC
I order the table by the primary key field in descending order. This has not given me any problems on my development machine (Windows with 3GB memory). Three times I have tried it successfully on the production Linux server (with 512MB RAM - and achieving the resulted sorted table in about 6 minutes each time), the last time I tried it I had to stop the query after about 30 minutes and rebuild the database from a backup.
Can a 512MB server cope with that alter statement on such a large table? I have read that a temporary table is created to perform the ALTER TABLE command.
Question: Can this alter command be safely run? What should be the expected time for the alteration of the table?
As I have just read, the ALTER TABLE ... ORDER BY ... query is useful to improve performance in certain scenarios. I am surprised that the PK Index does not help with this. But, from the MySQL docs, it seems that InnoDB does use the index. However InnoDB tends to be slower as MyISAM. That said, with InnoDB you wouldn't need to re-order the table but you would lose the blazing speed of MyISAM. It still may be worth a shot.
The way you explain the problems, it seems that there is too much data loaded into memory (maybe there is even swapping going on?). You could easily check that with monitoring your memory usage. It's hard to say as I do not know MySQL all that well.
On the other hand, I think your problem lies at a very different place: You are using a machine with only 512 Megs of RAM as Database server with a table containing more than 4Mio rows... And you are performing a very memory-heavy operation on the whole table on that machine. It seems that 512Megs will not nearly be enough for that.
A much more fundamental issue I am seeing here: You are doing development (and quite likely testing as well) in an environment that is very different to the production environment. The kind of problem you are explaining is to be expected. Your development machine has six times as much memory as your production machine. I believe I can safely say, that the processor is much faster as well. In that case, I suggest you create a virtual machine mimicking your production site. That way you can easily test your project without disrupting the production site.
What you're asking it to do is rebuild the entire table and all its indexes; this is an expensive operation particularly if the data doesn't fit in ram. It will complete, but it will be vastly slower if the data doesn't fit in ram, particularly if you have lots of indexes.
I question your judgement when choosing to run a machine with such tiny memory in production. Anyway:
Is this ALTER TABLE really necessary; what specific query are you trying to speed up, and have you tried it without?
Have you considered making your development machine more like production? I mean, using a dev box with MORE memory is never a good idea, and using a different OS is definitely not either.
There is probably also some tuning you can do to try to help; it largely depends on your schema (indexes in particular). 4M rows is not very many (for a machine with normal amounts of ram).
is the primary key auto_increment? if so, then doing ALTER TABLE ... ORDER BY isn't going to improve anything since everything will be inserted in order anyway.
(unless you have lots of deletes)
I'd probably create a View instead which is ordered by the PK value, so that for one thing you don't need to lock up that huge table while the ALTER is being performed.
If you're using InnoDB, you shouldn't have to explicitly perform the ORDER BY either post-insert or at query time. According to the MySQL 5.0 manual, InnoDB already defaults to primary key ordering for query results:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html#id4052480
MyISAM tables return records in insertion order by default, instead, which may work as well if you only ever append to the table, rather than using an UPDATE query to modify any rows in-place.