I have a table with 30M rows and one of the columns I need to create an INDEX on. What would be the fastest way to do this? Two options I have considered is truncating the table, adding the index, and then re-importing the SQL as a csv file. The other would be the ALTER TABLE statement.
What should I do for the fastest performance?
The fastest way would be to use ALTER TABLE. If you truncate, alter and re-import then you will have to wait while the truncate runs, then while the re-import runs which will be added to by building the new index. With just the ALTER the only time will be building the new index, so it skips the truncate and import time.
However, with 30M rows, building the index could take some time and may timeout. If this happens you will need to increase the timeouts somehow (I don't use MySQL so I can't tell you how). If this doesn't work, you may have no choice but to do the truncate and re-import route, hopefully using some sort of Bulk upload.
Related
Is there a way to perform ALTER TABLE in MySQL, telling the server to skip creating a backup of the table first? I have a backup of the table already and I'm doing some tests on it (adding indexes), so I don't care if the table gets corrupted in the process. I'll just restore it from the backup. But what I do care about is for the ALTER TABLE to finish quickly, so I can see the test results.
Given that I have a big MyISAM table (700 GB) it really isn't an option to wait for couple of hours so that MySQL can first finish creating a backup of the original table, before actually adding an index to it.
It's not doing a backup; it is building the new version. (The existing table serves as a backup in case of a crash.)
With InnoDB, there are many flavors of ALTER TABLE -- some of which take essentially zero time, regardless of the size of the table. MyISAM (mostly) does the brute force way: Create an empty table with the new schema; copy all the data and build all the indexes; swap tables. For some alters, InnoDB must also do the brute force way: Example changing the PRIMARY KEY.
I have a table in my MySQL database with round 5M rows. Inserting rows to the table is too slow as MySQL updates index while inserting. How to stop index updating while inserting and do the indexing separately later?
Thanks
Kamrul
Sounds like your table might be over indexed. Maybe post your table definition here so we can have a look.
You have two choices:
Keep current indexes and remove unused indexes. If you have 3 indexes on a table for every single write to the table there will be 3 writes to the indexes. A index is only helpful during reads so you might want to remove unused indexes. During a load indexes will be updated which will slow down your load.
Drop you indexes before load then recreate them after load. You can drop your indexes before data load then insert and rebuild. The rebuild might take longer than the slow inserts. You will have to rebuild all indexes one by one. Also unique indexes can fail if duplicates are loaded during the load process without the indexes.
Now I suggest you take a good look at the indexes on the table and reduce them if they are not used in any queries. Then try both approaches and see what works for you. There is no way I know of in MySQL to disable indexes as they need the values insert to be written to their internal structures.
Another thing you might want to try it to split the IO over multiple drives i.e partition your table over several drives to get some hardware performance in place.
I am importing a lot of data (something like 75 million inserts) into a MySQL database with a few different tables.
I have indexes on a lot of columns. Should I remove them while I do the inserts and just add them back after it is done? Will that have a significant impact on performance?
I get the feeling the import has slowed down now that I have imported a few hundred thousand records, and I suspect the indexes might be case.
Would any more information be useful?
There is no yes/no answer for that
The most important point being: If the table is used while importing,
do not disable the indices: Just imagine a few simple queries
falling back to full table scan, after you have inserted 74 million
records.
Closely realted: Can you make sure, the table is not needed after being filled, but the indices not yet built?
If you can do the insert on a completly "cold" table, I'd drop the indices and rebuild them later.
Yes, certainly. You need to remove the indexes before importing large amount of records. Otherwise it will not only take a long time to import, but it will also make the existing indexes heavily fragmented. You will have to rebuild the index anyway to restore optimal performance.
If you remove the indexes before doing the import, then import will be faster. After the import, create the indexes again and then the indexes will be created fresh and they will have no fragmentation and the search performance will be faster as well.
I have a database with static tables which require to be updated from CSV weekly.
Tables are Mysql MyISAM and by static i mean they are used for read only (except when updated from CVS, obviously).
There're about 50 tables and in total about 200mb of data to be reloaded weekly.
I can think about 3 ways:
Truncate table
Load data from files
Or
For each table create a temporary table
Load data there
Truncate (or delete rows?) original table
Insert into original table select * from temporary table.
Or
Create table_new and load data there
Rename original table to table_old (or drop table altogether)
Rename table_new into original table
What do you reckon is the most efficient way?
Have you considered using mysqlimport? You can read about it here:
http://dev.mysql.com/doc/refman/5.1/en/mysqlimport.html
I probably wouldn't do anything with deleting the original tables, because then you have to re-create all your foreign keys, indexes, constraints, etc. which is a mess and a maintenance nightmare. Renaming tables can also cause problems (like if you have synonyms for the tables, I'm not sure if mysql has synonyms though).
What I would do, however, is disable the keys before loading the data.
ALTER TABLE tbl_name DISABLE KEYS
In other words, when loading the data you don't want it to be trying to update indexes because that will slow down the load. You want the indexes updated once the load is completed.
So I think by combining mysqlimport with the tip above, you should be able to get a really efficient load.
You could always do INSERT INTO ... ON DUPLICATE KEY UPDATE ... or REPLACE INTO .... You shouldn't get any down time (between a TRUNCATE and INSERT), and there's very little chance of corruption.
Be careful with REPLACE, since it will actually delete each record and re-insert it, firing any triggers you may have (unlikely in this case), but also giving you a new ID if you have an auto-increment field.
Your third option is the best, you can LOCK and DISABLE KEYS on the _new table while importing, and it'll be extra quick. You can even do a "batch atomic rename" of all your new tables to the "current ones", with zero downtime if they have relations between them.
I'm assuming the whole tables are contained in the weekly cvs updates (i.e. they're not incremental).
I would prefer the 3rd method and also keep the old table.
create table_new
drop table_old if exists
rename table to table_old
rename table_new to table
The advantage of this method is that it fast and safe with less effect on the readers. The creation of new table does not affect reads on existing table. The rename operation is faster (just a file rename in case of myisam) so the downtime is not that much. So the clients will not be affected by this that much. You also got to keep the old data in case something is wrong with the new data.
As you are not going to update it online I think it will be good if you do myisampack.
I have a database full of time-sensitive data, so on a daily basis I truncate the table and then import the new data (from a merge of other databases) into the truncated table.
Currently I am running OPTIMIZE on the table after I have imported the daily refresh of data.
However, looking at the mysql OPTIMIZE syntax page
http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
it says I can optimize to reclaim unused space and defrag the data.
So should I being running OPTIMIZE twice?
Once when I delete the data, and then again after I've reinserted it?
or just once?
and if just once, should it be after loading the new data?
or after clearing out the old?
it may depend upon whether you are using MyISAM or InnoDB tables, but i would run the OPTIMIZE after truncating the table. This should ensure space is reclaimed and it will run very quickly.
When you insert your batch of data it should all insert in order and not be fragmented anyway, and since it's a fresh insert there will be no space to reclaim. If it's a small dataset it may not matter too much, but on a large dataset doing the OPTIMIZE after the insert could also be quite slow.
Just once is fine, after you've imported the new data.
After deleteing or updateing a set of data into your database,you can use optimize table command to remove de-fragmented space .
there is no need to use optimize command two time.after all DML process you can
use optimize command.