I have 1 million rows in MySql table "temp" and wish to multiply column "t" (int unsigned, indexed) by 1000.
mysql> update temp set t=1000*t;
This process takes 25 seconds. The same statement on not-indexed column takes 10 seconds;
Any ideas how to make this process faster? I have to apply this on over 1e5 tables.
You can turn indexing off and back on after the updates are done
ALTER TABLE tbl_name DISABLE KEYS;
ALTER TABLE tbl_name ENABLE KEYS;
Or if you are using myISAM
You can use the delay_key_write flag. You can set it per-table, or
globally. You can use the "FLUSH TABLE mytable" command to force mysql
to update the on-disk copy of the indexes.
http://dev.mysql.com/doc/mysql/en/create-table.html
http://dev.mysql.com/doc/mysql/en/myisam-start.html
http://dev.mysql.com/doc/mysql/en/flush.html
Indexing has nothing to do with the problem here. Think about what you're doing - you're mutating all the rows in your table, so no matter how you select them and if you have an index on t or not, you're still scanning the whole table.
The UPDATE operation == IO is what your bottleneck is. Get faster disks.
If you're using InnoDB, my only advice would be to see if tweaking innodb_flush_log_at_trx_commit and setting it to 2 helps your performance but I doubt it as it's just 1 query. Disabling keys and re-enabling them after UPDATE won't work in InnoDB.
Related
I have a DB schema composed of MYISAM tables, i am interested to delete old records from time to time from some of the tables.
I know that delete does not reclaim the memory space, but as i found in a description of DELETE command, inserts may reuse the space deleted
In MyISAM tables, deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions.
I am interested if LOAD DATA command also reuses the deleted space?
UPDATE
I am also interested how the index space reclaimed?
UPDATE 2012-12-03 23:11
some more info supplied based on the answer received from #RolandoMySQLDBA
after executing the following suggested query i got different results for different tables for which space need to be reused or reclaimed:
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable1';
> Dynamic
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable2';
> Fixed
UPDATE 2012-12-09 08:06
LOAD DATA do reuses previously deleted space (i have checked it by running a short script) if and only if the row format is fixed or (the row format is dynamic and there is a deleted row with exactly the same size).
it seems that if the row_format is dynamic, full look-up over the deleted list is made for each record , and if the exact row size is not found , the deleted record is not used, and the table memory usage will raise, additionally LOAD DATA will take much more time to import records.
I will except the answer given here , since it describes all the process perfectly.
For a MySQL table called mydb.mytable just run the following:
OPTIMIZE TABLE mydb.mytable;
You could also do this in stages:
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ALTER TABLE mydb.mytable_old;
ANALYZE TABLE mydb.mytable;
In either case, the table ends up with no fragmentation.
Give it a Try !!!
UPDATE 2012-12-03 12:50 EDT
If you are concerned whether or not rows are reused upon bulk INSERTs via LOAD DATA INFILE, please note the following:
When you created the MyISAM table, I assumed the default row format would be dynamic. You can check what it is with either
SHOW CREATE TABLE mydb.mytable\G
or
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';
Since the row format of your table is Dynamic, the fragmented rows are of various sizes. The MyISAM storage engine would have keep checking for the row length of each deleted to see if the next set of data being insert will fit. If the incoming data cannot fit in any of the deleted rows, then the new row data is appended.
The presence of such rows can make myisamchk struggle.
This is why I recommended running OPTIMIZE TABLE. That way, data would be appended quicker.
UPDATE 2012-12-03 12:58 EDT
Here is something interesting you can also do: Try setting concurrent_insert to 2. That way, you are always appending to a MyISAM table without checking for gaps in the table. This will speed up INSERTs dramatically but leave all known gaps alone.
You could still defragment your table at your earliest convenience using OPTIMIZE TABLE.
UPDATE 2012-12-03 13:40 EDT
Why don't run the my second sugesstion
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;
This will give you an idea
How long OPTIMIZE TABLE would take to run
How much smaller the .MYD and .MYI would be after running OPTIMIZE TABLE
After you run my second suggestion, you can compare them with
SELECT
A.mydsize,B.mydsize,A.mydsize - B.mydsize myd_diff,
A.midsize,B.myisize,A.myisize - B.myisize myi_diff
FROM
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable'
) A,
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable_new'
) B;
UPDATE 2012-12-03 16:42 EDT
Any table whose ROW_FORMAT is set to fixed has the luxury of allocating the same length row every time. If MyISAM tables maintain a list of deleted rows, the very first row in the list should always be selected as the next row to insert data. There would be no need to traverse a whole list until a suitable row gaps with sufficient length is found. Each deleted row is quickly appended after a DELETE. Each INSERT would pick the first row of the deleted rows.
We can assume these things because MyISAM tables can do concurrent inserts. In order for this feature to be available via the concurrent_insert option, INSERTs into a MyISAM table must be able to detect one of three(3) things:
The presence of a list of deleted rows, thus choosing from the list
Row_Format=Dynamic : list of deleted rows with each row with a different length
Row_Format=Fixed : list of deleted rows with all rows the same length
The absence of a list of deleted rows, thus appending
Bypass checking for the presence of a list of deleted rows (set concurrent_insert to 2)
For detection #1 to be the fastest possible, a MyISAM table's row_format must be Fixed. If it is Dynamic, it is very possible that a list traversal is necessary.
I'm trying to speed up bulk insert in an InnoDB table by temporary disabling its indexes:
ALTER TABLE mytable DISABLE KEYS;
But it gives a warning:
+-------+------+-------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------+
| Note | 1031 | Table storage engine for 'mytable' doesn't have this option |
+-------+------+-------------------------------------------------------------+
1 row in set (0.00 sec)
How can we disable the indexes?
What alternatives are there to avoid using the index when doing bulk inserts?
How can we speed up the process?
Have you tried the following?
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
From the MySQL References https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-bulk-data-loading.html
See Section "Bulk Data Loading Tips"
There is a very good reason why you cannot execute DISABLE KEYS on an InnoDB table; InnoDB is not designed to use it, and MyISAM is.
In fact, here is what happens when you reload a mysqldump:
You will see a CREATE TABLE for a MyISAM table following by a write lock.
Before all the bulk inserts are run, a call to ALTER TABLE ... DISABLE KEYS is done.
What this does is turn off secondary indexes in the MyISAM table.
Then, bulk inserts are done. While this is being done, the PRIMARY KEY and all UNIQUE KEYS in the MyISAM table are being rebuilt. Before the UNLOCK TABLEs, a call ALTER TABLE ... ENABLE KEYS is done in order to rebuild all non-unique indexes linearly.
IMHO this operation was not coded into the InnoDB Storage Engine because all keys in a non-unique index come with the primary key entry from gen_clust_index (aka Clustered Index). That would be a very expensive operation since building a non-unique index would require O(n log n) running time to retrieve each unique key to attach to a non-unique key.
In light of this, posting a warning about trying to DISABLE KEYS/ENABLE KEYS on an InnoDB table is far easier than coding exceptions to the mysqldump for any special cases involving non-MyISAM storage engines.
A little late but... whatever... forget all the answers here, don't disable the indexes, there's no way, just drop them ALTER TABLE tablename DROP INDEX whatever, bulk insert the data, then ALTER TABLE tablename ADD INDEX whatever (whatever); the amount of time recreating the indexes is 1% of the bulk insert with indexes on it, like 400000 rows took 10 minutes with indexes and like 2 seconds without them..., cheers...
to reduce the costs for re-calculating the indexes you should insert the data either using DATA INFILE or using Mysql Multi Row Inserts, like
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
-> so inserting several rows with one statement.
How many rows one can insert with one statement depends on the max_allowed_packet mysql setting.
I am using MySQL database.
If I have 6,000,000 new records need to be inserted into a table (not a empty table).
Question 1:
Is
ALTER TABLE tbl_name DISABLE KEYS;
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9), ... ...
ALTER TABLE tbl_name ENABLE KEYS;
OPTIMIZE TABLE tbl_name;
faster than:
ALTER TABLE tbl_name DISABLE KEYS;
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3)
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3)
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3)
...
...
ALTER TABLE tbl_name ENABLE KEYS;
OPTIMIZE TABLE tbl_name;
?
Question 2:
Is the first one called bulk insertion?
-----------Update---------------
Should I enable/disable keys and optimize my table after? As # Neil 's comment seems do not recommend to do so. What is others' opinion?
Try to optimize your queries using bulk inserts. This should considerably increase the speed of the inserting data process.
You wrote that you had encountered an error on inserting large amount of data - 'database gone away'.
In this case the size of the query should not exceed the maximal size of the packet - see the information about the max_allowed_packet variable.
How to check max_allowed_packet value -
SELECT ##global.max_allowed_packet;
How to set this value -
SET ##global.max_allowed_packet = 200000;
More information here
If all this not enough for you, then have a look at this article (as Gfox suggested) - Speed of INSERT Statements.
i Think Sending one query is more faster but With 6,000,000 record ,
Database will have problems ,
as i tried on windows ` insert 1000000 record as one query ,
i got this error database gone away :(
Please refer to this article.
By sending one query you will save several steps and win time. Specially when there is a question of insertion of hundred thousands of rows this time difference will be significant.
I was just trying to add a column called "location" to a table (main_table) in a database. The command I run was
ALTER TABLE main_table ADD COLUMN location varchar (256);
The main_table contains > 2,000,000 rows. It keeps running for more than 2 hours and still not completed.
I tried to use mytop
to monitor the activity of this database to make sure that the query is not locked by other querying process, but it seems not. Is it supposed to take that long time? Actually, I just rebooted the machine before running this command. Now this command is still running. I am not sure what to do.
Your ALTER TABLE statement implies mysql will have to re-write every single row of the table including the new column. Since you have more than 2 million rows, I would definitely expect it takes a significant amount of time, during which your server will likely be mostly IO-bound. You'd usually find it's more performant to do the following:
CREATE TABLE main_table_new LIKE main_table;
ALTER TABLE main_table_new ADD COLUMN location VARCHAR(256);
INSERT INTO main_table_new SELECT *, NULL FROM main_table;
RENAME TABLE main_table TO main_table_old, main_table_new TO main_table;
DROP TABLE main_table_old;
This way you add the column on the empty table, and basically write the data in that new table that you are sure no-one else will be looking at without locking as much resources.
I think the appropriate answer for this is using a feature like pt-online-schema-change or gh-ost.
We have done migration of over 4 billion rows with this, though it can take upto 10 days, with less than a minute of downtime.
Percona works in a very similar fashion as above
Create a temp table
Creates triggers on the first table (for inserts, updates, deletes) so that they are replicated to the temp table
In small batches, migrate data
When done, rename table to new table, and drop the other table
You can speed up the process by temporarily turning off unique checks and foreign key checks. You can also change the algorithm that gets used.
If you want the new column to be at the end of the table, use algorithm=instant:
SET unique_checks = 0;
SET foreign_key_checks = 0;
ALTER TABLE main_table ADD location varchar(256), algorithm=instant;
SET unique_checks = 1;
SET foreign_key_checks = 1;
Otherwise, if you need the column to be in a specific location, use algorithm=inplace:
SET unique_checks = 0;
SET foreign_key_checks = 0;
ALTER TABLE main_table ADD location varchar(256) AFTER othercolumn, algorithm=inplace;
SET unique_checks = 1;
SET foreign_key_checks = 1;
For reference, it took my PC about 2 minutes to alter a table with 20 million rows using the inplace algorithm. If you're using a program like Workbench, then you may want to increase the default timeout period in your settings before starting the operation.
If you find that the operation is hanging indefinitely, then you may need to look through the list of processes and kill whatever process has a lock on the table. You can do that using these commands:
SHOW FULL PROCESSLIST;
KILL PROCESS_NUMBER_GOES_HERE;
Alter table takes a long time with a big data like in your case, so avoid to use it in such situations, and use some code like this one:
select main_table.*,
cast(null as varchar(256)) as null_location, -- any column you want accepts null
cast('' as varchar(256)) as not_null_location, --any column doesn't accept null
cast(0 as int) as not_null_int, -- int column doesn't accept null
into new_table
from main_table;
drop table main_table;
rename table new_table TO main_table;
DB2 z/OS does a virtual add of the column instantly. And puts the table into Advisory-Reorg status. Anything that runs before the reorg gets the default value or null if no default. When updates are done, they expand the rows updated. Inserts are done expanded. The next reorg expands every unexpanded row and assigns the default value to anything it expands.
Only a real database handles this well. DB2 z/OS.
I have to delete about 10K rows from a table that has more than 100 million rows based on some criteria. When I execute the query, it takes about 5 minutes. I ran an explain plan (the delete query converted to select * since MySQL does not support explain delete) and found that MySQL uses the wrong index.
My question is: is there any way to tell MySQL which index to use during delete? If not, what ca I do? Select to temp table then delete from temp table?
There is index hint syntax. //ETA: sadly, not for deletes
ETA:
Have you tried running ANALYZE TABLE $mytable?
If that doesn't pay off, I'm thinking you have 2 choices: Drop the offending index before the delete and recreate it after. Or JOIN your delete table to another table on the desired index which should ensure that the desired index is used.
I've never really come across a situation where MySQL chose the wrong index, but rather my understanding of how indexes worked was usually at fault.
You might want to check out this book: http://oreilly.com/catalog/9780596003067
It has a great section on how indexes work and other tuning options.
As stated in other answers, MySQL can't use indexes, but the PRIMARY KEY index.
So your best option, if you have a PRIMARY KEY on the table is to run a fast SELECT, then DELETE according lines. Preferably in a TRANSACTION, so that you don't delete wrong rows.
Hence:
DELETE FROM table WHERE column_with_index = 0
Will be rewritten:
SELECT primary_key FROM table WHERE column_with_index = 0 => returns many lines
DELETE FROM table WHERE primary_key IN(?, ?, ?) => ? will be replaced by the results of the SELECTed primary keys.
If you have not that much lines to delete, it would be more efficient this way.
For example, I've just hit an exemple, on the same table, with the same data:
7499067 rows analyzed by DELETE : 12 seconds
vs
6 rows analyzed by SELECT using a good index : 0.10 seconds
0 rows to be deleted in the end