I'm trying to speed up bulk insert in an InnoDB table by temporary disabling its indexes:
ALTER TABLE mytable DISABLE KEYS;
But it gives a warning:
+-------+------+-------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------+
| Note | 1031 | Table storage engine for 'mytable' doesn't have this option |
+-------+------+-------------------------------------------------------------+
1 row in set (0.00 sec)
How can we disable the indexes?
What alternatives are there to avoid using the index when doing bulk inserts?
How can we speed up the process?
Have you tried the following?
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
From the MySQL References https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-bulk-data-loading.html
See Section "Bulk Data Loading Tips"
There is a very good reason why you cannot execute DISABLE KEYS on an InnoDB table; InnoDB is not designed to use it, and MyISAM is.
In fact, here is what happens when you reload a mysqldump:
You will see a CREATE TABLE for a MyISAM table following by a write lock.
Before all the bulk inserts are run, a call to ALTER TABLE ... DISABLE KEYS is done.
What this does is turn off secondary indexes in the MyISAM table.
Then, bulk inserts are done. While this is being done, the PRIMARY KEY and all UNIQUE KEYS in the MyISAM table are being rebuilt. Before the UNLOCK TABLEs, a call ALTER TABLE ... ENABLE KEYS is done in order to rebuild all non-unique indexes linearly.
IMHO this operation was not coded into the InnoDB Storage Engine because all keys in a non-unique index come with the primary key entry from gen_clust_index (aka Clustered Index). That would be a very expensive operation since building a non-unique index would require O(n log n) running time to retrieve each unique key to attach to a non-unique key.
In light of this, posting a warning about trying to DISABLE KEYS/ENABLE KEYS on an InnoDB table is far easier than coding exceptions to the mysqldump for any special cases involving non-MyISAM storage engines.
A little late but... whatever... forget all the answers here, don't disable the indexes, there's no way, just drop them ALTER TABLE tablename DROP INDEX whatever, bulk insert the data, then ALTER TABLE tablename ADD INDEX whatever (whatever); the amount of time recreating the indexes is 1% of the bulk insert with indexes on it, like 400000 rows took 10 minutes with indexes and like 2 seconds without them..., cheers...
to reduce the costs for re-calculating the indexes you should insert the data either using DATA INFILE or using Mysql Multi Row Inserts, like
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
-> so inserting several rows with one statement.
How many rows one can insert with one statement depends on the max_allowed_packet mysql setting.
Related
I've been researching for a while regarding database partitioning in MySQL. Since I have one ever-growing table in my DB, I thought of using partitioning as an effective tool to optimize it. I'm only interested in retaining recent data (say last 6 months) and the table has a column name 'CREATED_AT' (TIMESTAMP, NON-PRIMARY), the approach which popped up in my mind is as follows
Create a time-based range partition on the table by using 'CREATED_AT' as the partition key.
Run a DB level Event periodically and drop partitions which are obsolete. ( older than 6 months).
However, the partition can only be realized if I make 'CREATED_AT' field as primary. But doesn't it violate the primary key principle? since the same field is non-unique and can have tons of rows with the same value, doesn't marking it as primary turn out to be an anti-pattern? Is there any workaround to acheive time based ranged partitioning in this scenario?
This is a problem that prevents many MySQL users from using partitioning.
The column you use for your partitioning key must be in every PRIMARY KEY or UNIQUE KEY of the table. It doesn't have to be the only column in those keys (because keys can be multi-column), but it has to be part of every unique key.
Still, in many tables it would violate the logical design of the table. So partitioning is not practical.
You could grit your teeth and design a table with partitions that has a compromised design:
create table mytable (
id bigint auto_increment not null,
created_at datetime not null,
primary key (id, created_at)
) partition by range columns (created_at) (
partition p20190101 values less than ('2019-01-01'),
partition p20190201 values less than ('2019-02-01'),
partition p20190301 values less than ('2019-03-01'),
partition p20190401 values less than ('2019-04-01'),
-- etc...
partition pMAX values less than (MAXVALUE)
);
I tested this table and there's no error when I define it. Even though this table technically allows multiple rows with the same id value if they have different timestamps, in practice you can code your application to just let id values be auto-incremented, and never change the id. As long as your code is the only application that inserts data, you can more or less have some assurance that the data doesn't contain multiple rows with the same id.
You might think you can add a secondary unique key constraint to enforce that id must be unique by itself. But this violates the partitioning rules:
mysql> alter table mytable add unique key (id);
ERROR 1503 (HY000): A UNIQUE INDEX must include all columns in the table's partitioning function
You just have to trust that your application won't insert invalid data.
Or else forget about using partitioning, and instead just add an index to the created_at column, and use incremental DELETE instead of using DROP PARTITION to prune old data.
The latter strategy is what I see used in almost every case. Usually, it's important to have the RDBMS enforce strict uniqueness on the id column. It's not safe to allow this uniqueness to be unenforced.
Re your comment:
Isn't dropping of an entire partition a much cheaper operartion than performing incremental deletes?
Yes and no.
DELETE can be rolled back, so it results in some overhead, like temporarily storing data in the rollback segment. On the other hand, it locks only the rows that match the index search.
Dropping a partition doesn't do rollback, so there are some steps it can skip. But it does an ALTER TABLE, so it needs to first acquire a metadata lock on the whole table. Any concurrent query, either read or write, will block that and be blocked by it.
Demo:
Open two MySQL client windows. In the first session do this:
mysql> START TRANSACTION;
mysql> SELECT * FROM mytable;
This holds a metadata lock on the table, which blocks things like ALTER TABLE.
In the second window:
mysql> ALTER TABLE mytable DROP PARTITION p20190101;
<pauses, waiting for the metadata lock held by the first session!>
You can even open a third session and do this:
mysql> SELECT * FROM mytable;
<also pauses>
The second SELECT is waiting behind the ALTER TABLE. They are both queued for the metadata lock.
If I commit the first SELECT, then the ALTER TABLE finally finishes:
mysql> ALTER TABLE mytable DROP PARTITION p20190101;
Query OK, 0 rows affected (6 min 25.25 sec)
That 6 min 25 sec isn't because it takes a long time to do the DROP PARTITION. It's because I had left my transaction uncommitted that long while writing this post.
Metadata lock waits don't time out like an InnoDB row lock, which times out after 50 seconds. The default metadata lock timeout is 1 year! See https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_lock_wait_timeout
Statements like ALTER TABLE, DROP TABLE, RENAME TABLE, and even things like CREATE TRIGGER need to acquire a metadata lock.
So in some cases, depending on if you have long-running transactions holding onto metadata locks, it could be better for your concurrent throughput to use DELETE to remove data incrementally, even if it takes longer.
I have a DB schema composed of MYISAM tables, i am interested to delete old records from time to time from some of the tables.
I know that delete does not reclaim the memory space, but as i found in a description of DELETE command, inserts may reuse the space deleted
In MyISAM tables, deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions.
I am interested if LOAD DATA command also reuses the deleted space?
UPDATE
I am also interested how the index space reclaimed?
UPDATE 2012-12-03 23:11
some more info supplied based on the answer received from #RolandoMySQLDBA
after executing the following suggested query i got different results for different tables for which space need to be reused or reclaimed:
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable1';
> Dynamic
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable2';
> Fixed
UPDATE 2012-12-09 08:06
LOAD DATA do reuses previously deleted space (i have checked it by running a short script) if and only if the row format is fixed or (the row format is dynamic and there is a deleted row with exactly the same size).
it seems that if the row_format is dynamic, full look-up over the deleted list is made for each record , and if the exact row size is not found , the deleted record is not used, and the table memory usage will raise, additionally LOAD DATA will take much more time to import records.
I will except the answer given here , since it describes all the process perfectly.
For a MySQL table called mydb.mytable just run the following:
OPTIMIZE TABLE mydb.mytable;
You could also do this in stages:
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ALTER TABLE mydb.mytable_old;
ANALYZE TABLE mydb.mytable;
In either case, the table ends up with no fragmentation.
Give it a Try !!!
UPDATE 2012-12-03 12:50 EDT
If you are concerned whether or not rows are reused upon bulk INSERTs via LOAD DATA INFILE, please note the following:
When you created the MyISAM table, I assumed the default row format would be dynamic. You can check what it is with either
SHOW CREATE TABLE mydb.mytable\G
or
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';
Since the row format of your table is Dynamic, the fragmented rows are of various sizes. The MyISAM storage engine would have keep checking for the row length of each deleted to see if the next set of data being insert will fit. If the incoming data cannot fit in any of the deleted rows, then the new row data is appended.
The presence of such rows can make myisamchk struggle.
This is why I recommended running OPTIMIZE TABLE. That way, data would be appended quicker.
UPDATE 2012-12-03 12:58 EDT
Here is something interesting you can also do: Try setting concurrent_insert to 2. That way, you are always appending to a MyISAM table without checking for gaps in the table. This will speed up INSERTs dramatically but leave all known gaps alone.
You could still defragment your table at your earliest convenience using OPTIMIZE TABLE.
UPDATE 2012-12-03 13:40 EDT
Why don't run the my second sugesstion
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;
This will give you an idea
How long OPTIMIZE TABLE would take to run
How much smaller the .MYD and .MYI would be after running OPTIMIZE TABLE
After you run my second suggestion, you can compare them with
SELECT
A.mydsize,B.mydsize,A.mydsize - B.mydsize myd_diff,
A.midsize,B.myisize,A.myisize - B.myisize myi_diff
FROM
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable'
) A,
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable_new'
) B;
UPDATE 2012-12-03 16:42 EDT
Any table whose ROW_FORMAT is set to fixed has the luxury of allocating the same length row every time. If MyISAM tables maintain a list of deleted rows, the very first row in the list should always be selected as the next row to insert data. There would be no need to traverse a whole list until a suitable row gaps with sufficient length is found. Each deleted row is quickly appended after a DELETE. Each INSERT would pick the first row of the deleted rows.
We can assume these things because MyISAM tables can do concurrent inserts. In order for this feature to be available via the concurrent_insert option, INSERTs into a MyISAM table must be able to detect one of three(3) things:
The presence of a list of deleted rows, thus choosing from the list
Row_Format=Dynamic : list of deleted rows with each row with a different length
Row_Format=Fixed : list of deleted rows with all rows the same length
The absence of a list of deleted rows, thus appending
Bypass checking for the presence of a list of deleted rows (set concurrent_insert to 2)
For detection #1 to be the fastest possible, a MyISAM table's row_format must be Fixed. If it is Dynamic, it is very possible that a list traversal is necessary.
I have a table that has 170,002,225 rows with about 35 columns and two indexes. I want to add a column. The alter table command took about 10 hours. Neither the processor seemed busy during that time nor were there excessive IO waits. This is on a 4 way high performance box with tons of memory.
Is this the best I can do? Is there something I can look at to optimize the add column in tuning of the db?
I faced a very similar situation in the past and i improve the performance of the operation in this way :
Create a new table (using the structure of the current table) with the new column(s) included.
execute a INSERT INTO new_table (column1,..columnN) SELECT (column1,..columnN) FROM current_table;
rename the current table
rename the new table using the name of the current table.
ALTER TABLE in MySQL is actually going to create a new table with new schema, then re-INSERT all the data and delete the old table. You might save some time by creating the new table, loading the data and then renaming the table.
From "High Performance MySQL book" (the percona guys):
The usual trick for loading MyISAM table efficiently is to disable keys, load the data and renalbe the keys:
mysql> ALTER TABLE test.load_data DISABLE KEYS;
-- load data
mysql> ALTER TABLE test.load_data ENABLE KEYS;
Well, I would recommend using latest Percona MySQL builds plus since there is the following note in MySQL manual
In other cases, MySQL creates a
temporary table, even if the data
wouldn't strictly need to be copied.
For MyISAM tables, you can speed up
the index re-creation operation (which
is the slowest part of the alteration
process) by setting the
myisam_sort_buffer_size system
variable to a high value.
You can do ALTER TABLE DISABLE KEYS first, then add column and then ALTER TABLE ENABLE KEYS. I don't see anything can be done here.
BTW, can't you go MongoDB? It doesn't rebuild anything when you add column.
Maybe you can remove the index before alter the table because what is take most of the time to build is the index?
Combining some of the comments on the other answers, this was the solution that worked for me (MySQL 5.6):
create table mytablenew like mytable;
alter table mytablenew add column col4a varchar(12) not null after col4;
alter table mytablenew drop index index1, drop index index2,...drop index indexN;
insert into mytablenew (col1,col2,...colN) select col1,col2,...colN from mytable;
alter table mytablenew add index index1 (col1), add index index2 (col2),...add index indexN (colN);
rename table mytable to mytableold, mytablenew to mytable
On a 75M row table, dropping the indexes before the insert caused the query to complete in 24 minutes rather than 43 minutes.
Other answers/comments have insert into mytablenew (col1) select (col1) from mytable, but this results in ERROR 1241 (21000): Operand should contain 1 column(s) if you have the parenthesis in the select query.
Other answers/comments have insert into mytablenew select * from mytable;, but this results in ERROR 1136 (21S01): Column count doesn't match value count at row 1 if you've already added a column.
I try to add full text search to an existing table. When I tried:
alter table tweets add fulltext index(tags);
I got the error:
ERROR 1214 (HY000): The used table type doesn't support FULLTEXT indexes
what is the problem? How can I know what table type it is?
If you want to use full text indexing you need to make sure your table's underlying engine is MyISAM. You can change this using ALTER TABLE tweets ENGINE = MYISAM;
This is how you check the table type:
SELECT table_schema,engine FROM information_schema.tables WHERE table_name='tweet';
Only MyISAM supports FULLTEXT Indexes.
You may also want to preempt the stopword list.
Click Here for the Stop Words that FullText Indexing Would Normally Ignore.
You can override this as Follows:
1) Create a text file in /var/lib/mysql like this
echo "a" > /var/lib/mysql/stopwords.txt<BR>
echo "an" >> /var/lib/mysql/stopwords.txt<BR>
echo "the" >> /var/lib/mysql/stopwords.txt<BR>
2) Add this to /etc/my.cnf
ft_stopword_file=/var/lib/mysql/stopwords.txt<BR>
ft_min_word_len=2
3) service mysql restart
Here is something else to consider:
You may not want to convert the table 'tweets' to MyISAM.
1) If the InnoDB table 'tweets' contains CONSTRAINT(s).
2) If the InnoDB table 'tweets' is the parent of other InnoDB tables with Foreign Key Constraints back to 'tweets'.
3) You cannot afford to have table-level locking of the 'tweets' table.
Remember, each INSERT into the 'tweets' table will trigger a table-level lock if it were a MyISAM table. Since it currently an InnoDB table (which does row-level locking), the 'tweets' table can be INSERTed into very quickly.
You many want to create a separate MyISAM table, called tweets_tags, with the same Primary Key of the 'tweets' table along with a TEXT column called 'tags' the same as in the 'tweets' table.
Next, do an initial load of tweets_tags like this:
INSERT INTO tweets_tags (id,tags) SELECT id,tags FROM tweets;
Then, periodically (every night or every 6 hours), load new tweets into tweets_tags like this :
INSERT INTO tweets_tags (id,tags) SELECT id,tags FROM tweets WHERE id > (SELECT max(id) FROM tweets_tags);
I have 1 million rows in MySql table "temp" and wish to multiply column "t" (int unsigned, indexed) by 1000.
mysql> update temp set t=1000*t;
This process takes 25 seconds. The same statement on not-indexed column takes 10 seconds;
Any ideas how to make this process faster? I have to apply this on over 1e5 tables.
You can turn indexing off and back on after the updates are done
ALTER TABLE tbl_name DISABLE KEYS;
ALTER TABLE tbl_name ENABLE KEYS;
Or if you are using myISAM
You can use the delay_key_write flag. You can set it per-table, or
globally. You can use the "FLUSH TABLE mytable" command to force mysql
to update the on-disk copy of the indexes.
http://dev.mysql.com/doc/mysql/en/create-table.html
http://dev.mysql.com/doc/mysql/en/myisam-start.html
http://dev.mysql.com/doc/mysql/en/flush.html
Indexing has nothing to do with the problem here. Think about what you're doing - you're mutating all the rows in your table, so no matter how you select them and if you have an index on t or not, you're still scanning the whole table.
The UPDATE operation == IO is what your bottleneck is. Get faster disks.
If you're using InnoDB, my only advice would be to see if tweaking innodb_flush_log_at_trx_commit and setting it to 2 helps your performance but I doubt it as it's just 1 query. Disabling keys and re-enabling them after UPDATE won't work in InnoDB.