Do MyISAM holes get filled in automatically? - mysql

When you run a delete on a MyISAM table, it leaves a hole in the table until the table is optimized.
This affects concurrent inserts. On the default setting, concurrent_inserts only work for tables without holes. However, in the documentation for MyISAM, under the concurrent_insert section it says:
Enables concurrent inserts for all MyISAM tables, even those that have holes. For a table with a hole, new rows are inserted at the end of the table if it is in use by another thread. Otherwise, MySQL acquires a normal write lock and inserts the row into the hole.
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_concurrent_insert
Does that mean MyISAM automatically fills in holes whenever a new row is insert into the table? Previously I thought the holes would not be fixed until you OPTIMIZE a table.

Yes, MyISAM tables reuses deleted/free space.(Wether this is true for all cases, I don't know).
This can easily be observed by deleting a handful of records and inserting some new one - the MyISAM data file for that table will not grow in size.

Inserts into the middle of the table require a lock. So, in the default setting, MySQL will favor filling the holes, even though it will prevent concurrent inserts.
So, yes, MySQL prefers to fill the holes.
Setting concurrent_inserts to 2 tells MySQL that if there is a lock on the table, insert at the end, which doesn't require a lock, even though there are still holes to be filled. Therefore, this allows concurrent inserts even if there are holes in the middle, at the cost of taking longer to fill the holes.

Related

MySQL table data physical fragmentation

Could you tell me, will MySQL data in tables be physically fragmented if I insert new data and delete old data (insert in table top, and delete bottom rows)?
Will the size of table be growing in any way, before I do OPTIMIZE?
Are you DELETEing and INSERTing at about the same rate? Is the table InnoDB?
If Yes to both, then "don't worry". The DELETEs will free up blocks and the INSERTs will re-use those blocks (or new blocks if the INSERTs are getting ahead).
InnoDB tables are composed of 16KB blocks that are chained together in a BTree structure. What you described will be freeing blocks from one side of the tree and creating new blocks on the other side. However, each block can reside anywhere, so the concept of "side of the tree" is more 'virtual' than 'physical'.
After you have done a lot of churning, try the OPTIMIZE once. If it does not cut the table size in half, that is a further clue that OPTIMIZE is not worth it. A 30% shrinkage after a lot of churn is somewhat typical. But it stabilizes at about that, and won't go over about 30%.
If, on the other hand, you are DELETEing most of the table all at once, then you may be better off rebuilding the table with the not-to-be-deleted rows. This combines the delete and the optimize steps into one. Then use this to flip the tables 'instantly':
RENAME TABLE real TO old, new TO real; DROP TABLE old;
And do not do any inserts during the rebuild.
When innodb_file_per_table is OFF and all data is going to be stored in ibdata files. When you remove rows, they are just marked as deleted on disk but space will be consumed by InnoDB files which can be re-used later when you insert/update more rows but it will never shrink.
If you drop some tables of delete some data then there is no any other way to reclaim that unused disk space except dump/reload method.
But, if you are using innodb_file_per_table then you can reclaim the space by running OPTIMIZE TABLE on that table. OPTIMIZE TABLE will create a new identical empty table. Then it will copy row by row data from old table to the new one. In this process a new .ibd tablespace will be created and the space will be reclaimed However, the shared tablespace-ibdata1 can still grow.
For more info- mysql reference and percona expert

Is it better to use deactive status column instead of deleting rows for mysql performance?

Recently i watched a video about CRUD operations in mysql and one of the things comes to my attention in that video, commentator claimed deleting rows bad for mysql index performance instead of that we should use a status column.
So, is there a really difference between those two ?
Deleting a row is indeed quite expensive, more expensive than setting a new value to a column. Some people don't ever delete a row from their databases (though it's sometimes due to preserving history, not performance considerations).
I usually do delayed deletions: when my app needs to delete a row, it doesn't actually delete, but sets a status instead. Then later, during low traffic period, I execute those deletions.
Some database engines need their data files to be compacted every once in a while, since they cannot reuse the space from deleted records. I'm not sure if InnoDB is one of those.
I guess the strategy is that deleting a row affects all indexes, whereas modifying a 'status' column might not affect any indexes (since you probably wouldn't index that column due to the low cardinality).
Still, when deleting rows, the impact on indexes is minimal. Inserting affects index performance when it fills up a page, causing the index to be rebuilt. This doesn't happen with deletes. With deletes, the index records are merely marked for deletion.
MySQL will later (when load is low) purge deleted rows from the indexes. So, deletes are already cached. Why double the effort?
Your deletes do need indexes just like your selects and updates in order to quickly find the record to delete. So, don't blame slow deletes that are due to missing or bad indexes on MySQL index performance. Your delete statement's WHERE clause should be able to utilize an index. With InnoDB, this is also important to ensure that just a single index record is locked instead having to lock all of the records or a range.

myisam place table-lock on table even when dealing with 'select' query?

i am reading the book High Performance MySQL, it mentions:
performing one query per table uses table locks more efficiently: the queries
will lock the tables invididually and relatively briefly, instead of locking
them all for a longer time.
MyISAM places table-lock even when selecting something? can someone explain a little bit?
MyISAM has different kinds of locks. A SELECT operation places a READ LOCK on the table. There can be multiple active read locks at any given time, as long as there are no active WRITE LOCKS. Operations that modify the table, eg. INSERT, UPDATE, DELETE or ALTER TABLE place a WRITE LOCK on the table. Write lock can only be placed on a table when there are no active read locks; If there are active read locks, MyISAM queues the write lock to be activated as soon as all active read locks are expired.
Likewise when there's an active write lock, attempting to place a read lock on a table will queue the lock (and the associated query) until write locks have expired on the table.
Ultimately this all means that:
You can have any number of active read locks (also called shared locks)
You can only have one active write lock (also called an exclusive lock)
For more information see: http://dev.mysql.com/doc/refman/5.5/en/internal-locking.html
reko_t provided a good answer, I will try to elaborate on it:
Yes.
You can have EITHER one writer or several readers
Except there is a special case, called concurrent inserts. This means that you can have one thread doing an insert, while one or more threads are doing select (read) queries.
there are a lot of caveats doing this:
it has to be "at the end" of the table - not in a "hole" in the middle
Only inserts can be done concurrently (no updates, deletes)
There is still contention on the single MyISAM key buffer. There is a single key buffer, protected by a single mutex, for the whole server. Everything which uses an index needs to take it (typically several times).
Essentially, MyISAM has poor concurrency. You can try to fake it, but it's bad whichever way you look at it. MySQL / Oracle has made no attempts to improve it recently (looking at the source code, I'm not surprised - they'd only introduce bugs).
If you have a workload with lots of "big" SELECTs which retrieve lots of rows, or are hard in some way, they may often overlap, this may seem ok. But a single row update or delete will block the whole lot of them.

Do table locks scale? / Would row locks be more efficient for nested sets?

I'm using nested sets to store hierarchical data in a MyISAM table; the table consists of several hierarchical sets for each user. Each user will be the only one writing to his respective trees, but other users may read from them. Node deletion / Insertion requires that other rows in the same tree have their lft and rgt values updated, potentially hundreds of rows.
In order to do this, I need to get a table write lock, update the other nodes in the tree, delete/insert the row and unlock the table.
What I'm wondering is this -- Do table locks scale to hundreds of concurrent users? thousands?
Would InnoDB's row locks be more efficient in this case? (locking a few hundred rows that will mostly be used only by the user himself)
If I were to use row locks, do I need to add explicit logic to deal with deadlock errors?
Well, the philosophy on locking is different between the two engines.
With MyISAM, the reason for full table locking is that writes should normally be fast. There are only two operation needed for the write (Lock table, then write row to disk). MyISAM performance is really bound by disk speed for this reason.
With InnoDB, it gets a little more complicated. Since it's fully ACID compliant, every write takes 4 steps (Lock row, write to transaction log, write row to dis, write to transaction log). Note that it writes to the disk three times. So that means that (in practice) an InnoDB write will take 3 times longer than a MyISAM write. That's one reason for the row level locking (transactions are another).
But it's not that easy. With MyISAM, the table lock requires one semaphore for that table. So the impact on both memory usage and speed are trivial at best. With InnoDB however, it requires an index and one semaphore per row. It needs an index to speed up the "check" to see if there's already a lock for the row. Now, if you're updating one or 10 rows at the same time, there's little difference. But when you're talking millions of rows the difference can be non-trivial (both in memory usage and speed, since it needs to transverse the lock "index" for each row to be locked).
There is also an additional tradeoff. Since InnoDB is ACID compliant, if there's a power loss (or other crash), you're never left in an inconsistent state. There's no uncommitted transaction's data in the db, and there's no committed transaction corrupted (it will automatically run the transaction log if it detects something to fix it). With MyISAM, a power loss (or crash) during a write can leave the table in an inconsistent state and there's nothing you can do about it. If you care about your data, InnoDB would be better. But, with good Binary logs and a backup system, you should be able to recover MyISAM, but it will require some manual intervention...
Now, with that said, your question of which scales better is really hard. First, are most of your writes dealing with one or two rows? If so, InnoDB and Row level locking will tend to scale better. If you do a lot of queries updating a lot of rows at the same time (tens of thousands and up), you'll notice that MyISAM will tend to have better performance.
As for your question of deadlocks, MySQL will locate and handle them for you (but it won't execute one of the queries, so you may want some exception handling code to either retry the query or something else). The internal system will prevent the deadlock...
Now, another note. Since MySQL supports more than one engine in a db, why not put your data into InnoDB, and then make a MyISAM join table to handle the nested set data? Store parenting info in the data table (via a parent_id mechanism). That way, all your data is in an ACID compliant db, but you can gain the speed increase by using the faster (for reading and large writes) MyISAM for the nested set logic...

Prevent read when updating the table

In MySQL:
Every one minute I empty the table and fill it with a new data. Now I want that users should not read data during the fill process, before or after is ok.
How do I achieve this?
Is transaction the way?
Assuming you use a transactional engine (Usually Innodb), clear and refill the table in the same transaction.
Be sure that your readers use READ_COMMITTED or higher transaction isolation level (the default is REPEATABLE READ which is higher).
That way readers will continue to be able to read the old contents of the table during the update.
There are a few things to be careful of:
If the table is so big that it exhausts the rollback area - this is possible if you update the whole of (say) a 1M row table. Of course this is tunable but there are limits
If the transaction fails part way through and gets rolled back - rolling back big transactions is VERY inefficient in InnoDB (it is optimised for commits, not rollbacks)
Be careful of deadlocks and lock wait timeouts, which are more likely if you use big transactions.
You can LOCK your table for the duration of your operation:
http://dev.mysql.com/doc/refman/5.1/en/lock-tables.html
A table lock protects only against
inappropriate reads or writes by other
sessions. The session holding the
lock, even a read lock, can perform
table-level operations such as DROP
TABLE. Truncate operations are not
transaction-safe, so an error occurs
if the session attempts one during an
active transaction or while holding a
table lock.
I don't know enough about the internal row-versioning mechanisms of MySql (or indeed, if there is one), but other databases (Oracle, Postgresql, and more recently, Sql Server) have invested a lot of effort into allowing writers to not block readers, in so far as readers have access to the version of the rows that existed immediately before the update/write process started. Once the update is committed, that version of the row becomes the one made availabe to all readers, thereby avoiding a bottleneck that the above behaviour in MySql will introduce.
This policy ensures that table locking
is deadlock free. There are, however,
other things you need to be aware of
about this policy: If you are using a
LOW_PRIORITY WRITE lock for a table,
it means only that MySQL waits for
this particular lock until there are
no other sessions that want a READ
lock. When the session has gotten the
WRITE lock and is waiting to get the
lock for the next table in the lock
table list, all other sessions wait
for the WRITE lock to be released. If
this becomes a serious problem with
your application, you should consider
converting some of your tables to
transaction-safe tables.
You can load your data into a shadow table as slowly as you like, then instantly swap the shadow and actual with RENAME TABLE:
truncate table shadow; # make sure it is clean to start with
insert into shadow .....; # lots of inserts etc against shadow table
rename table active to temp, shadow to active, temp to shadow;
truncate table shadow; # throw away the old active data
The rename statement is atomic. An intermediate name "temp" is used to help swap the names of temp and active.
This should work with all storage engines.
Rename table - MySQL Manual