Does anyone know how much memory MyISAM and innoDB use? How does their memory usages compare when dealing with small tables vs. when dealing with bigger tables (up to 32 GB)?
I know innoDB is heavier than MyISAM, but just how much more?
Any help would be appreciated.
Thanks,
jb
You can't compare them like that. Or at least, you shouldn't. Each one uses the memory in a different way. This is especially true if you're tunning your DB's for performance.
MyISAM has specific buffers for indexes and it uses the OS disk buffer for caching other data. It doesn't make sense to have your buffers larger than the sum of your indexes, but the more memory you give it, the faster it will be.
InnoDB has a buffer pool for all data. You configure this based on your available memory and how much you want to give it. InnoDB buffers as much of your data in memory as possible. If you can fit the entire DB in memory, InnoDB will never read from disk. A lot of InnoDB databases see huge performance hits when the data size becomes larger than the buffer pool.
MySQL is very configurable. It's tunable to meet your needs. Typically, databases should be given as much memory as possible since they are almost always disk bound. More memory means more can be buffered.
Related
From MySQL doc:
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name
(create_definition,...)
{DATA|INDEX} DIRECTORY [=] 'absolute path to directory'
My table is for search only and takes 8G of disk space (4G data + 4G index) with 80M rows
I can't use ENGINE = Memory to store the whole table into memory but I can store either the data or the index in a RAM drive through the DIRECTORY table options
From a theorical knoledge, is it better to store the data or the index in RAM?
MySQL's default storage engine is InnoDB. As you run queries against an InnoDB table, the portion of that table or indexes that it reads are copied into the InnoDB Buffer Pool in memory. This is done automatically. So if you query the same table later, chances are it's already in memory.
If you run queries against other tables, it load those into memory too. If the buffer pool is full, it will evicting some data that belongs to your first table. This is not a problem, since it was only a copy of what's on disk.
There's no way to specifically "lock" a table on an index in memory. InnoDB will load either data or index if it needs to. InnoDB is smart enough not to evict data you used a thousand times, just for one other table requested one time.
Over time, this tends to balance out, using memory for your most-frequently queried subset of each table and index.
So if you have system memory available, allocate more of it to your InnoDB Buffer Pool. The more memory the Buffer Pool has, the more able it is to store all the frequently-queried tables and indexes.
Up to the size of your data + indexes, of course. The content copied from the data + indexes is stored only once in memory. So if you have only 8G of data + indexes, there's no need to give the buffer pool more and more memory.
Don't allocate more system memory to the buffer pool than your server can afford. Overallocating memory leads to swapping memory for disk, and that will be bad for performance.
Don't bother with the {DATA|INDEX} DIRECTORY options. Those are for when you need to locate a table on another disk volume, because you're running out of space. It's not likely to help performance. Allocating more system memory to the buffer pool will accomplish that much more reliably.
but I can store either the data or the index in a RAM drive through the DIRECTORY table options...
Short answer: let the database and OS do it.
Using a RAM disk might have made sense 10-20 years ago, but these days the software manages caching disk to RAM for you. The disk itself has its own RAM cache, especially if it's a hybrid drive. The OS will cache file system access in RAM. And then MySQL itself will do its own caching.
And if it's an SSD that's already extremely fast, so a RAM cache is unlikely to show much improvement.
So making your own RAM disk isn't likely to do anything that isn't already happening. What you will do is pull resources away from the OS and MySQL that they could have managed smarter themselves likely slowing everything on that machine down.
What you're describing a micro-optimization. This is attempting to make individual operations faster. They tend to add complexity and degrade the system as a whole. And there are limits to how much optimizing you can do with micro-optimizations. For example, if you have to search 1,000,000 rows, and it takes 1ms per row, that's 1,000,000 ms. If you make it 0.9ms per row then it's 900,000 ms.
What you want to focus on is algorithmic optimization, improvements to the algorithm. These tend to make the code simpler and less complex, though often the data structures need to be more thought out, because you're doing less work. Take those same 1,000,000 rows and add an index. Instead of looking at 1,000,000 rows you'll spend, say, 100 ms to look at the index.
The numbers are made up, but I hope you get the point. If "what you want is speed", algorithmic optimizations will take you where no micro-optimization will.
There's also the performance of the code using the database to consider, it is often the real bottleneck using unoptimized queries, poor patterns for fetching related data, and not taking advantage of caching.
Micro-optimizations, with their complexities and special configurations, tend to make algorithmic optimizations more difficult. So you might be slowing yourself down in the long run by worrying about micro-optimizations now. Furthermore, you're doing this at the very start when you only have fuzzy ideas about how this thing will be used or perform or where the bottlenecks will be.
Spend your time optimizing your data structures and indexes, not minute details of your database storage. Once you've done that, if it still isn't fast enough, then look at tweaking settings.
As a side note, there is one possible benefit to playing with DIRECTORY. You can put the data and index on separate physical drives. Then both can be accessed simultaneously with the full I/O throughput of each drive.
Though you've just made it twice as likely to have a disk failure, and complicated backups. You're probably better off with an SSD and/or RAID.
And consider whether a cloud database might actually out-perform any hardware you might be able to afford.
Is a memory/heap engine table the same performance wise to a mostly innodb table database with big buffer pool? I usually have 2 tables - 1 innodb with varchars and several rows and a memory table compact size (5 rows, mostly just PK and indexed ints for heavy reads..I recently learned about innodb buffer so is my table clone system overkill and useless or still faster then innodb?
In memory tables must be more performant, at least in theory: in InnoDB, even with a large buffer pool, you're going to have block-based structure in the cache, so some blocks will only be partially full, and that's an overhead. Another reason is that in-memory tables don't have row versions or row locks, so, again, this is going to use less memory. But befare: in-memory tables still don't have row-level locking, so if you run large updates, you may actually find that using InnoDB is more scalable.
So, to sum up: MEMORY table - potentially less memory to store the same amount of data, InnoDB - potentially more scalable.
Everything needs to be measured for your particular case of course.
Perhaps if you need to store data in memory anyway, choose an in-memory database? (shameless plug).
Reads from the InnoDB buffer pool will be sensibly as fast as with Memory tables.
In some cases, Memory tables could even out-perform buffered InnoDB tables, the former also supports Hash indexes whereas the latter only supports B-Tree indexes. Depending on the profile of your queries, you might get faster reads with Hash tables.
Besides, buffered InnoDB tables could be flushed out of the buffer if some query require this memory space, or if the data is seldom used. By explicitely copying your data to a Memory table, you have the guarantee that your data will always be in memory.
I should also mention that regardless of the size of the buffer pool, updates to an InnoDB table will need to be flushed to disk at some stage. But I understand this does not apply in your use case.
Now this is theory. Only if this data is to be read very, very frequently should you bother with these considerations.
I have a server with 32G, a database of 20G. I run MySQL with innodb_buffer_pool_size of 10G, in trying to improve the performance increasing the value to 20G just slows down the select-queries. Can anybody explain me why that happens?
Try EXPLAIN to get the general idea why the queries are slow.
Try SHOW ENGINE INNODB STATUS to see how buffer pool is used and what your InnoDB engine is doing.
There may be a lot of other factors in play here. Increasing buffer_pool_size you may be getting out of physical RAM (in accord with other configuration options), and then it goes into virtual memory, which is on disk... and disk operations are always slow...
Assume a database consisting of 1 GB of data and 1 GB of index data.
To minimize disk IO and hence maximize performance I want to allocate memory to MySQL so that the entire dataset including indexes can be kept in RAM (assume that the machine has RAM in abundance).
The InnoDB parameter innodb_buffer_pool_size is used to specify the size of the memory buffer InnoDB uses to cache data and indexes of its tables. (Note: The memory is used for data AND indexes.)
The MyISAM parameter key_buffer_size is used to specify the size of the memory buffer MyISAM uses to cache indexes of its tables. (Note: The memory is used ONLY for indexes.)
If I want the 2 GB database (1 GB data and 1 GB index) to fit into memory under InnoDB, I'd simply configure the innodb_buffer_pool_size to be 2GB. The two gigabytes will hold both the data and the index.
However, when setting the MyISAM key key_buffer_size to 2GB that space will be used for the index, but not for the data.
My questions are:
Can MyISAM's "data buffer size" (not index data) be configured explicitly?
When will MyISAM read table data (excluding index data) from disk and when will it read from memory?
No MyISAM has no general purpose data cache. This is documented in the "key_buffer_size" description from the official documentation: This is because MySQL relies on the operating system to perform file system caching for data reads, so you must leave some room for the file system cache.
Modern OSes, especially Linux, tend to have very smart virtual memory subsystems that will keep frequently accessed files in the page cache, so disk I/O is kept at a bare minimum when the working set fits in available memory.
So to answer your second question: never.
It's important not to fall into "buffer oversizing" too for the various myisam variables such as read_buffer_size, read_rnd_buffer_size, sort_buffer_size, join_buffer_size, etc as some are dynamically allocated, so bigger doesn't always mean faster - and sometimes it can even be slower - see this post on mysqlperformanceblog for a very interesting case.
If you're on 5.1 on a posix platform, you might want to benchmark myisam_use_mmap on your workload, it's supposed to help high contention cases by reducing the quantity of malloc() calls.
When using MyISAM the configuration setting key_buffer_size defines the size of the global buffer where MySQL caches frequently used blocks of index data.
What is the corresponding setting for InnoDB?
innodb_buffer_pool_size is the setting that controls the size of the memory buffer that InnoDB uses to cache indexes and data. It's an important performance option.
See the manual page for the full explanation. The MySQL Performance Blog also has an article about how to choose a proper size for it.
As far as I know, the best setting you can adjust for InnoDB is innodb_buffer_pool_size.
The size in bytes of the memory buffer
InnoDB uses to cache data and indexes
of its tables. The default value is
8MB. The larger you set this value,
the less disk I/O is needed to access
data in tables. On a dedicated
database server, you may set this to
up to 80% of the machine physical
memory size. However, do not set it
too large because competition for
physical memory might cause paging in
the operating system.