Alternatives to the MEMORY storage engine for MySQL - mysql

I'm currently running some intensive SELECT queries against a MyISAM table. The table is around 100 MiB (800,000 rows) and it never changes.
I need to increase the performance of my script, so I was thinking on moving the table from MyISAM to the MEMORY storage engine, so I could load it completely into the memory.
Besides the MEMORY storage engine, what are my options to load a 100 MiB table into the memory?

A table with 800k rows shouldn't be any problem to mysql, no matter what storage engine you are using. With a size of 100 MB the full table (data and keys) should live in memory (mysql key cache, OS file cache, or propably in both).
First you check the indices. In most cases, optimizing the indices gives you the best performance boost. Never do anything else, unless you are pretty sure they are in shape. Invoke the queries using EXPLAIN and watch for cases where no or the wrong index is used. This should be done with real world data and not on a server with test data.
After you optimized your indices the queries should finish by a fraction of a second. If the queries are still too slow then just try to avoid running them by using a cache in your application (memcached, etc.). Given that the data in the table never changes there shouldn't be any problems with old cache data etc.

Assuming the data rarely changes, you could potentially boost the performance of queries significantly using MySql query caching.

If your table is queried a lot it's probably already cached at the operating system level, depending on how much memory is in your server.
MyISAM also allows for preloading MyISAM table indices into memory using a mechanism called the MyISAM Key Cache. After you've created a key cache you can load an index into the cache using the CACHE INDEX or LOAD INDEX syntax.
I assume that you've analyzed your table and queries and optimized your indices after the actual queries? Otherwise that's really something you should do before attempting to store the entire table in memory.

If you have enough memory allocated for Mysql's use - in the Innodb buffer pool, or for use by MyIsam, you can read the database into memory (just a 'SELECT * from tablename') and if there's no reason to remove it, it stays there.
You also get better key use, as the MEMORY table only does hash-bashed keys, rather than full btree access, which for smaller, non-unique keys might be fats enough, or not so much with such a large table.
As usual, the best thing to do it to benchmark it.
Another idea is, if you are using v5.1, to use an ARCHIVE table type, which can be compressed, and may also speed access to the contents, if they are easily compressible. This swaps the CPU time to de-compress for IO/memory access.

If the data never changes you could easily duplicate the table over several database servers.
This way you could offload some queries to a different server, gaining some extra breathing room for the main server.
The speed improvement depends on the current database load, there will be no improvement if your database load is very low.
PS:
You are aware that MEMORY tables forget their contents when the database restarts!

Related

Is search speed achieved with fast data access or fast index access?

From MySQL doc:
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name
(create_definition,...)
{DATA|INDEX} DIRECTORY [=] 'absolute path to directory'
My table is for search only and takes 8G of disk space (4G data + 4G index) with 80M rows
I can't use ENGINE = Memory to store the whole table into memory but I can store either the data or the index in a RAM drive through the DIRECTORY table options
From a theorical knoledge, is it better to store the data or the index in RAM?
MySQL's default storage engine is InnoDB. As you run queries against an InnoDB table, the portion of that table or indexes that it reads are copied into the InnoDB Buffer Pool in memory. This is done automatically. So if you query the same table later, chances are it's already in memory.
If you run queries against other tables, it load those into memory too. If the buffer pool is full, it will evicting some data that belongs to your first table. This is not a problem, since it was only a copy of what's on disk.
There's no way to specifically "lock" a table on an index in memory. InnoDB will load either data or index if it needs to. InnoDB is smart enough not to evict data you used a thousand times, just for one other table requested one time.
Over time, this tends to balance out, using memory for your most-frequently queried subset of each table and index.
So if you have system memory available, allocate more of it to your InnoDB Buffer Pool. The more memory the Buffer Pool has, the more able it is to store all the frequently-queried tables and indexes.
Up to the size of your data + indexes, of course. The content copied from the data + indexes is stored only once in memory. So if you have only 8G of data + indexes, there's no need to give the buffer pool more and more memory.
Don't allocate more system memory to the buffer pool than your server can afford. Overallocating memory leads to swapping memory for disk, and that will be bad for performance.
Don't bother with the {DATA|INDEX} DIRECTORY options. Those are for when you need to locate a table on another disk volume, because you're running out of space. It's not likely to help performance. Allocating more system memory to the buffer pool will accomplish that much more reliably.
but I can store either the data or the index in a RAM drive through the DIRECTORY table options...
Short answer: let the database and OS do it.
Using a RAM disk might have made sense 10-20 years ago, but these days the software manages caching disk to RAM for you. The disk itself has its own RAM cache, especially if it's a hybrid drive. The OS will cache file system access in RAM. And then MySQL itself will do its own caching.
And if it's an SSD that's already extremely fast, so a RAM cache is unlikely to show much improvement.
So making your own RAM disk isn't likely to do anything that isn't already happening. What you will do is pull resources away from the OS and MySQL that they could have managed smarter themselves likely slowing everything on that machine down.
What you're describing a micro-optimization. This is attempting to make individual operations faster. They tend to add complexity and degrade the system as a whole. And there are limits to how much optimizing you can do with micro-optimizations. For example, if you have to search 1,000,000 rows, and it takes 1ms per row, that's 1,000,000 ms. If you make it 0.9ms per row then it's 900,000 ms.
What you want to focus on is algorithmic optimization, improvements to the algorithm. These tend to make the code simpler and less complex, though often the data structures need to be more thought out, because you're doing less work. Take those same 1,000,000 rows and add an index. Instead of looking at 1,000,000 rows you'll spend, say, 100 ms to look at the index.
The numbers are made up, but I hope you get the point. If "what you want is speed", algorithmic optimizations will take you where no micro-optimization will.
There's also the performance of the code using the database to consider, it is often the real bottleneck using unoptimized queries, poor patterns for fetching related data, and not taking advantage of caching.
Micro-optimizations, with their complexities and special configurations, tend to make algorithmic optimizations more difficult. So you might be slowing yourself down in the long run by worrying about micro-optimizations now. Furthermore, you're doing this at the very start when you only have fuzzy ideas about how this thing will be used or perform or where the bottlenecks will be.
Spend your time optimizing your data structures and indexes, not minute details of your database storage. Once you've done that, if it still isn't fast enough, then look at tweaking settings.
As a side note, there is one possible benefit to playing with DIRECTORY. You can put the data and index on separate physical drives. Then both can be accessed simultaneously with the full I/O throughput of each drive.
Though you've just made it twice as likely to have a disk failure, and complicated backups. You're probably better off with an SSD and/or RAID.
And consider whether a cloud database might actually out-perform any hardware you might be able to afford.

How do I make a MySQL database run completely in memory?

I noticed that my database server supports the Memory database engine. I want to make a database I have already made running InnoDB run completely in memory for performance.
How do I do that? I explored PHPMyAdmin, and I can't find a "change engine" functionality.
Assuming you understand the consequences of using the MEMORY engine as mentioned in comments, and here, as well as some others you'll find by searching about (no transaction safety, locking issues, etc) - you can proceed as follows:
MEMORY tables are stored differently than InnoDB, so you'll need to use an export/import strategy. First dump each table separately to a file using SELECT * FROM tablename INTO OUTFILE 'table_filename'. Create the MEMORY database and recreate the tables you'll be using with this syntax: CREATE TABLE tablename (...) ENGINE = MEMORY;. You can then import your data using LOAD DATA INFILE 'table_filename' INTO TABLE tablename for each table.
It is also possible to place the MySQL data directory in a tmpfs in thus speeding up the database write and read calls. It might not be the most efficient way to do this but sometimes you can't just change the storage engine.
Here is my fstab entry for my MySQL data directory
none /opt/mysql/server-5.6/data tmpfs defaults,size=1000M,uid=999,gid=1000,mode=0700 0 0
You may also want to take a look at the innodb_flush_log_at_trx_commit=2 setting. Maybe this will speedup your MySQL sufficently.
innodb_flush_log_at_trx_commit changes the mysql disk flush behaviour. When set to 2 it will only flush the buffer every second. By default each insert will cause a flush and thus cause more IO load.
Memory Engine is not the solution you're looking for. You lose everything that you went to a database for in the first place (i.e. ACID).
Here are some better alternatives:
Don't use joins - very few large apps do this (i.e Google, Flickr, NetFlix), because it sucks for large sets of joins.
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because
the server might be able to optimize it better—a fact that is not
specific to MySQL Server alone.
-The MySQL Manual
Make sure the columns you're querying against have indexes. Use EXPLAIN to confirm they are being used.
Use and increase your Query_Cache and memory space for your indexes to get them in memory and store frequent lookups.
Denormalize your schema, especially for simple joins (i.e. get fooId from barMap).
The last point is key. I used to love joins, but then had to run joins on a few tables with 100M+ rows. No good. Better off insert the data you're joining against into that target table (if it's not too much) and query against indexed columns and you'll get your query in a few ms.
I hope those help.
If your database is small enough (or if you add enough memory) your database will effectively run in memory since it your data will be cached after the first request.
Changing the database table definitions to use the memory engine is probably more complicated than you need.
If you have enough memory to load the tables into memory with the MEMORY engine, you have enough to tune the innodb settings to cache everything anyway.
"How do I do that? I explored PHPMyAdmin, and I can't find a "change engine" functionality."
In direct response to this part of your question, you can issue an ALTER TABLE tbl engine=InnoDB; and it'll recreate the table in the proper engine.
In place of the Memory storage engine, one can consider MySQL Cluster. It is said to give similar performance but to support disk-backed operation for durability. I've not tried it, but it looks promising (and been in development for a number of years).
You can find the official MySQL Cluster documentation here.
Additional thoughts :
Ramdisk - setting the temp drive MySQL uses as a RAM disk, very easy to set up.
memcache - memcache server is easy to set up, use it to store the results of your queries for X amount of time.

Performance difference between Innodb and Myisam in Mysql

I have a mysql table with over 30 million records that was originally being stored with myisam. Here is a description of the table:
I would run the following query against this table which would generally take around 30 seconds to complete. I would change #eid each time to avoid database or disk caching.
select count(fact_data.id)
from fact_data
where fact_data.entity_id=#eid
and fact_data.metric_id=1
I then converted this table to innoDB without making any other changes and afterwards the same query now returns in under a second every single time I run the query. Even when I randomly set #eid to avoid caching, the query returns in under a second.
I've been researching the differences between the two storage types to try to explain the dramatic improvement in performance but haven't been able to come up with anything. In fact, much of what I read indicates that Myisam should be faster.
The queries I'm running are against a local database with no other processes hitting the database at the time of the tests.
That's a surprisingly large performance difference, but I can think of a few things that may be contributing.
MyISAM has historically been viewed as faster than InnoDB, but for recent versions of InnoDB, that is true for a much, much smaller set of use cases. MyISAM is typically faster for table scans of read-only tables. In most other use cases, I typically find InnoDB to be faster. Often many times faster. Table locks are a death knell for MyISAM in most of my usage of MySQL.
MyISAM caches indexes in its key buffer. Perhaps you have set the key buffer too small for it to effectively cache the index for your somewhat large table.
MyISAM depends on the OS to cache table data from the .MYD files in the OS disk cache. If the OS is running low on memory, it will start dumping its disk cache. That could force it to keep reading from disk.
InnoDB caches both indexes and data in its own memory buffer. You can tell the OS not to also use its disk cache if you set innodb_flush_method to O_DIRECT, though this isn't supported on OS X.
InnoDB usually buffers data and indexes in 16kb pages. Depending on how you are changing the value of #eid between queries, it may have already cached the data for one query due to the disk reads from a previous query.
Make sure you created the indexes identically. Use explain to check if MySQL is using the index. Since you included the output of describe instead of show create table or show indexes from, I can't tell if entity_id is part of a composite index. If it was not the first part of a composite index, it wouldn't be used.
If you are using a relatively modern version of MySQL, run the following command before running the query:
set profiling = 1;
That will turn on query profiling for your session. After running the query, run
show profiles;
That will show you the list of queries for which profiles are available. I think it keeps the last 20 by default. Assuming your query was the first one, run:
show profile for query 1;
You will then see the duration of each stage in running your query. This is extremely useful for determining what (e.g., table locks, sorting, creating temp tables, etc.) is causing a query to be slow.
My first suspicion would be that the original MyISAM table and/or indexes became fragmented over time resulting in the performance slowly degrading. The InnoDB table would not have the same problem since you created it with all the data already in it (so it would all be stored sequentially on disk).
You could test this theory by rebuilding the MyISAM table. The easiest way to do this would be to use a "null" ALTER TABLE statement:
ALTER TABLE mytable ENGINE = MyISAM;
Then check the performance to see if it is better.
Another possibility would be if the database itself is simply tuned for InnoDB performance rather than MyISAM. For example, InnoDB uses the innodb_buffer_pool_size parameter to know how much memory should be allocated for storing cached data and indexes in memory. But MyISAM uses the key_buffer parameter. If your database has a large innodb buffer pool and a small key buffer, then InnoDB performance is going to be better than MyISAM performance, especially for large tables.
What are your index definitions, there are ways in which you can create indexes for MyISAM in which your index fields will not be used when you think they would.

Mysql MEMORY table vs InnoDB table (many inserts, few reads)

I run my sites all on InnoDB tables which is working really well so far. Now I like to know what is going on in real-time on my sites, so I store each pageview (page, referrer, IP, hostname, etc) in an InnoDB table. There are about 100 inserts per second, and this table is only read once in a while when i'm browsing the logs.
I clean out the table every minute with a cron that removes old items. This leaves about 35.000 rows in that table on average, with a size of about 5MB.
Would it be easier on the server if I were to transfer the InnoDB table to a MEMORY table? As far as I can see this would save a lot of disk IO right? Restarting Mysql would result in a loss of data, but this does not matter in my case.
Question: In my case, would you recommend a Memory table over a InnoDB table?
Yes I would. The conditions you mention (a lot of writes, periodic purging of data, data persistence not required) make it pretty much an ideal candidate for MEMORY.
please optimize your innodb settings:
As long as you have configured InnoDB to use enough memory to hold your entire table (with innodb_buffer_pool_size), and there is not excessive pressure from other InnoDB tables on the same server, the data will remain in memory. If you're concerned about write performance (and again barring other uses of the same system) you can reduce durability to drastically increase write performance by setting innodb_flush_log_at_trx_commit = 0 and disabling binary logging.
Using any sort of triggers with temporary tables will be a mess to maintain, and won't give you any benefits of transactionality on the temporary tables.
You can find more details right here:
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

MySQL - How to determine if my table is stored in RAM?

I'm running:
MySQL v5.0.67
InnoDB engine
innodb_buffer_pool_size = 70MB
Question: What command can I run to ensure that my entire 50 MB database is stored entirely in RAM?
I am curious about why you want to store the entire table in memory. My guess is that you are not. The most important thing for me is if your queries are running well and if you are tied up on disk access. It is also possible that the OS has cached disk blocks that you need if there is memory available. In this case, even though MySQL might not have it in memory, the OS will. If your queries are not running well, and you can do it, I highly recommend adding more memory if you want it all in RAM. If you have slowdowns it is more likely that you are running into contention.
show table status
will show you some of the information.
If you get the server IO/buffer/cache statistics from
show server status
and then run a query that requires each row to be accessed (say sum the non empty values from each row using a column that is not indexed) and check to see if any IO has occurred.
I doubt you are caching the entire thing in memory though with only 70MB. You have to take out a lot of cache, temp, and index buffers from that total.
If you run SELECT COUNT(*) FROM yourtable USE INDEX (PRIMARY) then InnoDB will put every page of the PRIMARY index into buffer pool (assuming there is enough room in it). If the table has secondary indexes and if you want to load them into the buffer pool, too, then craft a similar query that would read from a secondary index and do the job.