I'm running:
MySQL v5.0.67
InnoDB engine
innodb_buffer_pool_size = 70MB
Question: What command can I run to ensure that my entire 50 MB database is stored entirely in RAM?
I am curious about why you want to store the entire table in memory. My guess is that you are not. The most important thing for me is if your queries are running well and if you are tied up on disk access. It is also possible that the OS has cached disk blocks that you need if there is memory available. In this case, even though MySQL might not have it in memory, the OS will. If your queries are not running well, and you can do it, I highly recommend adding more memory if you want it all in RAM. If you have slowdowns it is more likely that you are running into contention.
show table status
will show you some of the information.
If you get the server IO/buffer/cache statistics from
show server status
and then run a query that requires each row to be accessed (say sum the non empty values from each row using a column that is not indexed) and check to see if any IO has occurred.
I doubt you are caching the entire thing in memory though with only 70MB. You have to take out a lot of cache, temp, and index buffers from that total.
If you run SELECT COUNT(*) FROM yourtable USE INDEX (PRIMARY) then InnoDB will put every page of the PRIMARY index into buffer pool (assuming there is enough room in it). If the table has secondary indexes and if you want to load them into the buffer pool, too, then craft a similar query that would read from a secondary index and do the job.
Related
SELECT P_CODE, P_PRICE
FROM PRODUCT
WHERE P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT);
Will this query (under mysql) result in two full table scans (from disk) or will the optimizer understand that it's faster too (if there is enough RAM to hold the result set) only do one full table scan? The table has no indexes.
Is it possible to read (somehow) this information from output of the EXPLAIN command in mysql?
The question is flawed based on a misunderstanding of what a table scan actually is:
A table scan iterates over all rows in the table (irrespective of how it obtains those rows).
It also differs slightly from an index scan in that it works with the "full row". Whereas an index scan has less overall data to process, because it works with a subset of columns.
But the question is actually asking about difference between physical and logical IO.
(from disk) or will the optimizer understand that it's faster too (if there is enough RAM to hold the result set)
Yes the query will do 2 table scans. That cannot be avoided:
the server has to process the full set of prices twice.
and it has to finish processing for AVG(PRICE) before it can start processing for the WHERE filter.
However, a "logical" table scan does not necessarily require reading the data from disk twice. If all the data is in memory, the server can perform the table scan in memory. So although the second stage of processing must still perform a table scan, it can be more efficient by avoiding secondary disk access.
Take a look at this question to see how to distinguish logical and physical IO on mysql:
For a MySQL query, how do you determine physical and logical I/O?
I'll add that in theory a server could choose to keep only the Price column in memory on the first pass. In which case it wouldn't need be perform a "full table scan" on the second pass.
However this is unlikely in practice as there's a benefit to keep all the data in memory for other future queries ... whatever columns they may wish to process.
Re your comment:
my assumption, when looking at the query, is than an optimizer should/would be able to determine that "this query reads the same data twice, after the first read i will put it into memory(if there is space) and use the in-memory data for the next part of the query, instead of asking the disk for it twice"
Well, at least in MySQL's InnoDB engine, something sort of like this happens. InnoDB can't really read pages directly from disk. It load every requested page into RAM before doing data operations on it. The RAM is a preallocated area called the InnoDB buffer pool. This stores byte-for-byte copies of the pages from the on-disk tablespace, plus some metadata about them.
After reading a page, the buffer pool has no immediate need to evict it from RAM, unless other pages are requested and there's no space left in the buffer pool for them. So subsequent requests for the same pages may find the pages already residing in RAM. The more this happens, the better your performance overall.
You might have more data pages in your product table than can fit in your buffer pool. During a table-scan, InnoDB will evict pages as needed to load the remaining set of pages for the table. If you have a table that is many times larger than your buffer pool, you can imagine that this results in quite a bit of "churn" as pages come in and out. If you can afford it, allocating more RAM to the buffer pool is an good way to improve performance.
All these facts about the buffer pool don't change the fact that your query will perform two table-scans. It is true that it will be faster to read the pages from the buffer pool than reading pages from disk. You can experiment:
Shutdown your MySQL Server and start it back up again. The buffer pool should be empty at this point (unless you are using the feature to save the buffer pool on shutdown).
Run your query. It might take many seconds, because each page requested has to be read from disk before it can be used.
Run the same query again. It's faster! I've seen cases where this difference makes the performance about 4x faster in tests. I understand that RAM is typically thousands of times faster than disk, but I/O speed is not the only code running. Also it depends on what other requests are occupying the disk bandwidth, and other factors.
The difference between disk speed and RAM speed is (more or less) an arithmetic factor. No matter how large your dataset, the speed difference gives the same advantage.
Indexes are much more important, because they turn a linear search O(n) into a B-tree search O(log2n). As your dataset gets larger, the advantage of this becomes more dramatic. This is why there is so much emphasis on analyzing complexity of algorithms in computer science.
Please explain how you could do this with only one table scan. It is not obvious.
The use of the AVG() function would typically result in two full scans. If you have an index, then one or both scans might use the index.
I have a mysql table with over 30 million records that was originally being stored with myisam. Here is a description of the table:
I would run the following query against this table which would generally take around 30 seconds to complete. I would change #eid each time to avoid database or disk caching.
select count(fact_data.id)
from fact_data
where fact_data.entity_id=#eid
and fact_data.metric_id=1
I then converted this table to innoDB without making any other changes and afterwards the same query now returns in under a second every single time I run the query. Even when I randomly set #eid to avoid caching, the query returns in under a second.
I've been researching the differences between the two storage types to try to explain the dramatic improvement in performance but haven't been able to come up with anything. In fact, much of what I read indicates that Myisam should be faster.
The queries I'm running are against a local database with no other processes hitting the database at the time of the tests.
That's a surprisingly large performance difference, but I can think of a few things that may be contributing.
MyISAM has historically been viewed as faster than InnoDB, but for recent versions of InnoDB, that is true for a much, much smaller set of use cases. MyISAM is typically faster for table scans of read-only tables. In most other use cases, I typically find InnoDB to be faster. Often many times faster. Table locks are a death knell for MyISAM in most of my usage of MySQL.
MyISAM caches indexes in its key buffer. Perhaps you have set the key buffer too small for it to effectively cache the index for your somewhat large table.
MyISAM depends on the OS to cache table data from the .MYD files in the OS disk cache. If the OS is running low on memory, it will start dumping its disk cache. That could force it to keep reading from disk.
InnoDB caches both indexes and data in its own memory buffer. You can tell the OS not to also use its disk cache if you set innodb_flush_method to O_DIRECT, though this isn't supported on OS X.
InnoDB usually buffers data and indexes in 16kb pages. Depending on how you are changing the value of #eid between queries, it may have already cached the data for one query due to the disk reads from a previous query.
Make sure you created the indexes identically. Use explain to check if MySQL is using the index. Since you included the output of describe instead of show create table or show indexes from, I can't tell if entity_id is part of a composite index. If it was not the first part of a composite index, it wouldn't be used.
If you are using a relatively modern version of MySQL, run the following command before running the query:
set profiling = 1;
That will turn on query profiling for your session. After running the query, run
show profiles;
That will show you the list of queries for which profiles are available. I think it keeps the last 20 by default. Assuming your query was the first one, run:
show profile for query 1;
You will then see the duration of each stage in running your query. This is extremely useful for determining what (e.g., table locks, sorting, creating temp tables, etc.) is causing a query to be slow.
My first suspicion would be that the original MyISAM table and/or indexes became fragmented over time resulting in the performance slowly degrading. The InnoDB table would not have the same problem since you created it with all the data already in it (so it would all be stored sequentially on disk).
You could test this theory by rebuilding the MyISAM table. The easiest way to do this would be to use a "null" ALTER TABLE statement:
ALTER TABLE mytable ENGINE = MyISAM;
Then check the performance to see if it is better.
Another possibility would be if the database itself is simply tuned for InnoDB performance rather than MyISAM. For example, InnoDB uses the innodb_buffer_pool_size parameter to know how much memory should be allocated for storing cached data and indexes in memory. But MyISAM uses the key_buffer parameter. If your database has a large innodb buffer pool and a small key buffer, then InnoDB performance is going to be better than MyISAM performance, especially for large tables.
What are your index definitions, there are ways in which you can create indexes for MyISAM in which your index fields will not be used when you think they would.
I run my sites all on InnoDB tables which is working really well so far. Now I like to know what is going on in real-time on my sites, so I store each pageview (page, referrer, IP, hostname, etc) in an InnoDB table. There are about 100 inserts per second, and this table is only read once in a while when i'm browsing the logs.
I clean out the table every minute with a cron that removes old items. This leaves about 35.000 rows in that table on average, with a size of about 5MB.
Would it be easier on the server if I were to transfer the InnoDB table to a MEMORY table? As far as I can see this would save a lot of disk IO right? Restarting Mysql would result in a loss of data, but this does not matter in my case.
Question: In my case, would you recommend a Memory table over a InnoDB table?
Yes I would. The conditions you mention (a lot of writes, periodic purging of data, data persistence not required) make it pretty much an ideal candidate for MEMORY.
please optimize your innodb settings:
As long as you have configured InnoDB to use enough memory to hold your entire table (with innodb_buffer_pool_size), and there is not excessive pressure from other InnoDB tables on the same server, the data will remain in memory. If you're concerned about write performance (and again barring other uses of the same system) you can reduce durability to drastically increase write performance by setting innodb_flush_log_at_trx_commit = 0 and disabling binary logging.
Using any sort of triggers with temporary tables will be a mess to maintain, and won't give you any benefits of transactionality on the temporary tables.
You can find more details right here:
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
In a tb with 1 mil. rows if I do (after I restart the computer - so nothing it's cached):
1. SELECT price,city,state FROM tb1 WHERE zipId=13458;
the result is 23rows in 0.270s
after I run 'LOAD INDEX INTO CACHE tb1' (key_buffer_size=128M and total index size for tb is 82M):
2. SELECT price,city,state FROM tb1 WHERE zipId=24781;
the result is 23rows in 0.252s, Key_reads remains constant, Key_read_requests is incremented with 23
BUT after I load 'zipId' into OS cache, if I run again the query:
2. SELECT price,city,state FROM tb1 WHERE zipId=20548;
the result is 22rows in 0.006s
This it's just a simple example, but I run tens of tests and combinations. But the results are always the same.
I use: MySql with MyISAM, WINDOWS 7 64, and the query_cache is 0;
zipId it's a regular index (not primary key)
SHOULDN'T key_cache be faster than OS cache ??
SHOULDN'T be a huge difference in speed, after I load the index into cache ??
(in my test it's almost no difference).
I've read a lot of websites,tutorials and blogs on this matter but none of them really discuss the difference in speed. So, any ideas or links will be greatly appreciated.
Thank you.
Under normal query processing, MySQL will scan the index for the where clause values (i.e. zipId = 13458). Then uses the index to look up the corresponding values from the MyISAM main table (a second disk access). When you load the table into memory, the disk accesses are all done in memory, not from reading a real disk.
The slow part of the query is the lookup from the index into the main table. So loading the index into memory may not improve the query speed.
One thing to try is Explain Select on your queries to see how the index is being used.
Edit: Since I don't think the answers to your comments will fit in a comment space. I'll answer them here.
MyISAM in and of itself does not have a cache. It relies upon the OS to do the disk caching. How much of your table is cached by depends upon what else you are running in the system, and how much data you are reading through. Windows in particular does not allow the user much control over what data is cached and for how long.
The OS caches disk blocks (either 4K or 8K chunks) of the index file or the full table file.
SELECT indexed_col FROM tb1 WHERE zipId+0>1
Queries like this where you use functions on the predicate (Where clause) can cause MySQL to do full table scans rather than using any index. As I suggested above, use EXPLAIN SELECT to see what MySQL is doing.
If you want more control over the cache, try using an INNODB table. The InnoDB engine creates its own cache which you can size, and does a better job of keeping the most recent used stuff in it.
I'm currently running some intensive SELECT queries against a MyISAM table. The table is around 100 MiB (800,000 rows) and it never changes.
I need to increase the performance of my script, so I was thinking on moving the table from MyISAM to the MEMORY storage engine, so I could load it completely into the memory.
Besides the MEMORY storage engine, what are my options to load a 100 MiB table into the memory?
A table with 800k rows shouldn't be any problem to mysql, no matter what storage engine you are using. With a size of 100 MB the full table (data and keys) should live in memory (mysql key cache, OS file cache, or propably in both).
First you check the indices. In most cases, optimizing the indices gives you the best performance boost. Never do anything else, unless you are pretty sure they are in shape. Invoke the queries using EXPLAIN and watch for cases where no or the wrong index is used. This should be done with real world data and not on a server with test data.
After you optimized your indices the queries should finish by a fraction of a second. If the queries are still too slow then just try to avoid running them by using a cache in your application (memcached, etc.). Given that the data in the table never changes there shouldn't be any problems with old cache data etc.
Assuming the data rarely changes, you could potentially boost the performance of queries significantly using MySql query caching.
If your table is queried a lot it's probably already cached at the operating system level, depending on how much memory is in your server.
MyISAM also allows for preloading MyISAM table indices into memory using a mechanism called the MyISAM Key Cache. After you've created a key cache you can load an index into the cache using the CACHE INDEX or LOAD INDEX syntax.
I assume that you've analyzed your table and queries and optimized your indices after the actual queries? Otherwise that's really something you should do before attempting to store the entire table in memory.
If you have enough memory allocated for Mysql's use - in the Innodb buffer pool, or for use by MyIsam, you can read the database into memory (just a 'SELECT * from tablename') and if there's no reason to remove it, it stays there.
You also get better key use, as the MEMORY table only does hash-bashed keys, rather than full btree access, which for smaller, non-unique keys might be fats enough, or not so much with such a large table.
As usual, the best thing to do it to benchmark it.
Another idea is, if you are using v5.1, to use an ARCHIVE table type, which can be compressed, and may also speed access to the contents, if they are easily compressible. This swaps the CPU time to de-compress for IO/memory access.
If the data never changes you could easily duplicate the table over several database servers.
This way you could offload some queries to a different server, gaining some extra breathing room for the main server.
The speed improvement depends on the current database load, there will be no improvement if your database load is very low.
PS:
You are aware that MEMORY tables forget their contents when the database restarts!