Caching Response in MEMORY vs. MyISAM vs. InnoDB - mysql

I need to cache API responses due to a high flood delay. So I will cache every response for up to 30 seconds and whenever I get a cache hit, I will use the cached data, if it's younger than 30 seconds.
Which storage engine is the best for that?
First thought was MEMORY, but it's locking the whole table which I think could make some trouble when many users are online.
Then I thought about InnoDB, but then I read that MyISAM is better with many write operations.
But MyISAM again locks the whole table.
Which storage engine is best as a API response cache?

None of the above.
MySQL is a database, not a cache. The idea is to store data persistently (and in the case of InnoDB, durably). There is no TTL (time to live) or automatic expiration of data in SQL. You would also have to remove data that is older than 30 seconds yourself, or else it would accumulate.
You should use a cache server for data you want to disappear after 30 seconds. Memcached for example allows you to set a TTL in seconds when you set an object in the cache.

InnoDB.
But...
You say "API responses" -- is that HTML pages? AJAX replies? SQL queries? Something else?
How often is the data changing? Your design will deliver data that is up to 30 seconds out of date; is that acceptable? The "Query cache" avoids this, but has other issues.
Have you considered memcached?
What is the overhead of inserting into the proposed cache and checking it every time? Probably more than the alternatives.
Quite probably you have one or two queries that are a lot slower than they have to be. Fixing them would eliminate the need for extra caching. Please SHOW CREATE TABLE and EXPLAIN SELECT .... We may be able to quickly spot ways to speed them up.

Related

How do a really clear cache in MYSQL?

I would like to run the same query multiple times to see how much time it takes without the aid of cache memory.
Running the command RESET QUERY CACHE seems not to work because the query takes a really short time on the second run even after the cache reset.
What am i missing?
Your OS is caching a whole lot of data. Then the DBMS itself caches a whole of data. The query cache only holds the output of a previous query. The data the query output is created from may all be in RAM. If you want to run the query without any caching then switch off the host between executions or run a query which will read enough data from disk to overwrite all the cache. But it won't be a realistic measure of ho.w your query will perform in the wild.
Two "caches" affects the query. Query Cache and Buffer Cache(buffer_pool with InnoDB and key_buffer with MyISAM). Query cache affects much more than Buffer Cache because it caches the result of your query, so the same query will not execute again.
To avoid Query Cache, uses SQL_NO_CACHE is a good idea, and you can disable Query Cache in my.cnf and restart mysql.
Buffer cache is managed by mysql to cache data in memory so that your query will not read data from disk(SSD or HDD). If you want to clear it, try to set the 'Buffer cache' small enough and fill it with other data(use SELECT).
You can use the SQL_NO_CACHE in the SELECT query.
https://dev.mysql.com/doc/refman/5.5/en/query-cache-in-select.html
Keep it in mind that Query cache has been removed in later versions of MySQL.
Depending on the storage engine, (e.g innodb), it loads table data into memory as part the buffer pool. This part you can't really control on what tables to load and which tables not to load (or at least not easily).
On an ancient version of MySQL (v.5.1), I had a problem with a view being cached, which would never refresh. I tried RESET QUERY CACHE , FLUSH TABLES, SELECT SQL_NO_CACHE..., etc. And nothing worked. Then, I changed the storage engine for the underlying (single) table I was querying in this view, from InnoDB to MyISAM, and it worked as desired implicitly! There was no need to jump through any hoops to clear or prevent caching!
I'm not sure if this was simply a bug with that old version / storage engine? Please leave comments if you have any knowledge to share on the matter.

when performing SELECT operation how is data read from disk each time ? how to verify data is being read from disk?

Do we need to drop OS cache?
I want to read data from disk not from cache I have disabled
1. query_cache_type=OFF
2. query_cache_size=0
even then when i perform select operation for Id =2 , innodb_buffer_pool_reads changes . If i select Id=3 no change for innodb_buffer_pool_reads.
How do I read next value from disk? Is there any other way to verify whether data is being read from the disk?
[Edit] Thank you all for your response.
I m trying to perform reverse engineering , want to test the execution speed of a select query without cache . So want to disable all cache and read data from disk?
Yes, to completely turn off the Query cache, make both of those settings.
To disable the Query cache for a single SELECT, do SELECT SQL_NO_CACHE ....
But... The QC is not the only caching mechanism. For InnoDB, the buffer_pool caches data and indexes. (MyISAM uses its key_cache and the OS's cache)
Typically, the first time you perform a query (after restarting the server), the disk will need to be hit. Typically, that query (or similar queries) performed after that will not need to hit the disk. Because of "caching", MySQL will hit the disk as little as necessary.
If some other connection in modifying the data you are about to SELECT, do not worry. MySQL will make sure to change the cached copy and/or the disk copy. You will always get the correct value.
InnoDB does things in "blocks" (16KB, typically about 100 rows). That is the unit of disk I/O. Ids 1,2,3 are probably in the same block. Again, MySQL takes care of fetches and changes. It will probably read the block once, cache it for a long time, and eventually write it once, even if there are a lot of changes to the rows in the block.
So how does "Durability" happen? Magic. It involves the InnoDB log file and some extra writes that are done to it. That is another topic; it would take much too long to explain it all.

MySQL server very high load

I run a website with ~500 real time visitors, ~50k daily visitors and ~1,3million total users. I host my server on AWS, where I use several instances of different kind. When I started the website the different instances cost rougly the same. When the website started to gain users the RDS instance (MySQL DB) CPU constantly keept hitting the roof, I had to upgrade it several times, now it have started to take up the main part of the performance and monthly cost (around 95% of (2,8k$/month)). I currently use a database server with 16vCPU and 64GiB of RAM, I also use Multi-AZ Deployment to protect against failures. I wonder if it is normal for the database to be that expensive, or if I have done something terribly wrong?
Database Info
At the moment my database have 40 tables with the most of them have 100k rows, some have ~2millions and 1 have 30 millions.
I have a system the archives rows that are older then 21 days when they are not needed anymore.
Website Info
The website mainly use PHP, but also some NodeJS and python.
Most of the functions of the website works like this:
Start transaction
Insert row
Get last inserted id (lastrowid)
Do some calculations
Updated the inserted row
Update the user
Commit transaction
I also run around 100bots wich polls from the database with 10-30sec interval, they also inserts/updates the database sometimes.
Extra
I have done several things to try to lower the load on the database. Such as enable database cache, use a redis cache for some queries, tried to remove very slow queries, tried to upgrade the storage type to "Provisioned IOPS SSD". But nothing seems to help.
This is the changes I have done to the setting paramters:
I have though about creating a MySQL cluster of several smaller instances, but I don't know if this would help, and I also don't know if this works good with transactions.
If you need any more information, please ask, any help on this issue is greatly appriciated!
In my experience, as soon as you ask the question "how can I scale up performance?" you know you have outgrown RDS (edit: I admit my experience that leads me to this opinion may be outdated).
It sounds like your query load is pretty write-heavy. Lots of inserts and updates. You should increase the innodb_log_file_size if you can on your version of RDS. Otherwise you may have to abandon RDS and move to an EC2 instance where you can tune MySQL more easily.
I would also disable the MySQL query cache. On every insert/update, MySQL has to scan the query cache to see if there any results cached that need to be purged. This is a waste of time if you have a write-heavy workload. Increasing your query cache to 2.56GB makes it even worse! Set the cache size to 0 and the cache type to 0.
I have no idea what queries you run, or how well you have optimized them. MySQL's optimizer is limited, so it's frequently the case that you can get huge benefits from redesigning SQL queries. That is, changing the query syntax, as well as adding the right indexes.
You should do a query audit to find out which queries are accounting for your high load. A great free tool to do this is https://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html, which can give you a report based on your slow query log. Download the RDS slow query log with the http://docs.aws.amazon.com/cli/latest/reference/rds/download-db-log-file-portion.html CLI command.
Set your long_query_time=0, let it run for a while to collect information, then change long_query_time back to the value you normally use. It's important to collect all queries in this log, because you might find that 75% of your load is from queries under 2 seconds, but they are run so frequently that it's a burden on the server.
After you know which queries are accounting for the load, you can make some informed strategy about how to address them:
Query optimization or redesign
More caching in the application
Scale out to more instances
I think the answer is "you're doing something wrong". It is very unlikely you have reached an RDS limitation, although you may be hitting limits on some parts of it.
Start by enabling detailed monitoring. This will give you some OS-level information which should help determine what your limiting factor really is. Look at your slow query logs and database stats - you may have some queries that are causing problems.
Once you understand the problem - which could be bad queries, I/O limits, or something else - then you can address them. RDS allows you to create multiple read replicas, so you can move some of your read load to slaves.
You could also move to Aurora, which should give you better I/O performance. Or use PIOPS (or allocate more disk, which should increase performance). You are using SSD storage, right?
One other suggestion - if your calculations (step 4 above) takes a significant amount of time, you might want look at breaking it into two or more transactions.
A query_cache_size of more than 50M is bad news. You are writing often -- many times per second per table? That means the QC needs to be scanned many times/second to purge the entries for the table that changed. This is a big load on the system when the QC is 2.5GB!
query_cache_type should be DEMAND if you can justify it being on at all. And in that case, pepper the SELECTs with SQL_CACHE and SQL_NO_CACHE.
Since you have the slowlog turned on, look at the output with pt-query-digest. What are the first couple of queries?
Since your typical operation involves writing, I don't see an advantage of using readonly Slaves.
Are the bots running at random times? Or do they all start at the same time? (The latter could cause terrible spikes in CPU, etc.)
How are you "archiving" "old" records? It might be best to use PARTITIONing and "transportable tablespaces". Use PARTITION BY RANGE and 21 partitions (plus a couple of extras).
Your typical transaction seems to work with one row. Can it be modified to work with 10 or 100 all at once? (More than 100 is probably not cost-effective.) SQL is much more efficient in doing lots of rows at once versus lots of queries of one row each. Show us the SQL; we can dig into the details.
It seems strange to insert a new row, then update it, all in one transaction. Can't you completely compute it before doing the insert? Hanging onto the inserted_id for so long probably interferes with others doing the same thing. What is the value of innodb_autoinc_lock_mode?
Do the "users" interactive with each other? If so, in what way?

mySQL Inconsistent Performance

I'm running a mySQL query that joins various tables of 500,000+ rows. Sometimes it takes a second, other times around 15 seconds! This is on my local machine. I have experienced similarly varied times before on other intensive queries, does anyone know why this is?
Thanks
Thanks for the replies - I am using appropriate indexes, inner and left joins and have a WHERE clause range of one week out of possible 2 year period of invoices. If I keep varying it (so presumably query results are not cached) and re-running, time varies a lot, even if no. of rows retrieved is similar. The server is not busy. A few scheduled queries every minute but not intensive, take around 200ms.
The explain plan shows that a table of around 2000 rows is always fully scanned. So maybe these rows are sometimes cached, or maybe indexes are cached - didnt know indexes could be cached. I will try again with caching turned off.
Editing again - query cache is in fact off, I'm using InnoDB so looks like increasing innodb_buffer_pool_size is way to go
Same query each time?
It's hard to tell, based on what you've posted. If we assume that the schema and data aren't changing, I'd guess that there's something else running on your machine when the queries are long that would explain the difference. It could be that the state of memory is different, so paging is going on; an anti-virus program is running; some other service has started. It's impossible to answer.
Try to do an
Optimize Table
That should help to refresh some data useful for the query planner.
You have not give us much information, if you're using MyISAM tables, it may be a matter of locks.
Are you using ANSI INNER JOINs? Little basic, but don't use "cross joins". Those are the joins with the comma, like
SELECT * FROM t1, t2 WHERE t1.id_t1=t2.id_t1
Last things you may want to try. Increase your buffers (innodb), your key_buffers (myisam), and some query cache buffers.
Here's some common reasons(bar your server simply being too busy)
The slow query is hitting the harddrive. In the fast case the indexes and data are already cached in MySQL or the OS file cache.
Retrieving the data gets locked by updates/inserts, for MyISAM tables the whole table gets locked whenever someone inserts/updates data in it in some cases.
Table statistics are out of date and/or the wrong index gets selected. running analyze oroptimize on the table can help.
You have the query cache enabled, fetching the result of a cached query is fast, fetching it if it's not in the cache might be slow. Try turning off the query cache to check if the query is always slow if its not fetched from the cache.
In any case, you should show the output of EXPLAIN on your queries to verify indexes are getting used properly - even if they're not, queries can be fast if everything is in ram but grinding to a halt if it needs to hit the hardddrive.

Will a MySQL table with 20,000,000 records be fast with concurrent access?

I ran a lookup test against an indexed MySQL table containing 20,000,000 records, and according to my results, it takes 0.004 seconds to retrieve a record given an id--even when joining against another table containing 4,000 records. This was on a 3GHz dual-core machine, with only one user (me) accessing the database. Writes were also fast, as this table took under ten minutes to create all 20,000,000 records.
Assuming my test was accurate, can I expect performance to be as as snappy on a production server, with, say, 200 users concurrently reading from and writing to this table?
I assume InnoDB would be best?
That depends on the storage engine you're going to use and what's the read/write ratio.
InnoDB will be better if there are lot of writes. If it's reads with very occasional write, MyISAM might be faster. MyISAM uses table level locking, so it locks up whole table whenever you need to update. InnoDB uses row level locking, so you can have concurrent updates on different rows.
InnoDB is definitely safer, so I'd stick with it anyhow.
BTW. remember that right now RAM is very cheap, so buy a lot.
Depends on any number of factors:
Server hardware (Especially RAM)
Server configuration
Data size
Number of indexes and index size
Storage engine
Writer/reader ratio
I wouldn't expect it to scale that well. More importantly, this kind of thing is to important to speculate about. Benchmark it and see for yourself.
Regarding storage engine, I wouldn't dare to use anything but InnoDB for a table of that size that is both read and written to. If you run any write query that isn't a primitive insert or single row update you'll end up locking the table using MyISAM, which yields terrible performance as a result.
There's no reason that MySql couldn't handle that kind of load without any significant issues. There are a number of other variables involved though (otherwise, it's a 'how long is a piece of string' question). Personally, I've had a number of tables in various databases that are well beyond that range.
How large is each record (on average)
How much RAM does the database server have - and how much is allocated to the various configurations of Mysql/InnoDB.
A default configuration may only allow for a default 8MB buffer between disk and client (which might work fine for a single user) - but trying to fit a 6GB+ database through that is doomed to failure. That problem was real btw - and was causing several crashes a day of a database/website till I was brought in to trouble-shoot it.
If you are likely to do a great deal more with that database, I'd recommend getting someone with a little more experience, or at least oing what you can to be able to give it some optimisations. Reading 'High Performance MySQL, 2nd Edition' is a good start, as is looking at some tools like Maatkit.
As long as your schema design and DAL are constructed well enough, you understand query optimization inside out, can adjust all the server configuration settings at a professional level, and have "enough" hardware properly configured, yes (except for sufficiently pathological cases).
Same answer both engines.
You should probably perform a load test to verify, but as long as the index was created properly (meaning indexes are optimized to your query statements), the SELECT queries should perform at an acceptable speed (the INSERTS and/or UPDATES may be more of a speed issue though depending on how many indexes you have, and how large the indexes get).