AWS Aurora cache metrics meanings

AWS Aurora cache metrics meanings - mysql

Aurora has two query-cache related metrics :
Buffer cache hit ratio : The percentage of requests that are served by the Buffer cache.
Resultset cache hit ratio : The percentage of requests that are served by the Resultset cache.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Monitoring.html
But I can't find the documentation that explains the difference between "Buffer cache" and "Resultset cache".
What are they?

"Resultset Cache Hit Ratio" is related to the query cache, which is a feature that enables caching the read queries' results (that's why called result set cache hit). So,if the engine started to execute a new read query, it will check the cached results before executing the query itself and if it found that this same query has been executed before and that its result wasn't invalidated yet, then it will serve the result of the new query from the cache. This is generally useful & shows up high in number when the workload contains a lot of similar select queries that has the similar values and conditions.
On the other hand, "Buffer Cache Hit Ratio" is more related to the innodb page caching hit ratio (& not the query result cache), and this should increase with increasing all types of read queries, as this process is called by bufferpool warm up which will cause the engine to load all the needed pages from the storage to the memory for faster access to the data. However, with increased amount of writes to the writer, this will make the readers to invalidate there in memory pages then load these pages again from the storage when needed. The "ratio" here depends on the percentage of hitting the in memory pages which should be very high ex: more than 99%.
Query cache is generally considered with low connections, similar type of queries over & over again (based on few observations on mysql/aurora, query cache might be actually bad for performance if you have high no. of connections & lots of adhoc style, changing queries).

There's not a ton of info from Amazon other than what I found here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Monitoring.html
Buffer cache hit ratio: The percentage of requests that are served by the Buffer cache.
Resultset cache hit ratio: The percentage of requests that are served by the Resultset cache.

Related

Does couchbase actually support datasets larger than memory?

Couchbase documentation says that "Disk persistence enables you to perform backup and restore operations, and enables you to grow your datasets larger than the built-in caching layer," but I can't seem to get it to work.
I am testing Couchbase 2.5.1 on a three node cluster, with a total of 56.4GB memory configured for the bucket. After ~124,000,000 100-byte objects -- about 12GB of raw data -- it stops accepting additional puts. 1 replica is configured.
Is there a magic "go ahead and spill to disk" switch that I'm missing? There are no suspicious entries in the errors log.

It does support data greater than memory - see Ejection and working set management in the manual.
In your instance, what errors are you getting from your application? When you start to reach the low memory watermark, items need to be ejected from memory to make room for newer items.
Depending on the disk speed / rate of incoming items, this can result in TEMP_OOM errors being sent back to the client - telling it needs to temporary back off before performing the set, but these should generally be rare in most instances. Details on handling these can be found in the Developer Guide.

My guess would be that it's not the raw data that is filling up your memory, but the metadata associated with it. Couchbase 2.5 needs 56 bytes per key, so in your case that would be approximately 7GB of metadata, so much less than your memory quota.
But... metadata can be fragmented on memory. If you batch-inserted all the 124M objects in a very short time, I would assume that you got at least a 90% fragmentation. That means that with only 7GB of useful metadata, space required to hold it has filled up your RAM, with lots of unused parts in each allocated block.
The solution to your problem is to defragment the data. It can either be achieved manually or triggered as needed :
manually :
automatically :
If you need more insights about why compaction is needed, you can read this blog article from Couchbase.

Even if none of your documents is stored in RAM, CouchBase still stores all the documents IDs and metadata in memory(this will change in version 3), and also needs some available memory to run efficiently. The relevant section in the docs:
http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#memory-quota
Note that when you use a replica you need twice as much RAM. The formula is roughly:
(56 + avg_size_of_your_doc_ID) * nb_docs * 2 (replica) * (1 + headroom) / (high_water_mark)
So depending on your configuration it's quite possible 124,000,000 documents require 56 Gb of memory.

Does Caching always enhance performance?

I have a number of sites with PHP and MySQL, especially running MediaWiki, and I need to enhance the performance. However, I have only a limited percentage of CPU that I'm allowed to use.
The best thing I can think about to improve performance is to enable caching. However, I'm confused: Does that really enhance performance overall or just enhance speed?
What I can think about is, if caching will use files, then it would take more processing to get the content of these files. If it will use SQL tables, then it will take more processing to query these tables as well, perhaps the time will be shorter, but the CPU usage will be more.
Is that correct or not? does caching consume more CPU to give a speeder results or it improves performance overall?

At the most basic level caching should be used to store the result of CPU intensive processes. For example, if you have a server side image handler that creates an image on-the-fly (say a thumbnail and larger preview) then you don't want this operation to occur on every request - you'd want to run this process once and store the results; Then, every other request gets the saved result.
This is obviously a hugely over-simplified description of basic caching, and the use of an image is fine in this case as you don't have to worry about stale data i.e. how often will the actual image change? In your case, databases are hugely different. If you cache data then how can you guarantee that there won't be an instant mismatch between your real data and your cached data? Querying a database is not always a CPU intensive task also (granted you have to consider how the database is designed in terms of indexing, table size etc) but in most cases querying a well designed database is far more intensive on disk I/O than it is on CPU cycles.
First, you need to look at your database design and secondly your queries. For example are you normalizing your database correctly, are your queries trawling through huge amounts of data when you could just archive, are you joining tables on non-indexed fields, are your where clauses querying fields that could be indexed (IN is particulary bad in these cases).
I recommend you get hold of a query analyzer and spend some time optimizing your table structure and queries to find that bottle neck before looking into more drastic changes.

Reference : http://msdn.microsoft.com/en-us/library/ee817646.aspx
Performance : Caching techniques are commonly used to improve application performance by storing relevant data as close as possible to the data consumer, thus avoiding repetitive data creation, processing, and transportation.
For example, storing data that does not change, such as a list of countries, in a cache can improve performance by minimizing data access operations and eliminating the need to recreate the same data for each request.
Scalability : The same data, business functionality, and user interface fragments are often required by many users and processes in an application. If this information is processed for each request, valuable resources are wasted recreating the same output. Instead, you can store the results in a cache and reuse them for each request. This improves the scalability of your application because as the user base increases, the demand for server resources for these tasks remains constant.
For example, in a Web application the Web server is required to render the user interface for each user request. You can cache the rendered page in the ASP.NET output cache to be used for future requests, freeing resources to be used for other purposes.
Caching data can also help scale the resources of your database server. By storing frequently used data in a cache, fewer database requests are made, meaning that more users can be served.
Availability : Occasionally the services that provide information to your application may be unavailable. By storing that data in another place, your application may be able to survive system failures such as network latency, Web service problems, or hardware failures.
For example, each time a user requests information from your data store, you can return the information and also cache the results, updating the cache on each request. If the data store then becomes unavailable, you can still service requests using the cached data until the data store comes back online.

You need to profile your seem and find out where the bottle necking is happening. Cacheing is the best type of page load, its one that doesn't hit the server at all. You can build a very simple caching system that only reloads the information ever 15 minutes. So, if the page was cached in the last 15 minutes it gives them a pre-rendered page. The page loaded once, it creates a temp file. every 15 minutes you create a new on (if someone loads that page).
Caching only stores a file that the server has already done the work for. The work to create the file is already done and your simply storing it.

You use the terms 'performance' and 'speed'. I'll assume 'performance' relates to CPU cycles on your web server and that 'speed' relates to the time it takes to serve the page to the user. You want to maximize web server 'performance' ( by lowering the total number of CPU cycles needed to serve pages ) whilst maximizing 'speed' ( lowering the time it takes to serve a web page ).
The good news for you is that Caching can improve both of these metrics at the same time. By caching content you create an output page that is stored in the cache and can be served repeatedly to users directly without having to re-execute PHP code that originally created this output page ( thus lowering CPU cycles ). Fetching a cached page from cache consumes less CPU cycles than re-executing PHP code.
Caching is particularly good for web pages that are generally the same for all users who request the page - for example in a wiki, and for pages that generally do not change all too often - again, a wiki.

"Enhance performance" sounds like some of the email I get...
There are two, interrelated things that happen here. One is "how long does it take to serve a given request?", and the other is "how many requests can I serve concurrently given my limited resources?". People tend to use either or both of those concepts when talking about performance.
Caching can help with both those things.
The most effective caching strategy uses resources outside your machines to cache your stuff - the most obvious examples are the user's browser, or a CDN. I'll assume you can't use a CDN, but by spending a bit of effort on setting the HTTP cache headers, you can reduce the number of requests to your server for static or sluggish resources quite dramatically.
For dynamic content - usually the web page you generate by querying your database - the next most effective caching strategy is to cache the HTML generated by (parts of) your page. For instance, if you have a "most popular items" box on your homepage, this will usually run a couple of moderately complex database queries, and then some "turn data to HTML" back-end code. If you can cache the HTML, you save both the database queries and the CPU effort of turning the data into HTML.
If that's not possible, you may be able to cache the result of some database queries. That helps in reducing the database load, and usually also reduces the load on your web server - the code required to run the database query and deal with the results is usually more onerous that retrieving the item from cache; because it's faster, it allows your request to be handled quicker, which frees up resources more quickly. This reduces the load on your servers for an individual request, and thus allows you to serve more concurrent requests.

MYSQL concatenating large string

I have a web crawler that saves information to a database as it crawls the web. While it does this, it also saves a log file of its actions, and any errors it encounters to a log field in a mysql database (field becomes anywhere from 64kb to 100kb. It accomplishes this by concatenating (using the mysql CONCAT function).
This seems to work fine, but I am concerned about the cpu useage / impact it has on the mysql database. I've noticed that the web crawling is performing slower than before I implemented saving the log to the database.
I view this log file from a management webpage, and the current implementation seems to work fine other than the slow loading. Any recommendations for speeding this up, or implementation recommendations?

Reading 100kb strings into memory numerous time then write them to disk via a db. Of course your going to experience slowdown! Every part of what you are doing is going to task memory, disk, and cpu (especially if memory usage hits the system max and you start swapping to disk). Let me count some of the ways your going to possibly decrease overall site performance:
Sql connections max out and back up as the time to store 100kb records increases time a single process holds a connection
Webserver processes eat up free process pool and max out and take longer to free up because they have to wait on db connections to free.
Web server processes begin to bloat and take more memory each, possibly more than the system can handle without swapping. This is compounded by using the max. Umber of processes due to #2
... A book could be written on your situation.

Prevent filesystem caching for MySQL queries

When i disable query cache in mysql, queries still cached. As I understand it is because of OS filesystem cache. How can i prevent filesystem on cache this data. I working on WIndows 7 but it might be the Linux.

There is no query filesystem cache in MySQL.
When i disable query cache in mysql, queries still cached
How do you disable it and how do you know queries are still cached? Why you don't want them to be cached?

SET SESSION query_cache_type = OFF;

Well now i can answer my question by myself. To prevent caching second and next queries need to set innodb_buffer_pool_size=0 config option. This buffer used by mysql to swapping data into memory and all next queries operates with memory instead HD.

You need buffer pool a bit (say 10%) larger than your data (total size of Innodb TableSpaces) because it does not only contain data pages – it also contain adaptive hash indexes, insert buffer, locks which also take some time. Though it is not as critical – for most workloads if you will have your Innodb Buffer Pool 10% less than your database size you would not loose much anyway

Does MySQL packet size cause slowdown?

I have written a program which uses a MySQL database, and transaction between the database server (a very powerful one) and the client is happening over a ADSL connection (1 Mbit/s).
But I have a very very slow connection between each client and the server. Only approximately 3-4 KB/s data is send through the server. Neither the server nor the clients use the Internet for other purposes, just my program uses the Internet.
I can't figure out why? Is the reason MySQL server packet size?
Any suggestions?

Try using mytop to identify the server low performance cause.
Another one: you may be using SELECT COUNT(*) FROM .. for large InnoDB tables which causes a table scan.
And can you test for some other services whether the exchange data rate between the machines is OK? Even if the even if the output bandwidth is lower for ADSL users 3-4 kB might not be the reason of low performance.

The effective transfer rate is often heavily limited by the number of roundtrips between client and server. Without seeing your code it is sort of difficult to tell, but you should check the number of requests happening.
If you have a single request that results in many records being returned, you should see a better usage of bandwidth than with a higher number of requests which only deliver a few rows each.
In the latter case the actual result transfer is probably quite fast, but the latencies involved in the "control communications" (i. e. the statements themselves, login requests etc.) will add up, effectively lowering overall throughput.
As for the packet size: When it is very small, there is more overhead in the communications, increasing the aforementioned effect. The server's default max_allowed_packet size if 1MB if memory serves, but that should be fine with your connection.

You first have to debug both connections.
What is your upload speed if you upload a file with WinSCP ot equivalent to the MySQL server? It should be near 90 KB/s with ADSL 1 Mbit/s.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008