I have mysql Proxy running which takes a query, performs an md5 on it, and caches the result into a memcached DB. the problem occurs when an update happens in the rails app that would invalidate that cache. Any ideas on how to invalidate all of the proper keys in the cache at that time?
The core of the problem, is you don't know what the key is since it is md5 generated.
However, you can mitigate the problem by not storing data for that query.
You query may look like this "SELECT my_data.* FROM my_data WHERE conditions"
However, you can reduce the redudeancy of data by use this query instead
SELECT my_data.id FROM my_data WHERE conditions
Which is then followed up by
Memcache.mget( ids )
This won't prohibit the return on data that no longer matches the conditions, but may mitigate returning stale data.
--
Another option is to look into using namespaces: See here:
http://code.google.com/p/memcached/wiki/NewProgrammingTricks#Namespacing
You can namespace all of your major queries. You won't be able to delete the keys, but you can change the key version id, which will in effect expire your data.
Logistically messy, but you could use it on a few bad queries.
--
lastly, you could store those queries in a different memcache server and flush on a more frequent basis.
Related
I’ve been trying to optimize for performance one behemoth software based on php and mysql. I have gone through caching in Apache and indexes in MySQL but it is not enough.
Since all forms within this software are built and printed dynamically from configuration in the database the software sends huge number of SQL’s and does a lot of joins which slows the whole thing when there are many concurrent users connected (on average 200-300).
Since we cannot touch the code, I have seen that mysql-proxy can be placed between application server and database server and over there query results can be cached accessing redis o memchached via lua. My idea is to cache everything. However, the problems is invalidating the cache. Once record is updated how do I invalidate all cached result sets?
One of the ideas was convert SQL query into md5 and store result as a key of a set. But also do analysis of a query and store the same md5 key and references to the table. For example:
Query:
select * from products left join users on products.user_id = user.id
Cache Instance A
3b98ab273f45af78849db563df6598d1– {result set}
Cache Instance B
products - 3b98ab273f45af78849db563df6598d1
users - 3b98ab273f45af78849db563df6598d1
So once UPDATE or INSERT or DELETE is issued on of these tables it invalidates all result sets where the particular table was queried.
I see quite a lot of work with it and I was wondering if there are any simpler methods to achieve this.
I am trying to setup MySQL 5.6 with the memcached plugin enabled. I followed the procedure on the mysql website and a couple of other tutorials 2, 3 that I found online. Specifically, as per 2, this should be really simple to setup and test.
I am trying to verify that the setup works as expected using telnet. When I set the value of a key from telnet, I get the return status of STORED. I can even fetch the value immediately from memcache. However, when I login into the DB, I do not see the new row. I don't see any errors in the logs either. "show plugins" shows that the daemon_memcached plugin is enabled.
[Edited]
Actually, things don't even the other way. I added a new row into the demo_test table and tried fetching it through the memcache interface. That didn't work either.
Any pointers about how to go about identifying what's wrong?
The memcache integration in MySQL communicates directly with the InnoDB storage engine, not the higher MySQL "server layer." As such, changes to table data through this interface do not invalidate queries against the table that have been stored in the query cache. This is in contrast to normal operations through the SQL interface, where any change to a table's data will immediately evict any and all results held from the query cache for queries against that table, without regard to whether or not the change to the table data actually invalidated each specific query impacted.
Repeat your query, but instead of SELECT, use SELECT SQL_NO_CACHE. If you get the result you expect, this is the explanation.
Once you have established that this is the cause, you will find that any SQL query that does an insert, delete, or update against the table will also have the effect of making memcache-changed data visible to SELECT queries, without the need for adding the SQL_NO_CACHE directive, and this will hold true even when the insert, delete, or update does not directly impact the rows in question, so long as it modifies something in the table in question.
Duh!! There was already a memcached instance running on port 11211. Unfortunately, mysql doesn't error out in this situation. When I was using telnet to connect to port 11211, I was reaching the existing memcached instance. It was storing/retrieving values that it had seen but wasn't communicating with MySQL.
I stopped the existing memcached instance and restarted mysql. I am now able to connect to port 11211. Using telnet, when I do a "get", I get back values from the db. Also, when I set new values from telnet, they get reflected in the DB (and can be retrieved using SQL).
I started benchmarking with Zend_Db_Profiler by saving queries that take too long. For one user, this query:
SELECT chapter, order, topic, id, name
FROM topics
WHERE id = '1'
AND hidden = 'no'
took 2.97 seconds. I performed an Explain:
select_type table possible keys key key_len ref rows Extra
SIMPLE topics id id 4 const 42 Using Where
and ran the query myself from phpMyAdmin, and it only took 0.0108 seconds. I thought that perhaps the size of the table might have an effect, as there is one column which is varchar and 8000 characters long, but it's not a part of the Select. I also just switched over to semi-dedicated hosting but can't imagine that this would have had a negative effect. Any thoughts as to how I could troubleshoot would be appreciated.
No. PHP and MySQL are server-side technologies, meaning your server processes them and has no bearing on the client. If your server is slow, it will just be slower in returning the response to the client.
Sadly, your premise about bottleneck here is not right. Also, when testing how one query behaves within your browser and then within PHPMyAdmin (or any other GUI), you have to clear query cache before trying to do the same query again. You didn't mention whether you did that.
The second part of tracking what might be wrong includes confirming that your database's configuration variables have been optimally set, that you chose the proper storage engine, and that your indexing strategy is optimal (such as choosing an INT for primary key instead of VARCHAR and similar atrocities).
That means that in most cases you'd go with InnoDB storage engine. It's free, it's quick if optimized (server variable named innodb_buffer_pool does wonders when set to proper size and when you have sufficient RAM). Seeing you said that you use semi-dedicated hosting implies you don't have control over those configuration variables.
Only when you're sure that
1) you're not testing the same query off of cache
2) that you've done everything within your power to make it optimal (this includes making sure that you don't have rogue processes raping your server).
Only then you can assume there might be an error in communication between the server and client.
As both PHP and SQL run on a server side, the user's internet connection does not affect the speed of the query.
Maybe the database server was too loaded at the time and couldn't pass the query in time.
I have a Huge person database and do common search with name on it.
SELECT * FROM tbl_person WHERE full_name LIKE 'Sparow%Jack%';
SELECT * FROM tbl_person WHERE full_name LIKE 'Sparow%';
I rarely insert new data in this table.
I want to store common last_name queries on hark disk, queries already stored in ram but I loose it all each time the server reboot.
I have 1.7Billions row in my table and each row (with index) take 1k, yes it's a 1.7Tb database.
It's the main reason why I want to stored common select on disk.
Variable_name,Value
query_alloc_block_size,8192
query_cache_limit,1048576
query_cache_min_res_unit,1024
query_cache_size,4294966272
query_cache_type,ON
query_cache_wlock_invalidate,OFF
query_prealloc_size,8192
Edit :
SELECT * FROM tbl_person WHERE full_name LIKE 'Savard%';
take 1000 sec to execute first time and 2 sec after.
If I reboot the system and execute again, the query take 1000 sec again.
I simply want to avoid mysql take another 1000 sec runing the same query I already do before reboot.
Why not consider something like Redis for caching?
It's an in memory data store and it's very popular right now. Sites using Redis:
http://blog.togo.io/redisphere/redis-roundup-what-companies-use-redis
Redis also can persist data to disk: http://redis.io/topics/persistence
For caching though, saving to disk shouldn't be absolutely critical. The idea is that if some data is not cached, the worst case is not always loading from disk manually, but going straight through to your database.
If you are performing many such queries on your data, I suggest you index your table using Apache Lucene or Sphinx. Database are fast, but they are not so efficient (especially MySQL) when performing partial matches on millions of rows.
I already answered a similar question about Zend Framework and Lucene, and favor Zend's solution as I believe it is the easiest to setup and use with a PHP environment.
Luckily, Zend Framework can be used by module and you can easily only use the Zend Search Lucene module by itself without the entire class library.
** Edit **
The role of an indexer is not to replace your DB, but to improve it's search functionality by providing a way to perform partial searches. For example, given your table, you may only index a few of your fields (make them "queryable") and have other static (non-indexed) fields to reference your rows in your database.
The advantage in using an indexer is that you can also index pre-computations and directly search them, instead of querying the database.
I want to cache data on MySQL
SET GLOBAL query_cache_size = SOME_SIZE;
Is it all the thing required for caching data [efficiently] in MySQL ?
Do I need to add something extra to use the cache efficiently ?
I don't have good knowledge on data caching but still need to use for performance issue, so if I've missed to give some vital info, answer this question assuming the system is in default state.
I don't usually recommend using the MySQL query cache. It sounds great in theory, but unfortunately isn't a great win for caching efficiently, because access to it from queries is governed by a mutex. That means many concurrent queries queue up to get access to the query cache, and this harms more than it helps if you have a lot of concurrent clients.
It even harms INSERT/UPDATE/DELETE, even though these queries don't have result sets, because they purge query results from the query cache if they update the same table(s). And this purging is subject to the same queueing on the mutex.
A better strategy is to use memcached for scalable caching of specific query results, but this requires you to think about what you want to cache and to write application code to access memcached and fail back to MySQL if the data isn't present in the cache. That's more work, but if you do it right it gives better results.
See TANSTAAFL.
There are quite a few settings used for caching different things within MySQL. This is a good guide to optimizing MySQL:
http://www.fromdual.com/mysql-performance-tuning-key
Be careful, the query cache is very specific in what it does:
The query cache stores the text of a
SELECT statement together with the
corresponding result that was sent to
the client. If an identical statement
is received later, the server
retrieves the results from the query
cache rather than parsing and
executing the statement again.
http://dev.mysql.com/doc/refman/5.6/en/query-cache.html
Therefore, if anything in the related tables change, or the query is even reworded, the cache isn't used. So select * from T where id in (1,2) and select * from T where id in (2,1) are different.
SHOW VARIABLES LIKE '%query_cache%';
Will show you the current settings for the cache. But its not as simple as just turning it on, the queries you run need to have result sets that are cacheable and it would take more than this comments box to explain that.
If you have a particular query that you think should be cached then post it and we may be able to determine if it is cacheable.