How to efficiently invalidate cache? - mysql

I’ve been trying to optimize for performance one behemoth software based on php and mysql. I have gone through caching in Apache and indexes in MySQL but it is not enough.
Since all forms within this software are built and printed dynamically from configuration in the database the software sends huge number of SQL’s and does a lot of joins which slows the whole thing when there are many concurrent users connected (on average 200-300).
Since we cannot touch the code, I have seen that mysql-proxy can be placed between application server and database server and over there query results can be cached accessing redis o memchached via lua. My idea is to cache everything. However, the problems is invalidating the cache. Once record is updated how do I invalidate all cached result sets?
One of the ideas was convert SQL query into md5 and store result as a key of a set. But also do analysis of a query and store the same md5 key and references to the table. For example:
Query:
select * from products left join users on products.user_id = user.id
Cache Instance A
3b98ab273f45af78849db563df6598d1– {result set}
Cache Instance B
products - 3b98ab273f45af78849db563df6598d1
users - 3b98ab273f45af78849db563df6598d1
So once UPDATE or INSERT or DELETE is issued on of these tables it invalidates all result sets where the particular table was queried.
I see quite a lot of work with it and I was wondering if there are any simpler methods to achieve this.

Related

Does a person's internet connection affect the speed of sql queries or php parsing?

I started benchmarking with Zend_Db_Profiler by saving queries that take too long. For one user, this query:
SELECT chapter, order, topic, id, name
FROM topics
WHERE id = '1'
AND hidden = 'no'
took 2.97 seconds. I performed an Explain:
select_type table possible keys key key_len ref rows Extra
SIMPLE topics id id 4 const 42 Using Where
and ran the query myself from phpMyAdmin, and it only took 0.0108 seconds. I thought that perhaps the size of the table might have an effect, as there is one column which is varchar and 8000 characters long, but it's not a part of the Select. I also just switched over to semi-dedicated hosting but can't imagine that this would have had a negative effect. Any thoughts as to how I could troubleshoot would be appreciated.
No. PHP and MySQL are server-side technologies, meaning your server processes them and has no bearing on the client. If your server is slow, it will just be slower in returning the response to the client.
Sadly, your premise about bottleneck here is not right. Also, when testing how one query behaves within your browser and then within PHPMyAdmin (or any other GUI), you have to clear query cache before trying to do the same query again. You didn't mention whether you did that.
The second part of tracking what might be wrong includes confirming that your database's configuration variables have been optimally set, that you chose the proper storage engine, and that your indexing strategy is optimal (such as choosing an INT for primary key instead of VARCHAR and similar atrocities).
That means that in most cases you'd go with InnoDB storage engine. It's free, it's quick if optimized (server variable named innodb_buffer_pool does wonders when set to proper size and when you have sufficient RAM). Seeing you said that you use semi-dedicated hosting implies you don't have control over those configuration variables.
Only when you're sure that
1) you're not testing the same query off of cache
2) that you've done everything within your power to make it optimal (this includes making sure that you don't have rogue processes raping your server).
Only then you can assume there might be an error in communication between the server and client.
As both PHP and SQL run on a server side, the user's internet connection does not affect the speed of the query.
Maybe the database server was too loaded at the time and couldn't pass the query in time.

Store common queries on disk with Mysql and windows

I have a Huge person database and do common search with name on it.
SELECT * FROM tbl_person WHERE full_name LIKE 'Sparow%Jack%';
SELECT * FROM tbl_person WHERE full_name LIKE 'Sparow%';
I rarely insert new data in this table.
I want to store common last_name queries on hark disk, queries already stored in ram but I loose it all each time the server reboot.
I have 1.7Billions row in my table and each row (with index) take 1k, yes it's a 1.7Tb database.
It's the main reason why I want to stored common select on disk.
Variable_name,Value
query_alloc_block_size,8192
query_cache_limit,1048576
query_cache_min_res_unit,1024
query_cache_size,4294966272
query_cache_type,ON
query_cache_wlock_invalidate,OFF
query_prealloc_size,8192
Edit :
SELECT * FROM tbl_person WHERE full_name LIKE 'Savard%';
take 1000 sec to execute first time and 2 sec after.
If I reboot the system and execute again, the query take 1000 sec again.
I simply want to avoid mysql take another 1000 sec runing the same query I already do before reboot.
Why not consider something like Redis for caching?
It's an in memory data store and it's very popular right now. Sites using Redis:
http://blog.togo.io/redisphere/redis-roundup-what-companies-use-redis
Redis also can persist data to disk: http://redis.io/topics/persistence
For caching though, saving to disk shouldn't be absolutely critical. The idea is that if some data is not cached, the worst case is not always loading from disk manually, but going straight through to your database.
If you are performing many such queries on your data, I suggest you index your table using Apache Lucene or Sphinx. Database are fast, but they are not so efficient (especially MySQL) when performing partial matches on millions of rows.
I already answered a similar question about Zend Framework and Lucene, and favor Zend's solution as I believe it is the easiest to setup and use with a PHP environment.
Luckily, Zend Framework can be used by module and you can easily only use the Zend Search Lucene module by itself without the entire class library.
** Edit **
The role of an indexer is not to replace your DB, but to improve it's search functionality by providing a way to perform partial searches. For example, given your table, you may only index a few of your fields (make them "queryable") and have other static (non-indexed) fields to reference your rows in your database.
The advantage in using an indexer is that you can also index pre-computations and directly search them, instead of querying the database.

MySQL is so slow on Amazon EC2 m1.large

I'm migration my .NET/MSSQL to RoR/MySQL/EC2/Ubuntu platform. After I transferred all my existing data into MySQL, I found the MySQL querying speed is incredibily slow, even for a super-basic query , like querying a select count(*) from countries, it's just a country table, only contains around 200 records, but it takes 0.124ms for the query. It's obviously not normal.
I'm a newbie to MySQL, can anyone tell me what would be the possible problem? Or any initial optimization button I should turn on after installing MySQL?
count(*) operation cannot really be optimized since it has to either do a full table scan (O(n)), or read the cached table count (O(1)) depending on the database engine you are using. Either ways, your query should not be that slow. You might want to get in touch with AWS support. It's possible the box is being choked by some other process running on it.

How to cache latest inserted data in MySQL?

Is it possible to cache recently inserted data in MySQL database internally?
I looked at query cache etc (http://dev.mysql.com/doc/refman/5.1/en/query-cache.html) but thats not what I am looking for. I know that 'SELECT' query will be cached.
Details:
I am inserting lots of data to MySQL DB every second.
I have two kind of users for this Data.
Users who query any random data
Users who query recently inserted data
For 2nd kind of users, my table has primary key as unix time-stamp which tells me how new the data is. Is there any way to cache the data at the time of insert?
One option is to write my own caching module which cache data and then 'INSERT'.
Users can query this module before going to MySQL DB.
I was just wondering if something similar is available.
PS: I am open to other database providing similar feature.
Usually you get the best performance from MySQL if you allow a big index cache (config setting key_buffer_size), at least for MyISAM tables.
If latency is really an issue (as it seems in your case) have a look at Sphinx which has recently introduced real-time indexes.

Expiring memcached using mysql proxy when an update occurs?

I have mysql Proxy running which takes a query, performs an md5 on it, and caches the result into a memcached DB. the problem occurs when an update happens in the rails app that would invalidate that cache. Any ideas on how to invalidate all of the proper keys in the cache at that time?
The core of the problem, is you don't know what the key is since it is md5 generated.
However, you can mitigate the problem by not storing data for that query.
You query may look like this "SELECT my_data.* FROM my_data WHERE conditions"
However, you can reduce the redudeancy of data by use this query instead
SELECT my_data.id FROM my_data WHERE conditions
Which is then followed up by
Memcache.mget( ids )
This won't prohibit the return on data that no longer matches the conditions, but may mitigate returning stale data.
--
Another option is to look into using namespaces: See here:
http://code.google.com/p/memcached/wiki/NewProgrammingTricks#Namespacing
You can namespace all of your major queries. You won't be able to delete the keys, but you can change the key version id, which will in effect expire your data.
Logistically messy, but you could use it on a few bad queries.
--
lastly, you could store those queries in a different memcache server and flush on a more frequent basis.