I think two reasons
1 Mysql and redis both provide persistence, but why mysql is is used more than redis in persistence? Maybe redis has no index and cannot be used to answer queries directly from disk. But since we can query from memory, there is no need query from disk.
2 Redis saves data to disk on a periodic basis, then data loss may occur, but does Mysql save data to disk immediately after insert without time window?
Redis and MySQL are really two very different technologies. Redis primarily serves as a cache for storing data temporarily as a key-value store. While it is true that Redis can be configured to write back to a database or file under the hood, Redis itself is neither of these things. Instead, Redis is meant to store data which generally would be considered volatile.
On the other hand, MySQL is a database and a full blown data store. MySQL is suitable for permanently storing data, and also exposes a rich API for making it easy to query and search its data.
In terms of common ground, a query against a MySQL column which has a hash index would behave somewhat similarly to a lookup in a Redis cache, each using a certain key. But the difference is that, in general, Redis will perform about 100 times faster than a database. For this reason, when a lightning fast cache technology is needed, MySQL often will not be suitable for this purpose, but a cache like Redis might be suitable.
Related
For a project we are working with an several external partner. For the project we need access to their MySQL database. The problem is, they cant do that. Their databse is hosted in a managed environment where they don't have much configuration possibilities. And they dont want do give us access to all of their data. So the solution they came up with, is the federated storage engine.
We now have one table for each table of their database. The problem is, the amount of data we get is huge and will even increase in the future. That means there are a lot of inserts performed on our database. The optimal solution for us would be to intercept all incoming MySQL traffic, process it and then store it in bulk. We also thought about using someting like redis to store the data.
Additionnaly, we plan to get more data from different partners. They will potentialy provide us the data in different ways. So using redis would allow us, to have all our data in one place.
Copying the data to redis after its stored in the mysql database is not an option. We just cant handle that many inserts and we need the data as fast as possible.
TL;DR
is there a way to pretend to be a MySQL server so we can directly process data received via the federated storage engine?
We also thought about using the blackhole engine in combination with binary logging on our side. So incoming data would only be written to the binary log and wouldn't be stored in the database. But then performance would still be limited by Disk I/O.
Is a good idea use Redis as a persistent database(AOF strategy) to store information about geodata?
For example, instead you store all positions of a user inside mysql, I want to use redis. But I am afraid of persistence problem.
Redis persistence is not the same as durability in an ACID database. Trying to make Redis maximally durable (insofar as it can) will limit its performance and lead to large log files. You can relax persistence by various configuration options, but this naturally leads to a compromise on durability.
You should read more about it:
https://redis.io/topics/persistence
http://oldblog.antirez.com/post/redis-persistence-demystified.html
Personally, I would not use Redis as a primary data store for any data that could not be reproduced easily. That would not be using Redis for its strength, in any case.
I have a SQL-based application and I like to cache the result using Redis. You can think of the application as an address book with multiple SQL tables. The application performs the following tasks:
40% of the time:
Create a new record / Update an existing record
Bulk update multiple records
Review an existing record
60% of the time:
Search records based on user's criteria
This is my current approach:
The system cache a record when a record is created or updated.
When user performs a search, the system will cache the query result.
On top of that, I have a Redis look-up table (Redis Set) which stores the MySQL record ID and the Redis cache key. That way I can delete the Redis caches if the MySQL record has been changed (e.g., bulk update).
What if a new record is created after the system cache the search result? If the new record matches the search criteria, the system will always return the old cache (which does not include the new record), until the cache is deleted (which won't happen until an existing record in the cache is updated).
The search is driven by the users and the combination of the search condition is countless. It is not possible to evaluate which cache should be deleted when a new record is created.
So far, the only solution is to remove all caches of a MySQL table when a record is created. However this is not a good choice because lots of records are created daily.
In this situation, what's the best way to implement Redis on top of MySQL?
Here's a surprising thing when it comes to PHP and MySQL (I am not sure about other languages) - not caching stuff into memcached or Redis is actually faster. Much faster. Basically, if you just built your app and queried MySQL - you'd get more out of it.
Now for the "why" part.
InnoDB, the default engine, is a superb engine. Specifically, it's memory management (allocation and what not) is superior to any memory storage solutions. That's a fact, you can look it up or take my word for it - it will, at least, perform as good as Redis.
Now what happens in your app - you query MySQL and cache the result into redis. However, MySQL is also smart enough to keep cached results. What you just did is create an additional file descriptor that's required to connect to Redis. You also used some storage (RAM) to cache the result that MySQL already cached.
Here comes another interesting part - the preferred way of serving PHP scripts is by using php-fpm - it's much quicker than any mod_* crap out there. Down to the core, php-fpm is a supervisor process that spawns child processes. They don't shut down after the script is served, which means they cache connections to MySQL - connect once, use multiple times. Basically, if you serve scripts using php-fpm, they will reuse the already established connection to MySQL, meaning that you won't be opening and closing connections for each request - this is extremely resource friendly and it lets you have lightning fast connection to MySQL. MySQL, being memory efficient and having the cached result is much quicker than Redis.
Now what does all of this mean for you - having a proper setup lets you have small code that's simple, easy, doesn't involve Redis and eliminates all the problems that you might have with cache invalidation and what not and you won't waste your memory to contain the same data twice.
Ingredients you need for this to work:
php-fpm
MySQL and InnoDB based tables and most of all - sufficient RAM and tweaked innodb_buffer_pool_size variable. That one controls how much RAM InnoDB is allowed to allocate for its purposes - the larger the better.
You eliminated Redis from the game, you kept your code simple and easy to maintain, you didn't duplicate data, you didn't introduce additional system to the play and you let software that's meant to take care of data do its job. Pretty cheap trade-off for maximum usefulness, even if you compile all the software from scratch - it won't take more than an hour or so to get it up and running.
Or, you can just ignore what I wrote and look for a solution using Redis.
We met the same problem and we chose to do same thing you are thinking of: remove all query caches affected by the table. It is not ideal like your said but fortunately our "write" is not as high as 40% so it's ok so far.
That's the nature of query based caching. As an alternative you can add entity based caching. Instead of caching the search result only, cache the entire table and do the search inside memory. We use C# LINQ so we can do pretty common queries in memory but if the search is too complicated then you are out of luck.
Currently I have a system, which is based solely on Solr. Which means, that I store all data in Solr (using SolrJ) with no other datastore involved. The problem is now, that I experience some performance issues. I thought, that it maybe could make sense to store in MySQL and then synchronize the data with Solr with e.g. the DataImportHandler. So that I have the reading operations on the Solr index and the main writing operations in MySQL and then sometimes only Solr-Writing operations when synchronizing with Solr.
The thing is that I expect hundreds of millions documents which should be stored and I don't really now if that the MySQL/Solr makes sense.
Is there another better solution? Maybe Master-Solr for writing and Solr-slaves for reading?
Update: What I forgot to say is, that also in case of a schema.xml change, the "storing data in MySQL" solution could be useful in my opinion, because then I can re-commit all the data without caring about Solr's self-stored data.
Its not preferable to use the same Solr instance for both reading and writing as the activities (with commit and optimize) on Solr during writing would heavily impact the read operations.
Master - Slave confgurations would be nicer approach, with master primarily for writes and slaves for read only purposes.
Slaves being periodically refreshed with the contents from Master. (So there would be some delay)
You can always scale by adding multiple slaves.
Using MySQL as a persistant store with Master-Slave Solr would be a best approach.
MySQL providing a stable data store, and would guard you against index corruption or some more issues which would result in data lost.
Using dataimport handler you can do it easily with incremental updates, but there would be more time tag for latest data to appear on slaves.
With this you can also use Index swapping for full refreshes.
In case the index grows up hugh to be be maintainable and has performance impact, you may want to check solr shards.
I also thought about the same issue: storing everything in solr or stor in mySql and index in Solr.
I decided to go the 2nd way: store with MySQL and index in solr.
The reason: handling of data (reading and writing data) in MySql is much better than by Solr. Also data import/export from/to MySql is supported/possible by lots of tools, out of the box.
Next Point: Backup. There are much more established ways for backing up an MySql DB than an Solr index.
Of course, for fulltext-search, Solr is much more better than MySql. So i decided, that everyone should have to work where he knows best.
For your Information: i'm talking about an medium Index: 4GB for some million documents.
//Edit: don't forgett, that some features requiere stared data in lucene (not only indexed), like highlighting. If you need this, you have to store the documents in solr (additional). An alternative way could be implementing those features on client-side. (I did it this way)
Is it good idea to use Memcached for session storage with PHP? We will have a lot of servers and we must access the session data from everywhere so we are forced to use database (in our case that will be MySQL) as session storage or Memcached. What do you think?
I know people who've used Memcached for this -- it's very fast, certainly a lot faster than a database, and is built to handle a lot more concurrency.
The primary disadvantage to purely in-memory storage is that all your session data will be wiped if/when you restart the daemon. In my experience, memcached is rock-solid and I've never had to restart it because of a failure, but it is a consideration if your sysadmins aren't used to working that way, or if your systems are updated frequently. It also depends on whether losing all your user sessions once a month or year is acceptable or not (i.e. in ecommerce, management probably won't like this).
The obvious solution, if that's the case, is to go to one of the many disk-based NoSQL/hash table databases, such as MemcacheDB, which is based off of Memcached. Or see: CouchDB, MongoDB etc. Each of these daemons (including Memcached) is also a lot less complex when it comes to performance tuning than MySQL (where all sorts of things like key and sort buffers, query cache etc. have to be tuned per install/use case) -- I mean, with Memcached there's not much to do other than to allocate memory and start it up.
Personally, I am a fan of using faster, more appropriate (non-SQL) storage for temporary things like session keys, but if your database is not under load and you don't anticipate it to be, the only thing you lose by storing sessions in the database is that it's a little slower, so users see a little more latency.
Whichever way you go, I suggest that you write your session-management code in such a way that the storage engine is just a layer, and you can swap in a different storage engine relatively painlessly. You don't want to be recoding your application if you find memcached or whatever you choose isn't working well, and you want to try something else. For instance, I once wrote a caching system for a clustered CMS application that used memcached to cache various pages and objects, but when the daemon wasn't reachable, it would fail over to alternate backends that would cache to shared memory or disk on the individual webservers. (In your case, you don't necessarily need the auto-failover, just the ability to change your mind about the backend.)
I mentioned MemcacheDB because it uses the Memcache protocol, so it's extremely easy to swap in Memcached for MemcacheDB or vice versa.