Does it make sense to use redis instead of mysql for many chats? Will there be an increase in speed? I think because of one thread redis speed Vice versa will fall.a
Yes , there are following advantages
Every time the user send data(message) I do not need to open mysql
connection and store on it. We can save it on redis and use pub/sub
to broadcast it on real time.
I can publish all those data and other clients(javascript/android)
can subscribe in a real time using message queue based on redis.
I can trigger real time alerts(like user gone offline etc)
Since Redis runs in memory, it is very fast but is disk persistent. So in case a crash happens, data is not lost. Redis can perform about 110,000 SETs and about 81,000 GETs per second
Related
I am running a couple of crawlers that produce millions of datasets per day. The bottleneck is the latency between the spiders and the remote database. In case the location of the spider server is too large, the latency will slow the crawler down to a point where it can not longer complete the datasets needed for a day.
In search for a solution I came upon redis with the idea on installing redis the spider server where it will temporarily store the data collected with low latency and then redis will pull that data to mysql some how.
The setup is like this until now:
About 40 spiders running on multiple instances feed one central MySQL8 remote server on a dedicated machine over TCP/IP.
Each spider writes different datasets, one kind of spider gets positions and prices of search results, where there are 100 results with around 200-300 inserts on one page. Delay is about 2-10s between the next request/page.
The later one is the problem as the spider yields every position within that page and creates a remote insert within a transaction, maybe even a connect (not sure at the moment).
This currently only works as spiders and remote MySQL server are close (same data center) with ping times of 0.0x ms, it does not work with ping times of 50ms as the spiders can not write fast enough.
Is redis or maybe DataMQ a valid approach to solve the problem or are there other recommended ways of doing this?
Did you mean you have installed a Redis Server on each spider?
Actually it was not a good solution for you case. But if you have already done this and still want to use MySQL to persistent your data, cronjob on each server will be an option.
You can create a cronjob on each spider server(based on your dataset and your need, you can choose daily or hourly sync job). And write a data transfer script to scan your Redis and transfer to MySQL tables.
I recommend using MongoDB instead of MySQL to store data
I am tinkering with redis and mysql to see how caching can improve performance. Accesing data from Cache is/should be faster than accessing it from database.
I calculated the time required for both the case in my program and found out that accesing from cache was much slower than accesing from the database . I was/am wondering what might be the cause(s).
Some points to consider:
I am using Azure Redis Cache.
The main application is on VM instance.
I hosted MYSQL server on another VM instance.
The table is very small with 200-300 records.
There is no error in the time calculation logic.
EDIT:
Load time for cache=about 1.2s
Load time for mysql= about 15ms
Turns out my application and MySQL server were in a same region while the redis cache was in a different region across the globe causing much higher latency.
But I would still want someone to explain why the fetch time for sql was much more smaller.
If the table of 200-300 rows is fully cached in MySQL's "buffer_pool", then it won't take much time to fetch all of them and send them back to the client. 15ms is reasonable (though it depends on too many things to be more specific).
If you are fetching 1 row, and you have an index (esp, the PRIMARY KEY) to locate that one row, I would expect it to be even faster than 15ms.
I'm summarizing a 40K-row table; it is taking under 2ms. But note: client and server are on the same machine. 15ms could represent the client and server being a few hundred miles apart.
How long does a simple SELECT 1 take? That will give you a clue of the latency, below which you cannot go without changing the physical location of machines.
I'm developing a high traffic ad serving platform for some years now, using a master-master Maria DB cluster with an HAProxy in front for balancing relational data queries (read queries go to all of the servers, but writes only go to one, to prevent the servers from going out of sync). By relational data I mean things like campaign settings, user details, payments. I'm also using Redis for caching some of the less dynamic MySQL information, but I believe there are a lot of opportunities to make better use of it, since as soon as the traffic increases, I'm frequently hitting bottlenecks like:
too many connections to MySQL
deadlocks (possibly because writes start coming on multiple servers when the main one gets overloaded).
My goal is to move as much of the writes away from MySQL and into Redis, but I'm having a hard time filtering MySQL data based on the counts/budgets stored in Redis, especially in places where a traditional JOIN would be used.
A simplified example of such MySQL query that would get the campaign with the highest bid within the user's budget:
SELECT campaigns.id, campaigns.url FROM campaigns
JOIN users ON campaigns.user_id = users.id
ORDER BY LEAST(users.credits, campaigns.bid) DESC
LIMIT 1;
After a click is delivered to that campaign, a budget reduction is immediately needed. Of course, reducing the credits in MySQL is trivial, but as soon as a user starts sending multiple clicks per second, the problems start appearing (mainly deadlocks in a cluster or reaching the maximum number of connections).
Applying a credit reduction in Redis would be preferred, but I have troubles connecting the dots between a bunch of credit records in Redis and filtering and sorting MySQL records based on that.
What would be a good approach to this problem that will allow me to touch MySQL as little as possible? Or maybe there is a fully different approach I need to take for this to happen.
Any advice or links will be much appreciated.
I would not recommend to move all write requests to Redis, especially for data with strong consistency(like payments).
Redis is a in-memory database, which do not have ACID transaction guarantee like MySQL. So you data still have some chances to be lost after write to Redis even if you have AOF enabled, which can make your data inconsistent.
For you case I thing you can integrate message queue(Kafka, rabbitMQ) to avoid connection issues and deadlocks:
When transaction occurred, serialize the request with data to write and send to message queue.
MySQL will listen on MQ with a fixed consume rate(based on your need), and write the data into MySQL sequentially(and rewrite to Redis if you need cache)
For client side, you can have a thread to query the result in an infinite loop until write finished. This will make the async write performs like sync.
In this case, you will avoid resouces compete(like deadlocks), and will also smooth the write rate by a fixed consuming rate.
Our mobile app track user events (Events can have many types)
Each mobile reporting the user event and later on can retrieve it.
I thought of writing to Redis and Mysql.
When user request:
1. Find on Redis
2. If not on Redis find on Mysql
3. Return the value
4. Keep Redis modified in case value wasnt existed.
5. set expiry policy to each key on redis to avoid out of mem.
Problem:
1. Reads: If many users at once requesting information which not existed at Redis mysql going to be overloaded with Reads (latency).
2. Writes: I am going to have lots of writes into Mysql since every event going to be written to both datasources.
Facts:
1. Expecting 10m concurrect users which writes and reads.
2. Need to serv each request with max latency of one second.
3. expecting to have couple of thousands requests per sec.
Any solutions for that kind of mechanism to have good qos?
3. Is that in any way Lambda architecture solution ?
Thank you.
Sorry, but such issues (complex) rarely have a ready answer here. Too many unknowns. What is your budget and how much hardware you have. Since 10 million clients are concurrent use your service your question is about hardware, not the software.
Here is no any words about several important requirements:
What is more important - consistency vs availability?
What is the read/write ratio?
Read/write ratio requirement
If you have 10,000,000 concurrent users this is problem in itself. But if you have much of reads it's not so terrible as it may seem. In this case you should take care about right indexes in mysql. Also buy servers with lot of RAM to keep at least index data in RAM. So one server can hold 3000-5000 concurrent select queries without any problems with latency requirement in 1 second (one of our statistic project hold up to 7,000 select rps per server on 4 years old ordinary harware).
If you have much of writes - all becomes more complicated. And consistency becomes main question.
Consistency vs availability
If consistency is important - go to the store for new servers with SSD drives and moder CPU. Do not forget to buy much RAM as possible. Why? If you have much of write requests your sql server would rebuild index with every write. And you can't do not use indexes because of your read requests do not to keep in latency requirement. Under consistency i mean - if you write something, you should do this in 1 second and if you read this data right after write - you get actual written information in 1 second.
Your problem 1:
Reads: If many users at once requesting information which not existed at Redis mysql going to be overloaded with Reads (latency).
Or well known "cache miss" problem. And it has just some solutions - horizontal scaling (buy more hardware) or precaching. Precaching in this case may be done in at least 3 scenarios:
Using non blocking read and wait up to one second while data wont be queried from SQL server. If it not, return data from Redis. Update in Redis immediately or throw queue - as you want.
Using blocking/non blocking read and return data from Redis as fast as possible, but with every ready query push jub to queue about update cache data in Redis (also may inform app it should requery data after some time).
Always read/write from Redis, but register job in queue every write request to update data in SQL.
Every of them is compromise:
High availability but consistency suffers, Redis is LRU cache.
High availability but consistency suffers, Redis is LRU cache.
High availability and consistency but requires lot of RAM for Redis.
Writes: I am going to have lots of writes into Mysql since every event going to be written to both datasources.
The filed of compromise again. Lot's of writes rests to hardware. So buy more or use queues for pending writes. So availability vs consistency again.
Event tracking means (usualy) you can return data close to real time but not in real time. For example have 1-10 seconds latency to update data on disk (mysql) keeping 1 second latency for write/read serving requests.
So, it's combination of 1/2/3 (or some other) techniques for data provessing:
Use LRU in Redis and do not use expire. Lot's of expire keys - problem as is. So we can't use to be sure we save RAM.
Use queue to warm up missing keys in Redis.
Use queue to write data into mysql server from Redis server.
Use additional requests to update data from client size of cache missing situation accures.
I've built an MMORPG that uses a MySQL database to store player related data when the user logs off.
We built in a auto save timer so that all the data of every logged in user is saved to the database every 3 hours.
In doing so we noticed a fatal flaw....
Due to the fact that all our database transactions are sent to a single DB Thread the thread can become backlogged with requests. This produces a login/saving issue. When this happens players unable to login as the login process requires the use of the DB Thread to confirm login credentials. Similarly all save requests are queued to the back of the DB thread schedule. This produces a backlog of requests...
The only solution that I can think of for this is to introduce multiple threads and have 3-4 threads interacting with the database.
However, this opens up a new issue. Since multiple threads are sent DB requests this means that one thread can receive a save request from a player while another DB thread receives a save request from the same player.
For example....
PlayerA Logs In to the game
3 Hours pass & the auto save happens, playerA's data will now be saved.
PlayerA kills a monster and gains experience.
PlayerA logs off, which adds a save request to a DB thread.
Now we have two different save requests queue'd in the database. Assuming they are both assigned to two different DB threads, this could cause the users data to be saved in the wrong order... For example maybe the the thread handling PlayerA's log out save runs first and then the auto save for PlayerA runs after that on a separate thread.... This would cause loss of data (in this case experience).
How do other MMORPG's handle something like this?
You need a database connection pool if you're not using one already and make sure you're not locking more data than you need. If you are saving how much gold a player has, you don't need to lock the table holding the credentials.
Keeping the order of events in a multi-threaded scenario is not a trivial problem, I suggest using a message queue, a single producer per player and a single consumer per player. This link shows 2 strategies to keep the order.
A queue is actually important for other reasons. If a save request fails, it would remain in the queue to retry later. When dealing with players money and items, you probably want this.
Your autosave is deterministic, meaning that you know exactly when the last one occured and when the next one would occur. I would use that somehow, along with the previously suggested idea to add a timestamp. Actually, it might be better to make the updates represent only the increments/decrements along with a user timestamp and calculate the experience upon request ( maybe cache it then)
To avoid this problem in all cases you must not allow users to continue doing stuff before their last database transaction has been successfully committed. Of course that means that the DB has to be very fast -- if it can't keep the request queue below a couple of seconds worth of transactions at most, you simply have to make it faster. More RAM cache, SSDs, the usual MySQL optimization dance. Adding extra logic in the form of triggers etc. isn't going to help in the long run, especially because they can become really complicated in the case of inventories and the like.
If on average the system is fast enough but struggling in peaks like when everybody logs in during lunch break, adding something like Redis as a fast cache might help. You'd load the data into Redis when a user logs on (or when they first need a certain piece of data) , remove it when they log off or when it expires, and write changes back to the relational DB as fast as it can keep up.