I've begun to have some immense server problems lately due to the overload on Insert queries. All of the queries on our server have been optimized throughout time, but the traffic has picked up again and the CPUs are maxing out due to the high amount of INSERT queries.
We run an INSERT .. ON DUPLICATE UPDATE query for every visitor that visits our website to track visitors that are online, as well as users that are online and what page they are on.
I'm not sure what other way to accomplish and don't know what to do in order to reduce the server load.
Any ideas?
My favorite way to handle this is to buffer up visitor data in server memory, and then do bulk inserts every so often.
For example, only write to the database when X amount of visitor-logs have been queued up in memory, where X may be between 10 and 1000 depending upon your application.
Even if the command that you send to the database server still contains 1000 insert commands, there will only be one round trip between your application server and the database server. Thus, you will have removed the overhead of managing many distinct connections.
Furthermore, you might consider a NoSQL hybrid such as elastic search for such data for removing the load for your MySQL Server. It scales extremely well and can be optimized for indexing (inserts), queries, or both.
Related
I am tinkering with redis and mysql to see how caching can improve performance. Accesing data from Cache is/should be faster than accessing it from database.
I calculated the time required for both the case in my program and found out that accesing from cache was much slower than accesing from the database . I was/am wondering what might be the cause(s).
Some points to consider:
I am using Azure Redis Cache.
The main application is on VM instance.
I hosted MYSQL server on another VM instance.
The table is very small with 200-300 records.
There is no error in the time calculation logic.
EDIT:
Load time for cache=about 1.2s
Load time for mysql= about 15ms
Turns out my application and MySQL server were in a same region while the redis cache was in a different region across the globe causing much higher latency.
But I would still want someone to explain why the fetch time for sql was much more smaller.
If the table of 200-300 rows is fully cached in MySQL's "buffer_pool", then it won't take much time to fetch all of them and send them back to the client. 15ms is reasonable (though it depends on too many things to be more specific).
If you are fetching 1 row, and you have an index (esp, the PRIMARY KEY) to locate that one row, I would expect it to be even faster than 15ms.
I'm summarizing a 40K-row table; it is taking under 2ms. But note: client and server are on the same machine. 15ms could represent the client and server being a few hundred miles apart.
How long does a simple SELECT 1 take? That will give you a clue of the latency, below which you cannot go without changing the physical location of machines.
I'm developing a high traffic ad serving platform for some years now, using a master-master Maria DB cluster with an HAProxy in front for balancing relational data queries (read queries go to all of the servers, but writes only go to one, to prevent the servers from going out of sync). By relational data I mean things like campaign settings, user details, payments. I'm also using Redis for caching some of the less dynamic MySQL information, but I believe there are a lot of opportunities to make better use of it, since as soon as the traffic increases, I'm frequently hitting bottlenecks like:
too many connections to MySQL
deadlocks (possibly because writes start coming on multiple servers when the main one gets overloaded).
My goal is to move as much of the writes away from MySQL and into Redis, but I'm having a hard time filtering MySQL data based on the counts/budgets stored in Redis, especially in places where a traditional JOIN would be used.
A simplified example of such MySQL query that would get the campaign with the highest bid within the user's budget:
SELECT campaigns.id, campaigns.url FROM campaigns
JOIN users ON campaigns.user_id = users.id
ORDER BY LEAST(users.credits, campaigns.bid) DESC
LIMIT 1;
After a click is delivered to that campaign, a budget reduction is immediately needed. Of course, reducing the credits in MySQL is trivial, but as soon as a user starts sending multiple clicks per second, the problems start appearing (mainly deadlocks in a cluster or reaching the maximum number of connections).
Applying a credit reduction in Redis would be preferred, but I have troubles connecting the dots between a bunch of credit records in Redis and filtering and sorting MySQL records based on that.
What would be a good approach to this problem that will allow me to touch MySQL as little as possible? Or maybe there is a fully different approach I need to take for this to happen.
Any advice or links will be much appreciated.
I would not recommend to move all write requests to Redis, especially for data with strong consistency(like payments).
Redis is a in-memory database, which do not have ACID transaction guarantee like MySQL. So you data still have some chances to be lost after write to Redis even if you have AOF enabled, which can make your data inconsistent.
For you case I thing you can integrate message queue(Kafka, rabbitMQ) to avoid connection issues and deadlocks:
When transaction occurred, serialize the request with data to write and send to message queue.
MySQL will listen on MQ with a fixed consume rate(based on your need), and write the data into MySQL sequentially(and rewrite to Redis if you need cache)
For client side, you can have a thread to query the result in an infinite loop until write finished. This will make the async write performs like sync.
In this case, you will avoid resouces compete(like deadlocks), and will also smooth the write rate by a fixed consuming rate.
I am going to be hosting a gaming website for a bunch of card games that are pretty popular in my country. Since there is really no other convenient way to do this apart from web sockets I decided that's the path I'm going to take. Anyhow there are a bunch of concerns that I have.
I plan on having multiple servers for each type of game and each server is going to host for about 100-200 people. With that being said it is necessary for players to see information about the server before joining, such as how many players are connected, what is the average wait time and such. To do this I could either use files or a database. I would very much like to go with the database but I'd like to ask a few questions about MySQL
I know that MySQL was not built for real-time applications but what is an acceptable interval for each server to update its status in the database?
Are there any problems I may run into when having persistent connections from each server to the MySQL server?
Are there any benefits to preparing the statement that is going to update the database and then execute it every N seconds or I should prepare it each time? I am asking this because I don't know what happens when a statement is prepared so having a prepared statement in a persistent connection may not be a good idea.
Using InnoDB, is there any need to create a separate MySQL server solely for this purpose or I could use the server that is used for the site. Not really sure if those updates every N seconds would affect anything.
I'm not sure if caching would be the correct term for this but my objective is to build a website that will be displaying data from my database.
My problem: There is a high probability of a lot of traffic and all data is contained in the database.
My hypothesized solution: Would it be faster if I created a separate program (in java for example) to connect to the database every couple of seconds and update the html files (where the data is displayed) with the new data? (this would also increase security as users will never be connecting to the database) or should I just have each user create a connection to MySQL (using php) and get the data?
If you've had any experiences in a similar situation please share, and I'm sorry if I didn't word the title correctly, this is a pretty specific question and I'm not even sure if I explained myself clearly.
Here are some thoughts for you to think about.
First, I do not recommend you create files but trust MySQL. However, work on configuring your environment to support your traffic/application.
You should understand your data a little more (How much is the data in your tables change? What kind of queries are you running against the data. Are your queries optimized?)
Make sure your tables are optimized and indexed correctly. Make sure all your query run fast (nothing causing a long row locks.)
If your tables are not being updated very often, you should consider using MySQL cache as this will reduce your IO and increase the query speed. (BUT wait! If your table is being updated all the time this will kill your server performance big time)
Your query cache is set to "ON". Based on my experience this is always bad idea unless your data does not change on all your tables. When you have it set to "ON" MySQL will cache every query. Then as soon as they data in the table changes, MySQL will have to clear the cached query "it is going to work harder while clearing up cache which will give you bad performance." I like to keep it set to "ON DEMAND"
from there you can control which query should be cache and which should not using SQL_CACHE and SQL_NO_CACHE
Another thing you want to review is your server configuration and specs.
How much physical RAM does your server have?
What types of Hard Drives are you using? SSD is not at what speed do they rotate? perhaps 15k?
What OS are you running MySQL on?
How is the RAID setup on your hard drives? "RAID 10 or RAID 50" will help you out a lot here.
Your processor speed will make a big different.
If you are not using MySQL 5.6.20+ you should consider upgrading as MySQL have been improved to help you even more.
How much RAM does your server have? is your innodb_log_buffer_size set to 75% of your total physical RAM? Are you using innodb table?
You can also use MySQL replication to increase the read sources of the data. So you have multiple servers with the same data and you can point half of your traffic to read from server A and the other half from Server B. so the same work will be handled by multiple server.
Here is one argument for you to think about: Facebook uses MySQL and have millions of hits per seconds but they are up 100% of the time. True they have trillion dollar budget and their network is huge but the idea here is to trust MySQL to get the job done.
I would like to convert my stats tracking system not to write to the database directly, as we're hitting bottlenecks.
We're currently using memcached for certain aspects of the site, and I wanted to use it for storing stats and committing them to mysql DB periodically.
The issue lies however in the number of items (which is in the millions) for which potentially there could be stats collected between the cronjob runs that would commit them into the database. Other than running a SELECT * FROM data and checking for existence of every single memcache key, and then updating the table.... is there any other way to do this?
(I'm not saying below is gospel, this is just my gut feeling. As said later on, I don't have the specifics of your system :) And obviously no offence meant etc :) )
I would advice against using memcached for this. Memcached is build te quickly retrieve values that you've gotten before, not to store values. The big difference is that is your cache is getting full, you'll loose your data.
Normally, you'd just have no data in your cache, and recollect the data from the source, which is impossible in this case. That alone would be a reason for me to try an dissuade you from this.
Now you say the major problem is the mysql connection limit you are hitting. If you do simple stuff (like what we talked about in the comments: the insert delayed), it's just a case of increasing the limit. You should probably have enough power to have your scripts/users go to the database once and say "this should eventually be added", and then go away. If your users can't even open 1 connection for that, there's a serious resource problem you probably won't fix by adding extra layers of cache?
Obviously hard to say without any specs of the system, soft and hardware, but my suggestion would be to see if you can just let them open their connections by increasing the limit, and fiddle with the server variables a bit, instead of monkey-patching your system by using a memcached as an in-between layer.
I had a similar issue with statistic data. But please don't use memcached for it. You can't be sure that ALL your items will moved to DB. You can loose data and/or double process data.
You should analyse your bottleneck against how much data you are writing/reading and how many connections you need. And than switch to something scalable like Hadoop, Cassandra, Scripe and other systems.
You need to provide additional information on the platform that you are running: O/S, database (version), storage engine, RAM, CPU (if possible)?
Are you inserting into a single table or more than one table?
Can you disable the indexes on the tables you are inserting into as this slows down the insert functions.
Are you running any triggers or stored procedures to compute values as you insert the raw data?