I am using RDS on amazon with a MySQL interface. My application runs on EC2 nodes and read/update the database, but the number of reads and writes are too many and that reduces performance. Most of the time the number of connections exceed the allowed limit.
I was considering using Elasticache to improve performance, however I did not find resources on web, how to configure database to use this effectively.
Is this the best way to improve my read/write performance?
Any suggestions?
You can't just "turn on" memcache. You need to write code that interacts with memcache, such that your database query results are cached in memcache. Take a look at this users guide -- I think it will give you a good idea for how memcache is used:
http://www.memcachier.com/documentation/memcache-user-guide/
Performance is related to the type and structure of the queries used, so there might be some room for optimization. Maybe you can provide more details about the exact queries used. However, you could tackle this from a different angle - if you had an auto scaling capability, you could simply scale up your database to additional machines if needed, so you could accommodate an infinite number of connections even without any performance optimization (of course if you optimize it will improve performance). This is not possible on RDS, but there are at least two other cloud DB providers running on EC2 that I'm aware of which offer auto scaling - www.xeround.com and www.enterprisedb.com.
You can use elasticache as a second level cache for your rds db. If you use java you can use the hibernate-memcached lib. But you still need to configure how and what to cache in the second level cache depending on your data.
Additionally you could use read replica at RDS least to get split the read traffic.
(What instance type are you using, you saw that they have different i/o capacities?)
Related
I have an application that is hosted in AWS ECS and having the database in AWS RDS. I'm using a microservice-based container architecture for my application. The frontend of the application is in Angular and Backends are in Java and Python. Right now, the database size is ~1GB. The database size will increase day by day as the scraped data will be inserted daily.
Right now, some queries are taking 4-6 seconds to execute. We need to host this application to the public and there are a lot of users will be using the application. So when we load tested the application with 50 users, I found that the CPU of RDS reached 100% and some queries had taken more than 60 seconds to execute and then timed-out. Also, the CPU and memory of other microservices (frontend and backend) are normal. I have tried vertically scaling the application up to 64GB RAM and 4 vCPUs but still, this condition remains.
Is this an issue with the query or can I do anything with the database server configuration?
The RDS storage I'm using is 100GB with a general-purpose SSD. So, I guess there will be only 300 IOPS, right? I'm planning to use RDS read replicas but before that, I need to know is there anything that I need to do for improving the performance? Any database configurations etc?
I also not have a good idea about the MySQL connection count. Right now, it is using a total of 24 connections. Do I need to change the connection count also?
Query Optimisation
As Tim pointed out, try and optimise the queries. Since you have more data getting inserted, consider indexing the table and make the queries to use indexed columns if possible. also consider archiving unused old data.
number of connections
If you have control over the code, you can make use of database pools to control the number of connections that your applications can use.
CPU usage
the CPU usage is highly related to the performance of the queries, once you optimise the queries, the CPU usage should come down.
disk usage
Use the cloudwatch metrics to monitor the disk usage, based on that, you can decide on a provisioned IOPS disk.
hope this helps.
To give you a little background, I currently have a website that allows users to upload photos. The website was initially on a GoDaddy shared server, but recent surges in traffic have forced me to explore other options. During peak hours, the site contains 400+ active visitors, which when combined with user uploads, forces the shared server to shut down.
I have a small amount of experience with setting up servers through AWS and attempted to place the website on a c1.medium instance, Amazon Linux. The website along with the MYSQL Database is on the same instance. While I have read that this is in general frowned upon, I have similarly read that moving the database to another instance would not significantly increase speeds. Unfortunately, the c1.medium instance also was unable to support the traffic and I soon received an error Establishing a Database connection. The site does load on occasion, so the problem stems from the traffic load and not an actual problem with the database.
My question is whether the problem revolves solely around MySQL? The database itself when backed up is around 250MB. Is the issue caused by input / output requests to the database? I read posts with people with similar problems in which they stated that installing MySQL 5.6 solved the problem, but also have read that MySQL 5.6 is slower than MySQL 5.5, which is my current version.
After conducting some preliminary research I started to believe that I could resolve the problem by increasing the IPOS of the EBS. Originally I had it set the IPOS as standard, but changed it to Provisioned IOPS and 30x the size of the EBS (i.e., 60GB – 1800 IOPS). This once again appeared to have little impact. Do I need to upgrade my instance? What measures should I be focused on when deciding on the instance? It appears that the cheapest instance with high network performance and EBS optimized would be c3.xlarge. Suggestions?
Several things to consider:
1)Separate the database server from the web server
Your database should not share resources with your web server. They will both perform poorly as the result.
It is easier to find what the bottle-neck is.
2) Upgrade to MySQL 5.6
In all the benchmarks that I have seen and done 5.6 performs better than 5.5
3) Configure your database to take advantage of your resources
Depending on the storage engine and the memory allocated in your machine configure MySQL for example set innodb_buffer_pool_size to 70% of the (DEDICATED) RAM
4) Monitor MySQL and check slow query log
Slow query log shows the queries that are slow and inefficient
5) Learn to use EXPLAIN
EXPLAIN shows query plan in MySQL run EXPLAIN on slow queries to tune them
6) Use Key-Value Stores to Cache queries
Using Memcached or Redis cache queries so they don't hit your database and return repeated queries from the memory
7) Increasing IOPS and Scaling Out
Increasing IOPS and getting better hardware helps but using efficient queries is much more effective. Queries and application most of the time are a greater contributing factor to performance problems
8) Replication
To help with concurrency consider moving to a MySQL Master/Slave replication , if you still had issues.
Final Note: use EBS because the storage on EC2 is ephemeral and will not persistent.
We recently did extensive research on the performance bottlenecks associated with massive end-user peaks across our global customer base, and the analysis actually indicates the database as - by far - the most frequent cause of slowdowns or even crashes. The report (https://queue-it.com/trend-report) includes best practice advice from our customers on how to improve the situation, which you may find helpful.
I have published my website on Amazon EC2 (Singapore region) and I have used MySQL RDS instance for the data storage. Everything is working very fine except performance.
I seems that, my all queries, especially the select statement, is performing very slowly. If I check this issue on my local PC, there it is working very well. But when I am trying to get data from RDS instance, it is very slow. Some of the select statements takes 2-3 seconds to fetch data.
I have properly tuned up all table indexes, and normalized/de-normalized as required. I have made all necessary settings on RDS custom parameter group (eg. max_connection, buffer etc). I don't know if I am missing something, but it didn't work for me - performance didn't increase.
So, can someone please help me with this issue?
It is worth noting that, for whatever reason, MySQL query cache is OFF by default in RDS. We learned that the hard way ourselves this week.
This won't help performance of your initial query, but it may speed things up in general.
To re-enable query cache:
Log in to the RDS Console
Click on your RDS instance to view it's details
Edit the Database Parameter Group
Be sure to set both query_cache_size and query_cache_type
(Disclaimer: I am not a DBA so there may be additional things I'm missing here)
For me, it was nothing to do with MySQL but rather the instance type I was on t2.medium. The problem is I ran out of CPU credits because the load on the DB was too high and the balance kept going down until finally, I was getting far fewer credits hourly than were needed.
Here is what I saw in RDS CloudWatch under CPU Credit Usage:
If you have the same problem it may be time to switch to a different instance. Here is the list of instance types:
https://aws.amazon.com/rds/instance-types/
Hope this helps.
It is important to have your RDS and EC2 instances not just in the same region but also in the same availability zone to minimize the latency.
I had an API hosted in Ireland on EC2 and moved the Database to a MySQL cluster in Virginia USA that we had set up for another project and the round trip on every SQL query made the API unusable.
RDS MySQL performance can be increased in following ways assuming the system has more read ratio:
Use Larger instance types, they come with better NW bandwidth. Example AWS Quadruple EXL comes with 1,000 Mbps bandwidth.
Use PIOPS storage you can extract 12,500 IOPS of 16KB from MySQL DB
If lots of read is performed, add one or more Read Replica's to increase read performance
Apply standard practices like: Tune the queries, apply the indexes etc
First i highly recommend to look over these queries using
SHOW FULL PROCESSLIST
You can read more about it on SHOW FULL PROCESSLIST
This will show you the time each query take.
Then you can use
EXPLAIN
You can read more about it on EXPLAIN
This will show you if you need some enhancement on your queries
You can check where the query is taking time by making use of profiling. Use the below query:
set profiling=1
execute your select query
show profile
This will tell you about the status of the query and where the query is spending its time. If the sum of all the time returned by the profiling is less than the actual execution time of the query, then maybe other factors like Network bandwidth may be the cause of it.
Always should deploy source and rds in the same AWS availability zone for lower network latency and Should create a private endpoint link in VPC for RDS to connect RDS endpoint through the internal network instead of routing through the internet.
Reference: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/vpc-interface-endpoints.html
I found that after migrating to RDS all my database Indexes are gone! They weren't migrated along with the schema and data. Make sure you're indexes are there.
I'm thinking about moving our production env from a self hosted solution to amazon aws. I took a look at the different services and thought about using RDS as replacement for our mysql instances. The hardware we're using for our master seems to be better than the best hardware we can get when using rds (Quadruple Extra Large DB Instance). Since I can't simply move our production env to aws and see if the performance is still good enough I'd love to make some tests in advance.
I thought about creating a full query log from our current master, configure the rds instance and start to replay the full query log against it. Actually I don't even know if this kind of testing is a good idea but I guess you'll tell me if there are better ways to make sure the performance of mysql won't drop dramatically when making the move to rds.
Is there a preferred tool to replay the full query log?
at what metrics should I take a look while running the test
cpu usage?
memory usage?
disk usage?
query time?
anything else?
Thanks in advance
I'd recommend against replaying the query log - it's almost certainly not going to give you the information you want, and will take a significant amount of effort.
Firstly, you'd need to prepare your database so that replaying the query log won't break constraints when inserting, updating or deleting data, and that subsequent "select" queries will find the records they should find. This is distinctly non-trivial on anything other than a toy database - just taking a back-up and replaying the log doesn't necessarily guarantee the ordering of DML statements will match what happened on production. This may well give you a false sense of comfort - all your select statements return in a few milliseconds, because the data they're looking for doesn't exist!
Secondly, load and performance testing rarely works by replaying what happened on production - that doesn't (usually) reflect the peak conditions that will bring your system to its knees. For instance, most production systems run happily most of the time at <50% capacity, but go through spikes during the day, when they might reach 80% or more of capacity - that's what you care about, can your new environment handle the peaks.
My recommendation would be to use a tool like JMeter to write performance scripts (either directly to the database using the JDBC driver, or through the front end if you've got a web appilcation). Your performance scripts should reflect the behaviour you see from users, and be parameterized so they're not dependent on the order in which records are created.
Set yourself some performance targets (ideally based on current production levels, with a multiplier to cover you against spikes), e.g. "100 concurrent users, with no query taking more than 1 second"), and use JMeter to simulate that load. If you reach it first time, congratulations - go home! If not, look at the performance counters to see where the bottleneck is; see if you can alleviate that bottleneck (or tune your queries, your awesome on-premise hardware may be hiding some performance issues). Typical bottlenecks are CPU, RAM, and disk I/O.
Experiment with different test scenarios - "lots of writes", "lots of reads", "lots of reporting queries", and mix them up.
The idea is to understand the bottlenecks on the system, and see how far you are from those bottleneck, and understand what you can do to alleviate them. Once you know that, your decision to migrate will be far more robust.
Did you try amazon-rds? How is it, performance-wise?
I think this is a hard question to answer as it is highly specific to the problem you are trying to solve, but I will try to give you a picture of what we have seen.
We have been benchmarking RDS using CloudWatch metric gathering tools (provided here: http://aws.amazon.com/articles/2934) and have found it does perform nearly as well as our production servers for our data set. We tested both with a single RDS instance and with a Multi-AZ setup (what we plan to use in production) with no back-up retention.
The load we have been able to throw at it so far we are able to get up into the 1000-1100 Write IOPS range (their metric) even on a small database instance (db.m1.small). At least for our load, increasing the instance class did not affect our throughput IOPS or Bytes. We saw about a 10% reduction in performance when
Amazon freely admitted up front that the solution to really scale out is to subdivide your problem such that you can scale/store it across multiple database servers. We in fact have this in our application (very similar to sharding) and therefore will be able to take advantage and very easily move past this IOPS measurement.
We've found RDS to be pretty comparable performance-wise to having our own production servers (either dedicated or virtual or EC2). Note that you will always suffer some IO/performance degradation using a virtualization solution, which is what RDS seems to be using, and this will show up under heavy load (but with heavy load, you should be having a dedicated MySQL/DB box anyway.)
Take note: the biggest performance you will likely see is the network latency - if you are reading/writing from an EC2 box to an RDS box and vice versa, the network latency will probably be the bottlebeck, particularly for a large number of queries. This is likely to be worse if you are connecting from a non-Amazon/non-EC2 box to RDS.
You will probably get more performance from an equivalent spec physical box than a virtual box, but this is true of dedicated vs EC2/RDS, and is not a RDS-specific problem.
Regarding RDS vs EC2, the defaults that Amazon has set up RDS with seem to be pretty good, so if you are simply looking to have database server(s) up and running and connect to it, RDS is more than suitable. Do make sure you have the cost correctly analyzed though - its not the same pricing model as, say, an EC2 instance.