How to identify AWS RDS Scaling conditions

How to identify AWS RDS Scaling conditions - mysql

I have an application that is hosted in AWS ECS and having the database in AWS RDS. I'm using a microservice-based container architecture for my application. The frontend of the application is in Angular and Backends are in Java and Python. Right now, the database size is ~1GB. The database size will increase day by day as the scraped data will be inserted daily.
Right now, some queries are taking 4-6 seconds to execute. We need to host this application to the public and there are a lot of users will be using the application. So when we load tested the application with 50 users, I found that the CPU of RDS reached 100% and some queries had taken more than 60 seconds to execute and then timed-out. Also, the CPU and memory of other microservices (frontend and backend) are normal. I have tried vertically scaling the application up to 64GB RAM and 4 vCPUs but still, this condition remains.
Is this an issue with the query or can I do anything with the database server configuration?
The RDS storage I'm using is 100GB with a general-purpose SSD. So, I guess there will be only 300 IOPS, right? I'm planning to use RDS read replicas but before that, I need to know is there anything that I need to do for improving the performance? Any database configurations etc?
I also not have a good idea about the MySQL connection count. Right now, it is using a total of 24 connections. Do I need to change the connection count also?

Query Optimisation
As Tim pointed out, try and optimise the queries. Since you have more data getting inserted, consider indexing the table and make the queries to use indexed columns if possible. also consider archiving unused old data.
number of connections
If you have control over the code, you can make use of database pools to control the number of connections that your applications can use.
CPU usage
the CPU usage is highly related to the performance of the queries, once you optimise the queries, the CPU usage should come down.
disk usage
Use the cloudwatch metrics to monitor the disk usage, based on that, you can decide on a provisioned IOPS disk.
hope this helps.

Related

AWS Architecture of CPU intensive MySQL searching and querying

I'm posed with a simple problem of scaling. As college students, we are new to setting up AWS services, droplets, scaling etc. We are stuck at deciding the architecture of our app. I'm unable to decide whether to use a big computing AMAZON EC2 or smaller multiple instances for benchmarking performance.
Even after code optimization, our MySQL queries are not as fast as we want it to be and clearly our hardware will address this problem. We are looking for high performance servers which require mostly searching a lot of MySQL FULL INDEXED search queries over 1.3M records (which clearly is a CPU and Memory intensive task). We intend to switch over to Solr at a later point of time. Both these tasks are very CPU demanding and RAM dependent. As of now, we are running our web app stack entirely on a single CPU with 2 cores and 4 GB RAM. However, we now wish to split the load up into multiple, say 5 instances/droplets of each 2 cores and 4 GB RAM.
Our primary concern is that, If we did create multiple ec2 instances/ droplets, wouldn't there be a considerable overhead for communicating between the instances/droplets for a standard MySQL search. As far as I know, the MySQL connection uses sockets to connect to local/remote host. Being a remote communication between 4 servers, I would expect significant overhead for EACH query.
For instance, let's say I've setup 4 instances and I've been allocated these IP's for each of them.
Server 1- x.x.x.1
Server 2- x.x.x.2
Server 3 - x.x.x.3
Server 4 - x.x.x.4
I setup a MySQL server and dump my databases into each of these instances (sounds like a very bad idea). Now I make a MySQL connection using python as:
conn1 = MySQLdb.connect(host=server1,user=user,passwd=password,db=db)
conn2 = MySQLdb.connect(host=server2,user=user,passwd=password,db=db)
conn3 = MySQLdb.connect(host=server3,user=user,passwd=password,db=db)`
conn4 = MySQLdb.connect(host=server4,user=user,passwd=password,db=db)
Since, each of these databases arn't on the localhost, I would guess that there is a huge overhead involved in contacting the server and getting the data for each query.
My Thoughts:
I'm guessing there must be a solution for integrating different droplets/ instances together. Unfortunately, I haven't found any resources to support that claim.
I've looked into Amazon RDS, which seems like a good fit. But again, I wouldn't be able to benchmark against a 4 instances MySQL search or a single huge AWS RDS server (given, it is quite expensive for new apps.)
We are also unsure of replacement of python with popular languages for scaling such as Scala which will help me tackle this problem of dealing with multiple servers.
Any suggestions will be greatly appreciated by our 3 member team :)

Will changing from a cloud server to a dedicated server improve MySQL performance?

I currently have a cloud server with 2 CPU's and 2GB RAM, Ive noticed that MySQL performs SELECT queries quite slowly on a table size greater than 250,000 rows unless the queries are cached. The table has all the correct indexes and my queries are optimized.
I have run the same MySQL setup on my personal computer and the queries are much faster. So I am thinking its more to do with the server itself rather than the MySQL queries or table setup.
I am considering upgrading the cloud server to 4GB Ram and 4CPU's or switching to a dedicated server with the same spec.
Will changing from a cloud server to a dedicated server improve MySQL performance?

Though Cloud computing utilizes the SAN storage, you will be sharing the resources like CPU, RAM and Storage among number of VMs. This may cause the slowness in I/O operations and hence in your MySQL queries.
In dedicated server, all resources will be allocated to your applications only. Thus, switching to a dedicated server will certainly improve the MySQL server performance. Also, when you have database driven application, SSD drives are highly recommended.

2GB is tiny by today's norms. Ar you using InnoDB or MyISAM? What is the cache size? Does it have SSDs?
It sounds like the table is bigger than can be cached, and that the slow query is scanning the entire table. That says that the query and/or indexes are not well optimized. Let's see the SELECT and SHOW CREATE TABLE; more tuning may be possible.
Does the Cloud limit your IOPs? That mighe explain the difference. And it would further indicate that we should study the query.

Given your query takes 20 seconds to run on Cloud environment, we can rule out network latency problems. To find out the bottleneck. Try using "top" while running the large query on cloud.
If CPU utilization is high, change to a high CPU type, maybe higher memory as well (you can see memory utilization with "top" as well).
If the %CPU of idle or wait is high, it basically suggests underlying disk is the bottleneck here. You can verify this by running some "fio" test, or use "iostat" and monitor the device usage for the query. The solution is adding more disk space or use a SSD device on cloud.

I will prefer cloud always. when it come to dedicated server there you can't find elasticity on server usage. choose a good cloud architecture to enhance the application performance. you have set proper auto scaling, devops methods.
You can also use a remote database to connect your application in order to avoid I/O request to application server.

Amazon Linux EC2 Webserver / MYSQL Upgrade – Traffic causing error establishing a database connection

To give you a little background, I currently have a website that allows users to upload photos. The website was initially on a GoDaddy shared server, but recent surges in traffic have forced me to explore other options. During peak hours, the site contains 400+ active visitors, which when combined with user uploads, forces the shared server to shut down.
I have a small amount of experience with setting up servers through AWS and attempted to place the website on a c1.medium instance, Amazon Linux. The website along with the MYSQL Database is on the same instance. While I have read that this is in general frowned upon, I have similarly read that moving the database to another instance would not significantly increase speeds. Unfortunately, the c1.medium instance also was unable to support the traffic and I soon received an error Establishing a Database connection. The site does load on occasion, so the problem stems from the traffic load and not an actual problem with the database.
My question is whether the problem revolves solely around MySQL? The database itself when backed up is around 250MB. Is the issue caused by input / output requests to the database? I read posts with people with similar problems in which they stated that installing MySQL 5.6 solved the problem, but also have read that MySQL 5.6 is slower than MySQL 5.5, which is my current version.
After conducting some preliminary research I started to believe that I could resolve the problem by increasing the IPOS of the EBS. Originally I had it set the IPOS as standard, but changed it to Provisioned IOPS and 30x the size of the EBS (i.e., 60GB – 1800 IOPS). This once again appeared to have little impact. Do I need to upgrade my instance? What measures should I be focused on when deciding on the instance? It appears that the cheapest instance with high network performance and EBS optimized would be c3.xlarge. Suggestions?

Several things to consider:
1)Separate the database server from the web server
Your database should not share resources with your web server. They will both perform poorly as the result.
It is easier to find what the bottle-neck is.
2) Upgrade to MySQL 5.6
In all the benchmarks that I have seen and done 5.6 performs better than 5.5
3) Configure your database to take advantage of your resources
Depending on the storage engine and the memory allocated in your machine configure MySQL for example set innodb_buffer_pool_size to 70% of the (DEDICATED) RAM
4) Monitor MySQL and check slow query log
Slow query log shows the queries that are slow and inefficient
5) Learn to use EXPLAIN
EXPLAIN shows query plan in MySQL run EXPLAIN on slow queries to tune them
6) Use Key-Value Stores to Cache queries
Using Memcached or Redis cache queries so they don't hit your database and return repeated queries from the memory
7) Increasing IOPS and Scaling Out
Increasing IOPS and getting better hardware helps but using efficient queries is much more effective. Queries and application most of the time are a greater contributing factor to performance problems
8) Replication
To help with concurrency consider moving to a MySQL Master/Slave replication , if you still had issues.
Final Note: use EBS because the storage on EC2 is ephemeral and will not persistent.

We recently did extensive research on the performance bottlenecks associated with massive end-user peaks across our global customer base, and the analysis actually indicates the database as - by far - the most frequent cause of slowdowns or even crashes. The report (https://queue-it.com/trend-report) includes best practice advice from our customers on how to improve the situation, which you may find helpful.

Amazon RDS MySQL instance performs very slow

I have published my website on Amazon EC2 (Singapore region) and I have used MySQL RDS instance for the data storage. Everything is working very fine except performance.
I seems that, my all queries, especially the select statement, is performing very slowly. If I check this issue on my local PC, there it is working very well. But when I am trying to get data from RDS instance, it is very slow. Some of the select statements takes 2-3 seconds to fetch data.
I have properly tuned up all table indexes, and normalized/de-normalized as required. I have made all necessary settings on RDS custom parameter group (eg. max_connection, buffer etc). I don't know if I am missing something, but it didn't work for me - performance didn't increase.
So, can someone please help me with this issue?

It is worth noting that, for whatever reason, MySQL query cache is OFF by default in RDS. We learned that the hard way ourselves this week.
This won't help performance of your initial query, but it may speed things up in general.
To re-enable query cache:
Log in to the RDS Console
Click on your RDS instance to view it's details
Edit the Database Parameter Group
Be sure to set both query_cache_size and query_cache_type
(Disclaimer: I am not a DBA so there may be additional things I'm missing here)

For me, it was nothing to do with MySQL but rather the instance type I was on t2.medium. The problem is I ran out of CPU credits because the load on the DB was too high and the balance kept going down until finally, I was getting far fewer credits hourly than were needed.
Here is what I saw in RDS CloudWatch under CPU Credit Usage:
If you have the same problem it may be time to switch to a different instance. Here is the list of instance types:
https://aws.amazon.com/rds/instance-types/
Hope this helps.

It is important to have your RDS and EC2 instances not just in the same region but also in the same availability zone to minimize the latency.
I had an API hosted in Ireland on EC2 and moved the Database to a MySQL cluster in Virginia USA that we had set up for another project and the round trip on every SQL query made the API unusable.

RDS MySQL performance can be increased in following ways assuming the system has more read ratio:
Use Larger instance types, they come with better NW bandwidth. Example AWS Quadruple EXL comes with 1,000 Mbps bandwidth.
Use PIOPS storage you can extract 12,500 IOPS of 16KB from MySQL DB
If lots of read is performed, add one or more Read Replica's to increase read performance
Apply standard practices like: Tune the queries, apply the indexes etc

First i highly recommend to look over these queries using
SHOW FULL PROCESSLIST
You can read more about it on SHOW FULL PROCESSLIST
This will show you the time each query take.
Then you can use
EXPLAIN
You can read more about it on EXPLAIN
This will show you if you need some enhancement on your queries

You can check where the query is taking time by making use of profiling. Use the below query:
set profiling=1
execute your select query
show profile
This will tell you about the status of the query and where the query is spending its time. If the sum of all the time returned by the profiling is less than the actual execution time of the query, then maybe other factors like Network bandwidth may be the cause of it.

Always should deploy source and rds in the same AWS availability zone for lower network latency and Should create a private endpoint link in VPC for RDS to connect RDS endpoint through the internal network instead of routing through the internet.
Reference: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/vpc-interface-endpoints.html

I found that after migrating to RDS all my database Indexes are gone! They weren't migrated along with the schema and data. Make sure you're indexes are there.

MySQL vs SQLite on Amazon EC2

I have a Java program and PHP website I plan to run on my Amazon EC2 instance with an EBS volume. The program writes to and reads from a database. The website only reads from the same database.
On AWS you pay for the amount of IOPS (I/O requests Per Second) to the volume. Which database has the least IOPS? Also, can SQLite handle queries from both the program and website simultaneously?

The amount of IO is going to depend a lot on how you have MySQL configured and how your application uses the database. Caching, log file sizes, database engine, transactions, etc. will all affect how much IO you do. In other words, it's probably not possible to predict in advance although I'd guess that SQLite would have more disk IO simply because the database file has to be opened and closed all the time while MySQL writes and reads (in particular) can be cached in memory by MySQL itself.
This site, Estimating I/O requests, has a neat method for calculating your actual IO and using that to estimate your EBS costs. You could run your application on a test system under simulated loads and use this technique to measure the difference in IO between a MySQL solution and a SQLite solution.
In practice, it may not really matter. The cost is $0.10 per million IO requests. On a medium-traffic e-commerce site with heavy database access we were doing about 315 million IO requests per month, or $31. This was negligible compared to the EC2, storage, and bandwidth costs which ran into the thousands. You can use the AWS cost calculator to plug in estimates and calculate all of your AWS costs.
You should also keep in mind that the SQLite folks only recommend that you use it for low to medium traffic websites. MySQL is a better solution for high traffic sites.

Yes SQLite can handle queries from both the program and website simultaneously. SQLite uses file level locking to ensure consistency.
In memory SQLite is intended for standalone or embedded programs.
Do not use in memory only SQLite:
when you share the db between multiple processes
when you have a php based website in which case you won't be able to leverage php fastcgi

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008