AWS Architecture of CPU intensive MySQL searching and querying

AWS Architecture of CPU intensive MySQL searching and querying - mysql

I'm posed with a simple problem of scaling. As college students, we are new to setting up AWS services, droplets, scaling etc. We are stuck at deciding the architecture of our app. I'm unable to decide whether to use a big computing AMAZON EC2 or smaller multiple instances for benchmarking performance.
Even after code optimization, our MySQL queries are not as fast as we want it to be and clearly our hardware will address this problem. We are looking for high performance servers which require mostly searching a lot of MySQL FULL INDEXED search queries over 1.3M records (which clearly is a CPU and Memory intensive task). We intend to switch over to Solr at a later point of time. Both these tasks are very CPU demanding and RAM dependent. As of now, we are running our web app stack entirely on a single CPU with 2 cores and 4 GB RAM. However, we now wish to split the load up into multiple, say 5 instances/droplets of each 2 cores and 4 GB RAM.
Our primary concern is that, If we did create multiple ec2 instances/ droplets, wouldn't there be a considerable overhead for communicating between the instances/droplets for a standard MySQL search. As far as I know, the MySQL connection uses sockets to connect to local/remote host. Being a remote communication between 4 servers, I would expect significant overhead for EACH query.
For instance, let's say I've setup 4 instances and I've been allocated these IP's for each of them.
Server 1- x.x.x.1
Server 2- x.x.x.2
Server 3 - x.x.x.3
Server 4 - x.x.x.4
I setup a MySQL server and dump my databases into each of these instances (sounds like a very bad idea). Now I make a MySQL connection using python as:
conn1 = MySQLdb.connect(host=server1,user=user,passwd=password,db=db)
conn2 = MySQLdb.connect(host=server2,user=user,passwd=password,db=db)
conn3 = MySQLdb.connect(host=server3,user=user,passwd=password,db=db)`
conn4 = MySQLdb.connect(host=server4,user=user,passwd=password,db=db)
Since, each of these databases arn't on the localhost, I would guess that there is a huge overhead involved in contacting the server and getting the data for each query.
My Thoughts:
I'm guessing there must be a solution for integrating different droplets/ instances together. Unfortunately, I haven't found any resources to support that claim.
I've looked into Amazon RDS, which seems like a good fit. But again, I wouldn't be able to benchmark against a 4 instances MySQL search or a single huge AWS RDS server (given, it is quite expensive for new apps.)
We are also unsure of replacement of python with popular languages for scaling such as Scala which will help me tackle this problem of dealing with multiple servers.
Any suggestions will be greatly appreciated by our 3 member team :)

Related

How to identify AWS RDS Scaling conditions

I have an application that is hosted in AWS ECS and having the database in AWS RDS. I'm using a microservice-based container architecture for my application. The frontend of the application is in Angular and Backends are in Java and Python. Right now, the database size is ~1GB. The database size will increase day by day as the scraped data will be inserted daily.
Right now, some queries are taking 4-6 seconds to execute. We need to host this application to the public and there are a lot of users will be using the application. So when we load tested the application with 50 users, I found that the CPU of RDS reached 100% and some queries had taken more than 60 seconds to execute and then timed-out. Also, the CPU and memory of other microservices (frontend and backend) are normal. I have tried vertically scaling the application up to 64GB RAM and 4 vCPUs but still, this condition remains.
Is this an issue with the query or can I do anything with the database server configuration?
The RDS storage I'm using is 100GB with a general-purpose SSD. So, I guess there will be only 300 IOPS, right? I'm planning to use RDS read replicas but before that, I need to know is there anything that I need to do for improving the performance? Any database configurations etc?
I also not have a good idea about the MySQL connection count. Right now, it is using a total of 24 connections. Do I need to change the connection count also?

Query Optimisation
As Tim pointed out, try and optimise the queries. Since you have more data getting inserted, consider indexing the table and make the queries to use indexed columns if possible. also consider archiving unused old data.
number of connections
If you have control over the code, you can make use of database pools to control the number of connections that your applications can use.
CPU usage
the CPU usage is highly related to the performance of the queries, once you optimise the queries, the CPU usage should come down.
disk usage
Use the cloudwatch metrics to monitor the disk usage, based on that, you can decide on a provisioned IOPS disk.
hope this helps.

Amazon Linux EC2 Webserver / MYSQL Upgrade – Traffic causing error establishing a database connection

To give you a little background, I currently have a website that allows users to upload photos. The website was initially on a GoDaddy shared server, but recent surges in traffic have forced me to explore other options. During peak hours, the site contains 400+ active visitors, which when combined with user uploads, forces the shared server to shut down.
I have a small amount of experience with setting up servers through AWS and attempted to place the website on a c1.medium instance, Amazon Linux. The website along with the MYSQL Database is on the same instance. While I have read that this is in general frowned upon, I have similarly read that moving the database to another instance would not significantly increase speeds. Unfortunately, the c1.medium instance also was unable to support the traffic and I soon received an error Establishing a Database connection. The site does load on occasion, so the problem stems from the traffic load and not an actual problem with the database.
My question is whether the problem revolves solely around MySQL? The database itself when backed up is around 250MB. Is the issue caused by input / output requests to the database? I read posts with people with similar problems in which they stated that installing MySQL 5.6 solved the problem, but also have read that MySQL 5.6 is slower than MySQL 5.5, which is my current version.
After conducting some preliminary research I started to believe that I could resolve the problem by increasing the IPOS of the EBS. Originally I had it set the IPOS as standard, but changed it to Provisioned IOPS and 30x the size of the EBS (i.e., 60GB – 1800 IOPS). This once again appeared to have little impact. Do I need to upgrade my instance? What measures should I be focused on when deciding on the instance? It appears that the cheapest instance with high network performance and EBS optimized would be c3.xlarge. Suggestions?

Several things to consider:
1)Separate the database server from the web server
Your database should not share resources with your web server. They will both perform poorly as the result.
It is easier to find what the bottle-neck is.
2) Upgrade to MySQL 5.6
In all the benchmarks that I have seen and done 5.6 performs better than 5.5
3) Configure your database to take advantage of your resources
Depending on the storage engine and the memory allocated in your machine configure MySQL for example set innodb_buffer_pool_size to 70% of the (DEDICATED) RAM
4) Monitor MySQL and check slow query log
Slow query log shows the queries that are slow and inefficient
5) Learn to use EXPLAIN
EXPLAIN shows query plan in MySQL run EXPLAIN on slow queries to tune them
6) Use Key-Value Stores to Cache queries
Using Memcached or Redis cache queries so they don't hit your database and return repeated queries from the memory
7) Increasing IOPS and Scaling Out
Increasing IOPS and getting better hardware helps but using efficient queries is much more effective. Queries and application most of the time are a greater contributing factor to performance problems
8) Replication
To help with concurrency consider moving to a MySQL Master/Slave replication , if you still had issues.
Final Note: use EBS because the storage on EC2 is ephemeral and will not persistent.

We recently did extensive research on the performance bottlenecks associated with massive end-user peaks across our global customer base, and the analysis actually indicates the database as - by far - the most frequent cause of slowdowns or even crashes. The report (https://queue-it.com/trend-report) includes best practice advice from our customers on how to improve the situation, which you may find helpful.

MySql localhost vs Amazon RDS instance

I have surprise with some of mysql performance.
When I run simple query 'SELECT 1;' on my local host (mysql 5.6.x) using workbench, its execute in 0.000s, but the same query I ran on Amazon RDS (medium mysql 5.5.x) it tooks almost 0.094s.
I can not understand this behavior of mysql.

I would propose that you go for simplicity of maintenance and scalability (which RDS apparently provides much better than local MySQL) over performance for now.
Later on, when you get insufficient output for dollar paid for Amazon, you could start measuring carefully to find bottlenecks.
Nonetheless, if you are used to maintain private tightly packed VPS servers — local MySQL could be more simple to maintain, and you should only go for external services much later :)

The query SELECT 1 nearly requires no parsing and no table access so its execution is quick. For remote servers however there's also the time to transmit the request and shared resources like RDS are not real-time resources, so it might take a millisecond or two to get the task executed. If there's no bigger difference then just ignore this little extra time.

MySQL vs SQLite on Amazon EC2

I have a Java program and PHP website I plan to run on my Amazon EC2 instance with an EBS volume. The program writes to and reads from a database. The website only reads from the same database.
On AWS you pay for the amount of IOPS (I/O requests Per Second) to the volume. Which database has the least IOPS? Also, can SQLite handle queries from both the program and website simultaneously?

The amount of IO is going to depend a lot on how you have MySQL configured and how your application uses the database. Caching, log file sizes, database engine, transactions, etc. will all affect how much IO you do. In other words, it's probably not possible to predict in advance although I'd guess that SQLite would have more disk IO simply because the database file has to be opened and closed all the time while MySQL writes and reads (in particular) can be cached in memory by MySQL itself.
This site, Estimating I/O requests, has a neat method for calculating your actual IO and using that to estimate your EBS costs. You could run your application on a test system under simulated loads and use this technique to measure the difference in IO between a MySQL solution and a SQLite solution.
In practice, it may not really matter. The cost is $0.10 per million IO requests. On a medium-traffic e-commerce site with heavy database access we were doing about 315 million IO requests per month, or $31. This was negligible compared to the EC2, storage, and bandwidth costs which ran into the thousands. You can use the AWS cost calculator to plug in estimates and calculate all of your AWS costs.
You should also keep in mind that the SQLite folks only recommend that you use it for low to medium traffic websites. MySQL is a better solution for high traffic sites.

Yes SQLite can handle queries from both the program and website simultaneously. SQLite uses file level locking to ensure consistency.
In memory SQLite is intended for standalone or embedded programs.
Do not use in memory only SQLite:
when you share the db between multiple processes
when you have a php based website in which case you won't be able to leverage php fastcgi

Utilizing 5 physical servers in 1 cluster

I have 5 physical servers with low end memory & cpu resources. I want to create 1 cluster using all these servers and want to run mysql db on the same such that mysql db would utilize 5 server's CPU power to execute db queries & same for memory. Could you please help me understanding how to achieve this?
Regards,

You should be aware that MySQL clustering is primarily for availability, not performance. With low end servers, your performance will not be nearly as good as one server with the same aggregate cpu/memory resources.

MySQL doesn't scale up like that. The only effective way to scale up MySQL's performance is more memory.
However, if you have the resources to re-tool your application and your database will partition well at the table level, then it's possible to have your app connect to multiple databases depending on what tables it needs. You basically will want an ORM layer (such as Hibernate) that can be taught to associated an object's table with a particular database.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008