MySQL vs SQLite on Amazon EC2 - mysql

I have a Java program and PHP website I plan to run on my Amazon EC2 instance with an EBS volume. The program writes to and reads from a database. The website only reads from the same database.
On AWS you pay for the amount of IOPS (I/O requests Per Second) to the volume. Which database has the least IOPS? Also, can SQLite handle queries from both the program and website simultaneously?

The amount of IO is going to depend a lot on how you have MySQL configured and how your application uses the database. Caching, log file sizes, database engine, transactions, etc. will all affect how much IO you do. In other words, it's probably not possible to predict in advance although I'd guess that SQLite would have more disk IO simply because the database file has to be opened and closed all the time while MySQL writes and reads (in particular) can be cached in memory by MySQL itself.
This site, Estimating I/O requests, has a neat method for calculating your actual IO and using that to estimate your EBS costs. You could run your application on a test system under simulated loads and use this technique to measure the difference in IO between a MySQL solution and a SQLite solution.
In practice, it may not really matter. The cost is $0.10 per million IO requests. On a medium-traffic e-commerce site with heavy database access we were doing about 315 million IO requests per month, or $31. This was negligible compared to the EC2, storage, and bandwidth costs which ran into the thousands. You can use the AWS cost calculator to plug in estimates and calculate all of your AWS costs.
You should also keep in mind that the SQLite folks only recommend that you use it for low to medium traffic websites. MySQL is a better solution for high traffic sites.

Yes SQLite can handle queries from both the program and website simultaneously. SQLite uses file level locking to ensure consistency.
In memory SQLite is intended for standalone or embedded programs.
Do not use in memory only SQLite:
when you share the db between multiple processes
when you have a php based website in which case you won't be able to leverage php fastcgi

Related

How to identify AWS RDS Scaling conditions

I have an application that is hosted in AWS ECS and having the database in AWS RDS. I'm using a microservice-based container architecture for my application. The frontend of the application is in Angular and Backends are in Java and Python. Right now, the database size is ~1GB. The database size will increase day by day as the scraped data will be inserted daily.
Right now, some queries are taking 4-6 seconds to execute. We need to host this application to the public and there are a lot of users will be using the application. So when we load tested the application with 50 users, I found that the CPU of RDS reached 100% and some queries had taken more than 60 seconds to execute and then timed-out. Also, the CPU and memory of other microservices (frontend and backend) are normal. I have tried vertically scaling the application up to 64GB RAM and 4 vCPUs but still, this condition remains.
Is this an issue with the query or can I do anything with the database server configuration?
The RDS storage I'm using is 100GB with a general-purpose SSD. So, I guess there will be only 300 IOPS, right? I'm planning to use RDS read replicas but before that, I need to know is there anything that I need to do for improving the performance? Any database configurations etc?
I also not have a good idea about the MySQL connection count. Right now, it is using a total of 24 connections. Do I need to change the connection count also?
Query Optimisation
As Tim pointed out, try and optimise the queries. Since you have more data getting inserted, consider indexing the table and make the queries to use indexed columns if possible. also consider archiving unused old data.
number of connections
If you have control over the code, you can make use of database pools to control the number of connections that your applications can use.
CPU usage
the CPU usage is highly related to the performance of the queries, once you optimise the queries, the CPU usage should come down.
disk usage
Use the cloudwatch metrics to monitor the disk usage, based on that, you can decide on a provisioned IOPS disk.
hope this helps.

AWS Architecture of CPU intensive MySQL searching and querying

I'm posed with a simple problem of scaling. As college students, we are new to setting up AWS services, droplets, scaling etc. We are stuck at deciding the architecture of our app. I'm unable to decide whether to use a big computing AMAZON EC2 or smaller multiple instances for benchmarking performance.
Even after code optimization, our MySQL queries are not as fast as we want it to be and clearly our hardware will address this problem. We are looking for high performance servers which require mostly searching a lot of MySQL FULL INDEXED search queries over 1.3M records (which clearly is a CPU and Memory intensive task). We intend to switch over to Solr at a later point of time. Both these tasks are very CPU demanding and RAM dependent. As of now, we are running our web app stack entirely on a single CPU with 2 cores and 4 GB RAM. However, we now wish to split the load up into multiple, say 5 instances/droplets of each 2 cores and 4 GB RAM.
Our primary concern is that, If we did create multiple ec2 instances/ droplets, wouldn't there be a considerable overhead for communicating between the instances/droplets for a standard MySQL search. As far as I know, the MySQL connection uses sockets to connect to local/remote host. Being a remote communication between 4 servers, I would expect significant overhead for EACH query.
For instance, let's say I've setup 4 instances and I've been allocated these IP's for each of them.
Server 1- x.x.x.1
Server 2- x.x.x.2
Server 3 - x.x.x.3
Server 4 - x.x.x.4
I setup a MySQL server and dump my databases into each of these instances (sounds like a very bad idea). Now I make a MySQL connection using python as:
conn1 = MySQLdb.connect(host=server1,user=user,passwd=password,db=db)
conn2 = MySQLdb.connect(host=server2,user=user,passwd=password,db=db)
conn3 = MySQLdb.connect(host=server3,user=user,passwd=password,db=db)`
conn4 = MySQLdb.connect(host=server4,user=user,passwd=password,db=db)
Since, each of these databases arn't on the localhost, I would guess that there is a huge overhead involved in contacting the server and getting the data for each query.
My Thoughts:
I'm guessing there must be a solution for integrating different droplets/ instances together. Unfortunately, I haven't found any resources to support that claim.
I've looked into Amazon RDS, which seems like a good fit. But again, I wouldn't be able to benchmark against a 4 instances MySQL search or a single huge AWS RDS server (given, it is quite expensive for new apps.)
We are also unsure of replacement of python with popular languages for scaling such as Scala which will help me tackle this problem of dealing with multiple servers.
Any suggestions will be greatly appreciated by our 3 member team :)

Will changing from a cloud server to a dedicated server improve MySQL performance?

I currently have a cloud server with 2 CPU's and 2GB RAM, Ive noticed that MySQL performs SELECT queries quite slowly on a table size greater than 250,000 rows unless the queries are cached. The table has all the correct indexes and my queries are optimized.
I have run the same MySQL setup on my personal computer and the queries are much faster. So I am thinking its more to do with the server itself rather than the MySQL queries or table setup.
I am considering upgrading the cloud server to 4GB Ram and 4CPU's or switching to a dedicated server with the same spec.
Will changing from a cloud server to a dedicated server improve MySQL performance?
Though Cloud computing utilizes the SAN storage, you will be sharing the resources like CPU, RAM and Storage among number of VMs. This may cause the slowness in I/O operations and hence in your MySQL queries.
In dedicated server, all resources will be allocated to your applications only. Thus, switching to a dedicated server will certainly improve the MySQL server performance. Also, when you have database driven application, SSD drives are highly recommended.
2GB is tiny by today's norms. Ar you using InnoDB or MyISAM? What is the cache size? Does it have SSDs?
It sounds like the table is bigger than can be cached, and that the slow query is scanning the entire table. That says that the query and/or indexes are not well optimized. Let's see the SELECT and SHOW CREATE TABLE; more tuning may be possible.
Does the Cloud limit your IOPs? That mighe explain the difference. And it would further indicate that we should study the query.
Given your query takes 20 seconds to run on Cloud environment, we can rule out network latency problems. To find out the bottleneck. Try using "top" while running the large query on cloud.
If CPU utilization is high, change to a high CPU type, maybe higher memory as well (you can see memory utilization with "top" as well).
If the %CPU of idle or wait is high, it basically suggests underlying disk is the bottleneck here. You can verify this by running some "fio" test, or use "iostat" and monitor the device usage for the query. The solution is adding more disk space or use a SSD device on cloud.
I will prefer cloud always. when it come to dedicated server there you can't find elasticity on server usage. choose a good cloud architecture to enhance the application performance. you have set proper auto scaling, devops methods.
You can also use a remote database to connect your application in order to avoid I/O request to application server.

Performance effects of moving mysql db to another Amazon EC2 instance

We have an EC2 running both apache and mysql at the moment. I am wondering if moving the mysql to another EC2 instance will increase or decrease the performance of the site. I am more worried about the network speed issues between the two instances.
EC2 instances in the same availability zone are connected via a 10,000 Mbps network - that's faster than a good solid state drive on a SATA-3 interface (6Gb/s)
You won't see any performance drop by moving a database to another server, in fact you'll probably see a performance increase because of having separate memory and cpu cores for the two servers.
If your worry is network latency then forget about it - not a problem on AWS in the same availability zone.
Another consideration is that you're probably storing your website & db file on an EBS mounted volume. That EBS block is stored off-instance so you're actually storing a storage array on the same super-fast 10Gbps network.
So what I'm saying is... with EBS your website and database are already talking across the network to get their data, putting them on seperate instances won't really change anything in that respect - besides giving more resources to both servers. More resources means more data stored locally in memory and more performance.
The answer depends largely on what resources apache and MySQL are using. They can happily co-habit if demands on your website are low, and each are configured with enough memory that they don't shell out to virtual memory. In this instance, they are best kept together.
As traffic grows, or your application grows, you will benefit from splitting them out because they can then both run inside dedicated memory. Provided that the instances are in the same region then you should see fast performance between them. I have even run a web application in Europe with the DB in USA and performance wasn't noticeably bad! I wouldn't recommend that though!
Because AWS is easy and cheap, your best bet is to set it up and benchmark it!

Scalability comparison between different DBMSs

By what factor does the performance (read queries/sec) increase when a machine is added to a cluster of machines running either:
a Bigtable-like database
MySQL?
Google's research paper on Bigtable suggests that "near-linear" scaling is achieved can be achieved with Bigtable. This page here featuring MySQL's marketing jargon suggests that MySQL is capable of scaling linearly.
Where is the truth?
Having built and benchmarked several applications using VoltDB I consistently measure between 90% and 95% of additional transactional throughput as each new server is added to the cluster. So if an application is performing 100,000 transaction per second (TPS) on a single server, I measure 190,000 TPS on 2 servers, 280,000 TPS on 3 servers, and so on. At some point we expect the server to server networking to become a bottleneck but our largest cluster (30 servers) is still above 90%.
If you don't do that many writes to the database, MySQL may be a good and easy solution, especially if coupled with memcached in order to increase the read speed.
OTOH if you data is constantly changing, you should probably look somewhere else:
Cassandra
VoltDB
Riak
MongoDB
CouchDB
HBase
These systems have been designed to scale linearly with the number of computers added to the system.
A full list is available here.