Google Compute Engine VM disk is very slow - google-compute-engine

We just switched over to Google Compute Engine and are having major issues with disk speed. It's been about 5% of Linode or worse. It's never exceeded 20M/s for writing and 10M/s for reading. Most of the time it's 15M/s for writing and 5M/s for reading.
We're currently running a n1-highmem-4 (4 vCPU, 26 GB memory) machine. CPU & memory aren't the bottleneck. Just running a script that reads rows from PostgreSQL database, processes them, then writes back to PostgreSQL. It's just for a common job to update database row in batch. Tried running 20 processes to take advantage of multi-core but the overall progress is still slow.
We're thinking disk may be bottleneck because traffic is abnormally low.
Finally we decided to do benchmarking. We found it's not only slow but seems to have a major bug which is reproducible:
create & connect to instance
run the benchmark at least three times:
dd if=/dev/zero bs=1024 count=5000000 of=~/5Gb.file
We found it becomes extremely slow and aren't able to finish the benchmarking at all.

Persistent Disk performance is proportional to the size of the disk itself and the VM that it is attached to. The larger the disk (or the VM), the higher the performance, so in essence, the price you are paying for the disk or the VM pays not only for the disk/CPU/RAM but also for the IOPS and throughput.
Quoting the Persistent Disk documentation:
Persistent disk performance depends on the size of the volume and the
type of disk you select. Larger volumes can achieve higher I/O levels
than smaller volumes. There are no separate I/O charges as the cost of
the I/O capability is included in the price of the persistent disk.
Persistent disk performance can be described as follows:
IOPS performance limits grow linearly with the size of the persistent disk volume.
Throughput limits also grow linearly, up to the maximum bandwidth for the virtual machine that the persistent disk is attached to.
Larger virtual machines have higher bandwidth limits than smaller virtual machines.
There's also a more detailed pricing chart on the page which shows what you get per GB of space that you buy (data below is current as of August 2014):
Standard disks SSD persistent disks
Price (USD/GB per month) $0.04 $0.025
Maximum Sustained IOPS
Read IOPS/GB 0.3 30
Write IOPS/GB 1.5 30
Read IOPS/volume per VM 3,000 10,000
Write IOPS/volume per VM 15,000 15,000
Maximum Sustained Throughput
Read throughput/GB (MB/s) 0.12 0.48
Write throughput/GB (MB/s) 0.09 0.48
Read throughput/volume per VM (MB/s) 180 240
Write throughput/volume per VM (MB/s) 120 240
and a concrete example on the page of what a particular size of a disk will give you:
As an example of how you can use the performance chart to determine
the disk volume you want, consider that a 500GB standard persistent
disk will give you:
(0.3 × 500) = 150 small random reads
(1.5 × 500) = 750 small random writes
(0.12 × 500) = 60 MB/s of large sequential reads
(0.09 × 500) = 45 MB/s of large sequential writes

Related

Is there any configuration option for innodb that increase the performance for SSD drives?

InnoDB often assumes the use of spindle hard drives for storing data, so it makes the best effort to reduce random access, and opt for sequential access whenever possible. Nowadays, a lot of MySQL instances in production use SSD Flash drives, so the benefit of these design decisions are gone, and may cause some overhead.
Does the current development of InnoDB take this into consideration? Is there any configuration option for tuning the performance for SSD drives?
There are a few tunables that directly impact SSD versus spinning drives:
innodb_flush_neighbors = OFF -- since there is no "rotational delay"
innodb_random_read_ahead = OFF
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_page_size - Using a smaller size _may_ help if _all_ tables are
accessed randomly _and_ have small rows. (not for the faint of heart)
I don't have "good" numbers for the numeric values, and they depend on the performance characteristics of the SSD/Flash.
There may be more settings. I don't know, for example, about innodb_read_io_threads and innodb_write_io_threads. (Default 4, max 64.)
A Battery-Backed Write Cache on the RAID controller makes writes essentially instantaneous. This is also a factor
8.0 (the latest version) has some "cost" parameters that apply to disk versus RAM access. They are manually tunable. (Sorry, I don't have the details.) I have implored the developers to make them self-tuning.
Keep in mind that if you are "at the limit" of such tunables, you won't have much room before the system collapses.
Keep in mind that optimizing indexes and queries often gives the biggest bang for your buck.

IOPS or Throughput? - Determining Write Bottleneck in Amazon RDS Instance

We have nightly load jobs that writes several hundred thousand records to an Mysql reporting database running in Amazon RDS.
The load jobs are taking several hours to complete, but I am having a hard time figuring out where the bottleneck is.
The instance is currently running with General Purpose (SSD) storage. By looking at the cloudwatch metrics, it appears I am averaging less than 50 IOPS for the last week. However, Network Receive Throughput is less than 0.2 MB/sec.
Is there anyway to tell from this data if I am being bottlenecked by network latency (we are currently loading the data from a remote server...this will change eventually) or by Write IOPS?
If IOPS is the bottleneck, I can easily upgrade to Provisioned IOPS. But if network latency is the issue, I will need to redesign our load jobs to load raw data from EC2 instances instead of our remote servers, which will take some time to implement.
Any advice is appreciated.
UPDATE:
More info about my instance. I am using an m3.xlarge instance. It is provisioned for 500GB in size. The load jobs are done with the ETL tool from pentaho. They pull from multiple (remote) source databases and insert into the RDS instance using multiple threads.
You aren't using up much CPU. Your memory is very low. An instance with more memory should be a good win.
You're only doing 50-150 iops. That's low, you should get 3000 in a burst on standard SSD-level storage. However, if your database is small, it is probably hurting you (since you get 3 iops per GB- so if you are on a 50gb or smaller database, consider paying for provisioned iops).
You might also try Aurora; it speaks mysql, and supposedly has great performance.
If you can spread out your writes, the spikes will be smaller.
A very quick test is to buy provisioned IOPS, but be careful as you may get fewer than you do currently during a burst.
Another quick means to determine your bottleneck is to profile your job execution application with a profiler that understands your database driver. If you're using Java, JProfiler will show the characteristics of your job and it's use of the database.
A third is to configure your database driver to print statistics about the database workload. This might inform you that you are issuing far more queries than you would expect.
Your most likely culprit accessing the database remotely is actually round-trip latency. The impact is easy to overlook or underestimate.
If the remote database has, for example, a 75 millisecond round-trip time, you can't possibly execute more than 1000 (milliseconds/sec) / 75 (milliseconds/round trip) = 13.3 queries per second if you're using a single connection. There's no getting around the laws of physics.
The spikes suggest inefficiency in the loading process, where it gathers for a while, then loads for a while, then gathers for a while, then loads for a while.
Separate but related, if you don't have the MySQL client/server compression protocol enabled on the client side... find out how to enable it. (The server always supports compression but the client has to request it during the initial connection handshake), This won't fix the core problem but should improve the situation somewhat, since less data to physically transfer could mean less time wasted in transit.
I'm not an RDS expert and I don't know if my own particular case can shed you some light. Anyway, hope this give you any kind of insight.
I have a db.t1.micro with 200GB provisioned (that gives be 600 IOPS baseline performance), on a General Purpose SSD storage.
The heaviest workload is when I aggregate thousands of records from a pool of around 2.5 million rows from a 10 million rows table and another one of 8 million rows. I do this every day. This is what I average (it is steady performance, unlike yours where I see a pattern of spikes):
Write/ReadIOPS: +600 IOPS
NetworkTrafficReceived/Transmit throughput: < 3,000 Bytes/sec (my queries are relatively short)
Database connections: 15 (workers aggregating on parallel)
Queue depth: 7.5 counts
Read/Write Throughput: 10MB per second
The whole aggregation task takes around 3 hours.
Also check 10 tips to improve The Performance of your app in AWS slideshare from AWS Summit 2014.
I don't know what else to say since I'm not an expert! Good luck!
In my case it was the amount of records. I was writing only 30 records per minute and had an Write IOPS of round about the same 20 to 30. But this was eating at the CPU, which reduced the CPU credit quite steeply. So I took all the data in that table and moved it to another "historic" table. And cleared all data in that table.
CPU dropped back down to normal measures, but Write IOPS stayed about the same, this was fine though. The problem: Indexes, I think because so many records needed to indexed when inserting it took way more CPU to do this indexing with that amount of rows. Even though the only index I had was a Primary Key.
Moral of my story, the problem is not always where you think it lies, although I had increased Write IOPS it was not the root cause of the problem, but rather the CPU that was being used to do index stuff when inserting which caused the CPU credit to fall.
Not even X-RAY on the Lambda could catch an increased query time. That is when I started to look at the DB directly.
Your Queue depth graph shows > 2 , which clearly indicate that the IOPS is under provisioned. (if Queue depth < 2 then IOPS is over provisioned)
I think you have used the default AUTOCOMMIT = 1 (autocommit mode). It performs a log flush to disk for every insert and exhausted the IOPS.
So,It is better to use (for performance tuning) AUTOCOMMIT = 0 before bulk inserts of data in MySQL, if the insert query looks like
set AUTOCOMMIT = 0;
START TRANSACTION;
-- first 10000 recs
INSERT INTO SomeTable (column1, column2) VALUES (vala1,valb1),(vala2,valb2) ... (val10000,val10000);
COMMIT;
--- next 10000 recs
START TRANSACTION;
INSERT INTO SomeTable (column1, column2) VALUES (vala10001,valb10001),(vala10001,valb10001) ... (val20000,val20000);
COMMIT;
--- next 10000 recs
.
.
.
set AUTOCOMMIT = 1
Using the above approach in t2.micro and inserted 300000 in 15 minutes using PHP.

Improving MySQL I/O Performance (Hardware & Partitioning)

I need to improve I/O performance for my database. I'm using the "2xlarge" HW described below & considering upgrading to the "4xlarge" HW (http://aws.amazon.com/ec2/instance-types/). Thanks for the help!
Details:
CPU usage is fine (usually under 30%), uptime load averages anywhere from 0.5 to 2.0 (but I believe I'm supposed to divide that by the number of CPU's) so that looks okay as well. However, the I/O is bad: iostat show favorable service times, but the time spent in queue (I suppose this means waiting to access the disk) is far too high. I've configured MySQL to flush to disk every 1sec instead of every write, which helps, but not enough. Profiling shows there are a handful of tables that are the culprits for most of the load (both read && write operations). Queries are already indexed & optimized, but not partitioned. Average MySQL states are: Sending Data # 45%, statistics # 20%, Updating # 15%, Sorting result # 8%.
Questions:
How much performance will I get by upgrading HW?
Same question, but if I partition the high-load tables?
Machines:
m2.2xlarge
64-bit
4 vCPU
13 ECU
34.2 Gb Mem
EBS-Optimized
Network Performance: "Moderate"
m2.4xlarge
64-bit
6 vCPU
26 ECU
68.4 Gb Mem
EBS-Optimized
Network Performance: "High"
In my experience, the biggest boost in MySQL performance comes from IO. You have alot of RAM. Try setting up a ram drive and point the tmpdir to it.
I have several MySQL servers that are very busy. My settings are below - maybe this can help you tweak your settings.
My Setup is:
-Dual 2.66 CPU 8 core box with a 6-drive Raid-1E array - 1.3TB.
-innodblogs on a separate SSD drives.
-tmpdir is on a 2GB tempfs partition.
-32GB of ram
InnoDB settings:
innodb_thread_concurrency=16
innodb_buffer_pool_size = 22G
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 400M
innodb_log_files_in_group=8
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2 (This is a slave machine - 1 is not required fo my purposes)
innodb_flush_method=O_DIRECT
Current Queries per second avg: 5185.650
I am using Percona Server, which is quite a bit faster that other MySQLs from my testing.

Amazon RDS CPU Utilization due to COUNT query

I have published my website on Amazon EC2 (Singapore region) and I have used MySQL RDS medium instance for data storage in the same region.
In my case, most of the select queries have some COUNT functionality. These queries are showing very slow results. I have already created appropriate indexes on the table and I checked the EXPLAIN command to analyze these queries. It shows me that full table scans are necessary to get results.
On my RDS medium instance, I have configured the custom parameter group with the following settings.
log_queries_not_using_index = true,
slow_query_log = true,
long_query_time = 2 sec,
max_connections = 303,
innodb_buffer_pool_size = {DBInstanceClassMemory*3/4}
Yesterday my CPU utilization went above 95% and my site crashed due to this. There was no major increase in traffic.
Also, I dumped the data on my local system, and tested one of the COUNT queries. While it takes about 1.5 seconds for it to run on RDS, it takes only about 400 milliseconds for it to run on my local system. The configuration on my local system (4GB RAM, Intel core 2 duo 2.8GHz) is:
max_connections = 100,
slow_query_log = true,
long_query_time = 2 sec,
innodb_buffer_pool_size = 72351744
So, what could be the reason for the spike in CPU utilization as well as the difference in performance times between RDS and my local system?
Thanks,
Depending on the table size - the RDS instance uses EBS to store the data - if you're doing a table scan and its going to have to get the data from EBS instead of a locally cached in-memory key and then scan it. So - you're likely seeing the increased lag of the network between the RDS instance where the CPU resides and the EBS data in the SAN. When you do the same query on your local computer the only lag is the disk head seek time.
Then there is the difference between CPU time - an m1.medium has less CPU time (and therefore less opportunity to scan the results) than the core2 duo based on Amazon's definition of EC2 units.
HTH - in general, I'd try to avoid doing COUNT(s) in your queries as this is a terribly inefficient query (as you've seen) which can and will continue to cause nasty undesired results when the DB is under real-time varying levels of load.
R

Heavy mysql usage CPU or Memory

I have an Amazon s3 instance and the project we have on the server does a lot of INSERTs and UPDATEs and a few complex SELECTs
We are finding that MySQL will quite often take up a lot of the CPU.
I am trying to establish whether a higher memory or higher cpu is better of the above setup.
Below is an output of cat /proc/meminfo
MemTotal: 7347752 kB
MemFree: 94408 kB
Buffers: 71932 kB
Cached: 2202544 kB
SwapCached: 0 kB
Active: 6483248 kB
Inactive: 415888 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 168264 kB
Writeback: 0 kB
AnonPages: 4617848 kB
Mapped: 21212 kB
Slab: 129444 kB
SReclaimable: 86076 kB
SUnreclaim: 43368 kB
PageTables: 54104 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 3673876 kB
Committed_AS: 5384852 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 180 kB
VmallocChunk: 34359738187 kB
Current Setup:
High-CPU Extra Large Instance
7 GB of memory 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute
Units each) 1690 GB of instance
storage 64-bit platform I/O
Performance: High API name: c1.xlarge
Possible Setup:
High-Memory Double Extra Large
Instance
34.2 GB of memory 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute
Units each) 850 GB of instance storage
64-bit platform I/O Performance: High
API name: m2.2xlarge
I would go for 32GB memory and maybe more harddisks in RAID. CPU won't help that much - you have eough cpu power. You also need to configure mysql correctly.
Leave 1-2 GB for OS cache and for temp tables.
Increase tmp_table_size
remove swap
optimize query_cache_size (don't make it too big - see mysql documentation about it)
periodically run FLUSH QUERY CACHE. if your query cache is <512 MB - run it every 5 minutes. This doesn't clean the cache, it optimizes it (defragment). This is from mysql docs:
Defragment the query cache to better
utilize its memory. FLUSH QUERY CACHE
does not remove any queries from the
cache, unlike FLUSH TABLES or RESET
QUERY CACHE.
However I noticed that the other solution has the half disk space: 850GB, which might be reduced number of hard disks. That's generally a bad idea. The biggest problem in databases is hard disks. If you use RAID5 - make sure you don't use less hard disks. If you don't use raid at all - I would suggest raid 0.
Use vmstat and iostat to find out whether CPU or I/O is the bottleneck (if I/O - add more RAM and load data into memory). Run from shell and check results:
vmstat 5
iostat -dx 5
if CPU is the problem vmstat will show high values in us column, and iostat will show low disk use (util)
if I/O is the problem then vmstat will show low values in us column and iostat will show high disk utilization (util); by high I mean >50%
It depends on the application.
You could use memcached to cache mysql queries. This would ease cpu usage a bit, however with this method you would want to increase RAM for storing the queries.
On the other hand if it's not feasible based on type of application then I would recommend higher CPU.
There are not many reasons for a MySQL to use a lot of CPU: It is either processing of stored routines (stored procedures or stored functions) or sorting going on that can eat CPU.
If you are using a lot of CPU due to stored routines, you are doing it wrong and your soul cannot be saved anyway.
If you are using a lot of CPU due to sorting going on, some things can be done, depending on the nature of your queries: You can extend indexes to include the ORDER BY columns at the end, or you can drop the ORDER BY clauses and sort in the client.
What approach to chose depends on the actual cause of the CPU usage - is it queries and sorting? And on the actual queries. So in any case you will need better monitoring first.
Not having monitoring information, the general advice is always: Buy more memory, not more CPU for a database.
Doesn't the on-demand nature of EC2 make it rather straightforward to rent the possible setup for a day, and do some load testing? Measurements speak louder than words.
Use "High-CPU Extra Large Instance".
In your current setup, MySQL is not constrained by memory:
MemTotal: 7347752 kB
MemFree: 94408 kB
Buffers: 71932 kB
Cached: **2202544 kB**
Out of 7 GB memory, 2 GB is unused and being used by OS as I/O cache.
In this case, increasing CPU count would give you more bang for buck.