MySQL I/O tuning on SAN environment - mysql

I have MySQL
MySQL version: 5.6.16-enterprise-commercial-advanced-log
MySQL Engine: InnoDB
MySQL Data Size: 35GB (including 9GB of indexes)
Which is running on
VM: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
File system: ext3
Storage technology: SAN
Disk data format: RAID-5
Disk type: SAS with Fibre channel
I found that lot of SELECT queries taking time because of I/O related operations (though necessary indexes and buffer is added to the same)
mysql> show profile for query 1;
+----------------------+------------+
| Status | Duration |
+----------------------+------------+
| starting | 0.000313 |
| checking permissions | 0.000024 |
| checking permissions | 0.000018 |
| Opening tables | 0.000086 |
| init | 0.000121 |
| System lock | 0.000092 |
| optimizing | 0.000079 |
| statistics | 0.000584 |
| preparing | 0.000070 |
| executing | 0.000014 |
| Sending data | 202.362338 |
| end | 0.000068 |
| query end | 0.000027 |
| closing tables | 0.000049 |
| freeing items | 0.000124 |
| logging slow query | 0.000135 |
| cleaning up | 0.000057 |
+----------------------+------------+
Does the following network latency and throughput is good for above mentioned DB instance?
$ time dd if=/dev/zero of=foobar bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 1.22617 seconds, 33.4 MB/s
real 0m1.233s
user 0m0.002s
sys 0m0.049s
$ time dd if=foobar of=/dev/null bs=4k count=10000
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 0.026479 seconds, 1.5 GB/s
real 0m0.032s
user 0m0.004s
sys 0m0.024s
$ time dd if=/dev/zero of=foobar bs=128K count=10000
10000+0 records in
10000+0 records out
1310720000 bytes (1.3 GB) copied, 78.1099 seconds, 16.8 MB/s
real 1m18.241s
user 0m0.012s
sys 0m1.117s
$ time dd if=foobar of=/dev/null bs=128K count=10000
10000+0 records in
10000+0 records out
163840000 bytes (164 MB) copied, 0.084886 seconds, 1.9 GB/s
real 0m0.101s
user 0m0.002s
sys 0m0.083s
$ time dd if=/dev/zero of=foobar bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 461.587 seconds, 22.7 MB/s
real 7m42.700s
user 0m0.017s
sys 0m8.229s
$ time dd if=foobar of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 4.63128 seconds, 2.3 GB/s
real 0m4.634s
user 0m0.003s
sys 0m4.579s
Does the following changes to MySQL system variables gives positive results in the context of MySQL I/O tuning?
innodb_flush_method: O_DSYNC (Referred http://bugs.mysql.com/bug.php?id=54306 for read-heavy workload)
Moving from ext3 to XFS file system

It's very hard to answer your question, because with performance problems, the answer is generally 'it depends'. Sorry.
The first thing you need to do is understand what's actually going on, and why you're performance is less than expected. There's a variety of tools for that, especially on a Linux system.
First off, grab a benchmark of your system read and write performance.
The simple test I tend to use is to time a dd:
time dd if=/dev/zero of=/your/san/mount/point/testfile bs=1M count=100
time dd if=/your/san/mount/point/testfile of=/dev/null bs=1M count=100
(increase the 100 to 1000 if it's quick to complete). This will give you an idea of sustained throughput of your storage system.
Testing IO operations per second is a similar thing - do the same, but use a small block size and a large count. 4k block size, 10,000 as the count - again, if it goes a bit too quick, increase the number.
This will get you an estimate of IOPs and throughput of your storage subsystem.
Now, you haven't been specific as to what type of disks, and number of spindles you're using. As an extremely rough rule of thumb, you should expect 75 IOPs from a SATA drive, 150 from an FC or SAS drive, and 1500 from an SSD before performance starts to degrade.
However, as you're using a RAID-5, you need to consider the write penalty of RAID-5 - which is 4. That means your RAID-5 needs 4 ops to do one write IO. (There is no read penalty, but for obvious reasons, your 'parity' drive doesn't count as a spindle).
How does your workload look? Mostly reads, mostly writes? How many IOPs? And how many spindles? In all honesty, it's more likely that the root of your problem is expectations of the the storage subsystems.

Related

RDS mysql database storage space available is less than the instance allocation

My available storage doesn't seem to match up with the Instance storage size in RDS.
When I run:
SELECT table_schema "database_name",
sum( data_length + index_length ) / 1024 /
1024 "Database Size in MB",
sum( data_free )/ 1024 / 1024 "Free Space in MB"
FROM information_schema.TABLES
GROUP BY table_schema ;
I get:
| database_name | Data Base Size in MB | Free Space in MB |
+--------------------+----------------------+------------------+
| fx | 6787.34375000 | 3239.00000000 |
| information_schema | 0.21875000 | 0.00000000 |
| mysql | 10.04687500 | 0.00000000 |
| performance_schema | 0.00000000 | 0.00000000 |
So total available space is about 10Gb.
But the storage I have provisioned in RDS for this database instance is 29Gb (3 times more than the space I actually have).
This is after I've cleared the slow query log and general log.
Can someone clarify the discrepancy here? At the moment I'm risking running out of space.
Thanks
Turns out it was the general error log filling up - I had some batch delete jobs that could potentially run thousands of times daily, which was causing this to get big fast. Switching this off and purging it stopped the storage space continuously dropping. Hope this helps someone in future (Probably just me again).
This is also probably the best write up of options to try:
https://aws.amazon.com/premiumsupport/knowledge-center/view-storage-rds-mysql-mariadb/

MYSQL Database Repair Taking Extremely Long

I have a MYSQL MYISAM table which is approximately 7GB in size and has quite a bit of indexes. The table got corrupted yesterday and I have MYSQL repair working for 12+ Hours now.
I would like to know how long does a MYSQL repair actually take for such a table? (I cant exactly get number of rows and exact size at the moment, due to the repair running).
The variables I used are :
| myisam_max_sort_file_size | 9223372036853727232 |
| myisam_mmap_size | 18446744073709551615 |
| myisam_recover_options | FORCE |
| myisam_repair_threads | 1 |
| myisam_sort_buffer_size | 268435456
| read_buffer_size | 67108864 |
| read_only | OFF |
| read_rnd_buffer_size | 4194304
I was not able to change any of the global variables due to using GODADDY Managed Hosting.
The repair has always been "Repair by sorting" as seen by state.
Is there any other way I can speed up this repair process??
Thank you
Edit:
My memory and CPU usage can be seen in the image below
I have also tried restoring the database from a 2 day old backup (unto a new database), it also is stuck on "Repair with keycache" on the same table for the past 5 hours.
I have tried mysqlcheck and REPAIR TABLE, not myisamchk, as I cannot access the specific database folder in /var/lib/mysql which gives Permission Denied error. As well as myisamchk empty command gives command not found.
It should take minutes. If it hasn't finished after 12 hours, it probably hung and is never going to finish.
MyISAM hasn't really been maintained in over a decade, and it is quite likely you hit a bug. You might stand a better chance with myisamchk if you can get your hands on the raw database files.

Reduce the size of MySQL NDB binlog

I am running NDB Cluster and I see that on mysql api nodes, there is a very big binary log table.
+---------------------------------------+--------+-------+-------+------------+---------+
| CONCAT(table_schema, '.', table_name) | rows | DATA | idx | total_size | idxfrac |
+---------------------------------------+--------+-------+-------+------------+---------+
| mysql.ndb_binlog_index | 83.10M | 3.78G | 2.13G | 5.91G | 0.56 |
Is there any recommended way to reduce the size of that without breaking anything? I understand that this will limit the time frame for point-in-time recovery, but the data has is growing out of hand and I need to do a bit of clean up.
It looks like this is possible. I don't see anything here: http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-replication-pitr.html that says you can't based on the last epoch.
Some additional information might be gained by reading this article:
http://www.mysqlab.net/knowledge/kb/detail/topic/backup/id/8309
The mysql.ndb_binlog_index is a MyISAM table. If you are cleaning it,
make sure you don't delete entries of binary logs that you still need.

MySQL LIMIT x,y performance huge difference on 2 machine

I have a query in an InnoDb item table which contains 400k records (only...). I need to page the result for the presentation layer (60 per page) so I use LIMIT with values depending on the page to display.
The query is (the 110000 offset is just an example):
SELECT i.id, sale_type, property_type, title, property_name, latitude,
longitude,street_number, street_name, post_code,picture, url,
score, dw_id, post_date
FROM item i WHERE picture IS NOT NULL AND picture != ''
AND sale_type = 0
ORDER BY score DESC LIMIT 110000, 60;
Running this query on my machine takes about 1s.
Running this query on our test server is 45-50s.
EXPLAIN are both the same:
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+-------------+
| 1 | SIMPLE | i | index | NULL | IDX_SCORE | 5 | NULL | 110060 | Using where |
+----+-------------+-------+-------+---------------+-----------+---------+------+--------+-------------+
The only configuration difference when query show variables are:
innodb_use_native_aio. It is enabled on the Test server, not on my machine. I tried disabling it and I don't see any significant change
innodb_buffer_pool_size 1G on Test server, 2G on my machine
Test server has 2Gb of ram, 2 core CPU:
mysqld uses > 65% of RAM at all time, but only increase 1-2% running above query
mysqld uses 14% of CPU while running the above query, none when idle
My local machine has 8Gb, 8 core CPU:
mysqld uses 28% of RAM at all time, and doesn't really increase while running the above query (or for a so short time I can see it)
mysqld uses 48% of CPU while running the above query, none when idle
Where and what can I do to have the same performance on the Test server? Is the RAM and/or CPU too low?
UPDATE
I have setup a new Test server with the same specs but 8G of RAM and 4 core CPU and the performance just jumped to values similar to my machine. The original server didn't seem to use all of the RAM/CPU, why are performance so worse?
One of the surest ways to kill performance is to make MySQL scan an index that doesn't fit in memory. So during a query, it has to load part of the index into the buffer pool, then evict that part and load the other part of the index. Causing churn in the buffer pool like this during a query will cause a lot of I/O load, and that makes it very slow. Disk I/O is about 100,000 times slower than RAM.
So there's a big difference between 1GB of buffer pool and 2GB of buffer pool, if your index is, say 1.5GB.
Another tip: you really don't want to use LIMIT 110000, 60. That causes MySQL to read 110000 rows from the buffer pool (possibly loading them from disk if necessary) just to discard them. There are other ways to page through result sets much more efficiently.
See articles such as Optimized Pagination using MySQL.

Can I/O latency cause a simple UPDATE to take seconds in MySQL?

My MySQL application is experiencing slow performance when running some UPDATE, INSERT and DELETE queries. In this question, I will only discuss one particular UPDATE, because it's enough to demonstrate the problem:
UPDATE projects SET ring = 5 WHERE id = 1
This UPDATE is usually fast enough, around 0.2ms, but every now and then (enough to be a problem) it takes several seconds. Here's an excerpt from the log (look at the 4th line):
~ (0.000282) UPDATE `projects` SET `ring` = 5 WHERE `id` = 1
~ (0.000214) UPDATE `projects` SET `ring` = 6 WHERE `id` = 1
~ (0.000238) UPDATE `projects` SET `ring` = 7 WHERE `id` = 1
~ (3.986502) UPDATE `projects` SET `ring` = 8 WHERE `id` = 1
~ (0.000186) UPDATE `projects` SET `ring` = 9 WHERE `id` = 1
~ (0.000217) UPDATE `projects` SET `ring` = 0 WHERE `id` = 1
~ (0.000162) UPDATE `projects` SET `ring` = 1 WHERE `id` = 1
projects is an InnoDB table with 6 columns of types INT and VARCHAR, 17 rows and an index on id. It happens with other tables too, but here I'm focusing on this one. When trying to solve the problem, I ensured that the queries were all sequential, so this is not a lock issue. The UPDATE above is executed in the context of a transaction. Other information on the server:
VPS with 4GB RAM (was 1GB), 12GB free disk space
CentoOS 5.8 (was 5.7)
MySQL 5.5.10 (was 5.0.x)
The "was" bit above means it didn't work before or after the upgrade.
What I've tried so far, to no avail:
Setting innodb_flush_log_at_trx_commit to 0, 1 or 2
Setting innodb_locks_unsafe_for_binlog on or off
Setting timed_mutexes on or off
Changing innodb_flush_method from the default to O_DSYNC or O_DIRECT
Increasing innodb_buffer_pool_size from the default to 600M and then to 3000M
Increasing innodb_log_file_size from the default to 128M
Compiling MySQL from source
Running SHOW PROCESSLIST, which informs me that the state is "updating"
Running SHOW PROFILE ALL, which says that almost all the time was spent on "updating", and that, within that step, not so much time was spent on CPU cycles and there were many voluntary context switches (like 30)
Monitoring SHOW STATUS for changes in Innodb_buffer_pool_pages_dirty. There may be some relation between dirty pages being flushed and the slow queries, but the correlation isn't clear.
Then I decided to check the system's I/O latency with ioping. This is my first VPS, so I was surprised to see this result:
4096 bytes from . (vzfs /dev/vzfs): request=1 time=249.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=2 time=12.3 ms
4096 bytes from . (vzfs /dev/vzfs): request=3 time=110.5 ms
4096 bytes from . (vzfs /dev/vzfs): request=4 time=232.8 ms
4096 bytes from . (vzfs /dev/vzfs): request=5 time=294.4 ms
4096 bytes from . (vzfs /dev/vzfs): request=6 time=704.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=7 time=1115.0 ms
4096 bytes from . (vzfs /dev/vzfs): request=8 time=209.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=9 time=64.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=10 time=396.2 ms
Pretty erratic, I would say.
Having said all of that, I ask:
Can the I/O latency be occasionally killing MySQL performance? I always thought that, when you ran an UPDATE, the thread taking care of that connection wasn't going to flush data to disk or wait for such a flush; it would return immediately and the flushing would be done by another thread at another time.
If it can't be disk I/O, is there anything else I can possibly try, short of renting a dedicated server?
I'm replying to my own question with additional data that I collected based on your answers.
I used two notebooks connected by means of a wireless network. On notebook A, I
mounted a directory of notebook B using sshfs. Then on notebook A I started up
MySQL specifying that mounted directory as its data directory. This should
provide MySQL with a very slow I/O device. MySQL was started with
innodb_flush_log_at_trx_commit = 0.
I defined 3 sets of queries, each set consisting of an update and a select query
repeated 10,000 times, without explicit transactions. The experiments were:
US1SID: update and select on a specific row of the same table. The same row
was used in all iterations.
US1MID: update and select on a specific row of the same table. The row was a
different one in each iteration.
US2MID: update and select on rows of different tables. In this case, the table
being read by the select didn't change at all during the experiment.
Each set was run twice using a shell script (hence timings are slower than those in my original question), one under normal conditions and the other after executing the following command:
tc qdisc replace dev wlan0 root handle 1:0 netem delay 200ms
The command above adds a mean delay of 200ms when transmitting packets through wlan0.
First, here's the mean time of the top 99% fastest updates and selects, and the
bottom 1% updates and selects.
| Delay: 0ms | Delay: 200ms |
| US1SID | US1MID | US2MID | US1SID | US1MID | US2MID |
| top99%u | 0.0064 | 0.0064 | 0.0064 | 0.0063 | 0.0063 | 0.0063 |
| top99%s | 0.0062 | 0.0063 | 0.0063 | 0.0062 | 0.0062 | 0.0062 |
| bot01%u | 1.1834 | 1.2239 | 0.9561 | 1.9461 | 1.7492 | 1.9731 |
| bot01%s | 0.4600 | 0.5391 | 0.3417 | 1.4424 | 1.1557 | 1.6426 |
As is clear, even with really, really poor I/O performance, MySQL manages to
execute most queries really fast. But what concerns me the most is the worst
cases, so here's another table, showing the 10 slowest queries. A "u" means it
was an update, an "s" a select.
| Delay: 0ms | Delay: 200ms |
| US1SID | US1MID | US2MID | US1SID | US1MID | US2MID |
| 5.443 u | 5.946 u | 5.315 u | 11.500 u | 10.860 u | 11.424 s |
| 5.581 u | 5.954 s | 5.466 u | 11.649 s | 10.995 u | 11.496 s |
| 5.863 s | 6.291 u | 5.658 u | 12.551 s | 11.020 u | 12.221 s |
| 6.192 u | 6.513 u | 5.685 u | 12.893 s | 11.370 s | 12.599 u |
| 6.560 u | 6.521 u | 5.736 u | 13.526 u | 11.387 u | 12.803 u |
| 6.562 u | 6.555 u | 5.743 u | 13.997 s | 11.497 u | 12.920 u |
| 6.872 u | 6.575 u | 5.869 u | 14.662 u | 12.825 u | 13.625 u |
| 6.887 u | 7.908 u | 5.996 u | 19.953 u | 12.860 u | 13.828 s |
| 6.937 u | 8.100 u | 6.330 u | 20.623 u | 14.015 u | 16.292 u |
| 8.665 u | 8.298 u | 6.893 u | 27.102 u | 22.042 s | 17.131 u |
Conclusions:
Poor I/O performance can indeed slow MySQL to a crawl. It's not clear why
or when exactly, but it does happen.
The slowing down applies to both selects and updates, with updates suffering
more.
For some reason, even selects on a table that wasn't involved in any changes,
and which had recently been populated, were also slowed down, as is clear
from US2MID above.
As for the test cases proposed by mentatkgs, it seems that updating different
rows instead of the same ones does help a little, but doesn't solve the
problem.
I guess I will either adapt my software to tolerate such delays or try to move
to another provider. Renting a dedicated server is too expensive for this
project.
Thank you all for the comments.
As you are hosting your VPS in the cloud, you might be running into issues that are completely out of your control.
VPSs are subject to the whims of the host servers that run them. For example, CPU cycle priority at the Rackspace Cloud is weighted based on the size of the VPS. The bigger your VPS, the better likelihood that your app will perform smoothly. If there is a bigger VPS on the host you're using, it's possible that there is weighted bursting to blame. It's pretty hard to say.
Have you tried running this locally on your own machine? If it runs perfectly on your own system, and you need guaranteed performance, then your best bet will be to move to a dedicated server.
You have a VPS-related IO problem. It's not MySQL's fault.
Are you by any chance using Elastic Block Store with Amazon, or possibly RDS? Both of those use remote storage, and an IP protocol layer to talk to the storage; they can have nasty lag at times.
Question 1) Yes.
To check it write 2 apps:
Test case 1: will do this every minuto for a few hours
UPDATE `projects` SET `ring` = 5 WHERE `id` = 1
UPDATE `projects` SET `ring` = 6 WHERE `id` = 1
Test case 2: will do this every minute for a few hours
UPDATE `projects` SET `ring` = 7 WHERE `id` = 1
UPDATE `projects` SET `ring` = 8 WHERE `id` = 2
Test case 1 should have a delay, while test case 2 should not.
Question 2) Use a noSQL database.