MariaDB Galera cluster servers running at 100% CPU and load rising - mysql

I have a Drupal application which has been running on a single MySQL database server for 12 months, and has been performing relatively well (apart from peak load events). We needed to be able to support much higher spikes than the current DB server allowed, and at 32GB there was not much gain to be had from simply vertically scaling the single DB server.
We decided to set up a new MariaDB Galera cluster with 2x 32GB instances. We matched the configuration as far as possible with the soon-to-be-obselete DB server.
After migrating to the new database servers, we noticed that the CPU usage on those instances was constantly at 100%, and load was steadily increasing. Over the course of 1 hour, load average went from 0.1 to 150.
Initially we thought it might have something to do with the synchronisation between servers, but even with 1 server turned off and no sync occurring the it was still maxing out CPU as long as the web application was making requests to it.
After a lot of experimentation I found that reducing a few of the configuration options had a profound effect on the CPU usage and load. After making the below changes, the load average has stabilised between 4 and 6 on both instances.
The questions
What are some possible reasons for such a dramatic difference in CPU usage between the old and new servers, despite essentially migrating the configuration from the old server?
Load is currently hovering between 4 and 6 (and this is a low traffic period for our website). What should I be looking at to try and reduce this value, and ensure that when the site gets hit with some real traffic it wont fall over?
Config changes
innodb_buffer_pool_instances
Original value: 500 (there are 498 tables total in all databases)
New value: 92
table_cache
Original value: 8
New value: 4
max_connections
Original value: 1000
New value: 400
Current configuration
Here is the full configuration file from one of the servers /etc/mysql/my.cnf
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=1
bind-address=0.0.0.0
max_connections = 400
wait_timeout = 600
key_buffer_size = 16M
max_allowed_packet = 16777216
max_heap_table_size = 512M
table_cache = 92
thread_stack = 196608
thread_cache_size = 8
myisam-recover = BACKUP
query_cache_limit = 1048576
query_cache_size = 128M
expire_logs_days = 10
general_log = 0
max_binlog_size = 10485760
server-id = 0
innodb_file_per_table
innodb_buffer_pool_size = 25G
innodb_buffer_pool_instances = 4
innodb_log_buffer_size = 8388608
innodb_additional_mem_pool_size = 8388608
innodb_thread_concurrency = 16
net_buffer_length = 16384
sort_buffer_size = 2097152
myisam_sort_buffer_size = 8388608
read_buffer_size = 131072
join_buffer_size = 131072
read_rnd_buffer_size = 262144
tmp_table_size = 512M
long_query_time = 1
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
# Galera Provider Configuration
wsrep_provider=/usr/lib/galera/libgalera_smm.so
#wsrep_provider_options="gcache.size=32G"
# Galera Cluster Configuration
wsrep_cluster_name="xxx"
wsrep_cluster_address="gcomm://xxx.xxx.xxx.107,xxx.xxx.xxx.108"
# Galera Synchronization Congifuration
wsrep_sst_method=rsync
#wsrep_sst_auth=user:pass
# Galera Node Configuration
wsrep_node_address="xxx.xxx.xxx.107"
wsrep_node_name="xxx01"
[mysqldump]
quick
quote-names
max_allowed_packet = 16777216
[isamchk]
key_buffer_size = 16777216

We ended up getting a Percona consultant to assist with this problem. The main issue they identified was a large number of EXPLAIN queries were being executed. Turns out this was some debugging code that was left enabled (devel.module query logging for drupal devs). Disabling this saw CPU usage fall off a cliff.
There were a number of additional fixes which they recommended we implement.
Add a third node to the cluster to act as an observer and maintain the integrity of the cluster.
Add primary keys to tables that do not have one.
Change MyISAM tables to InnoDB.
Change wsrep_sst_method from rsync to xtrabackup-v2.
Set innodb_log_file_size to 512M.
Set innodb_flush_log_at_trx_commit to 2 as the cluster maintains the integrity of the data.
I hope this information helps anyone who runs into similar issues.

innodb_buffer_pool_instances should not be a function of the number of tables. The manual advocates that each instance be no smaller than 1GB. So, I suggest that even 92 is much too high. But my.cnf says only innodb_buffer_pool_instances = 4??
table_cache = 92
Maybe your comments are messed up? 500 would be more reasonable for table_open_cache. (table_cache is the old name.)
This may be the problem:
query_cache_size = 128M
Whenever a write occurs, all entries in the QC for the table(s) involved are purged from the QC. Recommend no more than 50M. Or, better yet, turn the QC off completely.
You have the slowlog turned on. What does pt-query-digest say are the top couple of queries? (This may be your best way to get a handle on the problem.)

Related

Database performance drop, after upgrade to MySQL 8.0.20

After upgraded the MySQL from version 5.7 to 8.0, I found out that the database performance is significant drop.
Before upgrade the MySQL the CPU usage is stable around 30%+-, but after upgraded the CPU usage is become unstable and frequently having large spike.
And recently I test out something very interesting, I'm keep run a same query for a few time, and found out that the duration taken becomes longer and longer. as per picture shown below.
I had read a lot of article and stack overflow post, but none of the solution is really get help.
So hope that someone can share some idea or experience on tuning the MySQL8.0 with me.
Will very appreciate it.
Please let me know if needed any info for further investigate.
Config my.ini:-
key_buffer_size = 2G
max_allowed_packet = 1M
;Added to reduce memory used (minimum is 400)
table_definition_cache = 600
sort_buffer_size = 4M
net_buffer_length = 8K
read_buffer_size = 2M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 2G
;Path to mysql install directory
basedir="c:/wamp64/bin/mysql/mysql8.0.20"
log-error="c:/wamp64/logs/mysql.log"
;Verbosity Value 1 Errors only, 2 Errors and warnings , 3 Errors, warnings, and notes
log_error_verbosity=2
;Path to data directory
datadir="c:/wamp64/bin/mysql/mysql8.0.20/data"
;slow_query_log = ON
;slow_query_log_file = "c:/wamp64/logs/slow_query.log"
;Path to the language
;See Documentation:
; http://dev.mysql.com/doc/refman/5.7/en/error-message-language.html
lc-messages-dir="c:/wamp64/bin/mysql/mysql8.0.20/share"
lc-messages=en_US
; The default storage engine that will be used when create new tables
default-storage-engine=InnoDB
; New for MySQL 5.6 default_tmp_storage_engine if skip-innodb enable
; default_tmp_storage_engine=MYISAM
;To avoid warning messages
secure_file_priv="c:/wamp64/tmp"
skip-ssl
explicit_defaults_for_timestamp=true
; Set the SQL mode to strict
sql-mode=""
;sql-mode="STRICT_ALL_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE,NO_ZERO_IN_DATE,NO_AUTO_CREATE_USER"
;skip-networking
; Disable Federated by default
skip-federated
; Replication Master Server (default)
; binary logging is required for replication
;log-bin=mysql-bin
; binary logging format - mixed recommended
;binlog_format=mixed
; required unique id between 1 and 2^32 - 1
; defaults to 1 if master-host is not set
; but will not function as a master if omitted
server-id = 1
; Replication Slave (comment out master section to use this)
; New for MySQL 5.6 if no slave
skip-slave-start
; The InnoDB tablespace encryption feature relies on the keyring_file
; plugin for encryption key management, and the keyring_file plugin
; must be loaded prior to storage engine initialization to facilitate
; InnoDB recovery for encrypted tables. If you do not want to load the
; keyring_file plugin at server startup, specify an empty string.
early-plugin-load=""
;innodb_data_home_dir = C:/mysql/data/
innodb_data_file_path = ibdata1:12M:autoextend
;innodb_log_group_home_dir = C:/mysql/data/
;innodb_log_arch_dir = C:/mysql/data/
; You can set .._buffer_pool_size up to 50 - 80 %
; of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 4G
; Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 16M
innodb_log_buffer_size = 8M
innodb_thread_concurrency = 64
innodb_flush_log_at_trx_commit = 2
log_bin_trust_function_creators = 1;
innodb_lock_wait_timeout = 120
innodb_flush_method=normal
innodb_use_native_aio = true
innodb_flush_neighbors = 2
innodb_autoinc_lock_mode = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
; Remove the next comment character if you are not familiar with SQL
;safe-updates
[isamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer_size = 2M
write_buffer_size = 2M
[myisamchk]
key_buffer_size = 256M ;20M hys
sort_buffer_size_size = 20M
read_buffer_size = 2M
write_buffer_size = 2M
[mysqlhotcopy]
interactive-timeout
[mysqld]
port = 3306
skip-log-bin
default_authentication_plugin= mysql_native_password
max_connections = 400
max_connect_errors = 100000
innodb_read_io_threads = 32
innodb_write_io_threads = 8
innodb_thread_concurrency = 64
Hardware:-
Ram: 16GB
CPU: 4 Cores 3.0 Ghz
SHOW GLOBAL STATUS:
https://pastebin.com/FVZrgnTw
SHOW ENGINE INNODB STATUS:
https://pastebin.com/Rewp84Gi
SHOW GLOBAL VARIABLES:
https://pastebin.com/3v6cM6KZ
Rate Per Second = RPS
Suggestions to consider for your my.ini [mysqld] section
It is unusual to have more than 1 [mysqld] section in the my.ini configuration
the section you have near the end of you my.ini could be moved to be just before
[mysqldump] to avoid confusion.
innodb_lru_scan_depth=100 # from 1024 to conserve 90% of CPU cycles used for function
key_buffer_size=16M # from 1G to conserve RAM - you are not using MyISAM data tables
read_rnd_buffer_size=64K # from 2M to reduce handler_read_rnd_next RPS of 1,872,921
innodb_io_capacity=900 # from 200 to more of your rotating drive IOPS capacity
You should find query completion time and CPU busy reduced with these changes.
select_scan averages 41 RPS and is caused by indexes not being available, causing delays.
For additional suggestions, view profile, Network profile for contact info, FAQ, additional tips and free downloadable Utility Scripts to assist with performance tuning.
I have found out the root cause, and post it in https://dba.stackexchange.com/questions/271785/query-performance-become-slower-after-upgrade-to-mysql-8-0-20 .
Thanks a lot for all the reply and suggestion. Appreciate it.
[Update: solved the problem at our site]
Actually I currently have had a very similar (maybe the same?) issue.
We have
Windows Server 2016, 4 CPUs, 32 GB RAM
MySQL 8 Community Edition
Java / Apache Tomcat based application on top
For 2 weeks we experienced severe application problems, with mysqld process taking 100% CPU as soon as application interaction happens -- rendering the server completely unresponsive.
The last change to the setup before this degradation was updating MySQL from 8.0.18 to 8.0.20 due to security fixes.
Query monitoring shows many occurrences of the same (simple) query
SELECT COUNT(1) FROM xxxxx;
which take 5-10 seconds (although the table only has about 3 rows, so it should rather take 5 milliseconds!).
One hypothesis was this MySQL issue: https://bugs.mysql.com/bug.php?id=99593
However the recommended workaround did not help me.
Solution for us:
Apparently there was an additional bug in MySQL Community Edition, introduced in 8.0.19 or 8.0.20.
After downgrading MySQL to 8.0.18 everything worked fine again!
Additional note:
Downgrading is not supported by MySQL!
Actually in order to provide a downgraded DB on the same machine, I...
did a backup of the application schema (with mysqldump command)
did a manual installation of MySQL 8.0.18 binaries (no installer)
created an additional MySQL instance (different data directory, different port)
imported the backup into the new instance (with mysql command)
created roles and permissions exactly like "before"
switched application config to new MySQL port

maximizing memory usage for an MySQL analysis server

For an social sciences research project I'm using MySQL (5.5) on a dedicated Linux server with 8 GB of memory. The data consists of some 30 million records, resulting in a MyISAM source table of about 4GB (with MyISAM since the data is stable and transaction are not useful). My question is this: how can I prevent memory from being an unnecessary bottleneck?
At the current settings only about 20% of memory is ever used, but the right balance of my.ini settings is difficult to find, since many variables are interdependent. How can I allow MySQL from using as much memory as possible (reserving enough to prevent Linux from swapping out).
current settings:
[mysqld]
max_connections = 3
performance_schema=on
default-storage-engine=MYISAM
local-infile=1
myisam_sort_buffer_size = 2048M
key_buffer_size = 2048M
tmp_table_size = 2048M
max_heap_table_size = 2048M
sort_buffer_size = 128M
read_buffer_size = 256M
read_rnd_buffer_size = 128M
join_buffer_size = 512M
thread_stack = 256KB
query_cache_size = 64M
query_cache_limit = 32M
table_open_cache= 256
table_definition_cache = 512
myisam_max_sort_file_size = 75G
Your best bet is to run MySQL Tuner on your database after it been running and in use for a while. No one on SO will be able to give you relevant information for optimizing your memory usage without knowing the shape of your database and usage patterns. MySQL Tuner automates this.

Improve MySQL performance with O_DIRECT

I want to increase the performance of MySQL. So I have done the configuration level changes to MySQL. I used innodb_flush_method = O_DIRECT, but insert rate is not increasing much. Normally, insertion rate is 650 inserts/sec. How do I know weather O_DIRECT is working properly.
I am using Ubuntu 14.04.1 server and MySQL v5.6. CPU Memory and Disk I/O rates are normal (I use RAID, 16 GB RAM, 8 CPU cores) I use WSO2 CEP for insertion. I have implement that part and measured using MySQL workbench. But I couldn't get much more performance though I increase the insertion rate through wSO2 CEP.
I have used following my.cnf.
my.cnf
[mysqld]
innodb_buffer_pool_size = 9G
query_cache_size = 128M
innodb_log_file_size = 1768M
innodb_flush_log_at_trx_commit = 0
innodb_io_capacity = 1000
innodb_flush_method = O_DIRECT
max_heap_table_size = 536870912
innodb_lock_wait_timeout = 1
max_connections = 400
sort_buffer_size = 128M
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
skip-host-cache
skip-name-resolve
event_scheduler=on
In this case if you are using Event tables, older CEP/siddhi version does not perform batch insertions.. That could be the cause for above.. In latest SNAPSHOT source (of Siddhi) we have fixed this.. And you can gain considerably good numbers in next release..

Website slow loading on VPS

I have VPS, CentOS 6 64bit with directadmin + custombuild.
Web server: Apache with reverse proxy by Nginx.
I am hosting a lot of websites on this VPS.
when there are a lot of hits on website "example.com", that website becoming extremely slow (can be 50 seconds and more), but the others work perfect and fast.
I checked CPU and Memory, but there are nothing weird.
I installed "mytop" for monitoring database, and it happened when there are 20+- running queries.
my.cnf content:
[mysqld]
max_allowed_packet=16M
innodb_buffer_pool_size = 5096M
innodb_buffer_pool_instances = 12
innodb_file_per_table = 1
innodb_log_file_size = 64M
innodb_log_files_in_group = 2
innodb_log_buffer_size = 10M
innodb_flush_log_at_trx_commit = 0
innodb_buffer_pool_load_at_startup = 1
innodb_log_buffer_size = 8
innodb_thread_concurrency = 12
innodb_flush_method = O_DIRECT
innodb_read_io_threads = 4
innodb_write_io_threads = 8
#max_connections = 800
#max_user_connections = 400
local-infile=0
max_connections=120 #
interactive_timeout=300
join_buffer_size=512K
key_buffer_size=64M
query_cache_limit=4G
tmp_table_size=1024M
max_heap_table_size=512M #
thread_cache_size=4
open_files_limit=50000
table_open_cache=3000
query_cache_type=1
query_cache_size=128M #
I dont know if its mysql problem or apache and nginx problem.
The information you give is pretty sparse. You should probably check the general cpu usage and see what applications use the cpu. However if the other applications on the server are still performing well then you probably have some slow database queries. 20+ sounds pretty much depending on what you call lots of hits. You should probably analyse which database queries are running and how long they take. Maybe you need to add indexes.
Another problem could be locks in the database.

Mysql on EC2 instance: slow results after the first query

I'm trying to configure mysql on an Amazon EC2 micro instance with 613MB memory. This instance will only be used for running the mysql, so i want to use as much memory as possible. We have another instance of the same DB running on a different host, so i can easily compare results.
Performing a mediocre query on the original db took less than 3 seconds. Before my changes on the EC2, it took 46 seconds, but now, after changing the settings, the same query only took 4 seconds. However, executing the same query once more, it seems to take forever.
This are the settings i use in my MySql my.cnf:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
innodb_buffer_pool_size = 128M
query_cache_size = 128M
query_cache_limit = 8M
key_buffer_size = 128M
table_open_cache = 2048
table_definition_cache = 2048
read_buffer_size = 64M
join_buffer_size = 64M
sort_buffer_size = 64M
myisam_sort_buffer_size = 64M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
I don't think the myisam param should be included tho, since it should all be using innodb, but i just gave it some extra memory just in case it is needed.
It would be logical if it was slower at first run, and after that go faster, but this just seems illogical.
Any idea's would be very welcome.
EC2 micro has a generally terrible performance. You could get Linode/Rackspace for the same price.
Amazon Micro instances "may opportunistically increase CPU capacity in short bursts when additional cycles are available."
This means you're allocated X units of computation but it can jump to X + N when other Micro instances are not using their allocation and your instance will automatically scavenge it.
This is likely why you are seeing fluctuations in performance.