abnormal bandwidth between ec2 instance for mysql replication

abnormal bandwidth between ec2 instance for mysql replication - mysql

I am using MySQL replication between 2 AWS Instance: m4.xlarge.
Version 5.6.35
Sometimes I have trouble with replication lag time is high, increased too much up to seconds_behind_master xx,000 seconds. I found that io_thread on slave can not catch up bin_logs size on master db.
At this time, the bandwidth & byte transmit between master & slave is very low (count in bytes)
But when I changed the instance type of Slave from m4.xlarge to t2.xlarge and backward. The bandwidth between master slave increased immediately ( up to 400kb/s). Keep master db, no touch or change on it. ANd soon the replication lag disappear. This is weird.
(iftop to check bandwidth)
Could you please advise what is wrong with this? what happend when we change the instance type? and How can we detect the root cause?
Thanks so much.

I have checked & monitored that the problem come from our Slave DB running out of Burst Balance. (https://aws.amazon.com/blogs/aws/new-burst-balance-metric-for-ec2s-general-purpose-ssd-gp2-volumes/).
As our slave DB has small storage just 20GB so it just has default 100 IOPS.
IOPS = Volume size (in GB) * 3
Minimum for gp2 is 100 IOPS
Maximum for gp2 is 3000 IOPS
I increased the storage to 50GB to get 150 IOPS and now io_thread running better.
Stop/start instance also reset Burst Balance to 100% but it is a quick tip only. This metric should be checked beside bandwidth incase high latency between EC2 instances.

Seems like there are some one ran into the same problem [1] [2]
Did you try to turn off sync_binlog ? ( sync_binlog=0 )

Related

Weird spikes in MySQL query times

I'm running a NodeJS with MySQL (InnoDB) for a game server (player info, savedata, stuff). Server is HTTP(S) based so nothing realtime.
I'm having these weird spikes as you can see from the graphs below (first graph is requests/sec and last graph is queries/sec)
On the response time graph you can see max response times with purple and avg response times with blue. Even with those 10-20k peaks avg stays at 50-100ms as do 95% of the requests.
I've been digging around and found that the slow queries are nothing special. Usually update query with savedata (blob of ~2kb) or player profile update which modifies like username or so. No joins or anything like that. We're talking about tables with less than 100k rows.
Server is running in Azure on Ubuntu 14.04 with MySQL 5.7 using 4 cores and 7GB of RAM.
MySQL settings:
innodb_buffer_pool_size=4G
innodb_log_file_size=1G
innodb_buffer_pool_instances=4
innodb_log_buffer_size=4M
query_cache_type=0
tmp_table_size=64M
max_heap_table_size=64M
sort_buffer_size=32M
wait_timeout=300
interactive_timeout=300
innodb_file_per_table=ON
Edit: It turned out that the problem was never MySQL performance but Node.js performance before the SQL queries. More info here: Node.js multer and body-parser sometimes extremely slow

check your swappiness (suppose to be 0 mysql machines maximizing ram usage):
> sysctl -A|grep swap
vm.swappiness = 0
with only 7G of RAM and 4G of just buffer pool, your machine will swap if swappiness is not zero.
could you post your swap graph and used memory. 4G buffer is "over the edge" for 7G ram. For 8G ram, I would give 3G as you have +1G on everything else mysql wise + 2G on OS.
Also you have 1G for transaction log file and I assume you have two log files. Do you have so many writes to have such large files? You can use this guide: https://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/

AWS RDS MySQL Read Replica Lag Issues

I run a service that needs to be able to support about 4000+ IOPS and keep replica lag <=1 second to function properly.
I am using AWS RDS MySQL instances and have 2 read replica's. My service was experiencing giant replica lag spikes on the read replica's so I was in contact with AWS support for a week trying to understand why I was experiencing the lag--I had 6000 IOPS provisioned and my instances were very powerful. They gave me all kinds of reasons.
After changing instance types, upgrading to MySQL 5.6 from 5.5 to take advantage of multi-threading, and them replacing underlying hardware I was still seeing significant replica lag randomly.
Eventually I decided to start tinkering with the parameter groups changing my configs for just the read replica's on anything I could find that was involved in the replication process and am now finally experiencing <= 1 second of replica lag.
Here are the settings I changed and their values that appear to be successful (I copied the default mysql 5.6 param group and changed these values applying the updated paramater group to just the read replicas):
innodb_flush_log_at_trx_commit=0
sync_binlog=0
sync_master_info=0
sync_relay_log=0
sync_relay_log_info=0
Please read about each of these to understand the impact of the modifications: http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html
Other things to make sure you take care of:
Convert any MyISAM tables to InnoDB
Upgrade from MySQL < 5.6 to MySQL >= 5.6
Ensure that your provisioned IOPS are > the combined read/write IOPS you require
Ensure that your read replica instances are >= master instance
If anyone else has any additional parameters that could be modified on the read replica's or master DB to get the best replication performance I'd love to hear more.
UPDATE 7-8-2014
To take advantage of Mysql 5.6 multi-thread replication I've set:
slave_parallel_workers=5 (Set it to the number of read replica DBs you have running)
I found this in this here:
https://blogs.oracle.com/MySQL/entry/benchmarking_mysql_replication_with_multi

Mysql replication executes all the transactions on a single database in order , and master - can execute those transactions in parallel.
You probably have most of the updates executed on a single DA, and that is what not allowing you to get advantage of multithreaded replication.
Check the iostat on your replica server. Most of the time those problem occurs because of high IO on the machine.
In order to decrease the IO on a machine - there are several additional changes that you can do:
Increase innodb_buffer_pool_size - this is the first thing you should change from default. If this instance runs only mysql - you can allocate about 80% of your available the memory here.
Verify also the following parameters:
log_slave_updates = false
binlog_format = STATEMENT
(if you have MIXED or ROW binlog_format configured - verify that you understand what does that means from here http://dev.mysql.com/doc/refman/5.6/en/binary-log-setting.html
If you have a lot of data that is being changed for several times - increasing
innodb_max_dirty_pages_pct to 90 or 95% can be worth checking.

MySQL my.cnf -- open-files-limit causing CPU Overload

I got this server Intel Xeon Quadcore E3-1230v2 with 8GBs of DDR3 RAM Round the clock I see that this server is running out of CPU. It looks badly overloaded. After observing "Daily Process Log" I realized that below process is eating 25% of the CPU resources & there were three such processes (technically errors). Below is the process (error):
/usr/sbin/mysqld --basedir/ --datadir/var/lib/mysql --usermysql --log-error/var/lib/mysql/server.yacart.com.err --open-files-limit16384 --pid-file/var/lib/mysql/server.yacart.com.pid
As visible in the above error, It appears something is wrong with open-files-limit16384, I tried increasing open-files-limit in my.cnf to 16384 but in vain. Below is how my my.cnf now looks like:
[mysqld]
innodb_file_per_table=1
local-infile=0
open_files_limit=9978
Can anyone advise me a good configuration for my my.cnf ? Which would help me get rid of CPU overload?
There is a GoogleBot like robot script I am running in slave servers to mine data from internet. Its crawling the entire internet. When I shutdown this script, everything gets in order. I wonder if there is a fix I could apply to this script?
This robot program has got about 40 databases, each with a size of 50 - 800 MBs, total DB size of about 14 GBs so far & I expect this to shoot upto 500 GBs in future. At one point (whole day long) only ONE DB is used. Next day, I use next DB & so on. I was thinking of increasing RAM once the biggest DB reaches 2 GBs. Currently RAM does not seem to be an issue at all.
Thanks in advance for any help you guys can offer.
regards,
Sam

If you have WHM, look for this under Server Configuration >> Tweak Settings >> SQL
** Let cPanel determine the best value for your MySQL open_files_limit configuration ? [?]
cPanel will adjust the open_files_limit value during each MySQL restart depending on your total number of tables.

replication slave has less space

I need to setup the replication
so for that i took backup from the master and imported into slave and now slave has 20 GB free space,
at the time while i am restoring backup master got 5+ GB of data
after that i enabled replication
Now the problem is while data from the relay log is written to slave many new relay logs are generated and i am left with no space on slave....
The relay-log-purge is enabled but the writing process from relay log to database is very slow...

You may have performance benefits by switching binlog format to 'ROW'. In mixed format mysql is writing binlog statement based (the slave has to reparse and plan the strategy again) and only switches to row format in certain cases (http://dev.mysql.com/doc/refman/5.1/en/binary-log-mixed.html).
It can be done on the fly:
SET GLOBAL binlog_format = 'ROW';
With ROW based replication you can offload the slaves which makes them easier to keep up with the master. The only downside of it is you may have larger log files however from this "got 5+ GB of data" I assume you mainly do inserts so the there will be no difference.
Best is to try it and see how it behaves.
Other option until the replication catches up to change the innodb_flush_log_at_trx_commit to 2 on the slave. Only do this temporary.
SET GLOBAL innodb_flush_log_at_trx_commit = 2;

just turn off the slave io thread when you get less disk space alert (90% reached) as
mysql> stop slave io_thread;
and start it again after some time as
mysql> start slave io_thread;
on production this is the only best possible option......

Try adding this to your config
relay-log-space-limit=5G

MySql Replication - slave lagging behind master

I have a master/slave replication on my MySql DB.
my slave DB was down for a few hours and is back up again (master was up all the time), when issuing show slave status I can see that the slave is X seconds behind the master.
the problem is that the slave dont seem to catch up with the master, the X seconds behind master dont seem to drop...
any ideas on how I can help the slave catch up?

Here is an idea
In order for you to know that MySQL is fully processing the SQL from the relay logs. Try the following:
STOP SLAVE IO_THREAD;
This will stop replication from downloading new entries from the master into its relay logs.
The other thread, known as the SQL thread, will continue processing the SQL statements it downloaded from the master.
When you run SHOW SLAVE STATUS\G, keep your eye on Exec_Master_Log_Pos. Run SHOW SLAVE STATUS\G again. If Exec_Master_Log_Pos does not move after a minute, you can go ahead run START SLAVE IO_THREAD;. This may reduce the number of Seconds_Behind_Master.
Other than that, there is really nothing you can do except to:
Trust Replication
Monitor Seconds_Behind_Master
Monitor Exec_Master_Log_Pos
Run SHOW PROCESSLIST;, take note of the SQL thread to see if it is processing long running queries.
BTW Keep in mind that when you run SHOW PROCESSLIST; with replication running, there should be two DB Connections whose user name is system user. One of those DB Connections will have the current SQL statement being processed by replication. As long as a different SQL statement is visible each time you run SHOW PROCESSLIST;, you can trust mysql is still replicating properly.

What binary log format are you using ? Are you using ROW or STATEMENT ?
SHOW GLOBAL VARIABLES LIKE 'binlog_format';
If you are using ROW as a binlog format make sure that all your tables has Primary or Unique Key:
SELECT t.table_schema,t.table_name,engine
FROM information_schema.tables t
INNER JOIN information_schema .columns c
on t.table_schema=c.table_schema
and t.table_name=c.table_name
and t.table_schema not in ('performance_schema','information_schema','mysql')
GROUP BY t.table_schema,t.table_name
HAVING sum(if(column_key in ('PRI','UNI'), 1,0)) =0;
If you execute e.g. one delete statement on the master to delete 1 million records on a table without a PK or unique key then only one full table scan will take place on the master's side, which is not the case on the slave.
When ROW binlog_format is being used, MySQL writes the rows changes to the binary logs (not as a statement like STATEMENT binlog_format) and that change will be applied on the slave's side row by row, which means a 1 million full table scan will take place on the slave's to reflect only one delete statement on the master and that is causing slave lagging problem.

"seconds behind" isn't a very good tool to find out how much behind the master you really is. What it says is "the query I just executed was executed X seconds ago on the master". That doesn't mean that you will catch up and be right behind the master the next second.
If your slave is normally not lagging behind and the work load on the master is roughly constant you will catch up, but it might take some time, it might even take "forever" if the slave is normally just barely keeping up with the master. Slaves operate on one single thread so it is by design much slower than the master, also if there are some queries that take a while on the master they will block replication while running on the slave.

Just check if you have same time and timezones on both the servers, i.e., Master as well as Slave.

If you are using INNODB tables, check that you have innodb_flush_log_at_trx_commit to a value different that 0 at SLAVE.
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

We had exactly the same issue after setting up our slave from a recent backup.
We had changed the configuration of our slave to be more crash-safe:
sync_binlog = 1
sync_master_info = 1
relay_log_info_repository = TABLE
relay_log_recovery = 1
I think that especially the sync_binlog = 1 causes the problem, as the specs of this slave is not so fast as in the master. This config option forces the slave to store every transaction in the binary lo before they are executed (instead of the default every 10k transactions).
After disabling these config options again to their default values I see that the slave is catching up again.

Just to add the findings in my similar case.
There were few bulk temporary table insert/update/delete were happening in master which occupied most of the space from relay log in slave. And in Mysql 5.5, since being single threaded, CPU was always in 100% and took lot of time to process these records.
All I did was to add these line in mysql cnf file
replicate-ignore-table=<dbname>.<temptablename1>
replicate-ignore-table=<dbname>.<temptablename2>
and everything became smooth again.
Inorder to figure out which tables are taking more space in relay log, try the following command and then open in a text editor. You may get some hints
cd /var/lib/mysql
mysqlbinlog relay-bin.000010 > /root/RelayQueries.txt
less /root/RelayQueries.txt

If u have multiple schema's consider using multi threaded slave replication.This is relatively new feature.
This can be done dynamically without stopping server.Just stop the slave sql thread.
STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_threads = 4;
START SLAVE SQL_THREAD;

I have an issue similar to this. and both of my MySQL server hosted on AWS EC2 (master and replication). by increasing EBS disk size (which automatically increased IOPS) for MySQL slave server, its turned out the solution for me. R/W Throughput and bandwidth is increased R/W latency were decreased.
now my MySQL database replication is catching up to the master. and Seconds_Behind_Master was decreased (it was got increased from day to day).
so if you have MySQL hosted on EC2. I suggest you tried to increase EBS disk size or its IOPS on the slave.

I know it's been a while since OP asked but it would have helped me to read the following answer.
In /etc/mysql/mysql.cnf :
[mysql]
disable_log_bin
innodb_flush_log_at_trx_commit=2
innodb_doublewrite = 0
sync_binlog=0
disable_log_bin REALLY carried the trick for me.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008