Mysql Replica server's CPU utilization became high in EC2

Mysql Replica server's CPU utilization became high in EC2 - mysql

I have an account in amazone. my primary MySql is a large instance. and i have other 2 small instances as replication servers. The hierarchy is Primary -> slave 1-> slave 2 . The problem is that some times the slave 1 and slave 2 showing high CPU utilization. We couldn't find out the exact reason. Slave 1 is act as a slave of primary and at the same time it act as Master of Slave 2. We searched a lot but we are still stucked as blind.
Thanks in advance for all helps.

As a general rule, when using MySQL native, asynchronous replication -- which is what RDS uses -- the replica servers need be as large as the master or sometimes larger. The replicas receive "replication events" from the master, which may contain the actual queries that modified data -- insert, update, delete but not select -- this is "statement based replication" -- or may receive binary images of the rows added removed or changes on the master -- which is "row based replication." By default, the master server decides, on a query-by-query basis, which format to use to send the replication events ("mixed").
In all cases SELECT statements are not sent to the replicas (except of course for INSERT ... SELECT), but they do need sufficient capacity to handled both the incoming changes from the master, as well as the SELECT queries that are run directly against the replica by your application.
In RDS for MySQL 5.6 and above,you can set the binlog_format to ROW to force the master to always use row-based replication. This might improve your performance, and it might not -- it depends on the workload. You cannot force RDS to use only "statement" mode, nor should you want to.
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.MySQL.html
In general, however, any time the replica is a machine with fewer resources than the master, the chance of replication lags increases. The lag of the replicas can be monitored in Cloudwatch.

Related

In MySQL master-slave configuration, ignoring tables with a large amount of data

In MySQL master-slave synchronization, we configured the ignore table because of the performance problem of the slave. However, since the configuration is configured on the slave, when the host writes data in large quantities, will the data be synchronized to the slave and then filtered by the slave? In fact, if you ignore the table configuration, will the performance improvement be minimal? Or does it mean that when this data is written to the host, the ignore table information of the slave is read? Data transmission will not occur, which can greatly reduce the load of the slave? After all, I configure the ignore table to avoid the performance problems of the slave. If a large amount of data is written, the slave can't carry it.

Create a new database; I'll call it NoRepl
Put binlog-ignore-db=NoRepl on the Primary.
Move that big table into the database NoRepl.
That way, the filtering will occur much sooner -- thereby not clogging up replication either on the Replica or in the Primary's binlog.
If you are using Primary-Primary or you have another Replica that does want to see this table, then my suggestion will not work.

Why would long running select queries cause a replication lag in MySQL slave database?

we have a MySQL with a replica (5.7 with row based replication).
Now, the master performs at peak about 3000 inserts per second, and the replica seems to read that just fine. However, sometimes we execute long-time select queries (that ran from 10 to 20 seconds). And during those queries the replication lag becomes very huge.
What I do not understand is how the usual mysql threads that execute selects (without locking any tables) can cause the replication thread to slow down (i.e. it performs about 2.5K inserts instead of 3K like master)? What would I need to tune exactly?
Now I checked the slave status and it's not about the IO thread - this one manages to read events from the master just fine. It's SQL slave thread, that somehow does not manage to catch up. The isolation level is Read Committed, so the select queries potentially could lock some records and make the slave thread wait. But I'm not sure about that.
UPDATED. I have checked again - it turns out that even a single heavy query (that scans the entire table for example) on the slave produces the lag. It seems like slave sql thread is blocked, but I do not understand why?
UPDATED 2. I finally found the solution. First I increased number of slave_parallel_workers to 4 and set slave_parallel_type to LOGICAL_CLOCK. However, and this is important, that gave me no improvement at all, since the transactions were dependent. But, after I increased on master binlog_group_commit_sync_delay to 10000 (that is, 10 milliseconds), the lag disappeared.

There Might be many reasons why replication lag in mysql slave database.
But as you mentioned
It's SQL slave thread, that somehow does not manage to catch up.
Assuming that IO works fine, Percona says (emphasis mine):
[...] when the slave SQL_THREAD is the source of replication delays it is probably because of queries coming from the replication stream are taking too long to execute on the slave. This is sometimes because of different hardware between master/slave, different schema indexes, workload. Moreover, the slave OLTP workload sometimes causes replication delays because of locking. For instance, if a long-running read against a MyISAM table blocks the SQL thread, or any transaction against an InnoDB table creates an IX lock and blocks DDL in the SQL thread. Also, take into account that slave is single threaded prior to MySQL 5.6, which would be another reason for delays on the slave SQL_THREAD.

Replication issue

We have a one master and two VIP slave database servers. We changed data type of column from VARCHAR(255) to TEXT on master.
The application is currently configured to use master only for writing operations and configured slaves for reading operation.
After changing the data type on master server using ALTER TABLE command the slave server becomes unresponsive.
We are using Mariadb 10.0
[PROCESSES INFORMATION]
Id User Host Db Command Time(sec) State Info
-----------------------------------------------------------------------
203739 repl slave1 Binlog Dump 75,143,121 Master has sent all binlog to slave; waiting for binlog to be updated
203740 repl slave2 Binlog Dump 75,143,121 Master has sent all binlog to slave; waiting for binlog to be updated
The slave instance becomes very slow due to slow queries.
number of sessions: 1590
thread_pool_max_thread=500
Current value =648
After performing ALTER TABLE on Master server, it was replicating to slave server and in the same time number of sessions were get increased rapidly on slave server.
I think slaves becomes unresponsive because of slow queries.
But I don't know why this queries became so slow and slaves got unresponsive.
The DBA's saying that after executing ANALYZE TABLE command, the issue has been solved.
But I don't understand why this happened because ANALYZE TABLE only update the statistic information.
It would be helpful if anyone comment on this why it happened?
How to avoid such issues in future.

There is one minor case where TEXT is slower than VARCHAR. When a SELECT needs to build a temporary table (often for sorting due to GROUP BY or ORDER BY), it first tries to build a MEMORY table. But, TEXT and BLOB prevent it from using such, so it uses MyISAM instead. This is slower (but gets the job done).
I say this is a "minor case" because users rarely identify it with phrases like "very slow" and "becomes unresponsive". I would guess that a SELECT might run twice as slow.
Also, the ANALYZE TABLE discussion does not hold water. Again it may be coincidence, not causation.
So, the change to TEXT may be a 'red herring'. Instead, let's discover what is being slow by using the slowlog. See this for what I like to work from.

MySQL replication Master - Master ensure identical status

I have a Master to Master replication set up between two MySQL 5.0
My requirement is that before starting my application I have to ensure that the database are identical I would like to confirm that the "seconds_behind_master" of the "show slave status" command being at 0 seconds is enough to consider the 2 databases being synchronized ?
http://dev.mysql.com/doc/refman/5.0/en/show-slave-status.html
Thank you

No, that is not sufficient to guarantee that the databases are identical, just that the slave has run all statements from the master's binary log. You could update or delete large chunks of data on the slave database and still have seconds behind master = 0, but the slave would certainly not be identical to the master.
You should use a tool like Percona's pt-table-checksum if you really want to verify that the databases are identical.

It's not really a guarantee, as master-to-master replication does not guarantee atomicity across servers.
Also, a single read of 0 could be meaningless, but a consistent read of 0 over time would be a good indication that the servers are synced, if you have no errors reported.
So, I say, read it every second for 10 seconds, and if they're all 0, with no errors, you're probably good enough, short of comparing checksums, as Ike suggested.

mysql slave database problem

Currently we have 3 slave databases,
but almost always there is one of them extremly slow than others(can be an hour after master database)
Has any one met similar problem?What may be the cause?

I'd guess some other process is running on the same host as the slow replica, and it's hogging resources.
Try running "top" (or use Nagios or Cactus or something) to monitor system performance on the three replica hosts and see if there are any trends you can observe. CPU utilization pegged by another process besides mysqld, or I/O constantly saturated, that sort of thing.
update: Read the following two articles by MySQL performance expert Peter Zaitsev:
Managing Slave Lag with MySQL Replication
Fighting MySQL Replication Lag
The author points out that replication is single-threaded, and the replica executes queries sequentially, instead of in parallel as they were executed on the master. So if you have a few replicated queries that are very long-running, they can "hold up the queue."
He suggests the remedy is to simplify long-running SQL queries so they run more quickly. For example:
If you have an UPDATE that affects millions of rows, break it up into multiple UPDATEs that act on a subset of the rows.
If you have complex SELECT statements incorporated into your UPDATE or INSERT queries, separate the SELECT into its own statement, generate a set of literal values in application code, and then run your UPDATE or INSERT on these. Of course the SELECT won't be replicated, the replica will only see the UPDATE/INSERT with literal values.
If you have a long-running batch job running, it could be blocking other updates from executing on the replica. You can put some sleeps into the batch job, or even write the batch job to check the replication lag at intervals and sleep if needed.

Are all the slave servers located in the same location? In my case, one of the slave server was located in another location and it was a network issue.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008