what metric to use for failed slave AWS server - mysql

I have a master server running on a server in an independent data center, and a slave in AWS.
The replication failed with this error: "The incident LOST_EVENTS occured on the master. Message: error writing to the binary log".
Last time it went offline, it jumped from 4k bytes/second write throughput to 40k, and steadily increased to 252 k over a couple of weeks.
1) I'm wondering why write throughput would increase steadily after the failure?
2) I'm wondering what metric can be used within cloudwatch to create an SNS email to me when it does fail? Right now, I'm thinking the best thing to do is to run a simple bash script on the master, that compares Master_Log_File to Relay_Master_Log_File on 'show slave status;' and to forgo CloudWatch altogether.
edit update script:
Here's my script that I run every 10 minutes to check on the slave state (until I find an alernative metric in CloudWatch):
#!/bin/bash
a=$(mysql --host=*amazonaws.com --port=3306 -u whatever -ppass -N -B -e "show slave status;")
e=$(echo "$a" | awk -F\\t '{print $12}') #Slave_SQL_Running
d=$(echo "$a" | awk -F\\t '{print $26}') #Seconds_Behind_Master
if [ "$e" != 'Yes' ]; then
echo -e "slave mysql server down \n slave SQL running: $e \n seconds behind master: $d" | mail -s 'slave mysql server down' admin#email.com
fi

I didn't find a good metric from CloudWatch, so I made this script, which checks the slave status every 10 minutes through cron - it sends an email if it finds Slave_SQL_Running or Slave_IO_Running != 'Yes':
#!/bin/bash
a=$(mysql --host=host --port=3306 -u master -ppword -N -B -e "show slave status;")
b=$(echo "$a" | awk -F\\t '{print $6}') #Master_Log_File
c=$(echo "$a" | awk -F\\t '{print $10}') #Relay_Master_Log_File
e=$(echo "$a" | awk -F\\t '{print $12}') #Slave_SQL_Running
d=$(echo "$a" | awk -F\\t '{print $26}') #Seconds_Behind_Master
f=$(echo "$a" | awk -F\\t '{print $12}') #Slave_IO_Running
if [ "$e" != 'Yes' ] || [ "$f" != 'Yes' ]; then
echo -e "server id - slave mysql server down \n master log file: $b \n relay master log file: $c \n seconds behind master: $d \n Slave IO Running: $f \n Slave SQL Running: $e " | mail -s 'slave mysql server down' email#email.com
fi

Related

What is this bash script trying to do with mysql?

The person who wrote the follow bash script has left and I need to figure out the meaning of the follow bash script he wrote. This script is executed in a docker container before running some test cases which requires a running mysql instance.
I guess it is about starting a mysql server, but I am not exactly sure about the exact meaning of each statement in this script.
echo -n "Loading [+"
( echo "SHOW TABLES;" | mysql mysql 2>/dev/null 1>/dev/null ) || \
run-mysqld 2>/dev/null 1>/dev/null &
while ! ( echo "SHOW TABLES;" | mysql mysql 2>/dev/null 1>/dev/null ) ;
do
echo -n +
sleep 1
done
echo "] Done."
I had to figure out this because our bitbucket pipeline recently gets stuck and timeout when running this script (previously it was fine). Thanks in advance.
This sequence attempts to run SHOW TABLES through mysql, ignoring any output. If mysql fails (because mysqld isn't running), it starts mysqld to run in background.
( echo "SHOW TABLES;" | mysql mysql 2>/dev/null 1>/dev/null ) || \
run-mysqld 2>/dev/null 1>/dev/null &
The second part of the code just waits for mysqld to start up, which is signaled by the following code exiting 0:
( echo "SHOW TABLES;" | mysql mysql 2>/dev/null 1>/dev/null )
If mysqld doesn't come up with a single attempt, the second part of the code would run forever. Looks like that's what happened.
The simplest way to make this code free from hanging is to put a limit on how long we sleep:
max_sleep=15
sleep=0
while ! ( echo "SHOW TABLES;" | mysql mysql 2>/dev/null 1>/dev/null ) ;
do
echo -n +
sleep 1
((sleep++ > max_sleep)) && { echo "Failed to start mysqld] Error."; exit 1; }
done
echo "] Done."
exit 0

How can I purge up to the completed slave binary log, minus 50?

I'm using bash to purge binary logs on Master, only if the Slave has completed them. Instead I'd like to leave a bit more logs on the server. Can you help me turn mysql-bin.000345 into mysql-bin.000295 (subtracting 50 from 345)?
Here's my script:
# Fetch the current `Relay_Master_Log_File` from Slave
relay_master_log_file=$(ssh user#slave "mysql -hlocalhost -uroot -e 'SHOW SLAVE STATUS\G' | grep Relay_Master_Log_File | cut -f2- -d':' | sed -e 's/^[ \t]*//'")
# Purge binary logs on Master up to the current `Relay_Master_Log_File`
purge_cmd="PURGE BINARY LOGS TO '$relay_master_log_file';"
ssh user#master <<EOF
mysql -e "$purge_cmd"
EOF
But I'd like to keep 50 (or n) binary logs on Master instead.
Assuming you have mysql-bin.000345 in a variable, you could perform these steps:
Strip the beginning part with the trailing zeros, leaving only 345
Use Bash arithmetics $((...)) to subtract n from 345
Use printf to format the result padded with zeros
For example:
oldname=mysql-bin.000345
num=$(shopt -s extglob; echo ${filename##mysql-bin.+(0)})
newname=mysql-bin.$(printf '%06d' $((num - n)))
You can get it in your first command itself using awk and as a bonus skip use of grep, cut and sed.
# Fetch the current `Relay_Master_Log_File` from Slave
relay_master_log_file=$(ssh user#slave "mysql -hlocalhost -uroot -e 'SHOW SLAVE STATUS\G' |
awk -F':[[:blank:]]*' '$1~/Relay_Master_Log_File/{split($2, a, /\./); printf "%s.%06d", a[1], a[2]-50}'
)
Based on comments below use:
ssh user#slave <<-'EOF'
mysql -hlocalhost -uroot -e 'SHOW SLAVE STATUS\G' |
awk -F':[[:blank:]]*' '$1~/Relay_Master_Log_File/{split($2, a, /\./);
printf "%s.%06d\n", a[1], a[2]-50}'
EOF

Ping flood ( ping -f ) vs Default Ping ( e.g ping -i 0.2 )

We currently have a Zabbix app running in CentOS Linux that we use to log our network rtt and packet loss. We came in a internal discussion of what kind of ping we should use
Ping Flood:
ping -f -c 500 -s 1500 -I {$HOSTVLAN2} {$DEST1IPSEC} | grep packet | awk -F" " '{print $6}' | sed -e s'/%//' -e '/^\s*$/d'
'Default' ping:
ping -c 100 -i 0.2 -s 1500 -I {$HOSTVLAN2} {$DEST1IPSEC} | grep packet | awk -F" " '{print $6}' | sed -e s'/%//' -e '/^\s*$/d'
Thats a screen we made to compare results of packet loss:
So we would like an external view of this case. What you guys think about?
We came in several topics regarding network load, DDoS, realiable values of pkt loss etc..
thanks in advance

How to log mysql queries of specific database - Linux

I have been looking at this post
How can I log "show processlist" when there are more than n queries?
It is working fine by running this command
mysql -uroot -e "show full processlist" | tee plist-$date.log | wc -l
the problem it is overriding the file
I also want to run it in cronjob.
I have added this command to the /var/spool/cron/root:
* * * * * [ $(mysql -uroot -e "show full processlist" | tee plist-`date +%F-%H-%M`.log | wc -l) -lt 51 ] && rm plist-`date +%F-%H-%M`.log
but it is not working. Or maybe it is saving the log file some place out of the root folder.
So my question is: how to temporarily log all queries from specific database and specific table and save the whole queries in 1 file?
Note: it is not slow/long queries log I am looking for, but just temp solution to read which queries are running for a database
solution is:
watch -n 1 "mysqladmin -u root -pXXXXX processlist | grep tablename" | tee -a /root/plist.log
The % character has special meaning in crontab commands, you need to escape them. So you need to do:
* * * * * [ $(mysql -uroot -e "show full processlist" | tee plist-`date +\%F-\%H-\%M`.log | wc -l) -lt 51 ] && rm plist-`date +\%F-\%H-\%M`.log
If you want to use your original command, but not overwrite the file each time, you can use the -a option of tee to append:
mysql -uroot -e "show full processlist" | tee -a plist-$date.log | wc -l
To run the command every second for a minute, write a shell script:
#!/bin/bash
for i in {1..60}; do
[ $(mysql -uroot -e "show full processlist" | tee -a plist.log | wc -l) -lt 51 ] && rm plist.log
sleep 1
done
You can then run this script from cron every minute:
* * * * * /path/to/script
Although if you want to run something continuously like this, cron may not be the best way. You could use /etc/inittab to run the script when the system boots, and it will automatically restart it if it dies for some reason. Then you would just use an infinite loop:
#!/bin/bash
while :; do
[ $(mysql -uroot -e "show full processlist" | tee -a plist.log | wc -l) -lt 51 ] && rm plist.log
sleep 1
done

Removing old MySQL / MariaDB backups in Bash Backup Script

I've written a bash script, initiated on cron, that backups all databases on a particular machine nightly and weekly. The script correctly removes old databases, except for those cases when there's been a change in month.
As an example, let's say is November 2nd. The script runs at 11:00pm, and correctly removes the backup made from November 1st. But come December 1st, the script gets confused, and does not correctly remove the backup made from November 30th.
How can I fix this script to correctly remove the old backups in this case?
DATABASES=$(echo 'show databases;' | mysql -u backup --password='(password)' | grep -v ^Database$)
LIST=$(echo $DATABASES | sed -e "s/\s/\n/g")
DATE=$(date +%Y%m%d)
DAYOLD=$(($DATE-1))
SUNDAY=$(date +%a)
WEEKOLD=$(($DATE-7))
for i in $LIST; do
if [[ $i != "mysql" ]]; then
mysqldump --single-transaction $i > /mnt/backups/mariadb/daily/$i.$DATE.sql
if [ -f /mnt/backups/mariadb/daily/$i.$DAYOLD.sql ]; then
rm -f /mnt/backups/mariadb/daily/$i.$DAYOLD.sql
fi
if [[ $SUNDAY == "Sun" ]]; then
cp /mnt/backups/mariadb/daily/$i.$DATE.sql /mnt/backups/mariadb/weekly/$i.$DATE.sql
rm -f /mnt/backups/mariadb/weekly/$i.$WEEKOLD.sql
fi
fi
done
If you know the number of backups performed in a specific range of time, let's say you know from 2nd Nov until 2nd Dec you know that exactly 30 backups have been made and you now want to erase those, just use the number of backups, it's super simple to do and you don't have to deal with dates which is pretty complex in bash:
$ (ls -t|head -n 30;ls)|grep -v ^Database|sort|uniq -u|xargs rm -rf
You can then easily automate this script by removing each day the older one so you only get the fix number of backups you want:
#! /bin/bash
# Create new full backup
BACKUP_DIR="/path-to-backups/"
BACKUP_DAYS=1
# Prepare backup
cd ${BACKUP_DIR}
latest=`ls -rt | grep 201 | head -1`
# Change latest reference
ln -sf ${BACKUP_DIR}${latest} latest
# Cleanup older than one week (n days)
to_remove=`(ls -t | grep 201 | head -n 3;ls)|sort|uniq -u`
echo "Cleaning up... $to_remove"
(ls -t|head -n ${BACKUP_DAYS};ls)|sort|uniq -u|xargs rm -rf
echo "Backup Finished"
exit 0
Then you can link it to daily cron. This is explained in this blog entry, how to do this stuff in a very straightforward fashion (but with hot backups, no mysqldump): http://codeispoetry.me/index.php/mariadb-daily-hot-backups-with-xtrabackup/
I was making this too complicated. Instead of using the date at all, I'm just searching for the age of the file backup with:
find /mnt/backups/mariadb/weekly/* -type f -mtime +8 -exec rm -f {} \;
So the entire script becomes:
DATABASES=$(echo 'show databases;' | mysql -u backup --password='foo' | grep -v ^Database$)
LIST=$(echo $DATABASES | sed -e "s/\s/\n/g")
DATE=$(date +%Y%m%d)
SUNDAY=$(date +%a)
for i in $LIST; do
if [[ $i != "mysql" ]]; then
/bin/nice mysqldump --single-transaction $i > /mnt/backups/mariadb/daily/$i.$DATE.sql
find /mnt/backups/mariadb/daily/* -type f -mtime +1 -exec rm -f {} \;
if [[ $SUNDAY == "Sun" ]]; then
cp /mnt/backups/mariadb/daily/$i.$DATE.sql /mnt/backups/mariadb/weekly/$i.$DATE.sql
find /mnt/backups/mariadb/weekly/* -type f -mtime +8 -exec rm -f {} \;
fi
fi
chown -R backup.backup /mnt/backups
done