Tcl Expect script - Spawned process from forked child process never returns EOF - tcl

I've run into another strange behavior I can't seem to find an answer for. These are really confusing as I am seeing this problem with the simplest code, which is nearly straight out of the Exploring Expect book. I'm not sure why I'm seeing these issue, and no one else seems to have any problems.
Again, I've boiled this down to a super simple script. If I run it with no arguments, it doesn't fork off the process to a child and kill the parent. If I pass it any argument at all, it will use fork to create a child. When this runs as a parent process, it works normally as expected. But when this runs in the child process, the "expect" command doesn't seem to be connected to anything as it isn't receiving any output from the spawned process. I just don't know enough about what is going on here to be able to debug the problem.
The simple (updated) script:
package require Expect
puts "Tcl version : [info tclversion]"
puts "Expect version: [exp_version]"
# Any passed in argument will enable forking to child process
# fork_cat : Forks and then spawns a process to simply cat a text to stdout
# fork_wish: Forks and then spawns a process to start up wish which opens the
# Tk window as well out the interactive prompt % from the interpreter.
if {$argc > 0} {
while {1} {
# If forking fails, retry every 10 seconds until it succeeds.
if {[catch fork child_pid] == 0} {
break
}
sleep 10
}
if {[lindex $argv 0] == "fork_wish"} {
# Delay so process tree snapshot can be captured with both parent and child processes
sleep 20
}
# Kills the parent process to return terminal control to shell
if {$child_pid != 0} {
puts "[pid] Parent process exiting..."
exit
}
if {[lindex $argv 0] == "fork_wish"} {
# Delay so process tree snapshot can be captured of only child process after parent exits
sleep 20
}
# Redefine exit procedure for child so it kills the process for sure on exit
# I have no idea why exit doesn't work for a child process, but this seems to ensure it goes away on exit.
exit -onexit {
puts "[pid] Killing PID..."
exec kill [pid]
}
}
sleep 1
# Show stty output in case it is relevant to debugging
puts ""
stty -a
puts ""
# Spawn process to cat a text file
switch -exact [lindex $argv 0] {
"fork_cat" {
set spawned_pid [spawn -noecho cat 123.txt]
}
"fork_wish" {
set spawned_pid [spawn -noecho wish]
}
default {
set spawned_pid [spawn -noecho cat 123.txt]
}
}
while {1} {
expect {
eof {
puts ""
puts "[pid] Process received EOF from spawned process"
break
}
timeout {
puts ""
puts "[pid] Process expect timed out for spawned process"
}
}
}
puts ""
puts "[pid] Process exiting..."
exit
Running "cat" without any forking (normal foreground process):
:> temp_eof
Tcl version : 8.4
Expect version: 5.43.0
speed 38400 baud; line = 0;
-brkint ixoff -imaxbel
123
123
123
123
123
15060 Process recieved EOF from spawned process
15060 Process exiting...
Running "cat" without any forking (background process):
:> temp_eof &
[1] 15081
:> Tcl version : 8.4
Expect version: 5.43.0
speed 38400 baud; line = 0;
eof = <undef>; susp = <undef>; rprnt = <undef>; werase = <undef>; lnext = <undef>; min = 1; time = 0;
-brkint inlcr ixoff -imaxbel
-icanon -iexten -echo -echok
123
123
123
123
123
15081 Process received EOF from spawned process
15081 Process exiting...
[1] + Suspended (tty output) temp_eof
[lc-bun-019: AB: ~/bin]
:> fg
temp_eof
Running "cat" with forking:
:> temp_eof fork_cat
Tcl version : 8.4
Expect version: 5.43.0
15121 Parent process exiting...
:>
speed 38400 baud; line = 0;
eof = <undef>; susp = <undef>; rprnt = <undef>; werase = <undef>; lnext = <undef>; min = 1; time = 0;
-brkint inlcr ixoff -imaxbel
-icanon -iexten -echo -echok
15123 Process expect timed out for spawned process
15123 Process expect timed out for spawned process
15123 Process expect timed out for spawned process
15123 Process expect timed out for spawned process
15123 Process expect timed out for spawned process
15123 Process expect timed out for spawned process
15123 Process expect timed out for spawned process
:> pkill temp_eof
So the question is, why does expect work fine from a normal foregrounded process, but doesn't seem to receive any output from the spawned process when executed from a forked child process? Also not sure why the non-forked backgrounded version is suspending the tty when it exits. But you can now see the similarity of the output from stty in a normally started background process compared to the spawned process started from the forked child process.
I have verified that the spawn command is actually running from the forked child process. I changed the command from cat to something like "touch" that creates a file as evidence it actually was spawned. But either the spawned process has it's output directed somewhere else, or the expect isn't correctly seeing it. My feeling is that it's the former, and that while spawn is succesfully creating a process, it's stdin/stdout maybe be pointing to /dev/null. I am just not experienced enough to debug further. I know enough to get myself into trouble, but apparently not enough to get myself out of it.
I also wanted to capture snapshots of the process tree at 3 points in time. One when the parent and child are both alive. Another when the parent has died, and only the child remains alive. And finally when the child process and it's spawned process are alive. For this I change to spawning something GUI-like that stays open until you X it dead. I thought "wish" was perfect for this. In a separate shell during the 20 second delays, I ran "ps axjf" and redirected into a file. The important parts of that output are shown below the stdout captured below.
First the shell output from this run of "fork_wish":
:> temp_eof fork_wish
Tcl version : 8.4
Expect version: 5.43.0
29172 Parent process exiting...
:>
speed 38400 baud; line = 0;
eof = <undef>; susp = <undef>; rprnt = <undef>; werase = <undef>; lnext = <undef>; min = 1; time = 0;
-brkint inlcr ixoff -imaxbel
-icanon -iexten -echo -echok
29174 Process expect timed out for spawned process
29174 Process expect timed out for spawned process
29174 Process expect timed out for spawned process
29174 Process expect timed out for spawned process
29174 Process expect timed out for spawned process
29174 Process expect timed out for spawned process
29174 Process expect timed out for spawned process
:> pkill temp_eof
First snapshot (both parent and child alive):
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 24566 24564 20131 pts/30 20131 S 9276 0:00 konsole -T Main /dev/null
24566 24569 24569 24569 pts/35 29172 Ss 9276 0:00 \_ -bin/tcsh
24569 29172 29172 24569 pts/35 29172 Sl+ 9276 0:00 | \_ /usr/local/bin/tclsh temp_eof fork_wish
29172 29174 29172 24569 pts/35 29172 S+ 9276 0:00 | \_ /usr/local/bin/tclsh temp_eof fork_wish
24566 26530 26530 26530 pts/36 29175 Ss 9276 0:00 \_ -bin/tcsh
26530 29175 29175 26530 pts/36 29175 R+ 9276 0:00 \_ ps axjf
Second snapshot (child alive, parent dead):
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 24566 24564 20131 pts/30 20131 S 9276 0:00 konsole -T Main /dev/null
24566 24569 24569 24569 pts/35 24569 Ss+ 9276 0:00 \_ -bin/tcsh
24566 26530 26530 26530 pts/36 29176 Ss 9276 0:00 \_ -bin/tcsh
26530 29176 29176 26530 pts/36 29176 R+ 9276 0:00 \_ ps axjf
1 29174 29172 24569 pts/35 24569 S 9276 0:00 /usr/local/bin/tclsh temp_eof fork_wish
Third snapshot (child and spawn alive):
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 24566 24564 20131 pts/30 20131 S 9276 0:00 konsole -T Main /dev/null
24566 24569 24569 24569 pts/35 24569 Ss+ 9276 0:00 \_ -bin/tcsh
24566 26530 26530 26530 pts/36 29192 Ss 9276 0:00 \_ -bin/tcsh
26530 29192 29192 26530 pts/36 29192 R+ 9276 0:00 \_ ps axjf
1 29174 29172 24569 pts/35 24569 S 9276 0:00 /usr/local/bin/tclsh temp_eof fork_wish
29174 29178 29178 29178 pts/37 29178 Ssl+ 9276 0:00 \_ /usr/bin/wish
The thing that looks strange to me here, is that the spawned process has a new, unique, TTY of pty/37, while the child process has a TTY of pty/35 (which matches the tcsh TTY that started it). This is in line with the child still being able to send text to stdout, while the spawened process can't seem to send anything anywhere. Is the unique pts/37 TTY expected for a process spawned from Expect? I thought I remember reading that Expect created a new pseudo-terminal for communication with it's spawned processes. But if that's the case, why is Expect losing communication with it's spawned process connected to pty/37?
I've tried to gather some more clues and information, but something still seems very fishy here. This is pretty much what Expect is designed to do, and the behavior just doesn't seem to be in line with what I understand. Do others see this same behavior if you run my simple example script on your setup?
Thanks in advance for any help or suggestions.

use :
expect_user [expect_args]
is like expect but it reads characters from stdin (i.e.
keystrokes from the user). By default, reading is performed in
cooked mode. Thus, lines must end with a return in order for
expect to see them. This may be changed via stty (see the stty
command below).
instead of expect worked for me.

Related

Issues installing IDAS on CentOS 7 VM through provided RPMs

I've been trying to install IDAS GE in a CentOS 7 VM on my machine through the UL2.0 RPMs(download link!) available in its catalogue page.
I followed the instructions on github, but I get stuck in starting the IoT as per section 3 of the Deployment section of the instructions. If I execute the init_iotagent.sh, where I inserted the local IP of the VM, I get the error:
log4cplus:ERROR No appenders could be found for logger (main).
log4cplus:ERROR Please initialize the log4cplus system properly.
HTTPFilter DESTRUCTOR 0
HTTPFilter DESTRUCTOR 0
Also, in the instructions for Starting IoTAgent as a Service, it's said that:
After installing iot-agent-base RPM an init.d script can be found in
this folder /usr/local/iot/init.d .
But this file is not there, leading me to believe that the IoTAgent wasn't installed properly from the RPMs provided.
Also, I can't find log files regarding IoTAgent, only the MongoDB has its log file at /usr/local/iot/mongodb-linux-x86_64-2.6.9/log/mongoc.log.
If anyone could help, it would be apreciated. Also, if more info is needed, please let me know.
Thank you
I recommend you to get the GitHub repository and build the RPMs from the source and then install it in your CentOS. Like is explained in the documentation:
NOTE: I changed the BUILD_TYPE to Release, so I created the Release dir.
GIT_VERSION and GIT_COMMIT are not the latest ones.
git clone https://github.com/telefonicaid/fiware-IoTAgent-Cplusplus.git
cd fiware....
mkdir -p build/Release
cd build/Release
cmake -DGIT_VERSION=20527 -DGIT_COMMIT=217023407f25ed258043cfc00a46b6c05fb0b52c -DMQTT=ON -DCMAKE_BUILD_TYPE=Release ../../
make install
make package
The packages will be in pack/Linux/RPM/
rpm -i iot-agent-base-xxxxxxx (xxxxxxx will be the numbers of the build)
rpm -i iot-agent-ul-xxxxxx (xxxxxxx will be the numbers of the build)
Once installed with RPMs the init.d file is in: /usr/local/iot/init.d/iotagent
This is the content of the file:
#!/bin/bash
# Copyright 2015 Telefonica Investigación y Desarrollo, S.A.U
#
# This file is part of fiware-IoTagent-Cplusplus (FI-WARE project).
#
# iotagent Start/Stop iotagent
#
# chkconfig: 2345 99 60
# description: iotagent
. /etc/rc.d/init.d/functions
PARAM=$1
INSTANCE=$2
USERNAME=iotagent
EXECUTABLE=/usr/local/iot/bin/iotagent
CONFIG_PATH=/usr/local/iot/config
iotagent_start()
{
local result=0
local instance=${1}
if [[ ! -x ${EXECUTABLE} ]]; then
printf "%s\n" "Fail - missing ${EXECUTABLE} executable"
exit 1
fi
if [[ -z ${instance} ]]; then
list_instances="${CONFIG_PATH}/iotagent_*.conf"
else
list_instances="${CONFIG_PATH}/iotagent_${instance}.conf"
fi
for instance_config in ${list_instances}
do
local NAME
NAME=${instance_config%.conf}
NAME=${NAME#*iotagent_}
source ${instance_config}
local IOTAGENT_PID_FILE="/var/run/iot/iotagent_${NAME}.pid"
printf "Starting iotagent ${NAME}..."
status -p ${IOTAGENT_PID_FILE} ${EXECUTABLE} &> /dev/null
if [[ ${?} -eq 0 ]]; then
printf "%s\n" " Already running, skipping $(success)"
continue
fi
# Load the environment
set -a
source ${instance_config}
# Mandatory parameters
IOTAGENT_OPTS=" ${IS_MANAGER} \
-n ${IOTAGENT_SERVER_NAME} \
-v ${IOTAGENT_LOG_LEVEL} \
-i ${IOTAGENT_SERVER_ADDRESS} \
-p ${IOTAGENT_SERVER_PORT} \
-d ${IOTAGENT_LIBRARY_DIR} \
-c ${IOTAGENT_CONFIG_FILE}"
su ${USERNAME} -c "LD_LIBRARY_PATH=\"${IOTAGENT_LIBRARY_DIR}\" \
${EXECUTABLE} ${IOTAGENT_OPTS} & echo \$! > ${IOTAGENT_PID_FILE}" &> /dev/null
sleep 2 # wait some time to leave iotagent start
local PID=$(cat ${IOTAGENT_PID_FILE})
local var_pid=$(ps -ef | grep ${PID} | grep -v grep)
if [[ -z "${var_pid}" ]]; then
printf "%s" "pidfile not found"
printf "%s\n" "$(failure)"
exit 1
else
printf "%s\n" "$(success)"
fi
done
return ${result}
}
iotagent_stop()
{
local result=0
local iotagent_instance=${1}
if [[ -z ${iotagent_instance} ]]; then
list_run_instances="/var/run/iot/iotagent_*.pid"
else
list_run_instances="/var/run/iot/iotagent_${iotagent_instance}.pid"
fi
if [[ $(ls -l ${list_run_instances} 2> /dev/null | wc -l) -eq 0 ]]; then
printf "%s\n" "There aren't any instance of IoTAgent ${iotagent_instance} running $(success)"
return 0
fi
for run_instance in ${list_run_instances}
do
local NAME
NAME=${run_instance%.pid}
NAME=${NAME#*iotagent_}
printf "%s" "Stopping IoTAgent ${NAME}..."
local RUN_PID=$(cat ${run_instance})
kill ${RUN_PID} &> /dev/null
local KILLED_PID=$(ps -ef | grep ${RUN_PID} | grep -v grep | awk '{print $2}')
if [[ -z ${KILLED_PID} ]]; then
printf "%s\n" "$(success)"
else
printf "%s\n" "$(failure)"
result=$((${result}+1))
fi
rm -f ${run_instance} &> /dev/null
done
return ${result}
}
iotagent_status()
{
local result=0
local iotagent_instance=${1}
if [[ -z ${iotagent_instance} ]]; then
list_run_instances="/var/run/iot/iotagent_*.pid"
else
list_run_instances="/var/run/iot/iotagent_${iotagent_instance}.pid"
fi
if [[ $(ls -l ${list_run_instances} 2> /dev/null | wc -l) -eq 0 ]]; then
printf "%s\n" "There aren't any instance of IoTAgent ${iotagent_instance} running."
return 1
fi
for run_instance in ${list_run_instances}
do
local NAME
NAME=${run_instance%.pid}
NAME=${NAME#*iotagent_}
printf "%s\n" "IoTAgent ${NAME} status..."
status -p ${run_instance} ${NODE_EXEC}
result=$((${result}+${?}))
done
return ${result}
}
case ${PARAM} in
'start')
iotagent_start ${INSTANCE}
;;
'stop')
iotagent_stop ${INSTANCE}
;;
'restart')
iotagent_stop ${INSTANCE}
iotagent_start ${INSTANCE}
;;
'status')
iotagent_status ${INSTANCE}
;;
esac
And the logs file are in /tmp/ :
IoTAgent-IoTPlatform.log
IoTAgent.log
IoTAgent-Manager.log
Hope this helps you.

How to start and stop ping federation using a unix script

I need a script to start and stop the pingfederate server using a unix script. I am looking for a best practise to start and stop pingfederate using the script
http://documentation.pingidentity.com/display/PF72/Running+PingFederate+as+a+Service has:
#! /bin/sh
start() {
echo "starting PingFederate.."
su - <pf_user> \
-c '<pf_install>/pingfederate/sbin/pingfederate-run.sh \
> /dev/null 2> /dev/null'
}
stop() {
echo "stopping PingFederate.."
su - <pf_user> \
-c '<pf_install>/pingfederate/sbin/\
pingfederate-shutdown.sh'
}
restart(){
stop
# padding time to stop before restart
sleep 60
# To protect against any services that are not stopped,
# uncomment the following command.
# (Warning: this kills all Java instances running as
# <pf_user>.)
# su - <pf_user> -c 'killall java'
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
*)
echo "Usage: <pf_user> {start|stop|restart}"
exit 1
esac
exit 0

Send extra string netcat

I use tcpdump on openwrt to capture packets and send them to a raspberry pi with netcat.
the problem is that i want to use multiple routers to capture the requests, and forward them to the raspberry pi.
tcpdump -i wlan0 -e -s 256 -l type mgt subtype probe-req |nc 192.168.0.230 22222
And i recieve the packet info with a python script:
import socket
HOST = 'localhost' # use '' to expose to all networks
PORT = 12345
def incoming(host, port):
"""Open specified port and return file-like object"""
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# set SOL_SOCKET.SO_REUSEADDR=1 to reuse the socket if
# needed later without waiting for timeout (after it is
# closed, for example)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind((host, port))
sock.listen(0) # do not queue connections
request, addr = sock.accept()
return request.makefile('r', 0)
# /-- network ---
for line in incoming(HOST, PORT):
print line,
output:
15:17:57 801928 3933710786us tsft 1.0 Mb/s 2412 Mhz 11b -38dB signal antanna 1 BSSID: broadcast SA:xxxx ....
desired output:
192.168.0.130 15:17:57 801928 3933710786us tsft 1.0 Mb/s 2412 Mhz 11b -38dB signal antanna 1 BSSID: broadcast SA:xxxx ....
But how can i add the the Ip-address of the router to the command? so i can see witch router received the packet.
Or how can i just send and extra string like "router1" to identify the router?
You can send an extra string to the router with the script below:
#! /bin/bash
ip=$(ifconfig wlan0 | grep cast | awk -F: '{print $2}' | awk '{print $1}' )
tcpdump -i wlan0 -e -s 256 -l type mgt subtype probe-req |\
while read line; do
echo "$ip" "$(date +%T)" "$line"
done | nc 192.168.0.230 22222
It will insert ip address and time stamp at the beggining of each line of tcpdump's output and pipe it to netcat.

pacemaker can't start my zabbix service when I stop zabbix service

I want use corosync+pacemaker+zabbix to achieve high availability. Follow is my config
crm(live)configure# show
node zabbix1 \
attributes standby="off" timeout="60"
node zabbix2 \
attributes standby="off"
primitive httpd lsb:httpd \
op monitor interval="10s"
primitive vip ocf:heartbeat:IPaddr \
params ip="192.168.56.110" nic="eth0" cidr_netmask="24" \
op monitor interval="10s"
primitive zabbix-ha lsb:zabbix_server \
op monitor interval="30s" timeout="20s" \
op start interval="0s" timeout="40s" \
op stop interval="0s" timeout="60s"
group webservice vip httpd zabbix-ha
property $id="cib-bootstrap-options" \
dc-version="1.1.8-7.el6-394e906" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
last-lrm-refresh="1377489711" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
and my crm_mon status is:
Last updated: Mon Aug 26 18:52:48 2013
Last change: Mon Aug 26 18:52:33 2013 via cibadmin on zabbix1
Stack: classic openais (with plugin)
Current DC: zabbix1 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
3 Resources configured.
Node zabbix1: online
httpd (lsb:httpd): Started
vip (ocf::heartbeat:IPaddr): Started
zabbix-ha (lsb:zabbix_server): Started
Node zabbix2: online
now i stop zabbix-ha service on the zabbix1, I wait for 300s, pacemaker can't start my zabbix-ha service:
[root#zabbix1 tmp]# ps -ef|grep zabbix
root 13287 31252 0 18:59 pts/2 00:00:00 grep zabbix
and my zabbix-ha script is
i can use crm resource stop/start zabbix-ha to stop/start my zabbix-ha.
I'm not use zabbix default script(address is zabbix-2.0.6/misc/init.d/fedora/core/zabbix_serve),I create lsb script by myself.Follow is my script for zabbix_server(i put it in the /etc/init.d)
#!/bin/bash
#
# zabbix: Control the zabbix Daemon
#
# author: Denglei
#
# blog: http://dl528888.blog.51cto.com/
# description: This is a init.d script for zabbix. Tested on CentOS6. \
# Change DAEMON and PIDFILE if neccessary.
#
#Location of zabbix binary. Change path as neccessary
DAEMON=/usr/local/zabbix/sbin/zabbix_server
NAME=`basename $DAEMON`
#Pid file of zabbix, should be matched with pid directive in nginx config file.
PIDFILE=/tmp/$NAME.pid
#this file location
SCRIPTNAME=/etc/init.d/$NAME
#only run if binary can be found
test -x $DAEMON || exit 0
RETVAL=0
start() {
echo $"Starting $NAME"
$DAEMON
RETVAL=0
}
stop() {
echo $"Graceful stopping $NAME"
[ -s "$PIDFILE" ] && kill -QUIT `cat $PIDFILE`
RETVAL=0
}
forcestop() {
echo $"Quick stopping $NAME"
[ -s "$PIDFILE" ] && kill -TERM `cat $PIDFILE`
RETVAL=$?
}
reload() {
echo $"Graceful reloading $NAME configuration"
[ -s "$PIDFILE" ] && kill -HUP `cat $PIDFILE`
RETVAL=$?
}
status() {
if [ -s $PIDFILE ]; then
echo $"$NAME is running."
RETVAL=0
else
echo $"$NAME stopped."
RETVAL=3
fi
}
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
force-stop)
forcestop
;;
restart)
stop
start
;;
reload)
reload
;;
status)
status
;;
*)
echo $"Usage: $0 {start|stop|force-stop|restart|reload|status}"
exit 1
esac
exit $RETVAL
</pre>

Better script to restart mysql on Ubuntu 8.04

When I say sudo /etc/init.d/mysql restart on Ubuntu 8.04.2 sometimes there remains a mysql_safe process eating 99% of cpu. Making the machine practically unusable.
Is there a better way to restart mysql? I thought about writing a script:
sudo /etc/init.d/mysql stop
sleep 10
sudo killall mysql_safe
sudo /etc/init.d/mysql start
But this would be a evil workaround. (And the script is just a quick shot)
I googled and found that mysql_safe is a wrapper script which starts mysqld, and makes sure it gets restarted if it should die. So there should be a better way to restart the thing.
I googled that this is a common problem in this ubuntu version. Is Debian / Ubuntu doing it wrong at this point? The /etc/init.d script looks quite sophisticated, and it deals with mysql_safe also, but my skills are not good enough to understand it fully. But this would be the best place to improve. This is a paste of the version on my machine (which is untouched):
#!/bin/bash
#
### BEGIN INIT INFO
# Provides: mysql
# Required-Start: $remote_fs $syslog mysql-ndb
# Required-Stop: $remote_fs $syslog mysql-ndb
# Should-Start: $network $named $time
# Should-Stop: $network $named $time
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start and stop the mysql database server daemon
# Description: Controls the main MySQL database server daemon "mysqld"
# and its wrapper script "mysqld_safe".
### END INIT INFO
#
set -e
set -u
${DEBIAN_SCRIPT_DEBUG:+ set -v -x}
test -x /usr/sbin/mysqld || exit 0
. /lib/lsb/init-functions
SELF=$(cd $(dirname $0); pwd -P)/$(basename $0)
CONF=/etc/mysql/my.cnf
MYADMIN="/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf"
# priority can be overriden and "-s" adds output to stderr
ERR_LOGGER="logger -p daemon.err -t /etc/init.d/mysql -i"
# Safeguard (relative paths, core dumps..)
cd /
umask 077
# mysqladmin likes to read /root/.my.cnf. This is usually not what I want
# as many admins e.g. only store a password without a username there and
# so break my scripts.
export HOME=/etc/mysql/
## Fetch a particular option from mysql's invocation.
#
# Usage: void mysqld_get_param option
mysqld_get_param() {
/usr/sbin/mysqld --print-defaults \
| tr " " "\n" \
| grep -- "--$1" \
| tail -n 1 \
| cut -d= -f2
}
## Do some sanity checks before even trying to start mysqld.
sanity_checks() {
# check for config file
if [ ! -r /etc/mysql/my.cnf ]; then
log_warning_msg "$0: WARNING: /etc/mysql/my.cnf cannot be read. See README.Debian.gz"
echo "WARNING: /etc/mysql/my.cnf cannot be read. See README.Debian.gz" | $ERR_LOGGER
fi
# check for diskspace shortage
datadir=`mysqld_get_param datadir`
if LC_ALL=C BLOCKSIZE= df --portability $datadir/. | tail -n 1 | awk '{ exit ($4>4096) }'; then
log_failure_msg "$0: ERROR: The partition with $datadir is too full!"
echo "ERROR: The partition with $datadir is too full!" | $ERR_LOGGER
exit 1
fi
}
## Checks if there is a server running and if so if it is accessible.
#
# check_alive insists on a pingable server
# check_dead also fails if there is a lost mysqld in the process list
#
# Usage: boolean mysqld_status [check_alive|check_dead] [warn|nowarn]
mysqld_status () {
ping_output=`$MYADMIN ping 2>&1`; ping_alive=$(( ! $? ))
ps_alive=0
pidfile=`mysqld_get_param pid-file`
if [ -f "$pidfile" ] && ps `cat $pidfile` >/dev/null 2>&1; then ps_alive=1; fi
if [ "$1" = "check_alive" -a $ping_alive = 1 ] ||
[ "$1" = "check_dead" -a $ping_alive = 0 -a $ps_alive = 0 ]; then
return 0 # EXIT_SUCCESS
else
if [ "$2" = "warn" ]; then
echo -e "$ps_alive processes alive and '$MYADMIN ping' resulted in\n$ping_output\n" | $ERR_LOGGER -p daemon.debug
fi
return 1 # EXIT_FAILURE
fi
}
#
# main()
#
case "${1:-''}" in
'start')
sanity_checks;
# Start daemon
log_daemon_msg "Starting MySQL database server" "mysqld"
if mysqld_status check_alive nowarn; then
log_progress_msg "already running"
log_end_msg 0
else
/usr/bin/mysqld_safe > /dev/null 2>&1 &
# 6s was reported in #352070 to be too few when using ndbcluster
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
sleep 1
if mysqld_status check_alive nowarn ; then break; fi
log_progress_msg "."
done
if mysqld_status check_alive warn; then
log_end_msg 0
# Now start mysqlcheck or whatever the admin wants.
output=$(/etc/mysql/debian-start)
[ -n "$output" ] && log_action_msg "$output"
else
log_end_msg 1
log_failure_msg "Please take a look at the syslog"
fi
fi
# Some warnings
if $MYADMIN variables | egrep -q have_bdb.*YES; then
echo "BerkeleyDB is obsolete, see /usr/share/doc/mysql-server-5.0/README.Debian.gz" | $ERR_LOGGER -p daemon.info
fi
if [ -f /etc/mysql/debian-log-rotate.conf ]; then
echo "/etc/mysql/debian-log-rotate.conf is obsolete, see /usr/share/doc/mysql-server-5.0/NEWS.Debian.gz" | $ERR_L
fi
;;
'stop')
# * As a passwordless mysqladmin (e.g. via ~/.my.cnf) must be possible
# at least for cron, we can rely on it here, too. (although we have
# to specify it explicit as e.g. sudo environments points to the normal
# users home and not /root)
log_daemon_msg "Stopping MySQL database server" "mysqld"
if ! mysqld_status check_dead nowarn; then
set +e
shutdown_out=`$MYADMIN shutdown 2>&1`; r=$?
set -e
if [ "$r" -ne 0 ]; then
log_end_msg 1
[ "$VERBOSE" != "no" ] && log_failure_msg "Error: $shutdown_out"
log_daemon_msg "Killing MySQL database server by signal" "mysqld"
killall -15 mysqld
server_down=
for i in 1 2 3 4 5 6 7 8 9 10; do
sleep 1
if mysqld_status check_dead nowarn; then server_down=1; break; fi
done
if test -z "$server_down"; then killall -9 mysqld; fi
fi
fi
if ! mysqld_status check_dead warn; then
log_end_msg 1
log_failure_msg "Please stop MySQL manually and read /usr/share/doc/mysql-server-5.0/README.Debian.gz!"
exit -1
else
log_end_msg 0
fi
;;
'restart')
set +e; $SELF stop; set -e
$SELF start
;;
'reload'|'force-reload')
log_daemon_msg "Reloading MySQL database server" "mysqld"
$MYADMIN reload
log_end_msg 0
;;
'status')
if mysqld_status check_alive nowarn; then
log_action_msg "$($MYADMIN version)"
else
log_action_msg "MySQL is stopped."
exit 3
fi
;;
*)
echo "Usage: $SELF start|stop|restart|reload|force-reload|status"
exit 1
;;
esac
I found many hints, but I would like this resolved to a certain degree of reliability for production servers.
Edit: It seems to be exactly this unsolved bug.
Maybe it is this bug from the MySQL site.
This seems related or identical.
Some people talk of a race condition with 2 instances of mysql_safe. Others suggest commentiong out the sanity check in the startup script.
I would try to figure out what is causing the CPU issue, rather than investigate how to re-write the startup script. The startup script is fairly standard and should work well in a production environment.