gunicorn service fails after some time in Ubuntu server - gunicorn

i have a django app running on digital ocean.and i have a single cronjob that is also running on that server.But after a random interval of maybe 1day,2day,or 1 hour the gunicorn service fails
Ram: 512MB
● gunicorn.service - gunicorn daemon
Loaded: loaded (/etc/systemd/system/gunicorn.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-09-12 07:10:58 UTC; 3min 47s ago
Main PID: 360334 (gunicorn)
Tasks: 4 (limit: 512)
Memory: 202.4M
CPU: 4.291s
CGroup: /system.slice/gunicorn.service
├─360334 /var/www/SlackBot/env/bin/python3 /var/www/SlackBot/env/bin/gunicorn
--access-logfile - --workers 3 --bind unix:/var/w> ├─360335
/var/www/SlackBot/env/bin/python3 /var/www/SlackBot/env/bin/gunicorn --
access-logfile - --workers 3 --bind unix:/var/w> ├─360336
/var/www/SlackBot/env/bin/python3 /var/www/SlackBot/env/bin/gunicorn --
access-logfile - --workers 3 --bind unix:/var/w> └─360337
/var/www/SlackBot/env/bin/python3 /var/www/SlackBot/env/bin/gunicorn --
access-logfile - --workers 3 --bind unix:/var/w>

Related

gunicorn does not start after reboot

The advice given in gunicorn does not start after boot
does not solve my similar problem.
I'm using ubuntu server 20.04.1
my unit file: /etc/systemd/system/gunicorn.service
[Unit]
Description=Gunicorn Web Server as Unit Service Systemd - slimzulu.hopto.me
After=network.target
[Service]
User=nols
Group=nols
WorkingDirectory=/home/nols/fastapi
Environment="PATH=/home/nols/fastapi/env/bin"
ExecStart=/home/nols/fastapi/env/bin/gunicorn --config /home/nols/fastapi/gunicorn.py main:app
Restart=always
[Install]
WantedBy=multi-user.target
To activate the gunicorn.service
sudo systemctl daemon-reload
sudo systemctl enable gunicorn.service
sudo systemctl start gunicorn.service
The status of gunicorn.service after boot (sudo systemctl status gunicorn.service
)
● gunicorn.service - Gunicorn Web Server as Unit Service Systemd - slimzulu.hopto.me
Loaded: loaded (/etc/systemd/system/gunicorn.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2021-10-20 14:13:26 UTC; 3min 24s ago
Process: 681 ExecStart=/home/nols/fastapi/env/bin/gunicorn --config /home/nols/fastapi/gunicorn.py main:app (code=exited, status=3)
Main PID: 681 (code=exited, status=3)
Oct 20 14:12:52 ubuntuvirtual systemd[1]: Started Gunicorn Web Server as Unit Service Systemd - slimzulu.hopto.me.
Oct 20 14:13:26 ubuntuvirtual systemd[1]: gunicorn.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED
Oct 20 14:13:26 ubuntuvirtual systemd[1]: gunicorn.service: Failed with result 'exit-code'.
~
Any ideas to figure out why the gunicorn service isn't starting after reboot?

slurmd.service is Failed & there is no PID file /var/run/slurmd.pid

I am trying to start slurmd.service using below commands but it is not successful permanently. I will be grateful if you could help me to resolve this issue!
systemctl start slurmd
scontrol update nodename=fwb-lab-tesla1 state=idle
This is the status of slurmd.service
cat /usr/lib/systemd/system/slurmd.service
[Unit]
Description=Slurm node daemon
After=network.target munge.service
ConditionPathExists=/etc/slurm/slurm.conf
[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/slurmd
ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurmd.pid
KillMode=process
LimitNOFILE=51200
LimitMEMLOCK=infinity
LimitSTACK=infinity
[Install]
WantedBy=multi-user.target
and this the status of the node:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
gpucompute* up infinite 1 drain fwb-lab-tesla1
$ sinfo -R
REASON USER TIMESTAMP NODELIST
Low RealMemory root 2020-09-28T16:46:28 fwb-lab-tesla1
$ sinfo -Nl
Thu Oct 1 14:00:10 2020
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
fwb-lab-tesla1 1 gpucompute* drained 32 32:1:1 64000 0 1 (null) Low RealMemory
Here there is the contents of slurm.conf
$ cat /etc/slurm/slurm.conf
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=FWB-Lab-Tesla
#ControlAddr=137.72.38.102
#
MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
#SlurmUser=slurm
SlurmdUser=root
StateSaveLocation=/var/spool/slurm/StateSave
SwitchType=switch/none
TaskPlugin=task/cgroup
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
# Prevent very long time waits for mix serial/parallel in multi node environment
SchedulerParameters=pack_serial_at_end
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/filetxt
# Need slurmdbd for gres functionality
#AccountingStorageTRES=CPU,Mem,gres/gpu,gres/gpu:Titan
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
#
#
# COMPUTE NODES
GresTypes=gpu
#NodeName=fwb-lab-tesla[1-32] Gres=gpu:4 RealMemory=64000 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN
#PartitionName=compute Nodes=fwb-lab-tesla[1-32] Default=YES MaxTime=INFINITE State=UP
#NodeName=fwb-lab-tesla1 NodeAddr=137.73.38.102 Gres=gpu:4 RealMemory=64000 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN
NodeName=fwb-lab-tesla1 NodeAddr=137.73.38.102 Gres=gpu:4 RealMemory=64000 CPUs=32 State=UNKNOWN
PartitionName=gpucompute Nodes=fwb-lab-tesla1 Default=YES MaxTime=INFINITE State=UP
There is not any slurmd.pid in the below path. Just once by starting system it appears here but it is gone after few minutes again.
$ ls /var/run/
abrt cryptsetup gdm lvm openvpn-server slurmctld.pid tuned
alsactl.pid cups gssproxy.pid lvmetad.pid plymouth sm-notify.pid udev
atd.pid dbus gssproxy.sock mariadb ppp spice-vdagentd user
auditd.pid dhclient-eno2.pid httpd mdadm rpcbind sshd.pid utmp
avahi-daemon dhclient.pid initramfs media rpcbind.sock sudo vpnc
certmonger dmeventd-client ipmievd.pid mount samba svnserve xl2tpd
chrony dmeventd-server lightdm munge screen sysconfig xrdp
console ebtables.lock lock netreport sepermit syslogd.pid xtables.lock
crond.pid faillock log NetworkManager setrans systemd
cron.reboot firewalld lsm openvpn-client setroubleshoot tmpfiles.d
[shirin#FWB-Lab-Tesla Seq2KMR33]$ systemctl status slurmctld
â slurmctld.service - Slurm controller daemon
Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2020-09-28 15:41:25 BST; 2 days ago
Main PID: 1492 (slurmctld)
CGroup: /system.slice/slurmctld.service
ââ1492 /usr/sbin/slurmctld
Sep 28 15:41:25 FWB-Lab-Tesla systemd[1]: Starting Slurm controller daemon...
Sep 28 15:41:25 FWB-Lab-Tesla systemd[1]: Started Slurm controller daemon.
I try to start the service slurmd.service but it returns to failed after few minutes again
$ systemctl status slurmd
â slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
Active: failed (Result: timeout) since Tue 2020-09-29 18:11:25 BST; 1 day 19h ago
Process: 25650 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
CGroup: /system.slice/slurmd.service
ââ2986 /usr/sbin/slurmd
Sep 29 18:09:55 FWB-Lab-Tesla systemd[1]: Starting Slurm node daemon...
Sep 29 18:09:55 FWB-Lab-Tesla systemd[1]: Can't open PID file /var/run/slurmd.pid (yet?) after start: No ...ctory
Sep 29 18:11:25 FWB-Lab-Tesla systemd[1]: slurmd.service start operation timed out. Terminating.
Sep 29 18:11:25 FWB-Lab-Tesla systemd[1]: Failed to start Slurm node daemon.
Sep 29 18:11:25 FWB-Lab-Tesla systemd[1]: Unit slurmd.service entered failed state.
Sep 29 18:11:25 FWB-Lab-Tesla systemd[1]: slurmd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
Log output of starting slurmd:
[2020-09-29T18:09:55.074] Message aggregation disabled
[2020-09-29T18:09:55.075] gpu device number 0(/dev/nvidia0):c 195:0 rwm
[2020-09-29T18:09:55.075] gpu device number 1(/dev/nvidia1):c 195:1 rwm
[2020-09-29T18:09:55.075] gpu device number 2(/dev/nvidia2):c 195:2 rwm
[2020-09-29T18:09:55.075] gpu device number 3(/dev/nvidia3):c 195:3 rwm
[2020-09-29T18:09:55.095] slurmd version 17.11.7 started
[2020-09-29T18:09:55.096] error: Error binding slurm stream socket: Address already in use
[2020-09-29T18:09:55.096] error: Unable to bind listen port (*:6818): Address already in use```
The log files states that it cannot bind to the standard slurmd port 6818, because there is something else using this address already.
Do you have another slurmd running on this node? Or something else listening there? Try netstat -tulpen | grep 6818 to see what is using the address.

QEMU+Virt-manager can't connect to virtlxcd-sock

I've installed qemu virt-manager libvirt on Linux Mint 20, I have a AMD FX(tm)-4300 Quad-Core Processor with AMD-V enabled in the bios, restarted a lot but virt-manager(Virtual Machine Manager) is saying:
Unable to connect to libvirt lxc:///.
Failed to connect socket to '/var/run/libvirt/virtlxcd-sock': No such file or directory
Libvirt URI is: lxc:///
I am running this locally. The file/socket does not exist, but there is a "libvirt-sock" (and other files) in that folder.
The service is running, but reporting the same error:
libvirtd.service - Virtualization daemon
Loaded: loaded (/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-09-01 10:11:27 BST; 12min ago
TriggeredBy: ● libvirtd.socket
● libvirtd-ro.socket
● libvirtd-admin.socket
Docs: man:libvirtd(8)
https://libvirt.org
Main PID: 731 (libvirtd)
Tasks: 19 (limit: 32768)
Memory: 34.2M
CGroup: /system.slice/libvirtd.service
├─ 731 /usr/sbin/libvirtd
├─1041 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt>
└─1042 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt>
Sep 01 10:11:29 mainlinux dnsmasq[1041]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Sep 01 10:11:29 mainlinux dnsmasq-dhcp[1041]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Sep 01 10:12:35 mainlinux libvirtd[731]: libvirt version: 6.0.0, package: 0ubuntu8.3 (Marc Deslauriers <marc.deslauriers#ubuntu.com> Thu, 30 >
Sep 01 10:12:35 mainlinux libvirtd[731]: hostname: mainlinux
Sep 01 10:12:35 mainlinux libvirtd[731]: Failed to connect socket to '/var/run/libvirt/virtlxcd-sock': No such file or directory
Sep 01 10:12:35 mainlinux libvirtd[731]: End of file while reading data: Input/output error
I'm updated my kernel to 5.8.5-generic, but other than that, running Mint 20 (based on Ubuntu focal). Anyone know how to fix this, or display a log as to why virtlxcd-sock is not being created?
Also tried sudo chmod 777 on the libvirt subfolder and restarted libvirtd, same error.
Been googling for hours, finally found the one that worked for me, seems like installing libvirt and lxc does not install this package:
sudo apt install libvirt-daemon-driver-lxc
sudo systemctl restart libvirtd

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (mysqld.sock file is missing)

On an Ubuntu VPS. I installed mysql via command line. mysql start results in:
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
mysqld.sock is indeed missing.
Using find / -type s there is no mysqld.sock or any other mysql sock files.
Result of sudo service mysql status
● mysql.service - MySQL Community Server
Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
Active: activating (start) since Mon 2020-05-04 22:42:17 EDT; 13s ago
Process: 21907 ExecStartPre=/usr/share/mysql/mysql-systemd-start pre (code=exited, status=0/SUCCES
Main PID: 7884 (code=exited, status=0/SUCCESS); Control PID: 21926 (mysqld)
Tasks: 14 (limit: 2318)
CGroup: /system.slice/mysql.service
├─21926 /usr/sbin/mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid
└─21928 /usr/sbin/mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid
May 04 22:42:17 qnachatphpwebsockets systemd[1]: Starting MySQL Community Server...

MySQL can't connect Ubuntu

Can't connect to MySQL service. Below is the status output error I get when trying to start
Job for mysql.service failed because the control process exited with error code.
See "systemctl status mysql.service" and "journalctl -xe" for details.
ruan#master.danzlive.com:~$ systemctl status mysql.service
● mysql.service - MySQL Community Server
Loaded: loaded (/lib/systemd/system/mysql.service; enabled; vendor preset: enabled)
Active: activating (start-post) since Tue 2018-10-30 11:59:35 SAST; 1s ago
Process: 1988 ExecStartPre=/usr/share/mysql/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
Main PID: 1998 (mysqld); Control PID: 1999 (mysql-systemd-s)
Tasks: 16 (limit: 19660)
Memory: 130.2M
CPU: 577ms
CGroup: /system.slice/mysql.service
├─1998 /usr/sbin/mysqld
└─control
├─1999 /bin/bash /usr/share/mysql/mysql-systemd-start post
└─2031 sleep 1
Oct 30 11:59:35 ip-197-101-38-62 systemd[1]: Starting MySQL Community Server...
ruan#master.danzlive.com:~$ mysql
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
I faced similar Issue in Ubuntu 18.04. I found the reason that - On last day I cleared Ubuntu using Stacer Application. This application removed mysql log folder at location
/var/log/
So next day while starting MySQL, got this error.
I created that folder again and given permissions.
mkdir /var/log/mysql
sudo chown mysql:mysql -R /var/log/mysql
Try to add the database-username using the -u parameter directly via commandline.
mysql -u root <database>