Error installing eclipse che using chectl - FailedScheduling - qemu

I've been trying to install Eclipse Che with minikube for a while and there is a point where I can't get ahead.
That point is right after you run the
chectl server:start --platform minikube
And the error comes at this point:
❯ ✅ Post installation checklist
❯ PostgreSQL pod bootstrap
✔ scheduling...done.
✖ downloading images
→ ERR_TIMEOUT: Timeout set to pod wait timeout 300000
...
Show important messages
› Error: Error: ERR_TIMEOUT: Timeout set to pod wait timeout 300000
› Installation failed, check logs in '/tmp/chectl-logs/1592460423915'
If we look at the output of the log we see a fault that repeats
0s Warning FailedScheduling pod/postgres-59b797464c-vdfnl running "VolumeBinding" filter plugin for pod "postgres-59b797464c-vdfnl": pod has unbound immediate PersistentVolumeClaims
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
This would be the complete output of the log
LAST SEEN TYPE REASON OBJECT MESSAGE
0s Normal ScalingReplicaSet deployment/che-operator Scaled up replica set che-operator-7f7575f6fb to 1
0s Normal SuccessfulCreate replicaset/che-operator-7f7575f6fb Created pod: che-operator-7f7575f6fb-xjqg9
0s Normal Scheduled pod/che-operator-7f7575f6fb-xjqg9 Successfully assigned che/che-operator-7f7575f6fb-xjqg9 to minikube
0s Normal Pulling pod/che-operator-7f7575f6fb-xjqg9 Pulling image "quay.io/eclipse/che-operator:7.14.2"
0s Normal Pulled pod/che-operator-7f7575f6fb-xjqg9 Successfully pulled image "quay.io/eclipse/che-operator:7.14.2"
0s Normal Created pod/che-operator-7f7575f6fb-xjqg9 Created container che-operator
0s Normal Started pod/che-operator-7f7575f6fb-xjqg9 Started container che-operator
0s Normal SuccessfulCreate job/che-tls-job Created pod: che-tls-job-6pj8p
0s Normal Scheduled pod/che-tls-job-6pj8p Successfully assigned che/che-tls-job-6pj8p to minikube
0s Normal Pulling pod/che-tls-job-6pj8p Pulling image "quay.io/eclipse/che-tls-secret-creator:alpine-3029769"
0s Normal Pulled pod/che-tls-job-6pj8p Successfully pulled image "quay.io/eclipse/che-tls-secret-creator:alpine-3029769"
0s Normal Created pod/che-tls-job-6pj8p Created container che-tls-job-job-container
0s Normal Started pod/che-tls-job-6pj8p Started container che-tls-job-job-container
0s Normal Completed job/che-tls-job Job completed
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ScalingReplicaSet deployment/postgres Scaled up replica set postgres-59b797464c to 1
0s Normal SuccessfulCreate replicaset/postgres-59b797464c Created pod: postgres-59b797464c-vdfnl
0s Warning FailedScheduling pod/postgres-59b797464c-vdfnl running "VolumeBinding" filter plugin for pod "postgres-59b797464c-vdfnl": pod has unbound immediate PersistentVolumeClaims
0s Warning FailedScheduling pod/postgres-59b797464c-vdfnl running "VolumeBinding" filter plugin for pod "postgres-59b797464c-vdfnl": pod has unbound immediate PersistentVolumeClaims
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Warning FailedScheduling pod/postgres-59b797464c-vdfnl running "VolumeBinding" filter plugin for pod "postgres-59b797464c-vdfnl": pod has unbound immediate PersistentVolumeClaims
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Warning FailedScheduling pod/postgres-59b797464c-vdfnl running "VolumeBinding" filter plugin for pod "postgres-59b797464c-vdfnl": pod has unbound immediate PersistentVolumeClaims
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Warning FailedScheduling pod/postgres-59b797464c-vdfnl running "VolumeBinding" filter plugin for pod "postgres-59b797464c-vdfnl": pod has unbound immediate PersistentVolumeClaims
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Warning FailedScheduling pod/postgres-59b797464c-vdfnl running "VolumeBinding" filter plugin for pod "postgres-59b797464c-vdfnl": pod has unbound immediate PersistentVolumeClaims
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
0s Normal ExternalProvisioning persistentvolumeclaim/postgres-data waiting for a volume to be created, either by external provisioner "k8s.io/minikube-hostpath" or manually created by system administrator
At this point I don't know how to proceed. I currently have a minikube installed with kvm2. The machine that runs it is a virtual machine with nested virtualization enabled.
Following the advice of the documentation of de che lanzo minikube with at least 4GB of ram(the virtual machine has at least 6). I also came to try a solution I found here in stack to the timeout error, it was simply to extend the waiting time, so the error took longer to appear and I finally realized what a rookie I was and I got to look at the log.
This is the list of versions I have installed
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
minikube version: v1.11.0
commit: 57e2f55f47effe9ce396cea42a1e0eb4f611ebbd
Client Version:
version.Info{
Major:"1",
Minor:"18",
GitVersion:"v1.18.3",
GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40",
GitTreeState:"clean",
BuildDate:"2020-05-20T12:52:00Z",
GoVersion:"go1.13.9",
Compiler:"gc",
Platform:"linux/amd64"
}
Eclipse Che CLI
VERSION
chectl/7.14.2 linux-x64 node-v10.21.0
QEMU emulator version 3.1.0 (Debian 1:3.1+dfsg-8+deb10u5)
Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers

You could try with minikube 1.7.3 version instead of minikube 1.11.0.
I faced the same issue and its resolved with 1.7.3 version.
I hope it will helps you.
Cheers.

Related

Django does not gracefully close MySQL connection upon shutdown when run through uWSGI

I have a Django 2.2.6 running under uWSGI 2.0.18, with 6 pre-forking worker processes. Each of these has their own MySQL connection socket (600 second CONN_MAX_AGE). Everything works fine but when a worker process is recycled or shutdown, the uwsgi log ironically says:
Gracefully killing worker 6 (pid: 101)...
But MySQL says:
2020-10-22T10:15:35.923061Z 8 [Note] Aborted connection 8 to db: 'xxxx' user: 'xxxx' host: '172.22.0.5' (Got an error reading communication packets)
It doesn't hurt anything but the MySQL error log gets spammed full of these as I let uWSGI recycle the workers every 10 minutes and I have multiple servers.
It would be good if Django could catch the uWSGI worker process "graceful shutdown" and close the mysql socket before dying. Maybe it does and I'm configuring this setup wrong. Maybe it can't. I'll dig in myself but thought I'd ask as well..
If CONN_MAX_AGE is set to a positive value, then persistent connections are created by Django, that get cleaned up upon request start and request end. Clean up here, means if they are invalid, had too many errors or have been started longer than CONN_MAX_AGE seconds ago.
Otherwise, connections are closed at request close. So this problem occurs when you are using persistent connections and do uWSGI periodic reloads, by design.
There is this bit of code, that calls instructs uwsgi to shutdown all sockets, but I'm unsure if this is communicated to Django or that uwsgi uses a more brutal method and is causing the aborts. That shuts down all uwsgi owned sockets, so from the looks of it, unix sockets and connections with webserver. There's no hook either to be called just before or during reload.
Perhaps this get you on your way. :)

HAProxy - Does it cache the configuration file

I'm not sure if this is the correct place to post this. Please let me know if it is not.
Basically, exactly what the title of the question is. I have an haproxy configration file where I'm trying to set the timeout to 3600s. However, it seems that at random, the file will revert to a previous iteration, with much shorter timeout values.
What I set is as follows:
defaults
log global
mode http
retries 3
timeout client 3600s
timeout connect 3600s
timeout server 3600s
option tcplog
balance roundrobin
listen admin
bind 127.0.0.1:22002
mode http
stats enable
stats show-node
stats uri /admin
listen stats :1936
mode http
log global
maxconn 10
timeout client 3600s
timeout connect 3600s
timeout server 3600s
timeout queue 3600s
stats enable
stats hide-version
stats show-node
stats uri /haproxy?stats
However, it somehow changes to the following:
defaults
log global
mode http
retries 3
timeout client 50s
timeout connect 5s
timeout server 50s
option tcplog
balance roundrobin
listen admin
bind 127.0.0.1:22002
mode http
stats enable
stats show-node
stats uri /admin
listen stats :1936
mode http
log global
maxconn 10
clitimeout 100s
srvtimeout 100s
contimeout 100s
timeout queue 100s
stats enable
stats hide-version
stats show-node
stats uri /haproxy?stats
I have not found anything to indicate that haproxy 1.5 caches the configuration file, but I need to investigate all possibilities. Simply put: is there anything in haproxy that can cause this?
AFAIK, HAProxy doesn't implement anything that could explain that behavior.
How was installed HAProxy ? What's your OS ?
If you have auditd on your server, you can add a rule to watch which process is modifying the configuration file :
auditctl -w /etc/haproxy/haproxy.conf -p wa
Then watch for any activity in /var/log/audit/audit.log.
To delete the auditing :
auditctl -W /etc/haproxy/haproxy.conf

Docker: Defer startup actions relying on another container?

I have two types of Docker containers: One with a web application (nginx/php-fpm) and one with a MySQL database. Both are connected through a network. The app container is aware of the DB container, however, the DB doesn’t know if zero, one or more app containers are available. Both types of containers use Supervisord.
The database container has to startup mysqld, which can take a few seconds.
The other container has to perform some startup actions, part of which require database access. As these actions depend on the DB container, I have put a loop at the top of the script, waiting for the DB server to become available:
try=0
ok=0
until mysql -h$dbhost -u$dbuser -p$dbpass -e "USE $dbname" && ok=1; do
[ $((++try)) -gt 30 ] && break
sleep 1
done
if [ $ok -gt 0 ]; then
# DO STUFF
else
exit 1
fi
While this does work, I see two downsides: First, the script will fail if a DB container is down or takes longer than a certain timeout to start when the app container comes up. Second, the app container won’t know if there are changes on the DB server (e.g. migrations).
While I’m aware of Supervisord events, I wonder: How can I notify an arbitrary amount of other containers in the same network of such events?
(NOTE: I’m not restricted to using Supervisord for this, I just feel that this is the most promising approach.)
You might want to use Compose.
Also you can add an healthcheck to your database container and add a condition for the web server container. Something like this.
healthcheck:
test: ["CMD-SHELL", "mysql_check.sh"]
interval: 30s
timeout: 30s
retries: 3
and
depends_on:
mysql-database:
condition: service_healthy
Compose will wait for the database to be ready before starting the webserver container.

OpenShift Pod gets stuck in pending state

MySQL pod in OpenShift gets stuck after new deployment and shows the message "The pod has been stuck in the pending state for more than five minutes." What can I do to solve this? I tried to scale the current deployment pod to 0 and scale the previous deployment pod to 1. But it also got stuck which was working earlier.
If pod stuck in pending state we can remove it by executing
oc delete pod/<name of pod> --grace-period=0
This command would remove pods immediately, but use it with caution because it may leave some process pid files on persistent volumes.

ssh connection time out after reboot GCE instance

Can anyone tell me why after reboot a Google Compute Engine instance i get a ssh connection time out. I reboot the instance by sudo reboot and by Google Compute Engine Console and both do the same.
When the OS shuts down to reboot, all network connections are closed, including SSH connections. From the client side, this can look like a connection time out.
When you use gcutil resetinstance, it does the same thing as pushing the power button on a physical host. This is different from e.g. sudo reboot, because the former does not give the operating system a chance to perform any shutdown (like closing open sockets, flushing buffers, etc), while the latter does an orderly shutdown.
You should probably prefer logging in to the instance to do a reboot rather than using gcutil resetinstance if the host is still ssh-able; resetinstance (or the "Reboot Instance" button in the GUI) is a hard reset, which allows you to recover from a kernel crash or SSH failing.
In more detail:
During OS-initiated reboot (like sudo reboot), the operating system performs a number of cleanup steps and then moves to runlevel 6 (reboot). This causes all the scripts in /etc/init.d to be run and then a graceful shutdown. During a graceful shutdown, sshd will be killed; sshd could catch the kill signal to close all of its open sockets. Closing the socket will cause a FIN TCP packet to be sent, starting an orderly TCP teardown ("Connection closed" message in your ssh client). Alternatively, if sshd simply exits, the kernel sends a RST (reset) packet on all open TCP sockets, which will cause a "Connection reset" message on your ssh client. Once all the processes have been shut down, the kernel will make sure that all dirty pages in the page cache are flushed to disk, then execute one of two or three mechanisms to trigger a BIOS reboot. (ACPI, keyboard controller, or triple-fault.)
When triggering an external reset (e.g. via the resetinstance API call or GUI), the VM will go immediately to the last step, and the operating system won't have a chance to do any of the graceful shutdown steps above. This means your ssh client won't receive a FIN or RST packet like above, and will only notice the connection closed when the remote server stops responding. ("Connection timed out")
Thank you Brian Dorsey, E. Anderson and vgt for answering my question. The problem was other. Every time that i reseted the connection previously i up an ethernet bridge with the brigde-util utility between the "eth0" inferface and a new brigde interface called "br0". After reset the instance by sudo reboot or by GCE Console, ssh connection stopped working.
But if i don't up the ethernet bridge the instance restart ok by both methods.
If your instance image is CentOS, try to remove selinux.
sudo yum remove selinux*
Slightly orthogonal to Brian's answer. To gracefully reboot a GCE VM you can use:
gcutil resetinstance <instancename>