I had an alert for "Approaching full disk warning" on my couchbase cluster. I resolved the issue, but for some reason the alert does not clear within the cluster. How can I clear alerts within the cluster?
Related
I have downloaded code ready containers on windows for installing my openshift cluster. I need to deploy 3scale on it using the operator from operators hub but the operators hub page is empty.
Digging deeper I found that a few pods on the cluster are not running and show a state "ImagePullBackOff"
I deleted the pods in order to get them restarted but the error wont go away. I checked the event logs and all the screenshotted images are attached.
Pods Terminal logs
This is an error that I keep on getting when I start my cluster. Sometimes it comes up sometimes it starts normally but maybe this has something to do with it.
Quay.io Error
This is my fist time making a deployment on openshift cluster and setting up my cluster environment. So far I am not able to resolve the issue even after deleting and restarting the cluster.
My Windows Server instance on GCE is shut down from time to time. Based on the GCP logging, we can tell that fail to pass the lateBootReportEvent check only triggers a reboot by a certain chance. I am wondering why?
logs screenshot
I am aware that auto-shutdown is caused by integrity monitoring (settings shown below). And I understand that my boot integrity might fail here. I am just trying to understand why there is a "probability" here
Shielded-VM settings
The integrity monitor and shielded VMs don't have any relation with a VM restart or shutdown.
Integrity monitoring only compares the most recent boot measurements to the integrity policy baseline and returns a pair of pass/fail results depending on whether they match or not, one for the early boot sequence and one for the late boot sequence.
Early boot is the boot sequence from the start of the UEFI firmware until it passes control to the bootloader. Late boot is the boot sequence from the bootloader until it passes control to the operating system kernel. If either part of the most recent boot sequence doesn't match the baseline, you get an integrity validation failure.
If the failure is expected, for example if you applied some system update on that VM instance, you should update the integrity policy baseline. If it is not expected, you should stop that VM instance and investigate the reason for the failure, but the VM never be shutdown by integrity monitor .
In order to determine what actually caused the VM to restart you will need to look at the internal Windows event manager logs, and review the event viewer logs for the instance at time to shutdown, then reference the shutdown reason against Microsoft's reason codes to determine what caused the VM stop.
It is possible that the instance restarted to complete installation of updates, or encountered an internal error. However only the event viewer logs will determine the true cause.
If you found a useful internal logs please share on this post to check.
I'm trying to migrate an Openshift v2 application to v3. I'm really struggling to understand the documentation. In the section on persistent volumes, it says the following:
EmptyDir has the same lifecycle as the pod:
EmptyDir volumes survive container crashes/restarts.
EmptyDir volumes are deleted when the pod is deleted.
I cannot understand what this means. I have added storage to my app using the Web Console, which allowed me to add 1 GB of persistent storage and give it to a particular mountpoint. I don't know if this is an "EmptyDir" volume or not (I think it isn't, but in that case why the warning in the persistent volumes section?). Now, every time I rebuild the application, a new pod is created (if I understand this correctly). So far my data has persisted when this happens. Does this warning mean it can suddenly be wiped out? Or is persistent storage persistent?
When you claim a persistent volume, you are not usually using an EmptyDir volume type, so that isn't relevant. EmptyDir is a special volume type that is managed a bit differently that would normally only be needed if you want to share some temporary file system space between different containers in the same pod.
In short, the persistent storage is indeed 'persistent'. You could see the difference by a simple experiment of creating a mysql with "non-persistent" vs "persistent".
The openshift architecture is built on Amazon infrastructure. Please check AWS EBS volume which is backbone of Openshift persistent storage -
https://aws.amazon.com/ebs/getting-started/
EDIT
Think it like that
EBS Volume -- ATTACHED TO --- pod ---- WHICH CONTAINS --- Containers
What document means to say that, if you destroy the POD (think of a machine), the EBS volume will be wiped out and available again as an unattached resource. Its lifecycle limited by pod. However, a POD can contain multiple containers (say JVM) and each container can share the EBS volume (think of a hard disk), but killing a container does not affect the EBS lifecycle.
I have a spark 1.2.1 cluster set up in standalone mode with a master and a few slaves. I then let my data scientists enjoy the cluster's power.
All is working fine. However, the dedicated server that my data scientists used to submit spark jobs have its spark.local.dir filled up gradually.
Given that this machine is sitting outside of the cluster, not a master, nor a worker/slave, I wouldn't think that the local spark.local.dir is used in any way by spark. (And why would it? It only shows the logs.)
I could not find a good doc detailing this part of information. Does anybody have an idea?
Not enough information about your setup to be sure, but I am guessing that the jobs are launched in client mode where the driver would be on your client node.
From the spark docs:
In client mode, the driver is launched in the same process as the client that submits the application. In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.
I am guessing that in client mode the driver (on your client machine) of the application needs plenty of scratch space to manage the other workers in that case.
Yesterday I tried to delete an Instance by invoking the "halt" command through SSH. Unlike AWS, GCE does not allow us to choose the behavior of the VM shutdown and stop the instance by default (the instance status is TERMINATED).
Today I was browsing the Google Compute Engine REST API documentation and I found the following description :
status : [Output Only] The status of the instance. One of the following values: PROVISIONING, STAGING, RUNNING, STOPPING, STOPPED, TERMINATED.
What is this "STOPPPED" status ? Both the instances stopped through the Web console or the "halt" command have the "TERMINATED" status.
Any ideas ?
This STOPPED state is a new feature added a few weeks ago which you can reach via the compute engine API.
This method stops a running instance, shutting it down cleanly, and allows you to restart the instance at a later time. Stopped instances do not incur per-minute, virtual machine usage charges while they are stopped, but any resources that the virtual machine is using, such as persistent disks and static IP addresses,will continue to be charged until they are deleted. For more information, see Stopping an instance.
I think this is similar to the AWS option you mention.
For anyone stumbling on this question years later, a detailed lifecycle diagram of instances can be found here
There is no STOPPED status anymore, instances are going from STOPPING to TERMINATED, whatever the stopping method is.
However a new state, that may be closer to what halt does, has been introduced since: SUSPENDED. It's still in beta though, and not sure that invoking halt would induce this state or simply terminates the instance.
See here for more details