I have a 3 node bare metal K3s cluster where an install fails on one node, but not another.
My guess is that somehow the Kubernetes image repository on the node where the deployment failed is in a bad state. I don't know how to prove that, or fix it.
I did a helm install yesterday which failed with the following error:
Apr 14 14:28:41 clstr2n1 k3s[18777]: E0414 14:28:41.878018 18777 remote_image.go:114] "PullImage from image service failed" err="rpc error: code = NotFound desc = failed to pull and unpack image \"docker.ssgh.com/device-api:1.2.0-SNAPSHOT\": failed to copy: httpReadSeeker: failed open: could not fetch content descriptor sha256:cd5b8d67fe0f3675553921aeb4310503a746c0bb8db237be6ad5160575a133f9 (application/vnd.docker.image.rootfs.diff.tar.gzip) from remote: not found" image="docker.ssgh.com/device-api:1.2.0-SNAPSHOT"
I verified that I could pull the image from the repository using docker pull docker.ssgh.com/device-api:1.2.0-SNAPSHOT on my development VM and it worked as expected.
I then set the nodeName attribute for the pod specification to force it to one of the other nodes and the deployment worked as expected.
In addition I also used cURL to fetch the content descriptor, which worked as expected.
Edit for further detail.
My original install included 6 different charts. Initially only 2 of the 6 installed correctly, the remaining 4 reported image pull errors. I deleted the failing 4 and tried again, this time 2 of the 4 failed. I deleted the failing 2 and tried again. These 2 continued to fail, unless I specified a different node, in which they worked. I deleted them again and waited for an hour to see if Kubernetes would clean up the mess. When I tried again, 1 of them worked, but the other continued to fail. I left it over night, and its still failing this morning. Unless I move force onto a different node.
It is worth noting that the nodes in question are able to download other images from the same private repo without issue.
There can be multiple reasons for your pod not pulling the image on particular node:
Docker on non-working node is not trusting the image repo
Docker is not verifying the CA issuer for the repo
Firewall is not opened to image repo on non-working node
Troubleshoot using the following option to find the cause of the issue :
Check the connectivity to image repo on the non-working node
Check the docker config over non working node whether its allowing the image repo
Do docker pull on non working node
Related
I am trying to install a sample application using the git option in OpenShift 4.7.2 (CodeReady containers 1.24) and I keep getting the below error while openshift tries to build the image to be deployed.
Failed to pull image
"image-registry.openshift-image-registry.svc:5000/employee-ecosys/person-service:latest": rpc error:
code = Unknown
desc = Error reading manifest latest in image-registry.openshift-image-registry.svc:5000/employee-ecosys/person-service:
manifest unknown: manifest unknown
The application person-service is a simple crud application build using spring-boot and uses in-memory h2 as its database. Github repo is here
Some checks to perform:
Are the image registry pods running?
oc get pods -n openshift-image-registry
Is your specific image created?
oc get images | grep "person-service"
Do you get any images?
oc get images
"latest" is kind of a special tag. You should never manually tag an image as "latest". Openshift will consider the "latest" tag to be the newest image, regardless of what tag it has.
I am not familiar with the git deploy method. I have personally very little experience with any s2i builds. I normally use a git repo for the openshift/kubernetes resources and a git repo for the code (they can be the same but separated in the tree by folder structure) and use a pipeline or manually build the image and push it to a registry somewhere and then let openshift pull it from there.
I'm sometimes stuck while attempting to debug my code.
Debug Session is active, code execution is suspended :
But I cannot see what really happens, as the breakpoint show "unavailable" ("no parking" symbol):
Does anybody know about this sign ?
I still haven't found any information about it on JetBrains sites... that's why I'm here :-)
(PhpStorm 2020.3, using docker containers (linux containers) with Docker Desktop/ Windows 10)
[EDIT] :
I just noticed that "break at first line in php script" seem to be functioning though:
But I have these weird breakpoints instead of red "normal" ones, and an highlighted line.
I tried restarting my docker containers, same issue. This produces seemingly randomly and gets solved after a while ... (reboot ?...)
[EDIT] SOLVED
The path mapping (local<->docker) for the root of my project was empty (how did it happen...) in my docker configuration in PhPStorm.
I'm not sure how this problem occured, but I'll be able to solve it next time if it's back.
If you try to disable "break at first line in php scripts" you may get the message :
17:38 Debug session was finished without being paused It may be
caused by path mappings misconfiguration or not synchronized local and
remote projects. To figure out the problem check path mappings
configuration for 'docker-server' server at PHP|Servers or enable
Break at first line in PHP scripts option (from Run menu). Do not
show again
In my case, the path mapping for the root of my project was incomplete "Absolute path on the server" was emtpy. I don't know how it happened but you could check :
In PHP | Servers
I am using Google Cloud Build to build containers run on Container Optimized OS VM's on several projects
A typical cloudbuild.yaml file looks like this:
steps:
- name: 'gcr.io/cloud-builders/docker'
args: [ "build",
"-t", "gcr.io/${PROJECT_ID}/core-app-${BRANCH_NAME}:latest",
"."]
- name: 'gcr.io/cloud-builders/docker'
args: ["push", "gcr.io/${PROJECT_ID}/core-app-${BRANCH_NAME}:latest"]
- name: 'gcr.io/cloud-builders/gcloud'
args: ["beta", "compute", "instances", "update-container", "core-app-${BRANCH_NAME}", "--container-image", "gcr.io/${PROJECT_ID}/core-app-${BRANCH_NAME}:latest", "--zone", "${_ZONE}"]
images:
- "gcr.io/${PROJECT_ID}/core-app-${BRANCH_NAME}:latest"
A trigger is defined with some branch condition
In essence, on a commit to a given branch, an image with tag latest is built and used to run a container of a given VM.
It worked great until a couple of weeks ago. And suddenly, on all projects, it stopped working well. Instead of pulling latest the VM keeps using the local one. The only workaround I found was to use a SHA as tag (gcr.io/${PROJECT_ID}/core-app-${BRANCH_NAME}:${SHORT_SHA}), but this results in several images accumulating on the VM and at some point, there is not enough space anymore and the deployment fails.
So, how can I force the container optimized VM to pull an image:tag when it has one with the same name on the local disk?
You can delete the old images before you pull in the new ones, by utilizing a command like
docker image prune -a -f
You would get a higher downtime while updating from one version to another, but if that is not really an issue for you then this should work just fine.
I am trying to push a second version of my app (nodeJS + MongoDB) into my OpenShift account. It worked the first time, but now it fails with this error:
Erics-MacBook-Air:rippleRating ericg$ git push openShift master
Counting objects: 129, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (129/129), done.
Writing objects: 100% (129/129), 28.09 KiB | 0 bytes/s, done.
Total 129 (delta 94), reused 0 (delta 0)
remote: Stopping NodeJS cartridge
remote: Mon Apr 13 2015 07:53:08 GMT-0400 (EDT): Stopping application 'ripplerating' ...
remote: Mon Apr 13 2015 07:53:09 GMT-0400 (EDT): Stopped Node application 'ripplerating'
remote: Stopping MongoDB cartridge
remote: No such file or directory - /var/lib/openshift/xxxxxxxxxxxxxxxxf8000090/app-deployments/2015-04-13_07-53-10.382/metadata.json
To ssh://xxxxxxxxxxxxxxxxf8000090#ripplerating-<domain>.rhcloud.com/~/git/ripplerating.git/
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://xxxxxxxxxxxxxxxxf8000090#ripplerating-<domain>.rhcloud.com/~/git/ripplerating.git/'
If I rhc ssh to my app, I don't see the directory 2015-04-13_07-53-19.382, I have only app-files, current and by-id (app-files has the metadata.json).
BTW what would be a good place to add some files (secret.json) that I don't want to put in the git repo and can be used by the nodeJS app?
Thanks!
I recently came across this problem myself and wanted to share how I came to a solution.
To start, my /app-deployments directory contained the following:
by-id current redis-cli
Using ls -l reveals that current is actually a soft link to the currently running build.
current -> 2015-07-10_22-45-22.964
However, using the command file current also revealed that:
current: broken symbolic link to `2015-07-12_22-45-22.964'
That seemed strange, but it was consistent with the fact that there was no folder in the /app-deployments directory named with the most recent build's timestamp (2015-07-10_22-45-22.964). I removed current and attempted to push. Same result as the OP however, missing folder or directory for the new build with metadata.json inside.
After poking around in by-id and redis-cli, I found that redis-cli contained it's own metadata.json file with a lot of null values in it (both 'git_sha1' and 'id' were null). I played with the git_sha1 field to match both my previous and new commits, but nothing changed. The other folder, by-id, had a soft link in it as well which pointed to the redis-cli folder.
At this point I had backed up everything I wanted and I attempted to force a refresh to defaults of the /app-deployments directory by deleting everything in it and pushing. Surprisingly, it worked! Now my /app-deployments directory looks like this:
2015-07-10_22-45-22.964 by-id current
which is what I normally expect to see in there. Hopefully this will be helpful to someone!
As a side note, I later decided to enable openshift's support for multiple rollback versions, which you can read about here. It allows you to specify how many rollbacks you want to keep, which could be very valuable in another situation like this.
I finally got to the bottom of this one. I had created a folder under app-deployments, and that upsets the auto deployment logic in OpenShift. The current folder was deleted under app-deployments and I have to recreate it and put a metadata.json copy in it. Once I have done that I was able to deploy again using git push. I am gessing that if you have some secret data that cannot be kept in the git repo, they have to reside under app-root/data although this won't work for a scalable app... which in this case I am not sure where should I put those sensible data...
My answer is basically the one provided by #Will.R but shorter:
The problem comes from the fact that:
* app-deployments/current is a broken symbolic link to the most recent build.
If you want to fix this problem:
* Delete everything inside /app-deployments
* Push again
* :)
* Problem fixed.
I have an application in Openshift free plan with only one gear. I want to change it to scalabe and take usage of all of 3 free gears.
I read this blog post from openshift and I found that there is a way to do it. I should clone my current application to a new one as a scalable which will use the 2 remaining gears and then I will delete the original application. Thus, the new one will have 3 free gears.
The way that blog suggest is: rhc create-app <clone> --from-app <existing> --scaling
I have the following error: invalid option --from-app
Update
After running the command gem update rhc, I don't have the error above but...A new application with the given name has created with the same starting package (Python 2.7) just like the existing one, but all the files are missing. It actually create a blank application and not a clone of the existing.
Update 2
Here is the structure of the folder:
-.git
-.openshift
-wsgi
---static
---views
---application
---main.py
-requirements.txt
-setup.py
From what we've talked on IRC, your problem was around missing SSH configuration on Windows machine:
Creating application xxx ... done
Waiting for your DNS name to be available ...done
Setting deployment configuration ... done
No system SSH available. Please use the --ssh option to specify the path to your SSH executable, or install SSH.
I've double checked it, and it appears to be working without any problem.
The only requirement is to have the latest rhc client and putty or any other
SSH client. I'd recommend going through this tutorial once again and double-check everything to make sure everything is working properly.
Make sure you are using the newest version of the rhc gem with "gem update rhc" to make sure that you have access to that feature from the command line.
The --from-app will essentially do a 'rhc snapshot save & snapshot restore` (amoung other things) as you can see here from the source:
if from_app
say "Setting deployment configuration ... "
rest_app.configure({:auto_deploy => from_app.auto_deploy, :keep_deployments => from_app.keep_deployments , :deployment_branch => from_app.deployment_branch, :deployment_type => from_app.deployment_type})
success 'done'
snapshot_filename = temporary_snapshot_filename(from_app.name)
save_snapshot(from_app, snapshot_filename)
restore_snapshot(rest_app, snapshot_filename)
File.delete(snapshot_filename) if File.exist?(snapshot_filename)
paragraph { warn "The application '#{from_app.name}' has aliases set which were not copied. Please configure the aliases of your new application manually." } unless from_app.aliases.empty?
end
However this will not copy over anything in your $OPENSHIFT_DATA_DIR directory so if you're storing files there, you'll need to copy them over manually.