I'm administrating RHEL OpenShift cluster. I'm upgrading from 4.10.x -> 4.11.x -> 4.12.2
There are 3 masters, and 7 worker nodes.
all 3 masters updated
3 of the 8 workers updated.
Thus far the upgrade is now stuck on worker0 with:
oc logs machine-config-daemon-4bs9x -n openshift-machine-config-operator
< snip >
I0216 21:00:08.555947 3136 daemon.go:1255] Current config: rendered-worker-8ebd95b2c00a22992daf1248ebc5640f
I0216 21:00:08.555986 3136 daemon.go:1256] Desired config: rendered-worker-263c6ea5fafb6f1da35a31749a1180d7
I0216 21:00:08.555992 3136 daemon.go:1258] state: Degraded
I0216 21:00:08.566365 3136 update.go:2089] Running: rpm-ostree cleanup -r
Deployments unchanged.
I0216 21:00:08.647332 3136 update.go:2104] Disk currentConfig rendered-worker-263c6ea5fafb6f1da35a31749a1180d7 overrides node's currentConfig annotation rendered-worker-8ebd95b2c00a22992daf1248ebc5640f
I0216 21:00:08.651201 3136 daemon.go:1564] Validating against pending config rendered-worker-263c6ea5fafb6f1da35a31749a1180d7
E0216 21:00:10.291740 3136 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-263c6ea5fafb6f1da35a31749a1180d7: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a866916d3c75fb02ee", have "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:73b311468554ffe8bdd0dd51df7dafd7a791a16c3147374cc7b28f0d3d7fcc17" ("b5390d80a8b7f90b0b64f9db3e92848591c967612740716c656c6e88696e0c3f")
I've had this problem before and followed the RedHat solutions to run the following command. But this is now failing.
oc debug node/worker0.xx.com
sh-4.4# chroot /host
sh-4.4# rpm-ostree status
State: idle
Deployments:
* db83d20cf09a263777fcca78594b16da00af8acc245d29cc2a1344abc3f0dac2
Version: 412.86.202301311551-0 (2023-01-31T15:54:05Z)
sh-4.4#
sh-4.4# /run/bin/machine-config-daemon pivot "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a8669163c75fb02ee"
I0216 21:02:54.449270 3962714 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-821872843 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a866916d3c75fb02ee
I0216 21:03:48.349962 3962714 rpm-ostree.go:209] Previous pivot: quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:73b311468554ffe8bdd0dd51df7dafd7a791a16c3147374cc7b28f0d3d7fcc17
I0216 21:03:49.926169 3962714 rpm-ostree.go:246] No com.coreos.ostree-commit label found in metadata! Inspecting...
I0216 21:03:49.926234 3962714 rpm-ostree.go:412] Running captured: ostree refs --repo /run/mco-machine-os-content/os-content-821872843/srv/repo
error: error running ostree refs --repo /run/mco-machine-os-content/os-content-821872843/srv/repo: exit status 1
error: opening repo: opendir(/run/mco-machine-os-content/os-content-821872843/srv/repo): No such file or directory
sh-4.4#
After a reboot and retry now I'm getting:
sh-4.4# /run/bin/machine-config-daemon pivot "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a866916375fb02ee"
I0217 19:10:06.928154 1443914 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-903744214 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a8669163c75fb02ee
error: "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a8669163c75fb02ee" is not a valid image reference: invalid checksum digest length
W0217 19:10:07.176459 1443914 run.go:45] nice failed: running nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-903744214 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a8669163c75fb02ee failed: error: "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a8669163c75fb02ee" is not a valid image reference: invalid checksum digest length
: exit status 1; retrying...
^C
I tried this:
/run/bin/machine-config-daemon pivot "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:f454144d2c32aa6fd99b8c68082f59554751282865dce6a866916375fb02ee"
expecting this result ( from a previous upgrade problem ):
sh-4.4# chroot /host
sh-4.4# /run/bin/machine-config-daemon pivot "quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:73b311468554ffe8bdd0dd51df7dafd7a791a16c3147374cc7b28f0d3d7fcc17"
I0208 21:50:00.408235 2962835 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-3432684387 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:73b311468554ffe8bdd0dd51df7dafd7a791a16c3147374cc7b28f0d3d7fcc17
I0208 21:50:29.727695 2962835 rpm-ostree.go:353] Running captured: rpm-ostree status --json
I0208 21:50:29.780350 2962835 rpm-ostree.go:261] Previous pivot: quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:7c252d64354d207cd7fb2a6e2404e611a29bf214f63a97345dee1846055c15d8
I0208 21:50:31.456928 2962835 rpm-ostree.go:293] Pivoting to: 411.86.202301242231-0 (b5390d80a8b7f90b0b64f9db3e92848591c967612740716c656c6e88696e0c3f)
I0208 21:50:31.456966 2962835 rpm-ostree.go:325] Executing rebase from repo path /run/mco-machine-os-content/os-content-3432684387/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:73b311468554ffe8bdd0dd51df7dafd7a791a16c3147374cc7b28f0d3d7fcc17 and checksum b5390d80a8b7f90b0b64f9db3e92848591c967612740716c656c6e88696e0c3f
I0208 21:50:31.457048 2962835 update.go:1972] Running: rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-3432684387/srv/repo:b5390d80a8b7f90b0b64f9db3e92848591c967612740716c656c6e88696e0c3f --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev#sha256:73b311468554ffe8bdd0dd51df7dafd7a791a16c3147374cc7b28f0d3d7fcc17 --custom-origin-description Managed by machine-config-operator
0 metadata, 0 content objects imported; 0 bytes content written
Staging deployment... done
Upgraded:
NetworkManager 1:1.30.0-16.el8_4 -> 1:1.36.0-12.el8_6
< snip>
zlib 1.2.11-18.el8_4 -> 1.2.11-19.el8_6
Removed:
ModemManager-glib-1.10.8-2.el8.x86_64
libmbim-1.20.2-1.el8.x86_64
libqmi-1.24.0-1.el8.x86_64
openvswitch2.16-2.16.0-108.el8fdp.x86_64
redhat-release-coreos-410.84-2.el8.x86_64
Added:
WALinuxAgent-udev-2.3.0.2-2.el8_6.3.noarch
glibc-gconv-extra-2.28-189.5.el8_6.x86_64
libbpf-0.4.0-3.el8.x86_64
openvswitch2.17-2.17.0-67.el8fdp.x86_64
redhat-release-8.6-0.1.el8.x86_64
redhat-release-eula-8.6-0.1.el8.x86_64
shadow-utils-subid-2:4.6-16.el8.x86_64
Run "systemctl reboot" to start a reboot
sh-4.4# systemctl reboot
when I run realize start for a Go program, I got this error result
[14:55:13][V2-USER-API.YUMMY.ID] : Watching 159 file/s 118 folder/s
[14:55:13][V2-USER-API.YUMMY.ID] : Install started
[14:55:13][V2-USER-API.YUMMY.ID] : Install
exec: not started
I have set up my file .realize.yaml, like this
settings:
legacy:
force: false
interval: 0s
schema:
- name: v2-user-api.yummy.id
path: ./cmd/server
commands:
run:
status: true
watcher:
extensions:
- go
paths:
- ../../
ignored_paths:
- .git
- .realize
- vendor
but I got error after run realize start
This command work for me
#!/usr/bin/env bash
export GO111MODULE=off
cd ~/
go get github.com/oxequa/realize
cd /go/src/github.com/oxequa/realize && \
git fetch && \
git checkout v2.0.2 && \
go get github.com/oxequa/realize
RV=$(realize --version)
echo "Realize installed #: $RV"
export GO111MODULE=on
use realize version v2.0.2
getting python to run from cgi-bin causes lighttpd daemon failed to start
$HTTP["url"] =~ "^/cgi-bin/" {
alias.url += ( "/cgi-bin/" => "/var/www/cgi-bin" )
cgi.assign = (".py" => "/usr/bin/python")
}
Am I doing something wrong
I also have below added in the beginning of /etc/lighttpd/lighttpd.conf
server.modules = (
"mod_indexfile",
"mod_setenv",
"mod_access",
"mod_alias",
"mod_redirect",
"mod_cgi"
)
Look in the lighttpd error log and review the trace for what you broke.
Alternatively, run lighttpd pre-flight tests on your config:
lighttpd -tt -f /etc/lighttpd/lighttpd.conf
I am trying to provision an AWS AMI but the packer script terminates with the following error.
Build 'amazon-ebs' errored: Script exited with non-zero exit status: 127
==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: Script exited with non-zero exit status: 127
==> Builds finished but no artifacts were created.
My template for Packer is as follows:
{
"variables": {
"aws_access_key": "{{env `MY_ACCESS_KEY`}}",
"aws_secret_key": "{{env `MY_SECRET_KEY`}}"
},
"builders": [{
"type": "amazon-ebs",
"access_key": "{{user `aws_access_key`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": "us-east-1",
"source_ami":"ami-8c1be5f6",
"instance_type": "t2.micro",
"ssh_username": "ec2-user",
"ami_name": "packer-example {{timestamp}}"
}],
"provisioners":[
{
"type": "shell",
"script": "provision.sh"
}]}
Error log is as follows: PACKER_LOG=1 packer build template.json
2017/11/01 01:30:37 [INFO] (telemetry) ending amazon-ebs
2017/11/01 01:30:37 [INFO] (telemetry) found error: Script exited with non-zero exit status: 127
2017/11/01 01:30:37 ui error: Build 'amazon-ebs' errored: Script exited with non-zero exit status: 127
2017/11/01 01:30:37 Builds completed. Waiting on interrupt barrier...
2017/11/01 01:30:37 machine readable: error-count []string{"1"}
2017/11/01 01:30:37 ui error:
==> Some builds didn't complete successfully and had errors:
2017/11/01 01:30:37 machine readable: amazon-ebs,error []string{"Script exited with non-zero exit status: 127"}
2017/11/01 01:30:37 ui error: --> amazon-ebs: Script exited with non-zero exit status: 127
2017/11/01 01:30:37 ui:
==> Builds finished but no artifacts were created.
Build 'amazon-ebs' errored: Script exited with non-zero exit status: 127
provision.sh comprises of
#!/bin/bash
sudo yum install httpd -y
sudo yum update -y
sudo aws s3 cp s3://zbcxlkxcjvlxkj/index1.html /var/www/html/ --region us-east-1
sudo service httpd start
sudo chkconfig httpd on
Exit code 127 is returned by bash when a command is not found. Most likely you do not have all of the commands used in your script installed on your image prior to running it.
I am creating a cloudformation template, which creates some resources as EC2 instance, autoscaling group and launchConfiguration.
By the userData property of the launchConfiguration resource, I tried to install the Cloudwatch agent as follows:
"UserData":{ "Fn::Base64" : {
"Fn::Join" : ["", [
"#!/bin/bash -xe\n",
"yum -y install aws-cfn-bootstrap\n",
"/opt/aws/bin/cfn-init -v",
" --stack ", { "Ref": "AWS::StackName" },
" --resource LaunchCongig",
" --region ", { "Ref" : "AWS::Region" },"\n",
"yum -y install wget\n",
"# Get the CloudWatch Logs agent\n",
"wget https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py\n",
"# Install the CloudWatch Logs agent\n",
"python ./awslogs-agent-setup.py -n -r ", { "Ref" : "AWS::Region" }, " -c /etc/cwlogs.cfg || error_exit 'Failed to run CloudWatch Logs agent setup'\n",
"service awslogs start"
]]}
After ssh into the instance, I checked the file /var/log/cloud-init-output.log to see if everything is fine, but here is what I got:
+ wget https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py
--2017-02-17 14:36:10-- https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.226.59
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.226.59|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 47998 (47K) [text/x-python]
Saving to: ‘awslogs-agent-setup.py’
0K .......... .......... .......... .......... ...... 100% 196K=0.2s
2017-02-17 14:36:10 (196 KB/s) - ‘awslogs-agent-setup.py’ saved [47998/47998]
+ python ./awslogs-agent-setup.py -n -r eu-west-1 -c /etc/cwlogs.cfg
Step 1 of 5: Installing pip ...Traceback (most recent call last):
File "./awslogs-agent-setup.py", line 1144, in <module>
main()
File "./awslogs-agent-setup.py", line 1140, in main
setup.setup_artifacts()
File "./awslogs-agent-setup.py", line 693, in setup_artifacts
self.install_pip()
File "./awslogs-agent-setup.py", line 600, in install_pip
fail("Could not install pip. Please try again or see " + AGENT_SETUP_LOG_FILE + " for more details")
TypeError: fail() takes exactly 2 arguments (1 given)
+ error_exit 'Failed to run CloudWatch Logs agent setup'
/var/lib/cloud/instance/scripts/part-001: line 8: error_exit: command not found
Feb 17 14:36:12 cloud-init[2798]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
Feb 17 14:36:12 cloud-init[2798]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Feb 17 14:36:12 cloud-init[2798]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 0.7.6 finished at Fri, 17 Feb 2017 14:36:12 +0000. Datasource DataSourceEc2. Up 85.78 seconds
What is wrong with this script? Is there any other way to install the agent?
Thank you.
EDIT:
I figured out that is because maybe the python-pip package didn't get installed so I added this to the userData:
"yum -y install python-pip\n",
After that I played the template again and strangely I got the same Error.
I am usinh an Amazon ECS-optimized AMI
I solved the problem by installing the agent directly by yum awslogs:
"UserData":{ "Fn::Base64" : {
"Fn::Join" : ["", [
"#!/bin/bash -xe\n",
"yum -y install aws-cfn-bootstrap\n",
"/opt/aws/bin/cfn-init -v",
" --stack ", { "Ref": "AWS::StackName" },
" --resource launchConfig",
" --region ", { "Ref" : "AWS::Region" },"\n",
"yum -y install awslogs\n",
"service awslogs start"
]]}
Here is the output from the log file:
Installed:
awslogs.noarch 0:1.1.2-1.10.amzn1
Dependency Installed:
aws-cli.noarch 0:1.11.29-1.45.amzn1
aws-cli-plugin-cloudwatch-logs.noarch 0:1.3.3-1.15.amzn1
freetype.x86_64 0:2.3.11-15.14.amzn1
libjpeg-turbo.x86_64 0:1.2.90-5.14.amzn1
mailcap.noarch 0:2.1.31-2.7.amzn1
python27-botocore.noarch 0:1.4.86-1.62.amzn1
python27-colorama.noarch 0:0.2.5-1.7.amzn1
python27-dateutil.noarch 0:2.1-1.3.amzn1
python27-docutils.noarch 0:0.11-1.15.amzn1
python27-futures.noarch 0:3.0.3-1.3.amzn1
python27-imaging.x86_64 0:1.1.6-19.9.amzn1
python27-jmespath.noarch 0:0.9.0-1.11.amzn1
python27-ply.noarch 0:3.4-3.12.amzn1
python27-pyasn1.noarch 0:0.1.7-2.9.amzn1
python27-rsa.noarch 0:3.4.1-1.8.amzn1
Complete!
+ service awslogs start
Starting awslogs: [ OK ]
Cloud-init v. 0.7.6 finished at Fri, 17 Feb 2017 15:33:42 +0000. Datasource DataSourceEc2. Up 83.47 seconds
Everything works fine this way. Hope that will help someone someday!
For ECS specifically, see Using CloudWatch Logs with Container Instances in the EC2 Container Service documentation for details on configuring CloudWatch Logs. The documentation recommends using yum install -y awslogs instead of the Python install script.
The documentation provides a complete sample in the Configuring CloudWatch Logs at Launch with User Data section.
In your case, since you're already managing your config files using cfn-init and CloudFormation::Init metadata in CloudFormation, you don't need any complex parsing of config files in your User-Data script, but you can still use the script as a reference. One thing worth adding to your User-Data script is running chkconfig awslogs on to make sure the service continues running on the instance after a reboot.